What is the Puppet configuration management tool, and how does it work?

Puppet is an open source alternative to commercial IT configuration management tools. In this column, one of Puppet's founders discusses how the tool can be useful in the data center.

Puppet is an open source IT automation tool that allows IT organizations to encode the configuration of services as a policy, which the framework then audits and enforces.

At first glance, a battle-hardened system administrator might dismiss a new configuration management tool as unnecessary. She can do the same thing with machine images and some shell scripts. This is equivalent to a lumberjack who has just heard about chainsaws and doesn't see why anyone would ever want more than an ax.

Puppet was first released in 2005, and the project has been growing ever since. Puppet now manages systems at Google, Twitter, Sun, Sony, Red Hat, New York Stock Exchange, Digg, SlideShare, Shopzilla, and Harvard and Stanford universities. IT operations from the enterprise to Web 2.0 startups use the tool to get more work done. These organizations have recognized that Secure Shell (SSH) in a loop is not a solution.

How the Puppet configuration management tool works
Puppet consists of a language, client-server processes and the Resource Abstraction Layer.

The language allows the description of a server configuration with an abstraction of the resources that an administrator already thinks in: users, groups, packages, files, cron, mount and services, to name a few.

The relationship between the resources are also specified. For example, a service depends on a configuration file, and that file depends on a package being installed. The relationships provide order as the policy is applied and allow Puppet to restart dependent services when their configurations change.

The resources can also be composed into logical collections. To reuse the previous example, a package, a configuration file and a service can be grouped together. The group can then be reused and treated as a single logical entity in other Puppet code. The client-server setup provides a secure mechanism for transporting the specific configurations from the central description to the individual hosts over HTTP with SSL authentication and encryption -- the same SSL used to secure online banking and e-commerce. Each host only receives its specifically compiled configuration to apply.

Configurations are audited and applied by querying current state, comparing the results with the desired state, and taking the appropriate action for each resource. Applying the configuration will take a base install operating system to a fully configured server. Changes to the central policy can also update configuration files or apply patches. The cycle of auditing and syncing systems with the assigned policy is then used to manage systems throughout the system lifecycle.

The auditing and syncing cycle can ensure consistency across the entire network. With traditional tools and techniques, the chance that two machines providing the same service are configured the same is quite small. As the server counts climb, inconsistencies created by configuration drift cause confusion and mistakes.

Infrastructure is code
The impact of virtualization technology in the modern data center is undeniable. The future is abstracted storage, network and compute resources driven by APIs. While virtualization allows for greater service isolation and hardware utilization, each virtual machine represents as much configuration as a physical machine. With 10 or more virtual machines running on the same hardware or a cluster of VMs started with Web services, IT organizations can reduce hardware and capital expenses, but they multiply the configurations that need to be managed.

Image-based service management seems like a solution. The problem with the image-based approach isn't obvious the first day, but after months of managing images, one realizes she has just replaced sprawling configuration drift with a sprawling collection of machine images with little or no insight into each one.

With a configuration management tool like Puppet, the notion of an API extends to the configuration of each system. The encoded services provide not only a mechanism for building and maintaining systems, but the code can also provides something a 500 MB machine image doesn't -- semantics. The code provides insight into not only what is configured on a system, but why. Understanding why allows for much better decision making for managing existing systems and designing new services.

Process is technology
Change is the biggest cause of outages for most organizations (hardware failure is a distant second). Most IT departments adopt heavy change management processes to protect themselves, but the business needs change, because change enables the business. The business demand to change systems and the IT organization's attempt to slow change can often put businesses at odds with their own IT departments.

With Puppet, infrastructure is code, which means there is an ecosystem of software development tools and processes that can be used. Version control, indispensable in software development, is rarely employed in system administration work. Version control provides transparency, the ability to collaborate, make experimental changes and roll changes back, as well as the foundation for enabling nearly all other software best practices.

Starting with version control, encoding infrastructure allows the organization to use continuous integration and the develop-test-deploy cycle with server configurations like they would with other applications. Especially when coupled with a virtualization technology, Puppet significantly shortens the feedback from changing systems. Make a change to service, try the new reverse proxy, load test a new database -- all as part of a development process. When the changes are ready, promote the work to all the machines in production by merging the new code into their defined policy.

Once the configuration of the Web server, application server or database server is defined and maintained by Puppet, adding a new server is almost trivial. The work went into creating the generic policy upfront. Need to patch all the Web servers? Change the central description of Web server and Puppet will take care of the rest. Need a new Web server configured and added to the load balancer? Puppet can manage, and it isn't an ad hoc one off to add capacity. Use Puppet to apply a policy that has been vetted and hardened.

Approaches like this provide for change management processes that can quickly deliver more flexibility with less risk than the slower process using traditional techniques. Trade in shell scripts for Puppet, axes for chainsaws, and embrace change instead of fighting to slow it.

Puppet is open source software, created and supported by Reductive Labs, with the goal of pushing the evolution of system management tools. Puppet is currently supported on any major Unix or Linux distribution, including Red Hat (Fedora), CentOS, Ubuntu, Debian, SUSE, Solaris, OS X, FreeBSD, HP-UX and AIX. The Puppet wiki has documentation about language and executable with more information about getting started. There is also an active and generally helpful user community that can be accessed through the mailing list and on IRC (at irc.freenode.net #puppet).


Read more on Network monitoring and analysis