Advanced Features of PuppetDB

Introduction

Puppet Enterprise, or PE, is the commercial version of Puppet. It is built on top of the Puppet platform. It allows IT operations teams to manage and automate more infrastructure and complex workshops more simply and powerfully. It gives users a consistent approach to automation across their entire infrastructure lifecycle, from initial provisioning to system configuration, application deployment, and intelligent change orchestration. This blog will discuss in detail the advanced features of PuppetDb.

Using PuppetDB

Currently, PuppetDB is mostly used to make advanced Puppet functionality available. We anticipate that more applications will be created on PuppetDB as usage grows.

The navigation sidebar contains links to the API specifications if you want to develop apps using PuppetDB.

Checking node status

$ sudo puppet node status <NODE>

where <NODE> is the name of the node you want to investigate. This will let you know whether the node is active, when its last catalog was submitted, and when its last facts were submitted.

Maintaining and tuning

PuppetDB needs a relatively small amount of maintenance and tuning.

Monitor the performance dashboard

PuppetDB hosts a performance dashboard on port 8080, which works only on localhost by default. The standard way to access it is via an ssh tunnel. For ex,

ssh -L 8080:localhost:8080 root@<puppetdb server>

and then visit http://localhost:8080 in the browser. You can bypass the ssh tunnel and go directly to http://localhost:8080 or http://puppetdb server>:8080 if PuppetDB is operating locally or on a remote host and waiting for external cleartext connections from your machine.

A web-based dashboard providing performance data and analytics, such as memory utilisation, queue depth, command processing metrics, duplication rate, and query stats, is displayed by PuppetDB on this page. It displays the min/max/median of each metric over a configurable duration and an animated SVG "sparkline" (a simple line chart showing general variation). It also shows the current version of PuppetDB and checks for updates, showing a link to the latest package if your deployment is outdated.

You can utilize the following URL parameters to alter the attributes of the dashboard:

width: It changes the width of each sparkline in pixels
height: It changes the height of each sparkline, in pixels
nHistorical: It defines how many historical data points to use in each sparkline
pollingInterval: It defines how often to poll PuppetDB for updates, in milliseconds

Image source

Deactivate or expire decommissioned nodes

A node needs to have its status in PuppetDB changed to "deactivated" when it is removed from your Puppet deployment. By doing this, you can be guaranteed that any resources that node exports will stop showing up in the catalogs sent to the other agent nodes.

Nodes that haven't checked in for a while can be automatically designated as expired by PuppetDB. Expiration is just deactivation's automated counterpart; the difference is solely significant for record-keeping purposes. The same rules apply to disabled and expired nodes. By default, nodes expire after 7 days of inactivity; use the node-ttl to change this.
If you want to manually deactivate nodes, use the following command on your primary server:

$ sudo puppet node deactivate <node> [<node> ...]

Any deactivated or expired node will be reactivated if PuppetDB receives new catalogs or facts for it.

Although expired nodes and deactivated will be excluded from storeconfigs queries, their data is still preserved.

PuppetDB CLI

Step 1: Install and configure Puppet

Install Puppet from the official website if it hasn't been fully installed and configured, then sign, request, and obtain a certificate for the node.

Your node needs to be running the Puppet agent and possess a certificate that your Puppet Server has signed. The puppet agent test should finish a run with the message Notice: Applied catalogue in X.XX seconds if you run it.

For ex,

$ export PATH=/opt/bin:$PATH
$ export MANPATH=/optclient-tools/share/man:$MANPATH

The remainder of this documentation supposes that these two directories have been included in their proper path configurations.

Step 2: Install and configure the PuppetDB CLI

To install PuppetDB CLI from Rubygems:

$ gem install --bindir /opt/puppetlabs/bin puppetdb_cli

On a computer that does not already have Puppet installed, such as your own workstation, you can install the executables by skipping the —bindir option and installing them in the usual Ruby bindir.

$ gem install puppetdb_cli

You must add the CLI node's certname to the PuppetDB certificate-whitelist and give the paths to the node's cacert, cert, and private key when using the CLI either with flags or a configuration file if the node on which you installed the CLI is not the same node as your PuppetDB server.

To configure the PuppetDB CLI to talk to your PuppetDB with flags, add a configuration file at $HOME/.puppetlabs/client-tools/puppetdb.conf (or %USERPROFILE%.puppetlabs\client-tools\puppetdb.conf for Windows). For more details see the installed man page:

$ man puppetdb_conf

The PuppetDB CLI configuration files (the user-specified or global files) can take the given settings:

server_url: Either a JSON String (for a single url) or Array (for multiple ones) of your PuppetDB servers to manage or query via the CLI commands.

Default value: https://127.0.0.1:8080

cacert: It is the path for the CA cert.

*nix sytems - /etc/puppetlabs/puppet/ssl/certs/ca.pem

Windows - C:\ProgramData\PuppetLabs\puppet\etc\ssl\certs\ca.pem

cert: It is an SSL certificate signed by your site's Puppet CA.

key: It is the private key for that certificate.

Example configuration file (pe-client-tools)

The PE version of the PuppetDB CLI supports token auth so the only necessary configuration items are server_urls and cacert.

{
  "puppetdb": {
    "server_urls": "https://<PUPPETDB_HOST>:8081",
    "cacert": "/etc/puppetlabs/puppet/ssl/certs/ca.pem"
  }
}

On Windows, escape slashes in the CA certificate path.

{
  "puppetdb": {
    "server_urls": "https://<PUPPETDB_HOST>:8081",
    "cacert": "C:\\ProgramData\\PuppetLabs\\puppet\\etc\\ssl\\certs\\ca.pem"
  }
}

Example configuration file (puppet-client-tools)

The open source version of the PuppetDB CLI requires certificate authentication for SSL connections to PuppetDB. To configure certificate authentication set cacert, cert and key.

{
  "puppetdb": {
    "server_urls": "https://<PUPPETDB_HOST>:8081",
    "cacert": "/etc/puppetlabs/puppet/ssl/certs/ca.pem",
    "cert": "/etc/puppetlabs/puppet/ssl/certs/<WORKSTATION_HOST>.pem",
    "key": "/etc/puppetlabs/puppet/ssl/private_keys/<WORKSTATION_HOST>.pem"
  }
}

On Windows, escape slashes in paths.

{
  "puppetdb": {
    "server_urls": "https://<PUPPETDB_HOST>:8081",
    "cacert": "C:\\ProgramData\\PuppetLabs\\puppet\\ssl\\certs\\ca.pem",
    "cert": "C:\\ProgramData\\PuppetLabs\\puppet\\ssl\\certs\\<WORKSTATION_HOST>.pem",
    "key": "C:\\ProgramData\\PuppetLabs\\puppet\\ssl\\private_keys\\<WORKSTATION_HOST>.pem"
  }
}

Step 3: Enjoy!

Here are some ex of using the CLI.

Using puppet query

Query PuppetDB using PQL:

$ puppet query "nodes [ certname ]{ limit 1 }"

Or query PuppetDB using the AST syntax:

$ puppet query "['from', 'nodes', ['extract', 'certname'], ['limit', 1]]"

For more data on the query command:

$ man puppet-query

Using puppet db

Handle your PuppetDB exports:

$ puppet db export pdb-archive.tgz --anonymization full

Or handle your PuppetDB imports:

$ puppet db import pdb-archive.tgz

For more information on the db command:

$ man puppet-db

Exporting and anonymizing data

Using the export, import, and anonymization tools for PuppetDB is covered in this document.

Your entire PuppetDB will be archived via the export tool, which may then be imported into another PuppetDB. The archive can be anonymized using the export tool before being sent back. When transferring critical PuppetDB data, this is especially helpful.

Using the export command

To make an anonymized PuppetDB archive directly, use the Puppet db subcommand from any node with puppet-client-tools installed:

$ puppet db export my-puppetdb-export.tar.gz --anonymization moderate

Using the import command

To import an anonymized PuppetDB tarball, use the Puppet db subcommand from any node with puppet-client-tools installed:

$ puppet db import my-puppetdb-export.tar.gz

Scaling recommendations

PuppetDB will be a essential component of your Puppet deployment, as agent nodes will not able to request catalogs if it goes down. Therefore, you should ensure it can handle your site's load and is resilient against failures.

When scaling any service, there are many possible performance and reliability bottlenecks. These can be worked with in turn as they become problems.

Bottleneck: Node check-in interval

Your PuppetDB server will be more stressed the more frequently your Puppet nodes check in.

Changing the runinterval option in each Puppet node's Puppet.conf file can lessen the demand for higher performance. (Or, if using cron to execute Puppet agent, by altering the cron task's frequency.)

The standards and expectations of your site will determine how frequently nodes should check in; this is as much a cultural choice as a technological one. The puppetd plugin from MCollective can be used to establish a greater default check-in period while still enabling immediate runs when necessary.

Bottleneck: CPU cores and number of worker threads

To process the commands in its queue, PuppetDB can make use of many CPU cores. A worker thread can run on each core. PuppetDB uses half of the machine's cores by default. Running PuppetDB on a machine with lots of CPU cores and then adjusting the number of worker threads will improve performance:

PuppetDB will be able to handle more incoming commands per minute with more threads. To determine whether you require more threads, keep an eye on the queue depth in the performance dashboard.
Too many worker threads could potentially deplete the message queue and web server of resources, which would delay the timely entry of incoming commands into the queue. Check the CPU consumption on your server to discover whether the cores are overloaded.

Bottleneck: Single point of failure

A single PuppetDB and PostgreSQL server can likely handle the entire site's demand, but you may wish to operate numerous servers for robustness and redundancy. To set up PuppetDB for high availability, you should:

Use a reverse proxy or load balancer to divide the traffic between numerous PuppetDB instances on different servers.
Set up a cluster of PostgreSQL servers for high availability. The PostgreSQL manual and wiki both have additional information.
Set up each PuppetDB instance to utilize the same PostgreSQL repository. Although they might be communicating with several machines when using clustered PostgreSQL servers, theoretically, they ought to all be writing to the same database.

Bottleneck: SSL performance

Although PuppetDB employs its inbuilt SSL processing, this rarely affects speed. However, by terminating SSL using Apache or NGINX as an alternative, extensive deployments will be able to squeeze out additional performance. We advise turning off SSL at the proxy server if you're utilizing numerous PuppetDB servers behind a reverse proxy.

External SSL termination configuration instructions are currently outside the purview of this manual. However, if your site is large enough for this to be essential, we anticipate that you have likely done it with several other providers before.

Debugging with remote REPL

A remote REPL interface that is part of PuppetDB is present but by default, deactivated.

Most developers who are conversant with Clojure and the coding of PuppetDB will find use for this interface. You can instantly alter the code of PuppetDB. The REPL is typically left disabled for security reasons because most users will never need to use it.

Enabling the REPL

To enable the REPL, you must rework PuppetDB's config file to enable it, configure the listening IP address, and address, and choose a port:

# /etc/puppetdb/conf.d/repl.ini
[nrepl]
enabled = true
port = 8082
host = 127.0.0.1

After configuration, restart the PuppetDB service.

Connecting to a remote REPL

When PuppetDB accepts remote REPL connections, you can connect to it and begin issuing low-level debugging commands and Clojure code.

For example, with an NREPL configured on port 8082, and using Leiningen to connect:

# lein repl :connect localhost:8082
Connecting to nREPL at localhost:8082
REPL-y 0.3.1
Clojure 1.6.0
    Docs: (doc function-name-here)
          (find-doc "part-of-name-here")
  Source: (source function-name-here)
 Javadoc: (javadoc java-object-or-class-here)
    Exit: Control+D or (exit) or (quit)
 Results: Stored in vars *1, *2, *3, an exception in *e


user=> (+ 1 2 3)
6

Executing functions

Within the REPL, you can interactively execute PuppetDB's functions. For example, to manually compact the database:

user=> (use 'puppetlabs.puppetdb.cli.services)
nil
user=> (use 'puppetlabs.puppetdb.scf.storage)
nil
user=> (use 'clojure.java.jdbc)
nil
user=> (garbage-collect! (:database configuration))
(0)

Redefining functions

Additionally, you can control the active PuppetDB instance by dynamically adding new functions. Let's imagine you want to record each time a catalogue is destroyed for debugging purposes. You can simply dynamically redefine the current delete-catalog! function:

user=> (ns puppetlabs.puppetdb.scf.storage)
nil
puppetlabs.puppetdb.scf.storage=>
(def original-delete-catalog! delete-catalog!)
#'puppetlabs.puppetdb.scf.storage/original-delete-catalog!
puppetlabs.puppetdb.scf.storage=>
(defn delete-catalog!
  [catalog-hash]
  (log/info (str "Deleting catalog " catalog-hash))
  (original-delete-catalog! catalog-hash))
#'puppetlabs.puppetdb.scf.storage/delete-catalog!

Frequently Asked Questions

Detail the architecture of the Puppet.

A Master-Slave architecture is used by Puppet. To create a secure connection, the puppet slave must make a request to the puppet master. Along with a request for a slave certificate, the puppet master transmits the master certificate. Then, the puppet slave sends the puppet master the slave certificate along with a data request. Following receipt of the request, the puppet master pushes the configuration to the puppet slave.

Which command is used to run Puppet on demand from the CLI

The puppet job run command is used to run Puppet on demand from the CLI.

What is Puppet used for?

To automate and centralise configuration management, it is a free management solution. One of the most popular configuration management tools for deploying, setting up, and administering servers is Puppet.

Conclusion

Having studied basic operations of the PuppetDb database, in this article we learned about some of its advanced features, like how to use it to check node status, and how to maintain and tune the performance of monitors or deactivate it. We also learned about using Puppet CLI, Exporting and anonymizing data, and Debugging with remote REPL.

Check out this problem - Connect Nodes At Same Level

If you wish to enhance your skills in Data Structures and Algorithms, Competitive Programming, JavaScript, etc., you should check out our Guided path column at Coding Ninjas Studio. We at Coding Ninjas Studio organize many contests in which you can participate. You can also prepare for the contests and test your coding skills by giving the mock test series available. In case you have just started the learning process, and your dream is to crack major tech giants like Amazon, Microsoft, etc., then you should check out the most frequently asked problems and the interview experiences of your seniors that will surely help you in landing a job in your dream company.

Do upvote if you find the blogs helpful.

Happy Learning!