Table of contents
1.
Introduction
2.
Troubleshooting and support
2.1.
PDB Architectural Overview
2.2.
Terminus
2.3.
Message Queue
2.4.
Command processing
2.5.
Database
3.
PDB Diagnostics
3.1.
PDB Logs
3.2.
PostgreSQL logs
3.3.
atop output
4.
Troubleshooting: Session Logging
4.1.
What is Session Logging?
4.2.
Foreground debugging
4.3.
Daemonized Debugging
4.4.
Caveats
5.
Frequently Asked Questions
5.1.
What is a puppet bolt?
5.2.
What is Puppet Enterprise?
5.3.
What are puppet manifests?
6.
Conclusion
Last Updated: Mar 27, 2024

Troubleshooting of PuppetDB

Career growth poll
Do you think IIT Guwahati certified course can help you in your career?
OG Image

Introduction

We assume that you already have an idea about the PuppetDb above. If not, then let us summarize it for you.

Puppet is a software configuration management and deployment tool that is open source. It is most commonly used on Linux and Windows to pull strings simultaneously on multiple application servers.

So in this article, you will learn about the Puppet Troubleshooting guide and understand its internals.

So let's begin!! 

puppet

Troubleshooting and support

PDB Architectural Overview

In the PDB, data passes via four parts. It is transferred from the terminus to a filesystem queue, which is later retrieved by the program, entered into the database, and stored there. The endpoint, the queue management program and the database often reside on three different machines.

Terminus

The terminus resides on the Puppet Server and redirects agent data to PDB in the form of "commands". PDB has four commands, as described in the command's documentation.

Message Queue

Currently, all commands transmitted from the terminal to PDB are delayed to a filesystem queue for processing at a later time, when they will be added to the database in roughly the same order as they were received.

Deferred messages were formerly saved in ActiveMQ, which combined them in vardir/opaque mq's binary files. Now that PDB uses stockpile to store messages, which saves each message as a regular file inside vardir/stockpile/cmd/q, it is possible to manually remove queue commands when PDB is not in use. However, it is unlikely to be essential.

The message names are made to provide some information about their content. For ex:

stockpile/cmd/q/167-1478013718598_facts_5_somehost.json


Contains a version 5 "replace facts" instruction received for certname "somehost" at 1478013718598 milliseconds after the epoch (1970-01-01 UTC). The command files are plain text JSON representations of the incoming commands, as implied by the.json extension, and the 167 is a stockpile sequence number.

This setup can quickly examine the queue using standard filesystem commands. For instance, something similar to this.

find cmd/q | grep -cE '^cmd/q/[0-9]+-[0-9]+_facts_'

should provide a count of "replace facts" commands in the queue, and something like this:

find cmd/q -printf "%s %p\n" | sort -n | tail

should list the largest commands in the queue.

Note that filesystem constraints may require changing the certname. At the moment, doing so entails changing the characters "","/", ":", and 0 to "-" and trimming the certname so that the UTF-8 encoding never goes over around 255 characters. An underscore and a hash of the complete certname will come after a truncated certname. For instance:

stockpile/cmd/q/167-1478013718598_facts_5_LONGNAME_HASH.json

As a result, PDB anticipates that the vardir filesystem will be able to accommodate all of your certnames, UTF-8 encoded, as filenames, with the obvious exception of any characters that, as previously noted, are transformed to dashes.

Although the actual queue message filename lengths will depend on your certnames, PDB expects the vardir filesystem to support filenames up to a maximum of 255 bytes. Filesystems like ext4 and xfs need to function properly.

You could also see some files in the queue with names that start with "tmp-" in addition to the message files. You can disregard these because they are merely temporary files generated during message transmission. They ought to be highly brief under normal conditions, but if they were to build up (likely a sign of another issue), PDB would try to clean them up when it restarted.

Command processing

The configuration-specified number of threads and PDB process the queue simultaneously. Different commands are handled according to their type:

  • store report: These commands are inexpensive, mainly inserting a row in the reports table.
  • replace catalog: When a replace catalogue command is requested, PDB will first see if the node's database already contains a more recent catalogue for that node. If so, the catalogue is thrown away and nothing is done. If not, PDB will do a diff between the catalogue that is currently in use and the catalogue that is stored in the database, inserting only the resources and edges that have changed.
  • replace facts: As key-value relationships between "paths" and "values," PDB stores facts. Value refers to the leaf value itself, and "path" refers to a particular path leading from a tree's root (such as a structured fact) to a leaf value. Every fact is conceptually stored as a tree. to demonstrate the point.
"foo" => "bar"

is stored as

"foo" => "bar"

while the fact

"foo" => {"a" => "bar", "b" => "baz"}

is stored as

"foo#~a" => "bar"
"foo#~b" => "baz"

For the array case, the fact

"foo" => ["bar", "baz"]

is stored as

"foo#~0" => "bar"
"foo#~1" => "baz"

For larger structures, the same guidelines apply recursively. PDB will check the fact at hand against the pathways and values in the database when it receives a replace facts command. It will add any new paths/values that are necessary and remove any invalidated pairings.

deactivate node: The deactivate node command is unlikely to be the cause of performance problems because it merely updates a field in the certnames table.

Database

PostgreSQL is used by PDB. The easiest method to become comfortable with the schema is to create an ERD diagram from your database and conduct your own research using the psql interactive terminal on an active instance. For this, DB Visualizer is a fantastic tool. Additionally, the PDB team is accessible to address any queries on the freenode channels #puppet and #puppet-dev as well as on the mailing list.

PDB Diagnostics

When any issue is encountered with PDB, the first priority should be inspecting and collecting the following:

  • PostgreSQL logs
  • Screenshot of PDB dashboard
  • PDB logs
  • atop output on PDB system

PDB Logs

Search the PDB logs for recurring errors that line up with the timing of the issue. Some errors that may show in the PDB logs are:

  • Database constraint violations: These will occasionally occur in most instals owing to concurrent command processing, but they usually signify a problem if they happen frequently and about several nodes. It should be noted that when a command fails to execute because of a constraint violation, it will be tried 16 more times over around a day, with the number of tries shown in the log. A single command having 16 retries suggests a problem with the command, not necessarily the programme.
  • Out of memory errors: PDB may crash if it receives a command that is too large for its heap. Raising the Xmx value in the JAVA ARGS item in /etc/sysconfig/puppetdb on Red Hat or /etc/defaults/puppetdb on Debian derivatives can easily remedy this. However, crashes caused by OOMs typically mean that PDB is being misused, therefore it's critical to identify and examine the commands that caused the crash to determine whether there was a Puppet misuse that might be fixed.

Blobs of binary data stored in catalogue resources, massive structured facts, and a report's abundance of log entries are the most frequent causes of OOMs during command processing. Out of memory errors should result in a heap dump in the log directory that ends in.hprof and contains the offending command.

PostgreSQL logs

They contain the given settings or sensible equivalents enabled in postgresql.conf prior to log examination:

log_line_prefix = '%m [db:%d,sess:%c,pid:%p,vtid:%v,tid:%x] '
log_min_duration_statement = 5000

Check the postgres logs for:

  • Postgres errors (marked "ERROR") coming from the puppetdb database
  • Slow queries

The biggest issue here is slow queries because most failures have already been recorded in the PDB logs. Deletes linked to garbage collection and searches for event counts and aggregate event counts commonly result in delayed inquiries (which will typically present in Puppet Enterprise as slow page loads on the event inspector). Since garbage collection deletes only occur seldom, any degree of lag is typically not a problem.

Insufficient memory allotted to PostgreSQL and database bloat are the two most frequent exacerbators of delayed queries associated to queries against PDB's REST API. In all scenarios, the initial step should be to copy the query from the log and determine the plan Postgres is using by examining the results of the psql statement explane analyse <query>; You can then use the pgstattuple module to look for obvious bloat because this will show you which actions and tables the query is spending the most time on:

select * from pg_stat_tuple('reports'); -- (in the case of reports)

The primary indicator of memory utilization will be queried with explain plans that include disk-based sorting. You might try running pgtune against your postgresql.conf and analyzing the configuration it selects to look at your options for optimization. The pgtune output assumes that PostgreSQL is the sole application running on your server, so it is important to remember that it is probably not ideal for your needs. It should not be seen as a prescription but more as a guidance. work_memshared_buffers, and effective_cache_size are normally the parameters to pay the most attention to.

atop output

To identify which system components are slowing down PDB, utilise above. Use your package manager to download above, and for complete documentation, see to the manpage. Disk, CPU, and memory use are displayed on the default page. You can access further details if any of them seem out of the norm by typing d, s, and m within above.

Troubleshooting: Session Logging

puppet

What is Session Logging?

The default log level for PuppetDB only includes successfully negotiated connections over HTTP or HTTPS. Sessions that are terminated before reaching the application-layer do not have a log record. Although this is typically desired behaviour, it could impede the ability to diagnose sessions that were anticipated to succeed but did not, or unexpected traffic that degraded the service but left no traces. These unsuccessful connections can be viewed and examined by turning on session logging.

Session logging can produce a lot of noise and may reduce the PuppetDB node's availability. It is advisable to turn it on when necessary and turn it off after troubleshooting is finished.

Foreground debugging

All logging, including session logging, is enabled when PuppetDB runs in the foreground. Despite being easy to put up, it is very noisy. Puppetdb foreground —debug must be run as root after stopping the daemonized service. In the output, a connection that is unable to negotiate will appear and resemble:

2016-01-05 01:09:31,132 DEBUG [qtp296414558-71] [o.e.j.s.HttpConnection]
javax.net.ssl.SSLHandshakeException: null cert chain
    at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1431) ~[na:1.8.0_60]
    at sun.security.ssl.SSLEngineImpl.checkTaskThrown(SSLEngineImpl.java:535) ~[na:1.8.0_60]
    at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:813) ~[na:1.8.0_60]
    at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:781) ~[na:1.8.0_60]
    at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624) ~[na:1.8.0_60]
    at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.fill(SslConnection.java:516) ~[jetty-io-9.2.10.v20150310.jar:9.2.10.v20150310]
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:239) ~[jetty-server-9.2.10.v20150310.jar:9.2.10.v20150310]
    at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) [jetty-io-9.2.10.v20150310.jar:9.2.10.v20150310]
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635) [jetty-util-9.2.10.v20150310.jar:9.2.10.v20150310]
    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555) [jetty-util-9.2.10.v20150310.jar:9.2.10.v20150310]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]

When troubleshooting is complete, cancel the foreground job (commonly ctrl+c/^C) and restart the daemonized service.

Daemonized Debugging

To selectively enable session logging, or to make it part of your permanent configuration, the file logback.xml inside the puppetdb directory (e.g. /etc/puppetlabs/puppetdb/logback.xml) must be edited. Inside the configuration element, add a logger element for org.eclipse.jetty.server.HttpConnection with a level of debug:

<configuration scan="true">
    # Existing content here
    <logger name="org.eclipse.jetty.server.HttpConnection" level="debug"/>
</configuration>

Restart the service. Failed connections will now log to puppetdb.log or puppetdb-access.log, depending on protocol, in the configured logdir (e.g. /var/log/puppetlabs/puppetdb/puppetdb.log and /var/log/puppetlabs/puppetdb/puppetdb-access.log).

Caveats

Still, only sessions that reach the java process will be logged by PuppetDB. Unseen attempts are thwarted by a firewall like iptables or directed at an IP address that PuppetDB is not listening to. For those session logs, check the OS or firewall logs.

The increased logging could significantly affect the node's load and, consequently, availability, particularly if the PuppetDB ports are exposed to the general public. This logging is not advised during regular operation, only during active troubleshooting.

Frequently Asked Questions

What is a puppet bolt?

An open-source orchestration tool called Bolt simplifies the labor-intensive manual tasks involved in maintaining your infrastructure.

What is Puppet Enterprise?

The commercial version of Puppet, called Puppet Enterprise (PE), is based on the Puppet platform.

What are puppet manifests?

The configuration information for all nodes or Puppet Agents that are written in the native Puppet language is contained in the Puppet Master. Puppet Manifests are the name for these documents. 

Conclusion

To conclude this blog, we specifically understood the causes and reasons for the troubleshooting problems occurring especially in puppetDb and ways to resolve them. We also understood its internals like the architectural views, terminus, PDB diagnostics, troubleshooting sessions, and session logging.

For more content, Refer to our guided paths on Coding Ninjas Studio to upskill yourself.

Do upvote our blogs if you find them helpful and engaging!

Happy Learning!

Thank you image

 

Live masterclass