Wire Formats in PuppetDB - Naukri Code 360

Introduction

Puppet is a software configuration management tool. Puppet can be installed on multiple systems linked together via a network. A master server node will be used to configure all the other nodes - to add software, create files, perform tasks, etc.

Like any network software, we need to have a proper format for communication between nodes. It needs to be a standard, well-defined format. Let's take a look at this in more detail.

Wire Formats

Wire Formats define how data is stored on the disk - i.e., in hardware. We have to define the character encoding, how numbers are stored (e.g., big endian or little endian), the format for objects such as Dates, etc.

They are called Wire Formats because it is assumed this data will eventually be sent over a network (i.e., on the wire).

While this was the technical definition, the generally accepted meaning of Wire Format is – simply a set of rules for how a document should look when it is being used as part of a particular software.

We can compare this with a simple format for Letters. We know that an address must be present at the top left. We know a date and a subject line must be present before the letter starts. This tells us how to write the letter in the correct and accepted format.

Wire Formats contain more technical details than that, and they are more rigidly enforced. Many different softwares define their own Wire Formats for various documents. If these formats are not followed - that document will not be accepted by the software.

How Puppet Uses Wire Formats?

Let us recall some of the different types of documents in Puppet.

Catalog - a JSON document that the master node compiles. This defines the desired state of an agent node.
Report - Puppet Agents enforce the changes required on their systems. They then create a report recording all the changes made.
Facts - Facts are information, such as the current state of the system, installed packages, etc. about Agent nodes. Agent nodes compile these facts themselves.

These are just a few examples. We already know that these documents have to be interchanged between nodes. Agents compile data about themselves into Facts, and send them to the Server. The Server sends the Catalog to Agents. Agents send the Report to the Server.

Thus Agent and Server nodes need to decide on a standard format for these documents (and all others) to make communication possible. This helps in removing ambiguity, parsing data, converting data into instructions, etc.

PuppetDB stores all these documents in the database so we can use them for analysis and other later things. This article will discuss the Wire Formats in which PuppetDB receives the documents. This may be slightly different than the Wire Formats in which the nodes exchange these documents.

Different Wire Formats in Puppet

Let us look at some different wire formats in Puppet. We will look at all the latest versions of the wire formats in this article.

Catalog Wire Format

We are looking at v9 of the Catalog Wire Format. The catalog should be serialized as a JSON document and encoded in strict UTF-8. Unless specified otherwise, null is not allowed anywhere.

The wire format is as follows:

 {
  "certname": <string>,
  "version": <string>,
  "environment": <string>,
  "transaction_uuid": <string>,
  "catalog_uuid": <string>,
  "code_id": <string>,
  "job_id": <string>,
  "producer_timestamp": <datetime>,
  "producer": <string>,
  "edges":
      [<edge>, <edge>, ...],
  "resources":
      [<resource>, <resource>, ...]
 }

Let's break each term down and understand it.

Certname

This is a string - it is simply the name of the node for which this catalog was created.

Version

This is a string. It is used to uniquely identify this catalog across time. Puppet generally calculates versions as the milliseconds since the epoch, for when this catalog was created. We can override this in configuration settings.

Environment

A string specifies the environment associated with the node when this catalog was compiled.

transaction_uuid

A string that is used to match the report to a Puppet run. When cached catalogs are not being used, this also matches the report to its corresponding catalog. This field may be set to null.

catalog_uuid

When cached catalogs are being used, this field is used to match the report with the catalog that generated it. This field may be set to null.

code_id

A string that is used to match the catalog with the Puppet code that was used to generate it. This may be something like a Git commit SHA.

job_id

This is a string that's used to match a report to a Puppet run that was initiated by the Puppet Enterprise Orchestrator

producer_timestamp

This marks the time when the catalog was submitted to PuppetDB by the Puppet Server. Puppet Server populates this field itself.

producer

The certname of the Puppet Server that sent this report to the PuppetDB.

Edges

This is a list of Edge objects. It should contain every relationship between any two resources in the node.

Edge objects are in the format:

{"source": <resource-spec>,
 "target": <resource-spec>,
 "relationship": <relationship>}

Where source and target are of the data type resource-spec. Source is the resource that needs to be managed first (i.e., before the target), and target is the resource that needs to be managed after the source.

Resource-Spec objects are defined as:

{"type": <string>,
 "title": <string>}

Where type is the type of resource (such as a file, or a user), and title is the resource's name. The title must be present in the catalog's resources list.

Relationship defines the relationship between these two resources. It can be one of the following strings:

contains - source contains target
before - source before target
required-by - source required by target (synonym of before, with extra functionality)
notifies - source notifies target
subscription-of - source is a subscription of target (synonym of notifies, with extra functionality)

Regardless of the relationship, source is always managed before target.

Resources

Contains a list of resource objects. It should contain every resource in the node.

resource object is defined as:

{"type": <string>,
 "title": <string>,
 "aliases": [<string>, <string>, ...],
 "exported": <boolean>,
 "file": <string>,
 "line": <string>,
 "tags": [<string>, <string>, ...],
 "parameters": {<string>: <JSON object>,
                <string>: <JSON object>,
                ...}
}

Type - the type of resource.

Title - The name of the resource.

Aliases - The list of aliases for the resource (other names for it).

Exported - If this resource is exported, i.e., if it is exported into an output that other nodes may also be able to access.

File - The manifest file in which this resource is defined.

Line - The line number of the manifest file that contains this resource definition.

Tags - Includes every tag the resource has.

Parameters -Includes a list of all of the attributes of the resource. Each resource type has a list of available attributes. E.g., a file resource will have attributes such as owner, content, etc.

Facts Wire Format

Facts should be JSON serialized and encoded in strict UTF-8. Unless specified, null is not allowed anywhere here. We are looking at the v5 facts format.

Facts are defined as:

{"certname": <string>,
 "environment": <string>,
 "producer_timestamp": <datetime>,
 "producer": <string>,
 "values": {
     <string>: <any>,
     ...
     },
 "package_inventory": [<package_tuple>, <package_tuple>, ...]
}

Certname

This is the certname that the facts are associated with.

Environment

Each node has an environment associated with it. The environment of the node to which these facts belong, when the facts were collected, is stored here.

Values

This is a JSON object that is a list of facts. Each key is the fact name, and the value is the fact value.

Producer_Timestamp

This is a timestamp value, indicating when this fact set was submitted to PuppetDB by Puppet Server.

Timestamps are of the format datetime. This is a string representing a date and time with a timezone. It is formatted based on recommendations given in ISO8601. UTC is in the format YYYY-MM-DDThh:mm:ss.sssZ. For non-UTC times, the Z can be replaced with ±hh:mm to signify the time zone. This same datetime format is followed throughout PuppetDB Wire Formats.

Producer

This is the certname of the Puppet Server that sent this fact set to PuppetDB. This field is explicitly allowed to be null.

Package_Inventory

This is an optional field. If present, it must be a list of values that follow this format:

[ "<package_name>", "<version>", "<provider>" ]

Package name is the name of the package, version is its version, and the provider is the service that provides this package.

Report Wire Format

A report is a JSON object, encoded using UTF-8. Unless specified otherwise, null is not allowed anywhere in the report. All values are mandatory unless specified, but values that are arrays may be empty arrays. We are looking at v8 report format.

It is defined as:

{
    "certname": <string>,
    "environment": <string>,
    "puppet_version": <string>,
    "report_format": <int>,
    "configuration_version": <string>,
    "start_time": <datetime>,
    "end_time": <datetime>,
    "producer_timestamp": <datetime>,
    "producer": <string>,
    "resources": [<resource>, <resource>, ...],
    "metrics": [<metric>, <metric>, ...],
    "logs": [<log>, <log>, ...],
    "transaction_uuid": <string>,
    "catalog_uuid": <string>,
    "code_id": <string>,
    "job_id": <string>,
    "cached_catalog_status": <string>,
    "status": <string>,
    "noop": <boolean>,
    "corrective_change": <boolean>,
    "noop_pending": <boolean>,
    "type": <string>
}

certname

The certname that this report is associated with.

environment

The environment associated with the node (for which this report is being generated) at the time of the report's generation.

puppet_version

The version of Puppet that was used to generate the report.

report_format

This is the version number of the report format used to generate this report. The current latest version is v8 (the one we are looking at). This is a constant, generated by Puppet.

configuration_version

This is a value generated by Puppet. This is used for matching a catalog (for a node) to a Puppet run (an instance when Puppet code ran and made changes to the agent). This helps in tracking which run of Puppet resulted in the generation of which catalogs and reports.

start_time

This denotes the time on the Agent node when the Puppet run started.

end_time

This denotes the time on the Agent node when the Puppet run ended.

producer_timestamp

This marks the time when the catalog was submitted to PuppetDB by the Puppet Server. Puppet Server populates this field itself.

producer

The certname of the Puppet Server that sent this report to the PuppetDB.

transaction_uuid

A string that is used to match the report to a Puppet run. When cached catalogs are not being used, this also matches the report to its corresponding catalog. This field may be set to null.

catalog_uuid

When cached catalogs are being used, this field is used to match the report with the catalog that generated it. This field may be set to null.

code_id

A string that is used to match the catalog with the Puppet code that was used to generate it. This may be something like a Git commit SHA.

job_id

This is a string that's used to match a report to a Puppet run that was initiated by the Puppet Enterprise Orchestrator

status

The status of the Puppet run

noop

This is a boolean that reflects whether the report was generated from a noop Puppet run. A noop Puppet run is a run that allows us to view what changes would be made by a Puppet run, without actually making those changes.

noop_pending

Boolean - this indicates whether this report contained any "noop" events. This field may be set to null

corrective_change

This is a boolean that reflects whether this report contained any changes to correct drifts in configuration. This field may be set to null.

type

This is a String that tells us the source of the report. This may be either agent or plan. If this is field is absent, it is set as the default value - agent.

resources

An array of resource objects, which are of the following form:

{
  "timestamp": <timestamp (from agent) when the resource was managed>,
  "resource_type": <type of resource>,
  "resource_title": <title of resource>,
  "skipped" : <boolean for whether or not the resource was skipped>,
  "events" : [<event>, <event>, ...]
  "file": <manifest file containing resource definition>,
  "line": <line in manifest file on which resource is defined>,
  "containment_path": <containment hierarchy of resource within catalog>
}

The values of containment_path,file, and line may be set to null.

The meanings are present in the definition itself. The events array is a list of events, which take the form:

{
  "status": <status of event (`success`, `failure`, `noop`)>,
  "timestamp": <timestamp (from agent) at which event occurred>,
  "property": <property/parameter of resource on which event occurred>,
  "name": <name of resource on which event occurred>,
  "new_value": <new value for resource property>,
  "old_value": <old value of resource property>,
  "message": <description of what happened during event>,
  "corrective_change": <flag indicating whether the event corrected system drift>
}

The values of new_value,old_value,message,corrective_change,property, and status may be set to null.

metrics

This is a list of metric objects, which take the following form:

{
  "category" : <category of metric ("resources", "time", "changes", or "events")>,
  "name" : <name of the metric>,
  "value" : <value of the metric (double precision)>
}

This field may be set to null.

logs

This is a list of log objects, which take the following form:

{
  "file" : <file of resource declaration>,
  "line" : <line of resource declaration>,
  "level" : <log level>,
  "message" : <log message>,
  "source" : <log source>,
  "tags" : [<resource tag>],
  "time" : <log line timestamp>
 }

Deactivate Node Wire Format

Not only documents - but commands have their wire format too. A command to deactivate a node, when stored in PuppetDB, has its own wire format. We are looking at v3, which is JSON serialized and UTF-8 encoded.

The format is:

 {
  "certname": <string>,
  "producer_timestamp": <datetime>
 }

certname

The certname of the node for which the catalog has been compiled. This is the node to be deactivated.

producer_timestamp

The time when this command was submitted by Puppet Server to PuppetDB. This is optional but highly recommended. It helps in resolving conflicts - for e.g if another Puppet Server sent a different command regarding the same node. Precedence is determined based on the nature of the commands.

Frequently Asked Questions

What if I'm using old versions of wire formats?

You can certainly do so, and Puppet documentation has the details about them as well. However, it is recommended that you upgrade to the newer versions as quickly as possible. They contain improvements that help in increasing performance, security, and reliability.

Why does it matter if the Server's time is used or the Agent's - won't they be synced?

It is always tried to keep nodes on a network synchronized in time. However, it is not always possible to do so exactly. To sync clocks, we would have to communicate within nodes - which requires us to estimate the time needed for the signals to propagate. This cannot always be measured reliably. With computerized systems, even a millisecond can make a difference. Hence it is important to define which node we are taking the time value from.

Do nodes communicate in the same wire formats as PuppetDB?

The communication between nodes happens in similar, well-defined wire formats. But - they are not always exactly the same. They may have minor or even bigger differences. Puppet Server is responsible for converting the documents that it received via a communication to the wire formats accepted by PuppetDB before they are sent there.

Conclusion

This blog has explored what wire formats are and what their role is in communication. We have also seen how Puppet uses them to communicate and store data. We went through detailed examples of wire formats for Catalogs, Reports, Facts, etc.

We hope you leave this article with a broader knowledge of Puppet, Wire Formats and network communications. We recommend that you explore our different articles on these topics as well, such as:

You can practice questions on various problems on Coding Ninjas Studio, attempt mock tests, go through interview experiences, interview bundle, go along guided paths for preparations, and a lot more!

Keep coding, keep reading Ninjas.