Configuring Disaster Recovery in Puppet

Introduction

Hello Ninjas, Welcome back to our another article on Puppet. Have you ever wondered what disaster recovery in Puppet is and how you can configure it? If yes, then you are at the right place. In this article, we will learn about Configuring Disaster Recovery in Puppet.

But, before that, let's learn about Puppet.

Puppet

Puppet is a software configuration management tool. It is a platform to configure the system and software settings. Puppet has its declarative language to manage the settings. You do not need much knowledge of programming to use Puppet. Additional features include capacity control and scaling-down of systems and management over all configured machines, enabling automated propagation of changes made centrally.

Before proceeding with the blog, you can read Installing and Configuring Puppet Enterprise.

Disaster recovery

A replica of your primary server is made during disaster recovery. You can add disaster recovery to an installation with or without compilers, but you can only have one replica active at once. Installations that comply with FIPS do not support disaster recovery.

Disaster recovery architecture

The primary server's duplicate is not an identical replica. Instead, the replica makes exact copies of certain infrastructure parts and services. There is no replication of data or other customized setups.

Data can be written to a service on either the primary server or the replica, and the data is synchronized to both nodes when replication is read-write. Due to the lack of native data, some services, such as Puppet Server and the console service UI, are not duplicated.

Component or service	Type of replication	Activated when a replica is
Puppet Server	none	enabled
File sync client	read-only	enabled
PuppetDB	read-write	enabled
Certificate authority	read-only	promoted
RBAC service	read-only	enabled
Node classifier service	read-only	enabled
Activity service	read-only	enabled
Orchestration service	read-only	promoted
Console service UI	none	promoted
Agentless Catalog Executor service	none	promoted
Bolt service	none	promoted

What happens during failover

Failover occurs when the replica takes over tasks typically handled by the primary server.

Failover happens automatically. When disaster recovery is enabled, puppet runs are directed to the primary server. Runs are sent to the replica if the primary server cannot be reached at all or in part.

If the server, node classifier, or PuppetDB on the replica can't be accessed on the primary server, Puppet runs can utilize those resources in partial failovers. For instance, the agent runs using the primary server's Puppet Server but failover to the replica's node classifier if the node classifier on the original server fails. However, the server's Puppet Server is still active.

When failovers are effective:

Reporting and queries based on PuppetDB data
Catalog compilation
Scheduled Puppet runs
Viewing classification data using the node classifier API

When failovers are not effective:

Using the console
Deploying new Puppet code
Most CLI tools
Editing node classifier data

System and software requirements for disaster recovery

Component	Requirement
Operating system	All supported PE primary server platforms.
Software	After enabling a replica, you must use Code Manager to deploy code to both the primary server and the replica. You must use the default Puppet Enterprise node classifier to apply disaster recovery categorization to nodes. When you supply or enable a replica, Orchestrator must enable agents to get updates.
Replica	It must be an agent node without a specific function already assigned. A node can be decommissioned, its puppet packages uninstalled, and then it can be recommissioned as a replica. A compiler cannot, however, serve two purposes simultaneously, such as being both a compiler and a duplicate. It must possess the same hardware capabilities and specifications as your main server.
Firewall	To ensure that the replica can function as the primary server during failover, it must adhere to the same port specifications as the primary server.
Node names	The primary server and replica node names you supply must be resolvable domain names.
RBAC tokens	You need to have an admin RBAC token when using the provision, enable and forget commands of the puppet infrastructure. The puppet-access command can produce a token.

Load balancer timeout in disaster recovery installations

Timeouts is used in disaster recovery configuration to decide when to fail over to the replica. Agent connections may be terminated during failover if the load balancer timeout is less than the server and agent timeout.

Set the load balancer's timeout option to four minutes or longer to prevent timeouts. Compilers have adequate time to make the necessary PuppetDB and node classifier service queries during this period. Using options in the haproxy or f5 modules, you can modify the load balancer timeout setting.

Configure disaster recovery

Provision and enable a replica

When a replica is provisioned, certain elements and services from the primary server are copied to the replica. When a replica is enabled, most duplicated services and components are turned on, and agents and infrastructure nodes are given instructions on interacting in a failover situation.

Select the PE Agent > PE Infrastructure Agent group by clicking Node groups in the console's PE Infrastructure group.
Pinning load balancers to the group on the Rules tab is necessary if you administer your load balancers using agents.
By adding load balancers to the PE Infrastructure Agent group, you can ensure that they talk to the main server directly.
Locate the puppet_enterprise::profile::agent class on the Classes tab, then enter the following information.
Eliminate any settings made for pcp_broker_ws_uris.
Make the adjustments.
Run Puppet on all PE Infrastructure Agent group-classified agents.

Parameter	Value
manage_puppet_conf	If you enter true, the server list will be configured in the proper location and remain there throughout Puppet runs.
pcp_broker_list	The hostname for your primary server. Host names must include port 8142.
primary_uris and server_list	The hostname for your primary server. This setting assumes port 8140.

Promote a replica

Make sure the main server is indefinitely unavailable. Your agents may experience connection issues if they attempt to connect to two live primary servers if the primary server comes back online during the promotion.
Run puppet infrastructure to promote the replica as the root user on the replica. Promotion can take as long as it did to install PE initially. Don't change your classification or code while being promoted or after.
When promotion has finished, update any programs or configurations that use the old primary server, such as PE client tool setups, Code Manager hooks, and CNAME records.
Run Puppet to deploy the new configuration to the nodes or wait for the subsequent scheduled run.

Enable a new replica using a failed primary server

Run the command puppet infrastructure run enable_ha_failover with the following parameters on your promoted replica as the root user:

Host: The primary server's unsuccessful hostname. A fresh replica of this node is created.

Topology: Your environment's architecture may be mono (standard) or mono-with-compile (large).

Replication timeout secs: The time window during which provisioning and activation of the new replica may be finished without the command failing.

tmpdir: Path to the directory that should be used to upload and run temporary files.

Forget a replica

Replica deletion declutters classification and database state, preventing performance degradation over time.

When a replica node is destroyed, issue the forget command even if a replacement replica with the same name is intended.

Run the command puppet infrastructure forget <REPLICA NODE NAME> on the primary server as the root user.
Removing your secret key file from the replica is advised because doing so puts your security in danger. /etc/puppetlabs/orchestration-services/conf.d/secrets/keys.json is the location of the secret key file.

Reinitialize a replica

After provisioning, you can re-initialize your replica if you run into certain issues. Re-initializing causes the specified replica databases to be deleted and then recreated.

Reinitialization is not meant to address sluggish queries or sporadic errors. Only re-initialize your replica if it is broken or showing replication faults.

Frequently Asked Questions

What is Puppet?

Is Puppet a tool for ongoing monitoring?

It continuously checks the server for configurations, and if any are changed, it instantly switches the hosts' configuration to one that has been pre-defined. It has control over a large number of infrastructures, allowing centralized configurations to be applied to each one.

Why are puppets used?

To simplify and centralize version control, it is a free software application. A puppet is one of the most popular tools for managing configurations for installing, setting up, and managing servers.

What is Puppet Enterprise?

The commercial version of Puppet, called Puppet Enterprise (PE), is based on the Puppet platform. You may handle the configuration of thousands of nodes using both solutions. This is accomplished through open-source Puppet's intended state management.

Does puppet use SSL?

For all of its secure socket layer (SSL) communications, Puppet can use either leverage an existing external CA or its own internal PKI and certificate authority (CA) tools.

Conclusion

In the article, we learned about Configuring Disaster Recovery in Puppet. We hope this article will help you understand the concept of puppets. Check out our other blogs on the topic of Puppet:

Refer to our guided paths on Coding Ninjas Studio to learn about Data Structure and Algorithms, Competitive Programming, JavaScript, etc. Enroll in our courses and refer to our mock test available. Have a look at the interview experiences and interview bundle for placement preparations.

Happy Coding!