Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
Do you know what troubleshooting is? How can you troubleshoot a tool like Puppet?
Puppet is a configuration management tool. It is used for managing the infrastructure on physical or virtual devices. It is a Ruby-based open-source software. Troubleshooting is a type of problem-solving frequently used to fix broken components or operations on a machine or a system.
We will learn about the Advance concepts of troubleshooting in Puppet in this article. But before that, first, read basic concepts of troubleshooting in Puppet.
Troubleshooting the Databases
Troubleshooting issues with the databases supported by the console can use the strategies. Some of the common issues in databases include:-
Database taking too much space
The PostgreSQL database takes up a lot of space. autovacuum=on helps PostgreSQL to prevent the databases from growing large. By default, routing vacuuming is enabled. So the solution to this issue is to verify whether the autovacuum is on or not.
PostgreSQL buffer memory causes failure in the installation
The buffer memory of PostgreSQL sometimes causes the installation to fail. It happens when installing PE on machines with large RAM. The pgstartup.log contains the error below if this issue is present:-
FATAL: could not create shared memory segment: No space left on the device
DETAIL: Failed system call was shmget(key=5432001, size=34427584512,03600).
When this issue occurs, then do the following steps:-
Set the shmmax kernel to 50% of the total RAM.
Divide the shmmax setting value by the page size to get the shmall kernel setting value. Run the command getconf PAGE_SIZE to confirm the page size.
Run the following command and set the new kernel:-
The default port for PuppetDB communications is 8081. Sometimes, this can conflict with other services (like McAfee ePolicy Orchestrator). Install the PuppetDB with the non-default port in text mode. It is mentioned in the pe.conf file with the puppet_enterprise::puppetdb_port parameter.
Generation of Ruby Errors
When the puppet apply is incorrectly configured, then the puppet resource stops functioning and returns an error. This error is a Ruby run error.
You need to make changes in the routs.yaml file. It will help you to connect the puppet apply correctly. It is without affecting other functions.
Troubleshooting SAML connections
SAML stands for Security Assertion Markup Language. It is an open standard for authentication. It is based on XML (Extensible Markup Language) format. It transfers authentication data between the Identity provider(IdP) and Service provider(SP).
Some common errors and issues occur when the connection is established between SAML IdP to PE. For example,
Failed Redirects: This occurs when mismatched URLs are present between the PE and the identity provider. In this case,
If the redirection is failing from IdP to PE, you can correct the mismatched URLs in the SAML configuration. These SAML configurations are at your identity provider's end.
And if the redirection is failing from PE to IdP, you can correct the URLs in PE’s SAML configuration.
Rejected Communications: If either PE or IdP is rejecting communications or returning an error, you can check the console-services.log file. This usually happens because of mismatching certificates.
Failed user-group binding: In case the users are not able to bind with their groups or the permissions are missing,
Ensure the attribute bindings values in your Idp and PE SAML configuration are the same.
Make sure that the group export is incorrect in the IdPs configuration.
Troubleshooting backup and restore
In case of backup and restore failure, you can check for the following issues:
If the puppet-backup create command is failing with the error
This happens in case of any gem present on the Puppet server is not installed on the agent environment. This agent environment is on the primary server.
You can check backup logs to find which gem is causing the issue. Then you can install the missing or incorrectly versioned gems on the environment.
Suppose the puppet-backup restore command is failing with errors. The error is about a duplicate operator family.
You have to log in to your PostgreSQL instance using this command:
sudo su - pe-postgres -s /bin/bash -c "/opt/puppetlabs/server/bin/psql pe-rbac"
After that, you can run the following commands:
ALTER EXTENSION citext ADD operator family citext_ops using btree;
ALTER EXTENSION citext ADD operator family citext_ops using hash;
Now you can exit the PostgreSQL shell and re-run the backup utility.
Troubleshooting Code Manager
Code manager depends on the proper coordination between multiple components. These components include source control, the file sync service, and r10k.
If you are having trouble working with Code manager, you can check these components:
Code Manager logs
The code manager logs to the puppet server
/var/log/puppetlabs/puppetserver/puppetserver.log
Check Code Manager's Status
We can use the puppet-code status command to verify that the file sync and the code manager are responding correctly.
This command might return some errors. Like:
The code manager couldn't connect to the server
Code Manager reports invalid configuration
File sync storage service reports unknown status
Test the connection to the control repository
The control repository helps the user to control the existence of the environments. It also ensures the correct versions of all the required modules are in the environment.
Run the following command to see whether the code manager can connect with the control repo:
puppet-code deploy --dry-run
It will return a list of all the environments that are present on the current repo. If there is any error with the connection, the command will return the following message:
Unable to determine current branches for Git source
Check the Puppetfile for errors
You can check the Puppet file of the environment for syntax errors. You can also verify that each module that is listed is present. For this, you will be required to have a temporary copy of the Puppetfile.
If you want to check Puppetfile syntax, run the r10k puppetfile check within the temporary directory. The command returns Syntax OK if the syntax is correct. Perform a test installation to know the configuration of sources in your Puppetfile. Run the command given below in your temporary directory:
For the full deployment of r10k based on the r10k.yaml file, run the command below. The r10k.yaml file is the file used by Code Manager. This test only writes to the code staging directory. It doesn't trigger file sync and is only used for ad-hoc testing. Command for test deployment:-
When the command succeeds, the directory /etc/puppetlabs/code-staging is populated with directory-based environments. It is also populated with all necessary modules.
Monitor /v1/deploys logs
When you experience deployment errors, you can monitor logs by calling the endpoint. It can be triggered by the POST/v1/deploys webhook. Open the terminal window to run the command to monitor the console-services.log file.
Code deployment sometimes takes a long time if the environments are heavily loaded. Then the system can also be time out before completing the deployment. You must increase the timeouts_deploy setting when the deployments are timing out early. It would help if you also increased the following parameters
timeouts_shutdown
timeouts_sync
timeouts_wait
File sync stops when Code Manager tries to deploy the code
The deployment of Code Manager code involves accessing multiple small files. When you store your Puppet code on the storage of network-attached, then the performance is poor. It is due to problems with backend hardware or a slow network. Tyr the following if you are experiencing performance issues:-
Adjust the network to support a large number of small files.
Store the Puppet code on a local or directly connected storage device.
Classes are missing after deployment
Try one of the steps when a class is not available in the PE console, even after the successful code deployment:-
Run Puppet to refresh classes
Refresh your browser
Verify whether the environment directory exists on disk or not
Check the environment of your node group setting. You need to redeploy environments or run Puppet after changing environment settings.
Troubleshooting Windows
There are issues in Window PE at the time of installation. For example,
Failed installations
Failed upgrades
Problems applying manifests, etc
Installation fails
When Puppet agent installation fails on the Windows node, then check for the following issues:-
When the installation package is not accessible
When the installation is attempted without the admin privileges
Upgrade fails
Existing entries in the puppet.conf file is overwritten by the Puppet agent .msi package. Puppet applies the new value in the $confdir\puppet.conf file if you reinstall or upgrade the agent with a changed primary server hostname. Use the primary server hostname you set when you first installed the Windows agent when upgrading the agent.
Errors when doing Puppet agent run or applying a manifest
You can see the following issues if your manifests are not applied on Windows nodes. These can also be seen if manifests are applied incorrectly.
File or path separators are incorrect
When cases are inconsistent
When the execution of shell built-ins is not done
When scripts of PowerShell are not executed
Instead of short names, services are referenced by display names
Error messages
You can encounter the following error messages if you are using Puppet on Windows nodes:-
Forge connection or SSL certificate errors
Service 'Puppet Agent' (puppet) failed to start. Verify that you have sufficient privileges to start system services.
It cannot run on Microsoft Windows without the <GEM_NAME> gem.
/Stage[main]//Scheduled_task[task_system]: Could not evaluate: The operation completed successfully.
/Stage[main]//Exec[C:/tmp/<FILE_NAME>.exe]/returns: change from notrun to 0 failed: CreateProcess() failed: Access is denied.
getaddrinfo: The storage control blocks were destroyed.
Could not request certificate: The certificate retrieved from the primary does not match the agent's private key.
Could not send report: SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed. This is often because the time is out of sync on the server or client.
Could not parse for environment production: Syntax error at '='; expected '}.'
Logging and debugging
The Windows Event Log can be useful when troubleshooting problems with Windows nodes.
Run this command to halt and restart the Puppet service. It allows the Puppet agent to send --debug and --trace messages to the Windows Event Log:
Puppet uses a pull mode where agents poll the master every few minutes or seconds to retrieve site-specific and node-specific configurations.
Is Puppet easy to use?
Puppet is not an easy tool to manage. Its configurations use its language known as Puppet DSL(Domain Specific Language).
What port does Puppet use for communication?
Puppet's HTTPS traffic uses port 8140 for communication by default.
Why is Puppet used?
Puppet is an open-source software configuration management and deployment tool. It's most commonly used on Linux and Windows to pull the strings on multiple application servers simultaneously.
Why is Puppet used in DevOps?
Puppet is software that helps you in the management and automation of the configuration of servers.
Conclusion
In this article, we have extensively discussed Puppet's Advanced troubleshooting concepts. We have also explained troubleshooting the databases and SAML connections. Backup and restore troubleshooting, troubleshooting code manager, and more are also discussed in detail.
We hope this blog has helped you enhance your troubleshooting Puppet knowledge. If you want to learn more, check out our articles on Ansible vs. Puppet, DevOp's best things, DevOps tools, and reasons to build a career in DevOps.