Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
Authorize the Ops Agent
2.1.
Adding Credentials
2.2.
Creating a Service account
2.3.
Copying the private key to your instance
3.
Authorize the Ops Agent for Linux
4.
Configure the Ops Agent
4.1.
Configuration Model
4.2.
User-specified configuration
4.3.
Structure of logging receivers
4.4.
Logging processors
4.5.
Logging pipelines
4.6.
Metrics configurations
5.
Troubleshooting the Ops Agent
5.1.
Agent diagnostics tool for Linux VMs
5.2.
Agent fails to install
5.3.
AGENT is installed but not running
5.3.1.
Agent services are not running 
5.3.2.
Conflict with currently installed agents
5.3.3.
Agent is running, but data is not ingested.
5.3.4.
Is the agent sending logs to Cloud logging?
5.3.5.
Is the agent sending metrics to cloud Monitoring?
6.
Frequently Asked Questions
6.1.
Why is a service account preferred for authentication?
6.2.
What is a receiver in the configuration model?
6.3.
What is the default value for listen port of syslog receivers?
7.
Conclusion
Last Updated: Mar 27, 2024

Managing Ops Agent

Master Python: Predicting weather forecasts
Speaker
Ashwin Goyal
Product Manager @

Introduction

Ops agent acts as the primary agent for collecting telemetry from the user's compute engine instances. It combines logging and metrics into a single agent. It uses Fluent Bit for logs that support high-throughput logging and OpenTelemtery Collector for metrics. It can also be used for supporting the parsing of log files from third-party applications.

Authorize the Ops Agent

Before authorizing an ops agent, check your authorization scopes on compute engine using the command below.

curl --silent --connect-timeout 1 -f -H "Metadata-Flavor: Google" http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/scopes

Try to find one or more of the below authorization scopes in the output

https://www.googleapis.com/auth/logging.write
https://www.googleapis.com/auth/logging.admin
https://www.googleapis.com/auth/monitoring.write
https://www.googleapis.com/auth/monitoring.admin
https://www.googleapis.com/auth/cloud-platform

Adding Credentials

Authorization is defined as the process of determining what permissions an authenticated client has for a set of resources.

The following steps are involved for authorizing the ops agent on a VM instance:

  • Create a service account with the required private-key credentials and privileges in the google cloud project associated with the VM instance.
  • Copy the private-key credentials to the VM instance, where they act as Application default credentials for software running on the user's instance.
  • Install and restart the agent

Creating a Service account

For the process of authentication, which is basically the process of determining a client's identity, it is recommended to use a service account: an account associated with the user's Google cloud project as opposed to a specific user. A service account can be used regardless of whether the code is running on compute engine, app engine, or on-premise.

In order to create a service account, complete the create a service account procedure with the instructions mentioned below:

  • Choose the Google cloud project in which the service account is to be created. In the case of compute engine instance, select that project in which the instance was created.
  • From the Role drop-down menu, choose the following roles:
    • Monitoring > Monitoring Metric Writer
    • Logging > Logs Writer
  • Select JSON for the key type when creating the key

Copying the private key to your instance

Once the user has successfully created a service account, the user must copy the private key file to one of the below-mentioned locations on their VM instances so that the agent can recognize their credentials. Any file-copy tool can be used.

  • Linux only
/etc/google/auth/application_default_credentials.json
  • Windows only
C:\ProgramData\Google\Auth\application_default_credentials.json
  • For both Linux and Windows: Any location the user stores the variable in, GOOGLE_APPLICATION_CREDENTIALS. It must be visible to the agent's process.


If you have a Linux environment on both the workstation as well as your instance, then use the below file-copy instructions. When a service account is created, the private key credentials get stored on the workstation at a location that you saved in the variable CREDS:

Using the gcloud command line tool, find the [YOUR-INSTANCE-NAME] and [YOUR-INSTANCE-ZONE] in the google cloud console in the VM instances page:

REMOTE_USER="$USER"
INSTANCE=" [YOUR-INSTANCE-NAME]"
ZONE=" [YOUR-INSTANCE-ZONE]"
gcloud compute scp "$CREDS" "$REMOTE_USER@$INSTANCE:~/temp.json" --zone "$ZONE"


Run the below commands on your Compute Engine instance

GOOGLE_APPLICATION_CREDENTIALS="/etc/google/auth/application_default_credentials.json"
sudo mkdir -p /etc/google/auth
sudo mv "$HOME/temp.json" "$GOOGLE_APPLICATION_CREDENTIALS"
sudo chown root:root "$GOOGLE_APPLICATION_CREDENTIALS"
sudo chmod 0400 "$GOOGLE_APPLICATION_CREDENTIALS"
Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

Authorize the Ops Agent for Linux

  • Create a file or edit the following configuration file if it already exists:
/etc/systemd/system.conf
  • Add the below to the file.
DefaultEnvironment="GOOGLE_APPLICATION_CREDENTIALS=path_to_credentials_file"
  • Reload the environment variables
sudo systemctl daemon-reload
  • The below-mentioned command can be used to restart the agent on your VM instance
sudo service google-cloud-ops-agent restart

Configure the Ops Agent

Configuration Model

There is a built-in default configuration that an ops agent uses. This can't be directly modified but can be overridden by creating a file that is merged with the built-in configuration whenever the agent restarts.

The configuration contains the following building blocks:

  • Receivers: it is an element that describes what is collected by the agent.
  • Processors: it is an element that describes how the agent can modify the collected information.
  • Service: it is an element that links receivers and processors together in order to create data flows known as pipelines. The pipeline element can further contain multiple pipelines.

User-specified configuration

In order to override the built-in default configuration, the user can add new configuration elements to the configuration file. The user-specified configuration gets merged with the built-in configuration whenever the agent restarts. Put the configuration for the ops agent in the below files:

  • For Linux
/etc/google-cloud-ops-agent/config.yaml.
  • For Windows
\Program Files\Google\Cloud Operations\Ops Agent\config\config.yaml.

Structure of logging receivers

Each receiver has a unique attribute which is: RECEIVER_ID, and a type element. The valid types are:

  • Files: Collect logs by tailing files on the disk.
  • Flue_forward: Collects logs that are sent through the Fluent Forward protocol over TCP.
  • Syslog: Collects syslog sent via TCP or UDP.
  • Tcp: Collects logs that are in JSON format by listening to a TCP port.
  • windows_event_log: Collects Windows event logs
  • Systemd_journald: Collects logs from the systemd-journald service.


The receiver's structures look like this:

receivers:
  RECEIVER_ID:
    type: files
    ...
  RECEIVER_ID_2:
    type: syslog
    ...

Various configurations options are possible depending on the value of the type element, which is as follows:

  • files receivers:
    • include_paths: it contains a list of filesystem paths that are to be read by tailing each file. Wildcard (*) can also be used in the paths.
    • exclude_paths: optionally a list of filesystems path patterns to exclude from the set matched by include_paths.
  • fluent_forward receivers:
    • listen_host: It is an IP address to listen on, whose default value is 127.0.0.1
    • listen_port: It is a port to listen on, whose default value is 24224.
  • syslog receivers:
    • transport_protocol: It supports tcp and udp, but the default value is tcp.
    • listen_host: It is an IP address to listen on, whose default value is 0.0.0.0.
    • listen_port: It is a port to listen on, whose default value is 5140.
  • tcp receivers:
    • Format: log format. It is mandatory. Supported values are JSON.
    • listen_host: It is an IP address to listen on, whose default value is 127.0.0.1.
    • listen_port: It is a port to listen on, whose default value is 5170.

Logging processors

A set of processing directives are available in the processor's element, each identified by a PROCESSOR_ID. It is the duty of the processor to describe how the information that is collected by the receiver is to be managed.

Each processor needs to have a unique identifier and must include a type element. The valid types are:

  • parse_json: Parse JSON-formatted structured logs.
  • parse_multiline: Parse multiline logs.
  • parse_regex: parse the text-formatted logs via regex patterns in order to turn them into JSON-formatted structured logs.
  • exclude_logs: they match specified rules.
  • modify_fields: set/transform fields in log entries.


The processor's structure looks like this:

processors:
  PROCESSOR_ID:
    type: parse_json
    ...
  PROCESSOR_ID_2:
    type: parse_regex
    ...

Logging pipelines

Each pipeline can contain multiple pipeline IDs and definitions. Each of the pipeline definitions consists of:

  • receivers: It is required for new pipelines. The order of the receiver's IDs is irrelevant. Data from all of the listed receivers gets collected by the pipeline.
  • Processors: The order of the processor IDs is irrelevant. Each of the record is run through the processors in the listed order.

Metrics configurations

The configuration model defined above is also used by metrics:

  • receivers: it is a list of receiver definitions. It describes the source of the metrics. It can be shared among multiple pipelines.
  • Processors: it is a list of processor definitions. It describes how to modify the metrics which are collected by a receiver.
  • Service: it contains a pipelines section which in turn contains a list of pipeline definitions. A pipeline connects a list of receivers and processors to form the data flow.

Troubleshooting the Ops Agent

Agent diagnostics tool for Linux VMs

The agent diagnostics tool collects the critical local debugging information from the user's Linux VMs for the Ops agent, legacy logging agent, and legacy monitoring agent. Information like project info, VM info, agent configuration, agent logs, and agent service status comes under the debugging information, basically, any information that typically requires manual work to gather is a part of the debugging information. It also checks whether the local VM environment meets the requirements for agents to function properly.

Before filling a customer case for an agent on Linux VM, run the agent diagnostics tool and attach the information to the case after redacting any sensitive information.

The following command is used to retrieve the agent diagnostics tool

curl -sSO https://dl.google.com/cloudagents/diagnose-agents.sh
sudo bash diagnose-agents.sh

To locate the files with the collected info, follow the script execution output. Typically it is located in /var/tmp/google-agents directory unless the user has customized the output directory. 

Agent fails to install

Following are the common errors that may be encountered when running the installation script:

  • If the operating system is not supported, the error message will look like this:
https://packages.cloud.google.com/yum/repos/google-cloud-ops-agent-el6-x86_64-all/repodata/repomd.xml: [Errno 14] PYCURL ERROR 22 - "The requested URL returned error: 404 Not Found"
Trying other mirrors.
To address this issue, please refer to the below wiki article
https://wiki.centos.org/yum-errors
If the above article doesn't help to resolve this issue, please use https://bugs.centos.org/.
Error: Cannot retrieve repository metadata (repomd.xml) for repository: google-cloud-ops-agent. Please verify its path and try again
  • There may be a cloud logging agent or the cloud monitoring agent installed on the VM, which will conflict with the new agent. Then the error message looks like this:
Error:
Problem: problem with installed package stackdriver-agent-6.0.5-1.el8.x86_64 - package google-cloud-ops-agent-0.1.0-1.el8.x86_64 conflicts with stackdriver-agent provided by stackdriver-agent-6.0.5-1.el8.x86_64

To fix this error, the following can be done:

  • Save the custom configuration for the Cloud Monitoring agent and Cloud logging agent.
  • Try uninstalling the old Cloud Monitoring agent and cloud logging agent. 

AGENT is installed but not running

Agent services are not running 

If in case the agent service is not running as expected, then you might see the following status:

$ sudo service google-cloud-ops-agent status
● google-cloud-ops-agent.service - Google Cloud Ops Agent
  Loaded: loaded (/lib/systemd/system/google-cloud-ops-agent.service; enabled; vendor preset: enabled)
  Active: inactive (dead) since Wed 2021-06-30 21:20:43 UTC; 6s ago

To fix this, use the following command:

sudo service google-cloud-ops-agent start


Conflict with currently installed agents

If the VM already has a Cloud logging agent or the cloud monitoring agent installed, then their configuration will conflict with the new agent's configuration. To fix this error, you have two options:

  • Disable the conflicting section of the Ops agent configuration file.
  • Disable the conflicting cloud logging agent or the cloud monitoring agent.


Agent is running, but data is not ingested.

Use the metrics explorer in order to query the agent uptime metric. Also, verify that the agent component google-cloud-ops-agent-metrics or google-cloud-ops-agent-logging is writing to the metric.

  • Click on Monitoring from the google cloud console.
  • Click on the metrics explorer from the navigation pane.
  • Select the MQL tab
  • Click run, after entering the following query
fetch gce_instance
| metric 'agent.googleapis.com/agent/uptime'
| align rate(1m)
| every 1m


Is the agent sending logs to Cloud logging?

The below steps require to SSH into the VM. To check if the logging module is running, use the following commands:

sudo systemctl status google-cloud-ops-agent"*"


Check the logging module log.

Logging module logs can be found at /var/log/google-cloud-ops-agent/subagents/*.log for Linux and C:\ProgramData\Google\Cloud Operations\Ops Agent\log\logging-module.log for Windows. In case there are no logs, it means that the agent service is not running properly. 

  • You might get 403 error when writing to the logging API. For example, 
[2020/10/13 18:55:09] [ warn] [output:stackdriver:stackdriver.0] error
{
"error": {
  "code": 403,
  "message": "Cloud Logging API has not been used in project 147627806769 before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/logging.googleapis.com/overview?project=147627806769 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.",
  "status": "PERMISSION_DENIED",
  "details": [
    {
      "@type": "type.googleapis.com/google.rpc.Help",
      "links": [
        {
          "description": "Google developers console API activation",
          "url": "https://console.developers.google.com/apis/api/logging.googleapis.com/overview?project=147627806769"
        }
      ]
    }
  ]
}
}

To fix this error, enable the logging API and set the logs writer role.

  • There might be a quota issue for the logging API. It can be fixed by raising the quota or reducing the log throughput. The below error might appear in the module log
{"error":"invalid_request","error_description":"Service account not enabled on this instance"}

This happens when you have deployed an agent with no service account.


Is the agent sending metrics to cloud Monitoring?

Check the metrics module log

The metrics module logs can be found in syslog. In case there are no logs, it indicates that the agent service isn't running properly. 

  • A PermissionDenied error might occur while writing to the Monitoring API. This occurs when the Ops agent doesn't have any proper configuration. To fix this error, the user can enable the monitoring API and further set the Monitoring metric writer role.
  • A ResourceExhausted error might occur while writing to the monitoring API. This occurs when the project is hitting the limit for any Monitoring API quotas. To fix this error, the user needs to either raise the quota or reduce the metrics throughput.
  • The below error might appear in the module log
{"error":"invalid_request","error_description":"Service account not enabled on this instance"}
This indicates that the user has deployed the agent with no service account.

Frequently Asked Questions

Why is a service account preferred for authentication?

A service account is preferred for authentication as it is a google account that is associated with a google cloud project rather than a specific user.

What is a receiver in the configuration model?

A receiver is an element that describes what is collected by the agent.

What is the default value for listen port of syslog receivers?

The default value for listen port of syslog receivers is 5140.

Conclusion

In this article, we have extensively discussed how to manage the Ops Agent

After reading about how to manage the Ops Agent, are you not feeling excited to read/explore more articles on Google Cloud? Don't worry; Coding Ninjas has you covered. To learn about GCP certification: Google Cloud Platform, the difference between AWS, Azure & Google Cloud, and which platform is best: AWS vs. Google Cloud.

If you wish to enhance your skills in Data Structures and AlgorithmsCompetitive ProgrammingJavaScript, etc., you should check out our Guided path column at Coding Ninjas Studio. We at Coding Ninjas Studio organize many contests in which you can participate. You can also prepare for the contests and test your coding skills by giving the mock test series available. In case you have just started the learning process, and your dream is to crack major tech giants like Amazon, Microsoft, etc., then you should check out the most frequently asked problems and the interview experiences of your seniors that will surely help you in landing a job in your dream company. 

Do upvote if you find the blogs helpful.

Happy Learning!

Thank you image
Previous article
Overview of Ops Agent
Next article
Overview of Monitoring Agent
Live masterclass