Table of contents
1.
Introduction
2.
Profiler Interface
3.
Profiling Applications
3.1.
Java Applications
3.2.
External Applications
4.
Measure App Performance
5.
Flame graphs
5.1.
Focusing Flame graphs
5.2.
Comparing Profiles
5.3.
View historical trends
6.
Frequently Asked Questions
6.1.
What are some of the errors that occur while using a Python profiling agent?
6.2.
What roles and permissions can be set to access the profiling activities in Google Cloud?
6.3.
How to set the range of time for which the profiling data is displayed?
7.
Conclusion
Last Updated: Mar 27, 2024

Cloud Profiler

Author Yashesvinee V
0 upvote

Introduction

Google Cloud Operations provides various managed services for applications running on Google Cloud and beyond. These include integrated monitoring, logging, and tracing. Cloud Profiler is a statistical profiler that steadily gathers CPU usage and memory-allocation information from the applications in production. The collected data is sent to the application's source code, helping users identify the parts that consume the most resources. Cloud Profiler is supported for Compute Engine, App Engine, GKE, and other applications running on-premises.

Profiler Interface

The charts created by the Profiler UI can be used to help identify the performance bottlenecks. The Profiler library samples performance traits and produces reports, which can then be analysed with the Profiler UI.

After an agent collects some profiling data, the Profiler interface shows the statistics for CPU and memory usage correlate with areas of an application. The Cloud Profiler interface provides a flame graph view and a history view as its graphical elements. The flame graph includes the relative resource usage information, while the history view displays the daily average resource usage for selected functions. Profile data is retained for 30 days, allowing users to analyse performance data for periods up to the last 30 days. Profiles can be downloaded for long-term storage.
 

Profiler Interface

Source

The Profiler Interface displays data majorly in the form of flame graphs. The interface includes buttons for various purposes. The list button helps view the different functions and their metric consumption. The history button displays the historical trends of the past 30 days. There is a filter button to control the visualisation of the selected profiles. The interface also includes a download button to download the displayed profile to the local system and options to redirect to the official documentation for help. 

Profiling Applications

Continuous profiling of applications in production is an effective way to find where resources like CPU utilisation and memory are consumed. Cloud Profiler supports profiling based on the language in which a program is written. We use a profiling agent to instrument an application to capture profile data. The profiling agent collects the data, which can be viewed on the console interface in Google Cloud. It is installed on the virtual machines where the application runs. The agent comes as a library that is attached to the application at runtime.

Java Applications

The profile types supported for Java are CPU time, Heap and Wall time. 

Supported language versions are OpenJDK and Oracle JDK for Java 8, 11 or later.

Supported environments include Compute Engine, GKE, App Engine Flexible and Standard Environment, Dataproc and other external environments.

An underlying Profiler API is necessary to use a Profile Agent. We can enable this API by running the following command on gcloud CLI.

gcloud services enable cloudprofiler.googleapis.com.

The Profiler Agent can be installed according to the application's environment. To get a list of all the agent versions, run the command.

gsutil ls gs://cloud-profiler/java/cloud-profiler-*

The next step is to load the agent. Run the Java application and specify the agent-configuration options. To configure the profiling agent, insert an -agent path tag while starting the application. 

-agentpath:INSTALL_DIR/profiler_java_agent.so=OPTION1,OPTION2,OPTION3

We specify a service-name argument and an optional service-version argument to configure it. The service name allows the Profiler to collect profiling data for all replicas of that service. 

External Applications

Although the application and the Cloud Profiler agent run outside Google Cloud, the Cloud Profiler interface is required to analyse the profiling data. This requires a Google Cloud Project.

Step 1: Create a Google Cloud project and enable the Profiler API.

Step 2: Obtain the profiling agent's credentials when uploading profiles.

Step 3: Configure and allow the agent to use the credentials and the ID of the Google Cloud project.

We can use a service account to enable the agent. The account must have the roles/cloudprofiler.agent role to write profiling data. To use application default credentials, run the command.

-agentpath:INSTALL_DIR/profiler_java_agent.so=OPTION1,OPTION2,OPTION3

Link the agent to the Cloud Project. The profiling agent must be configured to specify the ID of the Google Cloud project to upload profiles. This requires an additional Java agent configuration flag, cprof_project_id, on Java invocation.

-cprof_project_id=PROJECT_ID

Measure App Performance

Calculating the performance of production systems is tough and complex. Test environments usually fail to replicate the pressures on a production system. Using Cloud Profiler greatly reduces the complexity of this task.

Let’s consider a Go program that creates a CPU-intensive workload to provide data to the profiler.

git clone https://github.com/GoogleCloudPlatform/golang-samples.git
cd golang-samples/profiler/profiler_quickstart

Step 1: Go to the directory that contains the code. Start the program and let it run.

go run main.go

The program is designed to load the CPU as it runs. It is configured to use the Profiler to collect profiling data and periodically saves it. After a few minutes, we receive an indication that the profiler has started. In the profiler Interface, we can observe an array of controls and a flame graph for exploring the profiling data.
 

Profiler

  • The grey frame represents the entire executable. This accounts for 100% of the consumed resources.
     
  • The green main frame represents the Go runtime.
     
  • The orange main frame is the main routine of the sample program.
     
  • The orange busyloop frame is called from the sample's main.
     
  • The orange main.load frame is called from the sample's main.

Flame graphs

They help visualise hierarchical data and stack traces of profiled software so that the most frequent code paths can be identified quickly and accurately. Each rectangle represents a stack frame. The wider a frame is, the more often it was present in the stacks. Flame charts put the passage of time on the x-axis. This means that time-based patterns can be studied using flame graphs. Flame graphs efficiently use screen space by representing much information in a compact and readable format.

The following diagram shows how a  tree can be converted into a flame graph. Each tree node is represented with a frame.

Conversion of a tree to a flame graph

The frame width is a relative measure of that function's total CPU time. ‘Y’ represents the total CPU usage of function f1. The empty space below a frame is the relative measure of the self CPU time for the function in the frame. ‘X’ represents the self CPU usage of f1.

frames in a flame graph

Focusing Flame graphs

The focus filter option present in the Profiler interface can be used to select functions. The flame graph displays the code paths specific to that function. A focused graph helps analyse the aggregate resource consumption of a given function called from multiple places. It can also analyse the proportion of time spent for different callers of the function. The graph built by the Focus filter effectively creates two flame graphs for a particular function and joins them together. The graph includes a separate frame for a focus function called Sort. It is a highlighted and full-width frame. The bottom half of the Sort frame assumes the starting point of a standard flame graph and shows all of its callees. The top half of the graph shows the callers of Sort with the callees hidden.

Selecting a frame in a focused graph redraws the flame graph displaying the frame's call stack in more detail. Users can set focus filters by using the graph, the focus list or the filter bar present in the interface. The focus filter can be removed by clicking Close on the filter.

Comparing Profiles

Cloud Profiler allows users to visually compare two profiles of the same type from the same service in a project. Profiles may differ by ending times, zones, service versions or weight.

We specify the parameters for an original profile and a compared profile to set up a comparison. The comparison type is set from the  Compare To menu. Fields like Timespan, End time, Profile type and service apply to the profiles set for comparison.

A comparison graph differs from the standard flame graph in terms of colours, block size and metric information. The colours in a comparison graph represent the difference between the total metric consumption of the function in the original and compared profile. Significant differences in consumption values between the two profiles display more saturated colours. The size of the function blocks indicates the relative average consumption of the metric being analysed. We can turn off comparison mode by setting the value of Compare to None.

View historical trends

The Cloud Profiler’s history view displays data for the most recent 30 days. Each line in a history chart shows a function's resource usage history. The Value type menu can display profile data as a percentage of the resource usage for all functions or as the absolute value in the metric's units. The Show up to menu allows users to configure the maximum number of functions to display. By default, it is set to 5 functions.

The chart title indicates if the chart shows self-usage or the total usage. It also identifies the resource being displayed. The chart legend lists the names of the functions on display.

Frequently Asked Questions

What are some of the errors that occur while using a Python profiling agent?

The NotImplementedError exception is thrown during the execution of the start function as the application runs in a non-Linux environment. The ValueError exception is thrown when the function's variables are invalid or can not be determined.

What roles and permissions can be set to access the profiling activities in Google Cloud?

Identity Access Management provides separate permission to create, list and modify profiles. Two different roles can be assigned to users, groups, and service accounts. The roles/cloudprofiler.agent for a Profiler agent and roles/cloudprofiler.user for a Profiler user.

How to set the range of time for which the profiling data is displayed?

The Timespan menu, the Now button, and the End time menu can be used to assign a time range. By default, Timespan is set to seven days, and End time contains the time when Profiler started and cannot be modified.

Conclusion

This blog discusses Cloud Profiler on GCP in detail. It describes the Profiler interface along with flame graphs. It also explains how to Profile applications and measure App Performance using Cloud Profiler.

Check out our articles on Cloud Logging in GCPMonitoring Agent and Identity Access ManagementExplore our Library on Coding Ninjas Studio to gain knowledge on Data Structures and Algorithms, Machine Learning, Deep Learning, Cloud Computing and many more! Test your coding skills by solving our test series and participating in the contests hosted on Coding Ninjas Studio! 

Looking for questions from tech giants like Amazon, Microsoft, Uber, etc.? Look at the problems, interview experiences, and interview bundle for placement preparations. Upvote our blogs if you find them insightful and engaging! Happy Coding!

Thank you

Live masterclass