Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Last Updated: Mar 27, 2024

Cloud Composer

Leveraging ChatGPT - GenAI as a Microsoft Data Expert
Speaker
Prerita Agarwal
Data Specialist @
23 Jul, 2024 @ 01:30 PM

Introduction

Cloud Composer is a fully managed workflow orchestration service that allows you to build, schedule, monitor, and manage workflows span clouds and on-premises data centers.

Cloud Composer is based on the popular Apache Airflow open source project and is programmed in Python.

You can get the best of Apache Airflow without installing or managing it using Cloud Composer instead of a local instance. Cloud Composer enables you to quickly create Airflow environments and use Airflow-native tools such as the powerful Airflow web interface and command-line tools, allowing you to concentrate on your workflows rather than your infrastructure.

DAGs, workflows, and tasks

A workflow in data analytics is a set of tasks for ingesting, transforming, analyzing, or utilizing data. Workflows in Airflow are built with DAGs, or "Directed Acyclic Graphs."

The connection between DAGs and tasks

DAG check_liveness

A DAG is a collection of tasks we want to schedule and run, organized so that their relationships and dependencies are reflected. DAGs are created using Python scripts, which use code to define the DAG structure (tasks and their dependencies).

A DAG task can represent almost anything—for example, one task could perform any of the following functions:

  • Data preparation for ingestion
  • API monitoring
  • Sending an email
  • Running pipeline

A DAG is not concerned with the function of each constituent task; instead, it should ensure that each task is completed at the appropriate time, in the proper order, and with the relevant issue handling.

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

Environments for Cloud Composer

Qwiklabs - Cloud Composer Qwik Start - Console [GSP261] - YouTube

To run workflows, you must first set up an environment. Because Airflow relies on numerous microservices to function, Cloud Composer sets up Google Cloud components to power your workflows. A Cloud Composer environment is made up of these components.

Environments are self-contained Google Kubernetes Engine-based Airflow deployments. They communicate with other Google Cloud services via Airflow connectors.

Configurations of the environment architecture

The following architecture configurations are possible in Cloud Composer 1 environments:

  1. Public IP architecture
  2. Private IP architecture
  3. Private IP with Domain restricted sharing (DRS) architecture

Each configuration modifies the architecture of environment resources slightly.

Cloud Composer Features

Airflow environments

A Cloud Composer environment is an Apache Airflow wrapper. For each environment, Cloud Composer generates the following components:

  • GKE clusters: Airflow schedulers, workers, and Redis Queue run as GKE workloads on a single cluster and are in charge of processing and executing DAGs. Other Cloud Composer components hosted on the cluster include the Composer Agent and Airflow Monitoring, which help manage the Cloud Composer environment, collect logs to store in Cloud Logging, and collect metrics to upload to Cloud Monitoring.
     
GKE

 

  • Database: The Apache Airflow metadata is stored in the database.
     
database

 

  • Web server: It runs the Apache Airflow web interface, and Identity-Aware Proxy protects it. See Airflow Web Interface for more information.
     
web server

 

Must Read Apache Server

 

  • Cloud Storage bucket: A Cloud Storage bucket is associated with the environment by Cloud Composer. The associated bucket holds the environment's DAGs, logs, custom plugins, and data. See Data Stored in Cloud Storage for more information on the Cloud Composer storage bucket.
     
cloud storage bucket

Airflow management

You can use the following Airflow-native tools to access and manage your Airflow environments:

  • Web interface: With the appropriate permissions, you can access the Airflow web interface via the Google Cloud console or a direct URL.
  • Command-line utilities: After installing the Google Cloud CLI, you can issue Airflow command-line commands to Cloud Composer environments by running gcloud composer environments commands.
     

The Cloud Composer REST and RPC APIs allow us to programmatically access your Airflow environments data.

Configuration of Airflow

In general, the configurations provided by Cloud Composer for Apache Airflow are the same as those offered by a locally hosted Airflow deployment. Some Airflow configurations are preconfigured in Cloud Composer, and the configuration properties cannot be changed. Other settings you can specify when creating or updating your environment

Access Control

We manage security at the Google Cloud project level and can assign Identity and Access Management (IAM) roles. IAM roles prevent individual users from creating or modifying environments. If someone does not have the essential Cloud Composer IAM roles, they will be unable to access any of your environments.

Monitoring and logging

Airflow logs associated with individual DAG tasks can be viewed in the Airflow web interface and in the logs folder in the environment's Cloud Storage bucket.

Cloud Composer now supports streaming logs.

Cloud Composer also provides audit logs for your Google Cloud projects, such as Admin Activity audit logs.

Shield and networking

Cloud Composer deploys a Standard mode VPC-native Google Kubernetes Engine cluster by default.

Cloud Composer also supports the following features for added security and networking flexibility.

  1. Shared VPC
  2. VPC-native Cloud Composer environment
  3. Private IP Cloud Composer environment

Cloud Composer security

security

Cloud Composer includes several security features and compliances useful for enterprise businesses with stringent security requirements.

These three sections provide details on Cloud Composer's security features

  1. Basic security features: The features that are available by default in Cloud Composer environments.
  2. Advanced Security features. Describe features you can use to tailor Cloud Composer to your security needs.
  3. Compliance with Standards. Provides a list of standards with which Cloud Composer complies.

Basic security features

This section lists the security features enabled by default in each Cloud Composer environment.

At-rest encryption

In Google Cloud, Cloud Composer employs encryption at rest.

Data is stored in various services by Cloud Composer. The Airflow Metadata DB, for example, uses a Cloud SQL database, and DAGs are stored in Cloud Storage buckets.

Data is encrypted by default with encryption keys managed by Google.

If you prefer, Cloud Composer environments can be encrypted using customer-managed encryption keys.

Access to buckets is uniform.

Uniform bucket-level access enables you to control access to your Cloud Storage resources uniformly. This mechanism also applies to the bucket in your environment that stores your DAGs and plugins.

Private IP mode for Cloud Composer environments.

Cloud Composer environments can be created in the Private IP networking configuration.

Nodes in your environment's cluster do not have external IP addresses and do not communicate via the public internet when in Private IP mode.

Shielded VMs are used in your environment's cluster.

Shielded virtual machines (VMs) are virtual machines (VMs) on Google Cloud that a set of security controls has hardened to help defend against rootkits and bootkits.

Shielded VMs run the nodes of Cloud Composer 1 environments that were created using GKE versions 1.18 and later.

Advanced Security features 

This section contains a list of advanced security features for Cloud Composer environments.

Encryption Keys Managed by the Customer (CMEK)

Customer Managed Encryption Keys are supported by Cloud Composer (CMEK). CMEK gives you more control over the keys to encrypt data at rest within a Google Cloud project.

CMEK can be used with Cloud Composer to encrypt and decrypt data produced by a Cloud Composer environment.

Support for VPC Service Controls (VPC SC)

VPC Service Controls is a mechanism for reducing the risk of data exfiltration.

Cloud Composer can be configured as a secure service within a VPC Service Controls perimeter. Cloud Composer's underlying resources are all configured to support and adhere to the VPC Service Controls architecture. In a VPC SC perimeter, only Private IP environments can be created.

When you deploy Cloud Composer environments with VPC Service Controls, you get:

  • Data exfiltration is less likely.
  • Protection against data exposure caused by incorrectly configured access controls.
  • Reduced risk of malicious users copying data to unauthorized Google Cloud resources or external attackers gaining access to Google Cloud resources via the internet.
     

Web server network access control levels (ACL)

Airflow web servers in Cloud Composer are always configured with an externally accessible IP address and network access control levels (ACL). You can specify which IP addresses can access the Airflow UI. Cloud Composer supports both IPv4 and IPv6 addresses.

Web server access restrictions can be set in the console, gcloud, API, and Terraform.

Secret Manager serves as a repository for sensitive configuration data.

Airflow can be configured in Cloud Composer to use Secret Manager as a backend for storing Airflow connection variables.

DAG developers can use the code to read variables and connections stored in Secret Manager.

Authentication methods 

authentication

The following authentication methods are supported by Cloud Composer.

Accounts for services rendered

Whether developing locally or in a production application, service accounts are recommended for almost all use cases.

User Accounts

You can authenticate users directly to your application when the application needs to access resources on behalf of an end user.

When making a method call in your application that uses end-user authentication, you must specify OAuth scopes. Per-method OAuth scopes can be found in the Cloud Composer Reference.

Access control

Roles limit the ability of an authenticated identity to access resources. When developing a production application, only grant an identity the permissions required to interact with Google Cloud APIs, features, or resources.

Frequently Asked Qusestions

Cloud Composer runs which version of Apache Airflow?

Airflow 1 and Airflow 2 are both supported by Cloud Composer.

Cloud Composer environments are constructed from Cloud Composer images. When creating an environment, you can choose an image with a specific Airflow version.

You have complete control over your environment's Apache Airflow version. You can upgrade your environment to a newer version of the Cloud Composer image. Each Cloud Composer release includes support for multiple Apache Airflow versions.

Is it possible to use our database as the Airflow Metadata DB?

For the Airflow Metadata DB, Cloud Composer employs a managed database service. A user-supplied database cannot be used as the Airflow Metadata DB.

Is it possible to use the native Airflow UI and CLI?

You can access your environment's Apache Airflow web interface. Your environments each have their Airflow UI. 

You use gcloud commands to run Airflow CLI commands in your environments. See Airflow command-line interface for more information on running Airflow CLI commands in Cloud Composer environments.

Conclusion

In this blog, we have extensively discussed the concept of Cloud Composer. We started with introducing the Cloud Composer, DAGs, workflow, and tasks, discussed cloud composer environments, features, and security then concluded with cloud composer authentication.

Recommended Reading: 

Clean Architecture

We hope this blog has helped you enhance your knowledge regarding Cloud Composer. If you want to learn about Cloud Composer, you can learn from HibernateHibernate Map mappingHibernate Configuration, etc.

You can also refer to our Guided Path on Coding Ninjas Studio to upskill yourself in Data Structures and AlgorithmsCompetitive ProgrammingSystem Design, and many more! You may also check out the mock test series and participate in the contests hosted on Coding Ninjas Studio! For placement preparations, you must look at the problemsinterview experiences, and interview bundle.

Nevertheless, you may consider our paid courses to give your career an edge over others!

Happy Coding!

Topics covered
1.
Introduction
2.
DAGs, workflows, and tasks
3.
Environments for Cloud Composer
3.1.
Configurations of the environment architecture
4.
Cloud Composer Features
4.1.
Airflow environments
4.2.
Airflow management
4.3.
Configuration of Airflow
4.4.
Access Control
4.5.
Monitoring and logging
4.6.
Shield and networking
5.
Cloud Composer security
5.1.
Basic security features
5.1.1.
At-rest encryption
5.1.2.
Access to buckets is uniform.
5.1.3.
Private IP mode for Cloud Composer environments.
5.1.4.
Shielded VMs are used in your environment's cluster.
5.2.
Advanced Security features 
5.2.1.
Encryption Keys Managed by the Customer (CMEK)
5.2.2.
Support for VPC Service Controls (VPC SC)
5.2.3.
Web server network access control levels (ACL)
5.2.4.
Secret Manager serves as a repository for sensitive configuration data.
6.
Authentication methods 
6.1.
Accounts for services rendered
6.2.
User Accounts
6.3.
Access control
7.
Frequently Asked Qusestions
7.1.
Cloud Composer runs which version of Apache Airflow?
7.2.
Is it possible to use our database as the Airflow Metadata DB?
7.3.
Is it possible to use the native Airflow UI and CLI?
8.
Conclusion