Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Last Updated: Mar 27, 2024

Overview of Cloud GPU

Leveraging ChatGPT - GenAI as a Microsoft Data Expert
Speaker
Prerita Agarwal
Data Specialist @
23 Jul, 2024 @ 01:30 PM

Introduction

The graphics processing unit, also known as GPUs, are provided by the compute engines that can be added to the virtual machine instances. GPUs can be used in order to accelerate specific workloads on VMs like machine learning or data processing. NVIDIA GPUs are provided by the compute engine for your VMs. The VMs get direct control over the GPUs as they are provided in the passthrough mode. The NVIDIA RTX virtual workstations can be used if the user has a graphics-intensive workload like 3D visualization or 3D rendering.

NVIDIA RTX Virtual Workstations for Graphics Workloads

If the user has a graphics-intensive workload like they have tasks related to 3D visualization, then the user can create a virtual workstation that uses NVIDIA RTX Virtual Workstation. An NVIDIA RTX Virtual workstation license gets automatically added to the user's VM when the user creates a virtual workstation. 

NVIDIA RTX Virtual workstation models are available in the below-mentioned stages:

NVIDIA T4 Virtual Workstation: nvidia-tesla-t4-vws

NVIDIA P100 Virtual Workstation: nvidia-tesla-p100-vws

NVIDIA P4 Virtual Workstation: nvidia-tesla-p4-vws

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

Regions and Zones

Multiple locations across the world are used to host the compute engine resources, these locations are composed of regions and zones. A region is defined as a specific geographical location where the user can host their resources. Regions can have three or more zones, the us-west1 is a region on the west coast of the United States that has further three zones: us-west1-a, us-west1-b, and us-west1-c.

Zonal resources are the resources that live in a zone like virtual machine instances or zonal persistent disks. At the same time, regional resources are like static external IP addresses. Anyone can use regional resources regardless of the zone, but zonal resources can only be used by resources that are in the same zone.

By putting resources in different zones in a region, the risk of an infrastructure outage affecting all the resources at the same time is reduced. While putting resources into different regions offers even a higher degree of failure independence. This helps design the user more robust systems with resources spread across different failure domains.

Zones and clusters

A layer of abstraction is implemented using the compute engine between the zones and the physical cluster where the zones are hosted. A distinct physical infrastructure that is housed in a data center is represented by a cluster. One or more clusters are hosted in each zone, and the compute engine independently maps zones to cluster for each organization. 

If the zones are decoupled from the cluster, it provides a number of benefits to the user and the compute engine.

  • It gives compute engine the ability to ensure that the resources are balanced across the cluster in a region.
  • It gives the user the choice of still choosing the zones from remains manageable as the compute engine continues to grow its region over time by adding more clusters.


The compute engine makes sure that all the projects in an organization have a consistent zone to cluster mapping. VPC network peeing is used for organizations with projects, or private service access is used for sharing networks or services with other organizations. 

Choosing a region and zone

The user can choose which region or zone to use to host the resources that control how data is stored and used. It's important to choose a region and zone for the reasons below.

Handling failures

The resources must be distributed across multiple zones and regions so that they can tolerate outages. Google designs zones in such a manner that the risk of correlated failures caused by physical infrastructure is minimized. If a zone becomes unavailable, then the user can transfer traffic to another zone within the same region, allowing the user to keep the services running. 

Decreased network latency

If the user wishes to decrease network latency, then the user must choose a region or zone that is close to its point of service. 

Identifying a region or zone

As mentioned above, each region has multiple zones in a compute engine. The zone name has two parts that describe each zone in detail. The first part represents the region and the second part represents the zone in the region.

  • Region
    Regions are defined as collections of zones. Each zone has low bandwidth and low latency network connections to other zones in the same region. Google recommends deploying applications over multiple zones and regions to deploy fault-tolerant applications that have high availability. This also helps against any unexpected components failures up to a single zone or region. 
  • Zone
    A zone is defined as a deployment area within a region. The name that is fully qualified for a zone is made up of <region >-<zone>. 

Transparent maintenance

By patching systems with the latest software, performing routine tests, and preventative maintenance, Google maintains its infrastructure as fast and efficiently as possible. All the maintenance events are transparent to the applications and workspace by default. Google uses a combination of data center innovations and operational best practices to move the running virtual machine instances out of the way of maintenance that is being performed. 

All the virtual machines are set to live to migrate by default, but the user can manually set the virtual machines to stop and reboot. The options differ in a below-mentioned manner.

  • Live migrate
    Running instances are automatically migrated by the compute engine. Although the guest's performance is impacted to some degree, the user's instance remains online throughout the migration process. 
  • Stop and reboot 
    A signal is sent by the compute engine automatically that will shut down the instance. It waits for a short while to shut down cleanly and then restarts it away from the maintenance event.

Quotas

Resources like static IPs, images, firewall rules, and VPC networks have predefined protect wide quota limits and per-region quota limits. When the resources are created, all the project-wide quota or the per region quota are counted, if applicable. If any quota limit gets exceeded, then the user won't be able to add more resources of the same type in that project.

Global, regional, and zonal resources

The google cloud resources are hosted at multiple locations across the world. These locations have regions with zones within those regions. Many types of infrastructure, hardware, and software failures are prevented by putting resources in different zones in a region. It also provides a high degree of failure independence. The scope of the resources is used to indicate how they are accessible to other resources. All the resources must be unique, whether they are global, zonal, or regional. 

Global resources 

These are the resources that are accessible by any resources in any zone within the same project. While creating a global resource, there is no need to provide a scope specification. The global resources include:

Addresses

It is a collection of any global static external IP addresses that the user has reserved for their project. The global static external IP addresses are a global resource which are used for global load balancers.

Images

They are used by any instances or disk resources in the same project as the image. There is a set of preconfigured images provided by Google that the user can use to boot your instance. You can either build an image or customize one of them.

Snapshots

The Persistent disk snapshots are available to all the disks within the same project as the snapshots.

Instance templates

It can be used to create a VM instance as well as manage an instance group. It is a global resource. The user can specify some zonal resources in an instance template, which will restrict the use of that template to the location of the specified zonal resource.

Cloud Interconnects

It is a highly available connection from the user's on-premises network to Google's network. This connection is a global resource, however, there are some interconnect attachments that run inside of this connection are regional resources.

VPC Network

It is a global resource, though the individual subnets are regional resources.

Firewalls

They are applied to a single VPC network and are considered a global resource as the packets can reach them from other networks.

Routes

It lets you create complex networking scenarios. The user has the ability to manage how the traffic will be routed for a specific IP range. Routes are similar to how a router directs the traffic within a local area network. They are applied to VPC networks within a Google Cloud project and are considered a global resource.

Regional resources

These are the resources that are accessible by any resources within the same region. Each region has one or more zones. The regional resources include:

Addresses

It is a collection of all the regional static external IP addresses that the user has reserved for their project. Static external IP addresses are considered a regional resource that is used by the instances that are in the same region as the address, by regional forwarding rules for either regional load balancers or protocol forwarding.

Cloud Interconnect attachments

An interconnect attachment is used to allocate a VLAN on the user's cloud interconnect, and it connects that VLAN to a VPC network. It is a regional resource though cloud interconnect connection is a global resource. 

Subnets

Subnets regionally segment a network IP space into prefixes. It also controls which prefix an instance's internal IP address is allocated from.

Regional managed instance groups

These are collections of identical instances that span across multiple zones. They let the user spread the app across multiple zones, rather than the user confining its app to a single zone or having to manage multiple instance groups across different zones.

Regional operations

Operations are a per-zone resource, per-region resource as well as global resource. An operation is considered as per-region operation if the user is performing an operation on a regional resource,

Zonal resources

The resources that are hosted in a zone are known as per-zone resources. They are unique to that zone and are only available for use by other resources of the same zone. Pre-zone resources include:

Instances

A zone has a virtual machine's instance and has the ability to access global resources as well as resources within the same zone.

Persistent disks

These are accessed by other instances within the same zone. The user can attach a disk only to the instances in the same zone as the disk. The user has the capability of sharing the disk resources with other projects, through which other projects make images and snapshots from the disks but can't let instances in other projects attached to the disk.

Machine types

They are per-zone resources. Instances, as well as disks, can use machine types only of the same zone.

Zonal-managed instance groups

It uses an instance template to create a group of identical instances within a single zone. The user can manage the VM instances in a managed instance group as a single entity.

Per-zone Operations

If any operation is performed on a zone-specific resource, then the operation is known as a per-zone operation.

Zone virtualization

It is a technique used by Google for mapping the public zones to clusters of internal physical hardware within our data centers. It enables us to expand our zones seamlessly, upgrade hardware and decommission physical infrastructure without customer-facing impact. Zones are logical groups of resources that are designed in order to avoid correlated failures. Placing resources in multiple zones within a region reduces the risk of correlated physical and software infrastructure failures affecting the user's application. 

Clusters

Every google cloud hardware is organized into clusters. It represents a set of compute, network and storage resources that are supported by a building, power, and colling infrastructure. The infrastructure component supports a single cluster to make sure that the clusters share few dependencies. Components with highly demonstrated reliability and downstream redundancy can also be shared between clusters.

Zone-to-cluster mapping

Google cloud selects a unique cluster for each zone in the region when a region is used for the first time by a project. The cluster selected is used for the project's zonal resources. This selection is known as zone-to-cluster mapping. To ensure that every customer experiences the same capabilities and performance by default, the zone-to-cluster mappings are selected on a per-project basis. Mapping between the logical zone and the physical cluster is consistent within a project, however, the zone-to-cluster mapping might be completely different for another project depending on the project's zonal resources. A project can never have two zones mapped to the same physical cluster. Using VPC networks, the user can align the zone-to-cluster mappings between projects. The Google cloud tries to assign the same zone-to-cluster map to all projects that share a VPC network.

Virtualized Zones

Each zone is supported by multiple clusters as the regions expand. The aim is to group the clusters with shared infrastructure, like building or cooling infrastructure, into logical zones so that the shared infrastructure failures affect only one zone within a region.

Tho zone-to-cluster mapping is a seldom change, changes do occur as the capacity needs and underlying hardware offerings evolve. In the case of a cluster outage, only the logical zone associated with the cluster is reported for having an outage, though not all the customer resources get affected since the zone might be composed of multiple clusters.

Choose a compute engine deployment strategy for the workload

Assess the workload

By using the below-mentioned question for analyzing the key requirements of the workload that the user wants to deploy. The answers will help map the capabilities of each deployment option to the requirements of the workload.
 

Application state

Is the application stateful?

Details like client or session ID are stored by a stateful application until that data is no longer required. Whereas a stateless application does not store any kind of client, transaction, or session data. 

Provisioning

  • Is there a need for the VMs to use a mix of machine types or images?
  • For example, is there a need for memory-optimized machine types for some VMs while the others use general-purpose machine types?
  • Is there a need for the infrastructure to scale automatically in tune with the changes in load to maintain an optimal balance between cost and response time?
  • Is it possible to run all the VMs within a single zone, VPC network, and subnet?

Operations

  • Do you wish to manage the VMs as a single group? 
  • Is there a requirement for a custom or a third-party tool to manage the VMs?
  • Is there a need to control the handling of failed VMs?
  • Is there a need to control the start-stop-suspend-resume sequence or schedule of your VMs?

Resilience

  • Is there a need for protection of the application against zonal failures?
  • Should the compute engine recreate the VM automatically in case the VM stops or crashes or if the application stops responding to requests.
  • Is there a need for a fixed internal or external IP address for the application for host VMs?


Review the available deployment options

The following mentioned options are available for consideration while deploying a workload to Compute engine:

Standalone VMs

This option allows the user to choose the machine type, image, disks, and many other attributes individually for each VM the user provides. And allows the user to manage the VMs as separate resources.

Unmanaged instance group

It allows the user to add standalone VMs into an instance group. This can be used as a backend to a load balancer.

Managed instance group

It is a group of identical or similarly configured instances that the user provision by using an instance template.

  • Specific disks or metadata can be preserved by making the MIG stateful.
  • The user can enable autoscaling and can configure a scaling policy for a stateless MIG.
  • The user can either deploy the VMs within a single zone or distribute them across more than one zone in a region for high availability while creating a MIG.

Frequently asked questions

What machine type must be used for running NVIDIA A100 GPUs?

The accelerator-optimized (A2) machine type must be used for running NVIDIA A100 GPUs.

Which workstation can be used if the user has a graphics-intensive workload.

NVIDIA RTX Virtual workstation can be used if the user has a graphics-intensive workload.

What is the syntax that fully qualifies for a zone name?

The syntax that fully qualifies for a zone name is made up of <region>-<zone>

Conclusion

In this article, we have extensively discussed the Overview of Cloud GPU.

After reading about the Overview of Cloud GPU, are you not feeling excited to read/explore more articles on Google Cloud? Don't worry; Coding Ninjas has you covered. To learn about GCP certification: Google Cloud Platform, the difference between AWS, Azure & Google Cloud, and which platform is best: AWS vs. Google Cloud.

If you wish to enhance your skills in Data Structures and AlgorithmsCompetitive ProgrammingJavaScript, etc., you should check out our Guided path column at Coding Ninjas Studio. We at Coding Ninjas Studio organize many contests in which you can participate. You can also prepare for the contests and test your coding skills by giving the mock test series available. In case you have just started the learning process, and your dream is to crack major tech giants like Amazon, Microsoft, etc., then you should check out the most frequently asked problems and the interview experiences of your seniors that will surely help you in landing a job in your dream company. 

Do upvote if you find the blogs helpful.

Happy Learning!

Topics covered
1.
Introduction
2.
NVIDIA RTX Virtual Workstations for Graphics Workloads
3.
Regions and Zones
3.1.
Zones and clusters
3.2.
Choosing a region and zone
3.2.1.
Handling failures
3.2.2.
Decreased network latency
3.3.
Identifying a region or zone
3.4.
Transparent maintenance
3.5.
Quotas
4.
Global, regional, and zonal resources
4.1.
Global resources 
4.1.1.
Addresses
4.1.2.
Images
4.1.3.
Snapshots
4.1.4.
Instance templates
4.1.5.
Cloud Interconnects
4.1.6.
VPC Network
4.1.7.
Firewalls
4.1.8.
Routes
4.2.
Regional resources
4.2.1.
Addresses
4.2.2.
Cloud Interconnect attachments
4.2.3.
Subnets
4.2.4.
Regional managed instance groups
4.2.5.
Regional operations
4.3.
Zonal resources
4.3.1.
Instances
4.3.2.
Persistent disks
4.3.3.
Machine types
4.3.4.
Zonal-managed instance groups
4.3.5.
Per-zone Operations
5.
Zone virtualization
5.1.
Clusters
5.2.
Zone-to-cluster mapping
5.3.
Virtualized Zones
6.
Choose a compute engine deployment strategy for the workload
6.1.
Assess the workload
6.1.1.
Application state
6.1.2.
Provisioning
6.1.3.
Operations
6.1.4.
Resilience
6.2.
Review the available deployment options
6.2.1.
Standalone VMs
6.2.2.
Unmanaged instance group
6.2.3.
Managed instance group
7.
Frequently asked questions
7.1.
What machine type must be used for running NVIDIA A100 GPUs?
7.2.
Which workstation can be used if the user has a graphics-intensive workload.
7.3.
What is the syntax that fully qualifies for a zone name?
8.
Conclusion