Introduction
Google Bigtable is a column-oriented, distributed data store made by Google Inc. to handle huge amounts of structured data associated with the company's and Web services operations. Applications like the Google App Engine Datastore, Google Personalized Search, Google Earth, and Google Analytics all use Google Bigtable as their database. Bigtable was initially created to enable applications needing tremendous scalability; the technology was meant to be utilized with petabytes of data.
In this blog, let us start our discussion with replication for Bigtable and then gradually move on to backup and restore.
Replication
Replication for Cloud Bigtable lets you increase the durability and availability of your data by copying it across multiple zones or regions within the same region. Additionally, you can separate workloads by sending various requests to various clusters.
How it works
Bigtable supports replicated clusters in up to eight Google Cloud regions where Bigtable is available. There can only be one cluster in each zone in a region. You can access the data from your instance even if one Google Cloud zone or region goes unavailable by using clusters spread across multiple zones or regions.
Bigtable immediately begins synchronizing your data amongst the clusters when you establish an instance with multiple clusters, making a unique, independent duplicate of your data in each zone where your instance has a cluster. Similar to this, Bigtable moves your current data from the zone of the original cluster to the zone of the new cluster, then synchronizes changes to your data between the zones when you add a new cluster to an existing instance.
Bigtable automatically replicates all data changes, including all of the following kinds of updates:
- Data updates to existing tables
- Updated and removed tables
- Additions and deletions of column families
- Modifications to the garbage collection rules for a column family
You can read and write in each cluster because Bigtable treats each cluster in your instance as a primary cluster. Additionally, you can configure your instance to redirect requests from certain application types to various clusters.
You should know the limitations when changing the garbage collection policies for replicated tables before adding clusters to an instance.
Use cases
Some everyday use cases for Bigtable replication are:
Isolate serving applications from batch reads
Users of the application may see a performance hit when a batch analytics job with many large reads runs concurrently on a single cluster with an application with a mix of reads and writes. To ensure that batch tasks don't affect users of your apps, you can utilise replication to route batch analytics processes and application traffic to distinct clusters using app profiles with single-cluster routing.
Improve availability
Your data's availability and durability are restricted to the zone in which that cluster is located if an instance only has one cluster. By maintaining distinct copies of your data in several zones or regions and dynamically switching between clusters as necessary, replication can increase durability and availability.
Provide near-real-time backup
In some circumstances, you'll always need to route requests to a single cluster—for instance, if you can't afford to read old data. However, you may still employ replication by using one cluster to handle requests and keeping a second cluster around for backup purposes. You can reduce downtime if the serving cluster becomes unreachable by manually switching to the backup cluster.
Ensure your data has a global presence
Replication can be set up in several places worldwide to bring your data closer to your clients. To direct application traffic to the closest cluster, you may, for instance, construct an instance with replicated clusters in the US, Europe, and Asia.
Replication settings
Some common use cases for enabling Cloud Bigtable replication are:
Isolate batch analytics workloads from other applications
Users of the application may see a performance hit when a batch analytics job that executes several large reads on a single cluster is running concurrently with an application that executes a mix of reads and writes. Application traffic and batch analytics jobs can be sent to different clusters using app profiles and single-cluster routing in replication, keeping users of your apps unaffected by batch jobs.
Create high availability (HA)
If an instance has only 1 cluster, your data's durability and availability are limited to the zone where that cluster is located. . By maintaining distinct copies of your data in several zones or regions and dynamically switching between clusters as necessary, replication can increase both durability and availability.
Provide near-real-time backup
In some circumstances, you'll always need to route requests to a single cluster—for instance, if you can't afford to read old data. However, you may still employ replication by using one cluster to handle requests and keeping a second cluster around for backup purposes. You can reduce downtime if the serving cluster becomes unreachable by manually switching to the backup cluster.
Maintain high availability and regional resilience
Let's suppose you have client concentrations in two different areas of a continent. Bigtable clusters should be used to service each customer concentration as close to the clients as is practical. Within each region, you want your data to be highly available, and you might also want a failover option in case one or more of your clusters becomes unavailable.
You can create an instance for this use case with 2 clusters in area A and 2 clusters in region B. High availability is provided by this setup even if you cannot connect to a Google Cloud region. Additionally, it offers regional resilience since even if one zone disappears, the other cluster in that zone's region remains operational.
Store data close to your users
By running your application nearby to your users and storing your data as close to your application as you can, you can reduce latency if you have users all over the world. Your data is automatically duplicated across all the clusters you establish with Bigtable in different Google Cloud regions.