Introduction
While working on large data, we have to store the data in tables. So what to do when you have billions of rows and columns? For that, we have Cloud BigTable.
You can store terabytes or even petabytes of data in Cloud Bigtable, a large table containing billions of rows and thousands of columns.
So in this blog, you'll learn the Basics of Cloud Bigtable with Migration Concept
Let’s dive into the topic to explore more.
Import and export data
The techniques for importing and exporting data into and out of Cloud Bigtable are listed on this page.
Avro files
You can export data from Bigtable as Avro files and then import that data back into Bigtable using the following Dataflow templates. The Google Cloud console or the gcloud command-line tool can be used to run the templates.
Cloud Bigtable to Cloud Storage Avro
Bigtable's Cloud Storage Alternative Avro templates are pipelines that read information from Bigtable tables and publish it in Avro format to Cloud Storage buckets. The template can be used to transfer data from Bigtable to cloud storage.
- Conditions for using this pipeline:
- There must be a Bigtable table.
- Before starting the pipeline, the output Cloud Storage bucket must be present.
Cloud Storage Avro to Cloud Bigtable
Avro files in a Cloud Storage bucket are read using the Cloud Storage Avro to Bigtable template, which then writes the data to a Bigtable table. The template can be used to transfer data from Cloud Storage to Bigtable.
- Conditions for using this pipeline:
- There must be a Bigtable table with the same column families as those exported in the Avro files.
- Before starting the pipeline, the input Avro files must be present in a Cloud Storage bucket.
- Bigtable requires the input Avro files to conform to a certain schema.
Parquet files
You can export data from Bigtable as Parquet files and then import that data back into Bigtable using the following Dataflow templates. The Google Cloud console or the gcloud command-line tool can be used to run the templates.
Cloud Bigtable to Cloud Storage Parquet
Bigtable's Cloud Storage Alternative A pipeline called a parquet template extracts data from a Bigtable table and sends it in parquet format to a Cloud Storage bucket. The template can be used to transfer data from Bigtable to cloud storage.
- Conditions for using this pipeline:
- There must be a Bigtable table.
- Before starting the pipeline, the output Cloud Storage bucket must be present.
Cloud Storage Parquet to Cloud Bigtable
The pipeline used by the Cloud Storage Parquet to Bigtable template receives information from Parquet files in a bucket and publishes it to a Bigtable table. The template can be used to transfer data from Cloud Storage to Bigtable.
- Conditions for using this pipeline:
- A Bigtable table must have the same column families as those exported in the Parquet files.
- Before starting the pipeline, the input Parquet files must be present in a Cloud Storage bucket.
- From the input Parquet files, Bigtable anticipates a particular schema.
SequenceFiles
The following Dataflow templates allow you to export data from Bigtable as SequenceFiles and then import the data back into Bigtable. You can execute the templates by using the gcloud command-line tool or the Google Cloud console.
Bigtable to Cloud Storage SequenceFile
Bigtable's Cloud Storage Alternative A pipeline called a SequenceFile template reads data from a Bigtable table and outputs it in SequenceFile format to a Cloud Storage bucket. The template can be used to transfer data from Bigtable to Cloud Storage.
- Conditions for using this pipeline:
- There must be a Bigtable table.
- Before starting the pipeline, the output Cloud Storage bucket must be present.
Cloud Storage SequenceFile to Bigtable
A pipeline that reads data from SequenceFiles in a Cloud Storage bucket and writes it to a Bigtable table is the Cloud Storage SequenceFile to Bigtable template. The template can be used to transfer data from Cloud Storage to Bigtable.
- Conditions for using this pipeline:
- There must be a Bigtable table.
- Before starting the pipeline, the input SequenceFiles must be in a Cloud Storage bucket.
- SequenceFiles for the input must have been exported from HBase or Bigtable.