Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
Integrations with Bigtable
3.
Google Cloud services
3.1.
BigQuery
3.2.
Cloud Asset Inventory
3.3.
Cloud Functions
3.4.
Dataflow
3.5.
Dataproc
4.
Big Data
5.
Geospatial databases
6.
Graph databases
7.
Infrastructure management
8.
Time-series databases and monitoring
9.
Adding a cache layer to Google Cloud databases
10.
The code
11.
Set up a machine within the network
12.
Optionally connect to Memcached via Telnet
13.
Run the code
14.
Cleaning up
15.
Monitoring time-series data with OpenTSDB on Cloud Bigtable and GKE 
16.
Create a Bigtable instance
16.1.
Create a GKE cluster
17.
Create OpenTSDB tables in Bigtable
18.
Create the OpenTSDB services
18.1.
Prepare the application
19.
Deploy the function
20.
Trigger the function
21.
Clean up
22.
Frequently Asked Questions
22.1.
What is Cloud Bigtable?
22.2.
Does Bigtable support column-level security restrictions?
22.3.
What graph databases does Bigtable integrate with?
22.4.
What is a system integrator in cloud computing?
22.5.
What infrastructure management tools does Bigtable integrate with?
23.
Conclusion
Last Updated: Mar 27, 2024

Basics of Integration Concept in Cloud BigTable

Author Muskan Sharma
0 upvote
Leveraging ChatGPT - GenAI as a Microsoft Data Expert
Speaker
Prerita Agarwal
Data Specialist @
23 Jul, 2024 @ 01:30 PM

Introduction

While working on large data, we have to store the data in tables. So what to do when you have billions of rows and columns? For that, we have Cloud BigTable.

You can store terabytes or even petabytes of data in Cloud Bigtable, a large table that can contain billions of rows and thousands of columns.

So in this article, you'll get to learn about the Basics of the Integration Concept in Cloud BigTable.

Integrations with Bigtable

Integrations between Cloud Bigtable and other products and services are discussed on this page.

Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

Google Cloud services

The Google Cloud services that Bigtable interfaces with are described in this section.

BigQuery

Google's petabyte-scale, low-cost analytics data warehouse, BigQuery, is completely managed. To query data kept in Bigtable, utilize BigQuery.

Cloud Asset Inventory

Bigtable resource types are supported and returned by Cloud Asset Inventory, a time series database-based inventory service.

Cloud Functions

Cloud Functions is an event-driven serverless compute platform that integrates with Bigtable.

Dataflow

Big data processing uses the cloud service and programming model called Dataflow. Both batch and streaming processing are supported by the data flow. Bigtable data may be processed using Dataflow, and you can also use it to store the results of your pipeline.

Dataproc

Dataproc offers managed cloud services for Apache Hadoop and associated technologies. You can execute Hadoop jobs that read from and write to Bigtable using Dataproc.

Big Data

Big Data products that Bigtable interfaces with are covered in this area.

Apache Hadoop

Apache Hadoop's platform enables the distributed processing of massive data collections across computer clusters. You can build a Hadoop cluster using Dataproc, after which you can run MapReduce tasks that read from and write to Bigtable.

StreamSets Data Collector

You can set up the data-streaming application StreamSets Data Collector to write data to Bigtable.

Geospatial databases

The geographical databases that Bigtable connects with are described in this section.

GeoMesa

A distributed Spatio-temporal database called GeoMesa allows for spatial data management and querying. GeoMesa can use Bigtable to store its data.

Graph databases

The graph databases that Bigtable interfaces with are described in this section.

HGraphDB

For using Apache HBase or Bigtable as a graph database, there is a client layer called HGraphDB. The Apache TinkerPop 3 interfaces are implemented by it.

JanusGraph

A scalable graph database is JanusGraph. It is designed with hundreds of billions of vertices and edges for storage and querying.

Infrastructure management

Tools for infrastructure management that Bigtable integrates with are described in this section.

Pivotal Cloud Foundry

An application can be bound to Bigtable using the application development and deployment platform Pivotal Cloud Foundry.

Terraform

An open-source program called Terraform converts APIs into declarative configuration files. These files can be edited, reviewed, and versioned while also being shared among team members and handled as code.

Time-series databases and monitoring

The time-series databases and monitoring programs that Bigtable connects with are described in this section.

Heroic

Heroic is a time-series database and monitoring system. Bigtable can be used by Heroic to store its data.

OpenTSDB

A time-series database called OpenTSDB can store data in Bigtable. Using OpenTSDB to gather, store, and watch time-series data on Google Cloud is demonstrated in Monitoring time-series data using OpenTSDB on Bigtable and GKE.

Adding a cache layer to Google Cloud databases

This tutorial shows you how to use Memcached to speed up your application's response time for frequently accessed data :

MemcachedClient mcc = new MemcachedClient(new InetSocketAddress(memcachedIP, 11211));      
Object value = mcc.get(rowId);
if (value != null) {
    System. out.println("Value fetched from cache:" + value);
} else {
    // Read row from the database (Pseudocode)
    Row row = dataClient. readRow(rowId);
    mcc.set(rowId, 30 * 60, row.toString()); // Cache of 30 minutes.
    System.out.println("Value fetched from db and cached: " + row);
}

Databases are made for specific schemas, queries, and throughput. Still, if you have data accessed more regularly for a while, you might want to consider adding a cache layer to lessen the stress on your database.

The code

A cache's overall logic can be summed up as follows:

  1. Select a row key for the query.
  2. Return the value if the row key is cached.

Otherwise

  • Go to Cloud Bigtable and look up the row.
  • Add the value with expiration to the cache.
  • Give the value back.
try {
  MemcachedClient mcc = new MemcachedClient(new InetSocketAddress(discoveryEndpoint, 11211));
  System.out.println("Connected with Memcached successfully");

  // Get value from the cache
  String rowkey = "phone#4c410523#20190501";
  String columnFamily = " stats_summary ";
  String column = "os_build";
  String cacheKey = String.format("%s:%s:%s", rowkey, columnFamily, column);

  Object value = mcc.get(cacheKey);

  if (value != null) {
    System.out.println("Value fetched from the cache: " + value);
  } else {
    System.out.println("didn't get value from the cache");
    // Get data from Bigtable source and add to the cache of 30 mins.
    try (BigtableDataClient dataClient = BigtableDataClient.create(projectId, instanceId)) {
      Row row = dataClient.readRow(tableId, rowkey);
      String cellValue = row.getCells(columnFamily, column).get(0).getValue().toStringUtf8();
      System.out.println("got data from bt " + cellValue);
      // Set data into Memcached server.
      mcc.set(cacheKey, 30 * 60, cellValue);
      System. out.println("Value fetched from Bigtable: " + cellValue);
    } catch (Exception e) {
      System.out.println("Could not set cache value.");
      e.printStackTrace();
    }
  }
  mcc.shutdown();
} catch (Exception e) {
  System.out.println("Could not get cache value.");
  e.printStackTrace();
}

The following key-value pairs could be used as cache:

  • rowkey: encoded row
  • start_row_key-end_row_key: array of encoded rows
  • SQL queries: results
  • row prefix: array for encoded rows

Consider your use case while choosing the setup for your cache. Your rowkey might be too big because Bigtable rowkeys have a size restriction of 4KB, but Memcached keys have a maximum limit of 250 bytes.

Create a Memcached instance

For testing purposes, you can install and operate a local Memcached instance in place of the Memorystore created in this tutorial.

  • Enable the Memcached API's Memorystore:
gcloud services enable memcache.googleapis.com
  • Create the smallest Memcached instance possible on the default network in the best region for your application:
gcloud beta memcache instances create bigtable-cache --node-count=1 --node-cpu=1 --node-memory=1GB --region=us-central1
  • Obtain the IP address of the discoveryEndpoint and the Memcached instance details:
gcloud beta memcache instances describe bigtable-cache --region=us-central1

Set up a machine within the network

You must set up a network location where you can execute code connected to your Memcached instance. A Compute Engine VM requires less configuration than a serverless option like Cloud Functions.

  • Create a Compute Engine instance with enabled API scopes for Cloud Bigtable data on the default network:
gcloud beta compute instances create bigtable-memcached-vm \
    --zone=us-central1-a \
    --machine-type=e2-micro \
    --image=debian-10-buster-v20200910 \
    --image-project=debian-cloud \
    --boot-disk-size=10GB \
    --boot-disk-type=pd-standard \
    --boot-disk-device-name=bigtable-memcached-vm \
    --scopes=https://www.googleapis.com/auth/bigtable.data,https://www.googleapis.com/auth/servicecontrol,https://www.googleapis.com/auth/service.management.readonly,https://www.googleapis.com/auth/logging.write,https://www.googleapis.com/auth/monitoring.write,https://www.googleapis.com/auth/trace.append,https://www.googleapis.com/auth/devstorage.read_only
  • Create a Compute Engine instance with enabled API scopes for Cloud Bigtable data on the default network:
gcloud beta compute ssh --zone "us-central1-a" bigtable-memcached-vm

Optionally connect to Memcached via Telnet

More details regarding this procedure may be found in the Memorystore for Memcached documentation. Still, you can merely use the instructions listed below to set and get a value from the cache:

sudo apt-get install telnet
telnet $DISCOVERY_ENDPOINT_ID 11211
set greeting 1 0 11
hello world
get greeting

Run the code

Run the program

1. VM installation of Git:

sudo apt-get install git

2. Copy the code from the repository:

git clone https://github.com/ GoogleCloudPlatform/java-docs-samples.git

3. Go to the directory where the code is located.

cd java-docs-samples/bigtable/memorystore

4. Setup Maven:

sudo apt-get install maven

5. Configure environment variables for your setup:

PROJECT_ID=your-project-id
MEMCACHED_DISCOVERY_ENDPOINT="0.0.0.0"

6. Run the program for once to get the value from database, and then run it again to see that the value is fetched from the cache:

mvn compile exec:java -Dexec.mainClass=Memcached \
    -DbigtableProjectId=$PROJECT_ID \
    -DbigtableInstanceId=bt-cache \
    -DbigtableTableId=mobile-time-series \
    -DmemcachedDiscoveryEndpoint=$MEMCACHED_DISCOVERY_ENDPOINT

Cleaning up

Through the instructions in this article, use these commands to destroy your virtual machine, Cloud Bigtable instance, and Memcached instance to prevent being charged for their continuous use.

cbt deleteinstance bt-cache
gcloud beta memcache instances delete bigtable-cache --region=us-central1
gcloud compute instances delete bigtable-memcached-vm --zone=us-central1-a

Monitoring time-series data with OpenTSDB on Cloud Bigtable and GKE 

This post explains how to use Bigtable and OpenTSDB, which are both running on Google Kubernetes Engine (GKE), to gather, store, and analyze time-series data on Google Cloud.

Create a Bigtable instance

You must build a Bigtable instance since this guide uses Bigtable to store the time-series data that you gather.

1. Set the environment variables for your Google Cloud zone, the GKE cluster, the instance identifier for your Bigtable cluster, and the environment variables for your GKE cluster in Cloud Shell:

export BIGTABLE_INSTANCE_ID=BIGTABLE_INSTANCE_ID
export ZONE=ZONE

2. Build a Bigtable instance:

gcloud bigtable instances create ${BIGTABLE_INSTANCE_ID} \
    --cluster-config=id=${BIGTABLE_INSTANCE_ID}-${ZONE},zone=${ZONE},nodes=1 \
    --display-name=OpenTSDB

Create a GKE cluster

A managed Kubernetes environment is offered by GKE. You can deploy Kubernetes Pods to a GKE cluster after creating it. This manual runs OpenTSDB using GKE and Kubernetes Pods.

1. Set the environment variables for the Google Cloud zone where your Bigtable cluster and GKE cluster will be created, as well as the name, node type, and version of your GKE cluster, in Cloud Shell:

export GKE_CLUSTER_NAME=GKE_CLUSTER_NAME
export GKE_VERSION=1.20
export GKE_NODE_TYPE=n1-standard-4

2. Establish a GKE cluster:

gcloud container clusters create ${GKE_CLUSTER_NAME} \
    --zone=${ZONE} \
    --cluster-version=${GKE_VERSION} \
    --machine-type ${GKE_NODE_TYPE} \
    --scopes "https://www.googleapis.com/auth/cloud-platform"

3. Obtain the login information necessary to connect to your GKE cluster:

gcloud container clusters get-credentials ${GKE_CLUSTER_NAME} --zone ${ZONE}

Create OpenTSDB tables in Bigtable

Bigtable tables must first be created in order to hold the data before OpenTSDB can be used to read or write to them. You will establish a Kubernetes job to create the tables.

1. Run the job in Cloud Shell:

envsubst < jobs/opentsdb-init.yaml.tpl | kubectl create -f -

2. Check the job logs for creating tables:

OPENTSDB_INIT_POD=$(kubectl get pods --selector=job-name=opentsdb-init \
                    --output=jsonpath={.items..metadata.name})
kubectl logs $OPENTSDB_INIT_POD

Create the OpenTSDB services

You construct two Kubernetes services: one service publishes measurements into OpenTSDB, and the other reads metrics, to guarantee consistent network connectivity to the deployments.

1. Create the service for writing metrics in Cloud Shell.

kubectl create -f services/opentsdb-write.yaml

2. The service for reading metrics should be created:

kubectl create -f services/opentsdb-read.yaml

Prepare the application

1. To your local PC, clone the repository for the example app:

Nodejs

git clone https://github.com/GoogleCloudPlatform/nodejs-docs-samples.git

Python

git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git

Go

git clone https://github.com/GoogleCloudPlatform/golang-samples.git

2. To access Bigtable using Cloud Functions, navigate to the following directory:

cd nodejs-docs-samples/functions/bigtable/

Nodejs

cd nodejs-docs-samples/functions/bigtable/

Python

cd python-docs-samples/functions/bigtable/

Go

cd golang-samples/functions/bigtable/


3. Look at the following example of code:

Nodejs

// Imports the Google Cloud client library
const {Bigtable} = require('@google-cloud/bigtable');

// Instantiates a client
const bigtable = new Bigtable();

exports.readRows = async (req, res) => {
  // Get reference to a Cloud Bigtable instance and database
  const instance = bigtable.instance(req.body.instanceId);
  const table = instance.table(req.body.tableId);

  // Execute the query
  try {
    const prefix = 'phone#';
    const rows = [];
    await table
      .createReadStream({
        prefix,
      })
      .on('error', err => {
        res.send(`Error querying Bigtable: ${err}`);
        res.status(500).end();
      })
      .on('data', row => {
        rows.push(
          `rowkey: ${row.id}, ` +
            `os_build: ${row.data['stats_summary']['os_build'][0].value}\n`
        );
      })
      .on('end', () => {
        rows.forEach(r => res.write(r));
        res.status(200).end();
      });
  } catch (err) {
    res.send(`Error querying Bigtable: ${err}`);
    res.status(500).end();
  }
};

Python

def bigtable_read_data(request):
    instance = client.instance(request.headers.get("instance_id"))
    table = instance.table(request.headers.get("table_id"))

    prefix = 'phone#'
    end_key = prefix[:-1] + chr(ord(prefix[-1]) + 1)

    outputs = []
    row_set = RowSet()
    row_set.add_row_range_from_keys(prefix.encode("utf-8"),
                                    end_key.encode("utf-8"))

    rows = table.read_rows(row_set=row_set)
    for row in rows:
        output = 'Rowkey: {}, os_build: {}'.format(
            row.row_key.decode('utf-8'),
            row.cells["stats_summary"]["os_build".encode('utf-8')][0]
            .value.decode('utf-8'))
        outputs.append(output)

    return '\n'.join(outputs)

)
                                }
                        }

                        fmt.Fprintf(w, "Rowkey: %s, os_build:  %s\n", row.Key(), osBuild)
                        return true
                })

        if err != nil {
                http.Error(w, "Error reading rows", http.StatusInternalServerError)
                log.Printf("tbl.ReadRows(): %v", err)
        }
}

Go

// BigtableRead is an example of reading the Bigtable from the Cloud Function.
func BigtableRead(w http.ResponseWriter, r *http.Request) {
        clientOnce.Do(func() {
                // Declare a separate err variable to avoid the shadowing client.
                var err error
                client, err = bigtable.NewClient(context.Background(), r.Header.Get("projectID"), r.Header.Get("instanceId"))
                if err != nil {
                        http.Error(w, "Error initializing client", http.StatusInternalServerError)
                        log.Printf("bigtable.NewClient: %v", err)
                        return
                }
        })

        tbl := client.Open(r.Header.Get("tableID"))
        err := tbl.ReadRows(r.Context(), bigtable.PrefixRange("phone#"),
                func(row bigtable.Row) bool {
                        osBuild := ""
                        for _, col := range row["stats_summary"] {
                                if col.Column == "stats_summary:os_build" {
                                        osBuild = string(col.Value

Deploy the function

Nodejs

gcloud functions deploy get \
--runtime nodejs16 --trigger-http

Python

gcloud functions deploy bigtable_read_data \
--runtime python310 --trigger-http

Go

gcloud functions deploy BigtableRead \
--runtime go116 --trigger-http

Trigger the function

Nodejs

curl "https:// REGION-PROJECT_ID.cloudfunctions.net/get" -H "instance_id: test-instance" -H "table_id: test-table"

Python

curl "https://REGION-PROJECT_ID.cloudfunctions.net/bigtable_read_data" -H "instance_id: test-instance" -H "table_id: test-table"

Go

curl "https://REGION-PROJECT_ID.cloudfunctions.net/BigtableRead" -H "instance_id: test-instance" -H "table_id: test-table"

Clean up

To prevent being charged more for the Bigtable and Cloud Functions resources used in this topic on your Google Cloud account:

1. Delete the instance:

gcloud bigtable instances delete test-instance

2. Eliminate the function you used:

gcloud functions delete get
gcloud functions delete bigtable_read_data
gcloud functions delete BigtableRead

 

Check out most important Git Interview Questions here.

Frequently Asked Questions

What is Cloud Bigtable?

You can store terabytes or even petabytes of data in Cloud Bigtable, a sparsely populated table that can scale to billions of rows and thousands of columns.

Does Bigtable support column-level security restrictions?

Bigtable does not support row-level, column-level, or cell-level security limitations.

What graph databases does Bigtable integrate with?

Google is not associated with this integration and does not support it.

What is a system integrator in cloud computing?

A system integrator offers a plan for the difficult process utilized to create a cloud platform.

What infrastructure management tools does Bigtable integrate with?

Tools for infrastructure management that Bigtable integrates with are described in this section.

Conclusion

This blog has extensively discussed the Basics of Integration Concept in Cloud BigTable, Create a Bigtable instance, Google Cloud Services, and Cloud Function. We hope this blog has helped you learn about the Basics of the Integration Concept in Cloud BigTable.If you want to learn more, check out the excellent content on the Coding Ninjas Website:

Advanced Concepts of Integration Concept , Overview of cloud Bigtable , Overview of cloud billing concepts

Refer to our guided paths on the Coding Ninjas Studio platform to learn more about DSA, DBMS, Competitive Programming, Python, Java, JavaScript, etc. 

Refer to the links problemstop 100 SQL problemsresources, and mock tests to enhance your knowledge.

For placement preparations, visit interview experiences and interview bundles.

Thank You

Do upvote our blog to help other ninjas grow. Happy Coding!

Live masterclass