Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
While working on large data, we have to store the data in tables. So what to do when you have billions of rows and columns? For that, we have Cloud BigTable.
You can store terabytes or even petabytes of data in Cloud Bigtable, a large table that can contain billions of rows and thousands of columns.
So in this article, you'll get to learn about the Basics of the Integration Concept in Cloud BigTable.
Integrations with Bigtable
Integrations between Cloud Bigtable and other products and services are discussed on this page.
Google Cloud services
The Google Cloud services that Bigtable interfaces with are described in this section.
BigQuery
Google's petabyte-scale, low-cost analytics data warehouse, BigQuery, is completely managed. To query data kept in Bigtable, utilize BigQuery.
Cloud Asset Inventory
Bigtable resource types are supported and returned by Cloud Asset Inventory, a time series database-based inventory service.
Cloud Functions
Cloud Functions is an event-driven serverless compute platform that integrates with Bigtable.
Dataflow
Big data processing uses the cloud service and programming model called Dataflow. Both batch and streaming processing are supported by the data flow. Bigtable data may be processed using Dataflow, and you can also use it to store the results of your pipeline.
Dataproc
Dataproc offers managed cloud services for Apache Hadoop and associated technologies. You can execute Hadoop jobs that read from and write to Bigtable using Dataproc.
Big Data
Big Data products that Bigtable interfaces with are covered in this area.
Apache Hadoop
Apache Hadoop's platform enables the distributed processing of massive data collections across computer clusters. You can build a Hadoop cluster using Dataproc, after which you can run MapReduce tasks that read from and write to Bigtable.
StreamSets Data Collector
You can set up the data-streaming application StreamSets Data Collector to write data to Bigtable.
Geospatial databases
The geographical databases that Bigtable connects with are described in this section.
GeoMesa
A distributed Spatio-temporal database called GeoMesa allows for spatial data management and querying. GeoMesa can use Bigtable to store its data.
Graph databases
The graph databases that Bigtable interfaces with are described in this section.
HGraphDB
For using Apache HBase or Bigtable as a graph database, there is a client layer called HGraphDB. The Apache TinkerPop 3 interfaces are implemented by it.
JanusGraph
A scalable graph database is JanusGraph. It is designed with hundreds of billions of vertices and edges for storage and querying.
Infrastructure management
Tools for infrastructure management that Bigtable integrates with are described in this section.
Pivotal Cloud Foundry
An application can be bound to Bigtable using the application development and deployment platform Pivotal Cloud Foundry.
Terraform
An open-source program called Terraform converts APIs into declarative configuration files. These files can be edited, reviewed, and versioned while also being shared among team members and handled as code.
Time-series databases and monitoring
The time-series databases and monitoring programs that Bigtable connects with are described in this section.
Heroic
Heroic is a time-series database and monitoring system. Bigtable can be used by Heroic to store its data.
OpenTSDB
A time-series database called OpenTSDB can store data in Bigtable. Using OpenTSDB to gather, store, and watch time-series data on Google Cloud is demonstrated in Monitoring time-series data using OpenTSDB on Bigtable and GKE.
Adding a cache layer to Google Cloud databases
This tutorial shows you how to use Memcached to speed up your application's response time for frequently accessed data :
MemcachedClient mcc = new MemcachedClient(new InetSocketAddress(memcachedIP, 11211));
Object value = mcc.get(rowId);
if (value != null) {
System. out.println("Value fetched from cache:" + value);
} else {
// Read row from the database (Pseudocode)
Row row = dataClient. readRow(rowId);
mcc.set(rowId, 30 * 60, row.toString()); // Cache of 30 minutes.
System.out.println("Value fetched from db and cached: " + row);
}
Databases are made for specific schemas, queries, and throughput. Still, if you have data accessed more regularly for a while, you might want to consider adding a cache layer to lessen the stress on your database.
The code
A cache's overall logic can be summed up as follows:
Select a row key for the query.
Return the value if the row key is cached.
Otherwise
Go to Cloud Bigtable and look up the row.
Add the value with expiration to the cache.
Give the value back.
try {
MemcachedClient mcc = new MemcachedClient(new InetSocketAddress(discoveryEndpoint, 11211));
System.out.println("Connected with Memcached successfully");
// Get value from the cache
String rowkey = "phone#4c410523#20190501";
String columnFamily = " stats_summary ";
String column = "os_build";
String cacheKey = String.format("%s:%s:%s", rowkey, columnFamily, column);
Object value = mcc.get(cacheKey);
if (value != null) {
System.out.println("Value fetched from the cache: " + value);
} else {
System.out.println("didn't get value from the cache");
// Get data from Bigtable source and add to the cache of 30 mins.
try (BigtableDataClient dataClient = BigtableDataClient.create(projectId, instanceId)) {
Row row = dataClient.readRow(tableId, rowkey);
String cellValue = row.getCells(columnFamily, column).get(0).getValue().toStringUtf8();
System.out.println("got data from bt " + cellValue);
// Set data into Memcached server.
mcc.set(cacheKey, 30 * 60, cellValue);
System. out.println("Value fetched from Bigtable: " + cellValue);
} catch (Exception e) {
System.out.println("Could not set cache value.");
e.printStackTrace();
}
}
mcc.shutdown();
} catch (Exception e) {
System.out.println("Could not get cache value.");
e.printStackTrace();
}
The following key-value pairs could be used as cache:
rowkey: encoded row
start_row_key-end_row_key: array of encoded rows
SQL queries: results
row prefix: array for encoded rows
Consider your use case while choosing the setup for your cache. Your rowkey might be too big because Bigtable rowkeys have a size restriction of 4KB, but Memcached keys have a maximum limit of 250 bytes.
Create a Memcached instance
For testing purposes, you can install and operate a local Memcached instance in place of the Memorystore created in this tutorial.
Enable the Memcached API's Memorystore:
gcloud services enable memcache.googleapis.com
Create the smallest Memcached instance possible on the default network in the best region for your application:
You must set up a network location where you can execute code connected to your Memcached instance. A Compute Engine VM requires less configuration than a serverless option like Cloud Functions.
Create a Compute Engine instance with enabled API scopes for Cloud Bigtable data on the default network:
More details regarding this procedure may be found in the Memorystore for Memcached documentation. Still, you can merely use the instructions listed below to set and get a value from the cache:
sudo apt-get install telnet
telnet $DISCOVERY_ENDPOINT_ID 11211
set greeting 1 0 11
hello world
get greeting
Through the instructions in this article, use these commands to destroy your virtual machine, Cloud Bigtable instance, and Memcached instance to prevent being charged for their continuous use.
Monitoring time-series data with OpenTSDB on Cloud Bigtable and GKE
This post explains how to use Bigtable and OpenTSDB, which are both running on Google Kubernetes Engine (GKE), to gather, store, and analyze time-series data on Google Cloud.
Create a Bigtable instance
You must build a Bigtable instance since this guide uses Bigtable to store the time-series data that you gather.
1. Set the environment variables for your Google Cloud zone, the GKE cluster, the instance identifier for your Bigtable cluster, and the environment variables for your GKE cluster in Cloud Shell:
A managed Kubernetes environment is offered by GKE. You can deploy Kubernetes Pods to a GKE cluster after creating it. This manual runs OpenTSDB using GKE and Kubernetes Pods.
1. Set the environment variables for the Google Cloud zone where your Bigtable cluster and GKE cluster will be created, as well as the name, node type, and version of your GKE cluster, in Cloud Shell:
Bigtable tables must first be created in order to hold the data before OpenTSDB can be used to read or write to them. You will establish a Kubernetes job to create the tables.
OPENTSDB_INIT_POD=$(kubectl get pods --selector=job-name=opentsdb-init \
--output=jsonpath={.items..metadata.name})
kubectl logs $OPENTSDB_INIT_POD
Create the OpenTSDB services
You construct two Kubernetes services: one service publishes measurements into OpenTSDB, and the other reads metrics, to guarantee consistent network connectivity to the deployments.
1. Create the service for writing metrics in Cloud Shell.
kubectl create -f services/opentsdb-write.yaml
2. The service for reading metrics should be created:
kubectl create -f services/opentsdb-read.yaml
Prepare the application
1. To your local PC, clone the repository for the example app:
// BigtableRead is an example of reading the Bigtable from the Cloud Function.
func BigtableRead(w http.ResponseWriter, r *http.Request) {
clientOnce.Do(func() {
// Declare a separate err variable to avoid the shadowing client.
var err error
client, err = bigtable.NewClient(context.Background(), r.Header.Get("projectID"), r.Header.Get("instanceId"))
if err != nil {
http.Error(w, "Error initializing client", http.StatusInternalServerError)
log.Printf("bigtable.NewClient: %v", err)
return
}
})
tbl := client.Open(r.Header.Get("tableID"))
err := tbl.ReadRows(r.Context(), bigtable.PrefixRange("phone#"),
func(row bigtable.Row) bool {
osBuild := ""
for _, col := range row["stats_summary"] {
if col.Column == "stats_summary:os_build" {
osBuild = string(col.Value
Deploy the function
Nodejs
gcloud functions deploy get \
--runtime nodejs16 --trigger-http
You can store terabytes or even petabytes of data in Cloud Bigtable, a sparsely populated table that can scale to billions of rows and thousands of columns.
Does Bigtable support column-level security restrictions?
Bigtable does not support row-level, column-level, or cell-level security limitations.
What graph databases does Bigtable integrate with?
Google is not associated with this integration and does not support it.
What is a system integrator in cloud computing?
A system integrator offers a plan for the difficult process utilized to create a cloud platform.
What infrastructure management tools does Bigtable integrate with?
Tools for infrastructure management that Bigtable integrates with are described in this section.
Conclusion
This blog has extensively discussed the Basics of Integration Concept in Cloud BigTable, Create a Bigtable instance, Google Cloud Services, and Cloud Function. We hope this blog has helped you learn about the Basics of the Integration Concept in Cloud BigTable.If you want to learn more, check out the excellent content on the Coding Ninjas Website: