Table of contents
1.
Introduction
2.
Integrations with Bigtable
2.1.
Running JanusGraph on GKE with Cloud Bigtable
2.2.
Prepare your environment
2.3.
Create a Bigtable instance
3.
Install and configure Helm
3.1.
Use Helm to install JanusGraph and Elasticsearch.
3.2.
Load and query a sample dataset
3.3.
Clean up
4.
Create a Hadoop cluster
5.
Create a Cloud Storage bucket
6.
Create the Dataproc cluster
6.1.
Test the Dataproc cluster
6.2.
Run the sample Hadoop job
6.3.
Delete the Dataproc cluster
7.
Frequently Asked Questions
7.1.
What is Cloud Bigtable?
7.2.
Does Bigtable support column-level security restrictions?
7.3.
What graph databases does Bigtable integrate with?
7.4.
What is a system integrator in cloud computing?
7.5.
What infrastructure management tools does Bigtable integrate with?
8.
Conclusion
Last Updated: Mar 27, 2024

Advanced Concepts of Integration Concept in Cloud BigTable

Author Muskan Sharma
0 upvote
Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

While working on large data, we have to store the data in tables. So what to do when you have billions of rows and columns? For that, we have Cloud BigTable.

You can store terabytes or even petabytes of data in Cloud Bigtable, a large table that can contain billions of rows and thousands of columns.

So in this article, you'll get to learn about the Advanced Concepts of Integration Concept in Cloud BigTable.

Advanced Concepts of Integration Concept in Cloud BigTable

Integrations with Bigtable

Integrations between Cloud Bigtable and other products and services are discussed on this page.

Running JanusGraph on GKE with Cloud Bigtable

By simulating your data entities and the connections between them, graph databases can aid in the discovery of new insights. A graph database that permits handling vast volumes of data is called JanusGraph. This article demonstrates how to run JanusGraph on Google Cloud using Bigtable as the storage backend and Google Kubernetes Engine as the orchestration platform.

JanusGraph data in Bigtable

JanusGraph stores graph information as an adjacency list. A vertex, any adjacent vertices (edges), and any property metadata pertaining to the vertices and edges are all represented in a row. The vertex is uniquely identified by the row key. An edge or edge-property column is used to store each relationship between a vertex and another vertex, as well as any additional attributes that help characterize the relationship. According to Bigtable best practices, the column qualifier and column value record the information describing the edge. Again using both the column qualifier and column value to specify the property, each vertex property is kept as a separate column and is maintained in this way.

Prepare your environment

You enter commands into Cloud Shell in this tutorial. You can access the console's command line with Cloud Shell, which also comes with the Google Cloud CLI and other tools you'll need to create Google Cloud-based applications.

1. Activate Cloud Shell in the console.

Activate Cloud Shel

2. Set environment variables for the Compute Engine zone where your Bigtable cluster and GKE cluster will be created, as well as the name, node type, and version of your GKE cluster, in Cloud Shell:

export PROJECT_ID=PROJECT_ID
export GCP_ZONE=REGION
export GKE_CLUSTER_NAME=GKE_CLUSTER_NAME
export GKE_NODE_TYPE=n1-standard-4
export GKE_VERSION=1.20

3. In order to deploy JanusGraph, create a GKE cluster:

gcloud container clusters create ${GKE_CLUSTER_NAME} \
    --zone=${GCP_ZONE} \
    --cluster-version=${GKE_VERSION} \
    --machine-type ${GKE_NODE_TYPE} \
    --scopes "https://www.googleapis.com/auth/cloud-platform"

Create a Bigtable instance

This tutorial uses Bigtable, which can scale quickly to match your needs, as the storage backend for the JanusGraph. For this tutorial, a single-node cluster is both practical and adequate.

1. Set environment variable for your Bigtable instance identification in Cloud Shell:

    export BIGTABLE_INSTANCE_ID=BIGTABLE_INSTANCE_ID

2. Build a Bigtable instance:

gcloud bigtable instances create ${BIGTABLE_INSTANCE_ID} \
    --cluster-config=id=${BIGTABLE_INSTANCE_ID}-${GCP_ZONE},zone=${GCP_ZONE},nodes=1 \
    --display-name=${BIGTABLE_INSTANCE_ID}-${GCP_ZONE}

Install and configure Helm

To deploy applications to your Kubernetes cluster, utilise Helm. The JanusGraph and Elasticsearch services are both deployed using Helm in this tutorial on your GKE cluster.

Install and configure Helm

1. Install Helm in Cloud Shell.

curl -fsSL -o get_helm.sh \
    https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3
chmod 700 get_helm.sh
DESIRED_VERSION=v3.5.0 ./get_helm.sh

2. Add the elastic chart repository so that the JanusGraph chart deployment can locate the Elasticsearch chart dependency:

helm repo add elastic https://helm.elastic.co

Use Helm to install JanusGraph and Elasticsearch.

From GitHub, the Helm chart is downloaded. Three JanusGraph Pods are deployed as part of the deployment in the Helm chart repository, and they are placed behind a Service that launches an internal HTTP(S) load balancer.

1. Set the following environment variables in Cloud Shell for Helm and JanusGraph names:

export HELM_REPO=bigtable-janusgraph-helm
export JANUSGRAPH_VERSION=0.5.3
export HELM_CHART_RELEASE_VERSION=1
export HELM_CHART_RELEASE_TAG=${JANUSGRAPH_VERSION}-${HELM_CHART_RELEASE_VERSION}
export HELM_CHART_RELEASE_TAG_HASH=f8b271a4854d4a553dd5e9ba014d077fb098d9ab
export HELM_CHART_NAME=janusgraph-bigtable

2. Download the Helm diagram from GitHub:

git clone https://github.com/GoogleCloudPlatform/${HELM_REPO} \
   --branch ${HELM_CHART_RELEASE_TAG}

3. Find the Helm chart directory here:

cd ${HELM_REPO}

4. Use the commit hash to check the following for security reasons:

HEAD_COMMIT_HASH=$(git rev-parse --verify HEAD)
if [ _${HEAD_COMMIT_HASH} == _${HELM_CHART_RELEASE_TAG_HASH} ]
then
    echo "Commit hash verified"
fi

5. Dependencies on updated charts:

helm dep update

6. Go to the parent directory by clicking here:

cd ..

7. Put Helm and JanusGraph entity names into environment variables:

export HELM_RELEASE_NAME=janusgraph-bigtable-elastic
export ELASTICSEARCH_CLUSTER_NAME=${HELM_RELEASE_NAME}-elasticsearch
export BIGTABLE_JANUSGRAPH_TABLE=janusgraph-table

8. In order to provide Helm with the configuration properties to utilize while deploying the JanusGraph graphic, create a values.yaml file as follows:

cat > values.yaml << EOF

image:
  repository: docker.io/janusgraph/janusgraph
  tag: 0.5.3
  pullPolicy: IfNotPresent

replicaCount: 3

Service:
  type: LoadBalancer
  port: 8182
  serviceAnnotations:
    networking.gke.io/load-balancer-type: "Internal"

elasticsearch:
  deploy: true
  clusterName: ${ELASTICSEARCH_CLUSTER_NAME}

Properties:
  storage.backend: hbase
  storage.directory: null
  storage.hbase.ext.google.bigtable.instance.id: ${BIGTABLE_INSTANCE_ID}
  storage.hbase.ext.google.bigtable.project.id: ${PROJECT_ID}
  storage.hbase.ext.hbase.client.connection.impl: com.google.cloud.bigtable.hbase2_x.BigtableConnection
  storage.hbase.short-cf-names: true
  storage.hbase.table: ${BIGTABLE_JANUSGRAPH_TABLE}
  index.search.backend: elasticsearch
  index.search.hostname: ${ELASTICSEARCH_CLUSTER_NAME}-master
  index.search.directory: null
  index.search.elasticsearch.health-request-timeout: 90s
  cache.db-cache: true
  cache.db-cache-clean-wait: 20
  cache.db-cache-time: 180000
  cache.db-cache-size: 0.5
  cluster.max-partitions: 1024
  graph.replace-instance-if-exists: true

persistence:
  enabled: false

debugLevel: INFO
EOF

9. Utilizing the values.yaml file you made, deploy the JanusGraph Helm chart:

helm upgrade --install \
             --wait \
              --timeout 600s \
              ${HELM_RELEASE_NAME} \
              ./${HELM_REPO} \
              -f values.yaml

Load and query a sample dataset

You can start loading and querying your data once JanusGraph has been deployed and you have established a connection to it using Gremlin.

1. Load the earlier-created graph into Gremlin.

GraphOfTheGodsFactory.load(graph)

2. Send a graph traversal search to discover all Jupiter brothers:

g.V().has('name', 'jupiter').out('brother').values('name')

Clean up

Either delete the project containing the resources or keep the project and delete the specific resources to prevent charges for the resources used in this tutorial from being applied to your Google Cloud account.

Create a Hadoop cluster

One or more Compute Engine instances that can connect to a Cloud Bigtable instance and execute Hadoop tasks can be created using Dataproc.

Create a Hadoop cluster

Create a Cloud Storage bucket

Temporary files are kept in a Cloud Storage bucket by Dataproc. Make a separate bucket for Dataproc to avoid file naming conflicts.

gsutil mb -p [PROJECT_ID] gs://[BUCKET_NAME]

Create the Dataproc cluster

Run the following command, changing the values in brackets with the necessary values, to build a Dataproc cluster with four worker nodes:

cloud dataproc clusters create [DATAPROC_CLUSTER_NAME] --bucket [BUCKET_NAME] \
    --zone [ZONE] --num-workers 4 --master-machine-type n1-standard-4 \
    --worker-machine-type n1-standard-4

Test the Dataproc cluster

Once your Dataproc cluster is configured, you can test it by running a sample Hadoop job that counts the occurrences of the particular word in a text file.

Run the sample Hadoop job

1. Go to the directory java/data proc-wordcount in the folder where you cloned the GitHub source.

2. To build the project, issue the following command, changing the values in brackets with the proper ones:

mvn clean package -Dbigtable.projectID=[PROJECT_ID] \
    -Dbigtable.instanceID=[BIGTABLE_INSTANCE_ID]

3. Start the Hadoop job by entering the following command, substituting the values in brackets with the required values:

  ./cluster.sh start [DATAPROC_CLUSTER_NAME]

Delete the Dataproc cluster

Run the following command to terminate and delete the Dataproc cluster once you have finished using it, substituting [DATAPROC CLUSTER NAME] for the name of your Dataproc cluster:

    gcloud dataproc clusters delete [DATAPROC_CLUSTER_NAME]

Frequently Asked Questions

What is Cloud Bigtable?

You can store terabytes or even petabytes of data in Cloud Bigtable, a sparsely populated table that can scale to billions of rows and thousands of columns.

Does Bigtable support column-level security restrictions?

Bigtable does not support row-level, column-level, or cell-level security limitations.

What graph databases does Bigtable integrate with?

Google is not associated with this integration and does not support it.

What is a system integrator in cloud computing?

A system integrator offers a plan for the difficult process utilized to create a cloud platform.

What infrastructure management tools does Bigtable integrate with?

Tools for infrastructure management that Bigtable integrates with are described in this section.

Conclusion

This blog has extensively discussed the Advanced Concepts of Integration Concept in Cloud, Running JanusGraph, Creating Hadoop cluster, etc. We hope this blog has helped you learn about the Advanced Concepts of Integration Concept in Cloud BigTable. If you want to learn more, check out the excellent content on the Coding Ninjas Website:

Overview of cloud Bigtable , Overview of cloud billing concepts

Refer to our guided paths on the Coding Ninjas Studio platform to learn more about DSA, DBMS, Competitive Programming, Python, Java, JavaScript, etc. 

Refer to the links problemstop 100 SQL problemsresources, and mock tests to enhance your knowledge.

For placement preparations, visit interview experiences and interview bundles.

Thank You Image

Do upvote our blog to help other ninjas grow. Happy Coding!

Live masterclass