Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Table of contents
1.
Introduction
2.
Overview Of Cloud Inference API
3.
Upload and query time-series data
3.1.
Before you begin
3.2.
Upload a Dataset
3.3.
Get the status of uploaded datasets
3.4.
Query a Loaded Dataset
4.
Read from BigQuery
5.
Scoring Guide
5.1.
Background probability exponent
5.2.
Probability of Rare Terms
5.3.
Timespan Parameters
6.
Frequently Asked Questions
6.1.
How many different types of storage are available in GCP?
6.2.
What exactly is the API in Google Cloud?
6.3.
Which hypervisor does Google use in GCP?
7.
Conclusion
Last Updated: Mar 27, 2024

Cloud Inference API

Author Harsh
0 upvote

Introduction

Many companies rely on time-series analysis on a daily basis. The most common use cases are evaluating foot traffic and conversion for stores, detecting data abnormalities, establishing correlations over sensor data in real-time, and delivering high-quality suggestions. 

cloud inference api

With Cloud Inference API Alpha, you can extract insights from typed time-series datasets in real-time. With Cloud Inference API you can:

1. Using event time markers, you may detect patterns and abnormalities.
 

2. Handle datasets with tens of billions of events and perform thousands of queries per second.
 

3. For interactive, user-facing apps, rely on minimal latency and leverage the API as a recommendation back end.
 

4. Cloud Inference API is completely managed, allowing you to focus on insights rather than infrastructure.

Overview Of Cloud Inference API

overview

 

Before going into the depths of Cloud Inference API, one should familiarize himself with the following concepts:
 

1. An 'event' is a single data entry. For instance, the pressure is 110 PSI.
 

2. Each event must have a start time and, if possible, an end time.
 

3. Each event has a type, which is referred to as the data name.
 

4. A 'group' is a logical grouping of events. Data of any type can be included in groups.
 

5. Each group additionally contains a series of events for time-based aggregation. This data is created automatically based on the timestamps of the specified events.
 

6. The system's scoring operates across groups. When you query the system for patterns and trends, the unit of aggregation is groups.
 

7. Some per-project limits are enforced by the Cloud Inference API. Clients can list, create, and remove datasets at up to 1 QPS per second. They have a query rate of up to 10 QPS and an ingest rate of up to 100 QPS.

Upload and query time-series data

Update time-series data

 

Google Search and Analysis tools for time-series data can be quickly and easily integrated into your applications thanks to the Cloud Inference API. You may process time-series datasets and run inference queries over loaded datasets using the Cloud Inference API.

Before you begin

Before you start uploading or querying time-series data. You need to do the following:

1. Choose or create a Google Cloud project via the Google Cloud console's project selector page.
 

2. A cloud project should have billing enabled.
 

3. Enable the Cloud Inference API.
 

4. Create a service account.
 

5. Create a service account key.
 

6. Set the location to the JSON file containing your service account key as the value for the environment variable GOOGLE_APPLICATION_CREDENTIALS.

export GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH"


Replacing the key_path with the actual path of your key.
 

7. Install and initialize the Google Cloud CLI.
 

Upload a Dataset

Upload Dataset

 

To construct a Google Cloud Inference API dataset using the createdataset REST function, follow these steps:
 

1. Create a JSON request file with the following code and save it as a plain text file named create-gdelt-dataset.json:
 

{
  "name":"gdelt_2018_04_data",
  "data_names": [
    "PageURL",
    "PageDomain",
    "PageCountry",
    "PageLanguage",
    "PageTextTheme",
    "PageTextGeo",
    "ImageURL",
    "ImagePopularityRawScore",
    "ImagePopularity",
    "ImageSafeSearch",
    "ImageLabel",
    "ImageWebEntity",
    "ImageWebEntityBestGuessLabel",
    "ImageGeoLandmark",
    "ImageFaceToneHas"
  ],
  "data_sources": [
    { "uri":"gs://inference-gdelt-demo/inference-gdelt-demo.201804.json" },
  ]
}


This JSON snippet indicates that we plan to build a dataset of GDELT-annotated news items on which to conduct Cloud Inference API queries.
 

2. Check for authorization token using the following command:

gcloud auth application-default print-access-token
 

3. Make a createdataset request using curl, handing it the access token and the filename of the JSON request you produced in step 1.

curl -s -H "Content-Type: application/json" \
  -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
  https://infer.googleapis.com/v1/projects/The_Project_number/datasets \
  -d @create-gdelt-dataset.json


You should get the following response in the console:

{
  "name": "gdelt_2018_04_data",
  "state": "STATE_PENDING"
}

 

Get the status of uploaded datasets

Uploaded data status

Using the ListDataSets REST method, you can find out the status of each dataset that you supplied to the Cloud Inference API from your client project for processing.

Use the below command to get the status of dataset

curl -s -H "Content-Type: application/json"  -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" https://infer.googleapis.com/v1/projects/PROJECT_NUMBER/datasets 


Replace PROJECT_NUMBER with actual value of your project.

Query a Loaded Dataset

1. Make the following JSON request file and store it as query-gdelt-dataset.json plain text file:

{
  "name": "gdelt_2018_04_data",
  "queries": [{
    "query": {
      "type": "TYPE_TERM",
      "term": {
      "name": "ImageWebEntity",
      "value": "Vacation"
      }
    },
    "distribution_configs": {
      "bgprob_exp": 0.7,
      "data_name": "ImageLabel",
      "max_result_entries": 5,
    }
  }]
}


This JSON snippet indicates that we wish to query the 'gdelt 2018 04 data' dataset, which we previously submitted to the Cloud Inference API via a createdataset call that was reported as STATE LOADED.
 

2. Obtain an access token in the same way as we did for Create dataset above.
 

3. Make a query request using curl, handing it the access token and the filename of the JSON request you created in step 1.

curl -s -H "Content-Type: application/json" \
  -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
https://infer.googleapis.com/v1/projects/PROJECT_NUMBER/datasets/gdelt_2018_04_data:query \
  -d @query-gdelt-dataset.json


 4. You should now get the data queried in the form of a JSON object.

Read from BigQuery

Inference API also allows reading from BigQuery tables in addition to JSON files on Google Cloud Storage. The following columns must be present in the BigQuery table to be handled by Inference API:

1. group_id in INTEGER type
 

2. data_name in STRING type
 

3. data_value in STRING type
 

4. start_time in TIMESTAMP type
 

5. end_time (optional) in TIMESTAMP typeBigQuery table.
 

First, create and upload your dataset as explained above. Then, once the dataset has been loaded, you may query it. You must provide the whole table ID in the format <project id>:<dataset name>.<table name> to utilize your own BigQuery table. The account used to use the Inference API must also be a BigQuery Data Viewer for the table.

inference gdelt demo

Source: Google Cloud Docs

Scoring Guide

Scoring Guide

 

In response to requests, the Cloud Inference API delivers a set of distributions. A distribution entry is an event with a score. Conditional probability lies at the heart of Inference scoring: The likelihood of an event occurring inside your dataset if the query also happens. These results can be summarised as follows:

P(event | query) / P(event)exp.

Background probability exponent

The exp in this formula represents the bgprobExp, a critical parameter that controls how the background probability is included in the score. The background probability is simply the likelihood of an event occurring for a random group in the dataset regardless of whether or not the query occurs.

It returns the raw conditional probability if the background parameter is 0. When the parameter is 1, a pure ratio known as a lift score is returned. The lift score indicates how much more or less likely, in relation to the baseline, the co-occurring event is.

BgprobExp's default setting of 0.7 falls somewhere in the middle of these two extremes. While maintaining some scoring weight for event popularity, this fine-tunes the scores to return occurrences that are unusual within the context of your dataset.
 

Example:

The example gdelt_2018_04 data will be used to demonstrate how bgprobExp can expose various features of a dataset. Try sending a request that includes the compound query below and sets bgprobExp to 0.0. This search identifies groups of articles where the tagged news photos include happy people and the text is from the United Kingdom.

{
  "name": "gdelt_2018_04_data",
  "queries": [{
    "query": {
      "type": "TYPE_AND",
      "children": [{
        "type": "TYPE_TERM",
        "term": {
          "name": "ImageFaceToneHas",
          "value": "Joy"
        }
      },{
        "type": "TYPE_TERM",
        "term": {
          "name": "PageTextGeo",
          "value": "United Kingdom"
        }
      }]
    },
    "distribution_configs": {
      "data_name": "ImageWebEntity",
      "bgprobExp": 0.0,
      "max_result_entries": 5
    }
  }]
}

 

The set of scored results returned will only contain conditional probabilities because bgprobExp is set to zero. The event score will result in ~0.13.

To see the difference set the value of bgprobExp to 0.7 and run the query again. You will notice that this time the event score will be ~5.0 

Probability of Rare Terms

The conditional probabilities P(event | query) returned by the Inference API may be lower than expected given the raw group counts in your data. The Inference API is intended to prevent returning very uncommon and potentially noisy events. The bottom bound of a 90 percent confidence range is returned instead of the direct probability estimate. For unusual incidents, this may be significantly lower than the estimate based just on group count.

Timespan Parameters

The Inference API will examine P(event | query) in terms of full groups by default: if a group fits the query, the whole set of events in the group is assumed to co-occur with the query. Setting the timespan options max before timespan and max after timespan to a certain set might limit which events are aggregated.

Each event that fits the query inside a group is deemed a "hit." If the timespan parameters are supplied, the aggregation will take place only within the time constraints indicated.

Frequently Asked Questions

Frequently Asked Questions

 

How many different types of storage are available in GCP?

Google Cloud offers three major storage services: Persistent Disks for block storage, Filestore for network file storage, and Cloud Storage for object storage.
 

What exactly is the API in Google Cloud?

The Google Cloud APIs provide programmatic access to Google Cloud Platform services. They are an important component of the Google Cloud Platform, letting you effortlessly add the power of computing, networking, storage, and machine-learning-based data analysis to your apps.
 

Which hypervisor does Google use in GCP?

KVM serves as the hypervisor for Google Compute Engine, which also supports guest images running Linux and Windows that are utilized to launch virtual machines with 64-bit x86 architectures.
 

Conclusion

In this blog, we discussed the Cloud Inference API, uploading and querying time series data, reading from BigQuery, and scoring guide along with timespan parameters.


Cheers, you have reached the end. Hope you liked the blog and it has added some knowledge to your life. Please look at these similar topics to learn more: BigQueryGCP vs AWS, and GCP Certifications.
 

Refer to our Coding Ninjas Studio Guided Path to learn Data Structures and Algorithms, Competitive Programming, JavaScript, System Design, and even more! You can also check out the mock test series and participate in the contests hosted by Coding Ninjas Studio! But say you're just starting and want to learn about questions posed by tech titans like Amazon, Microsoft, Uber, and so on. In such a case, for placement preparations, you can also look at the problemsinterview experiences, and interview bundle.

You should also consider our premium courses to offer your career advantage over others!

Please upvote our blogs if you find them useful and exciting!

Thank You

Happy Coding!

Live masterclass