Upload and query time-series data
Google Search and Analysis tools for time-series data can be quickly and easily integrated into your applications thanks to the Cloud Inference API. You may process time-series datasets and run inference queries over loaded datasets using the Cloud Inference API.
Before you begin
Before you start uploading or querying time-series data. You need to do the following:
1. Choose or create a Google Cloud project via the Google Cloud console's project selector page.
2. A cloud project should have billing enabled.
3. Enable the Cloud Inference API.
4. Create a service account.
5. Create a service account key.
6. Set the location to the JSON file containing your service account key as the value for the environment variable GOOGLE_APPLICATION_CREDENTIALS.
export GOOGLE_APPLICATION_CREDENTIALS="KEY_PATH"
Replacing the key_path with the actual path of your key.
7. Install and initialize the Google Cloud CLI.
Upload a Dataset
To construct a Google Cloud Inference API dataset using the createdataset REST function, follow these steps:
1. Create a JSON request file with the following code and save it as a plain text file named create-gdelt-dataset.json:
{
"name":"gdelt_2018_04_data",
"data_names": [
"PageURL",
"PageDomain",
"PageCountry",
"PageLanguage",
"PageTextTheme",
"PageTextGeo",
"ImageURL",
"ImagePopularityRawScore",
"ImagePopularity",
"ImageSafeSearch",
"ImageLabel",
"ImageWebEntity",
"ImageWebEntityBestGuessLabel",
"ImageGeoLandmark",
"ImageFaceToneHas"
],
"data_sources": [
{ "uri":"gs://inference-gdelt-demo/inference-gdelt-demo.201804.json" },
]
}
This JSON snippet indicates that we plan to build a dataset of GDELT-annotated news items on which to conduct Cloud Inference API queries.
2. Check for authorization token using the following command:
gcloud auth application-default print-access-token
3. Make a createdataset request using curl, handing it the access token and the filename of the JSON request you produced in step 1.
curl -s -H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
https://infer.googleapis.com/v1/projects/The_Project_number/datasets \
-d @create-gdelt-dataset.json
You should get the following response in the console:
{
"name": "gdelt_2018_04_data",
"state": "STATE_PENDING"
}
Get the status of uploaded datasets
Using the ListDataSets REST method, you can find out the status of each dataset that you supplied to the Cloud Inference API from your client project for processing.
Use the below command to get the status of dataset
curl -s -H "Content-Type: application/json" -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" https://infer.googleapis.com/v1/projects/PROJECT_NUMBER/datasets
Replace PROJECT_NUMBER with actual value of your project.
Query a Loaded Dataset
1. Make the following JSON request file and store it as query-gdelt-dataset.json plain text file:
{
"name": "gdelt_2018_04_data",
"queries": [{
"query": {
"type": "TYPE_TERM",
"term": {
"name": "ImageWebEntity",
"value": "Vacation"
}
},
"distribution_configs": {
"bgprob_exp": 0.7,
"data_name": "ImageLabel",
"max_result_entries": 5,
}
}]
}
This JSON snippet indicates that we wish to query the 'gdelt 2018 04 data' dataset, which we previously submitted to the Cloud Inference API via a createdataset call that was reported as STATE LOADED.
2. Obtain an access token in the same way as we did for Create dataset above.
3. Make a query request using curl, handing it the access token and the filename of the JSON request you created in step 1.
curl -s -H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
https://infer.googleapis.com/v1/projects/PROJECT_NUMBER/datasets/gdelt_2018_04_data:query \
-d @query-gdelt-dataset.json
4. You should now get the data queried in the form of a JSON object.
Read from BigQuery
Inference API also allows reading from BigQuery tables in addition to JSON files on Google Cloud Storage. The following columns must be present in the BigQuery table to be handled by Inference API:
1. group_id in INTEGER type
2. data_name in STRING type
3. data_value in STRING type
4. start_time in TIMESTAMP type
5. end_time (optional) in TIMESTAMP typeBigQuery table.
First, create and upload your dataset as explained above. Then, once the dataset has been loaded, you may query it. You must provide the whole table ID in the format <project id>:<dataset name>.<table name> to utilize your own BigQuery table. The account used to use the Inference API must also be a BigQuery Data Viewer for the table.
Source: Google Cloud Docs
Scoring Guide
In response to requests, the Cloud Inference API delivers a set of distributions. A distribution entry is an event with a score. Conditional probability lies at the heart of Inference scoring: The likelihood of an event occurring inside your dataset if the query also happens. These results can be summarised as follows:
P(event | query) / P(event)exp.
Background probability exponent
The exp in this formula represents the bgprobExp, a critical parameter that controls how the background probability is included in the score. The background probability is simply the likelihood of an event occurring for a random group in the dataset regardless of whether or not the query occurs.
It returns the raw conditional probability if the background parameter is 0. When the parameter is 1, a pure ratio known as a lift score is returned. The lift score indicates how much more or less likely, in relation to the baseline, the co-occurring event is.
BgprobExp's default setting of 0.7 falls somewhere in the middle of these two extremes. While maintaining some scoring weight for event popularity, this fine-tunes the scores to return occurrences that are unusual within the context of your dataset.
Example:
The example gdelt_2018_04 data will be used to demonstrate how bgprobExp can expose various features of a dataset. Try sending a request that includes the compound query below and sets bgprobExp to 0.0. This search identifies groups of articles where the tagged news photos include happy people and the text is from the United Kingdom.
{
"name": "gdelt_2018_04_data",
"queries": [{
"query": {
"type": "TYPE_AND",
"children": [{
"type": "TYPE_TERM",
"term": {
"name": "ImageFaceToneHas",
"value": "Joy"
}
},{
"type": "TYPE_TERM",
"term": {
"name": "PageTextGeo",
"value": "United Kingdom"
}
}]
},
"distribution_configs": {
"data_name": "ImageWebEntity",
"bgprobExp": 0.0,
"max_result_entries": 5
}
}]
}
The set of scored results returned will only contain conditional probabilities because bgprobExp is set to zero. The event score will result in ~0.13.
To see the difference set the value of bgprobExp to 0.7 and run the query again. You will notice that this time the event score will be ~5.0
Probability of Rare Terms
The conditional probabilities P(event | query) returned by the Inference API may be lower than expected given the raw group counts in your data. The Inference API is intended to prevent returning very uncommon and potentially noisy events. The bottom bound of a 90 percent confidence range is returned instead of the direct probability estimate. For unusual incidents, this may be significantly lower than the estimate based just on group count.
Timespan Parameters
The Inference API will examine P(event | query) in terms of full groups by default: if a group fits the query, the whole set of events in the group is assumed to co-occur with the query. Setting the timespan options max before timespan and max after timespan to a certain set might limit which events are aggregated.
Each event that fits the query inside a group is deemed a "hit." If the timespan parameters are supplied, the aggregation will take place only within the time constraints indicated.
Frequently Asked Questions
How many different types of storage are available in GCP?
Google Cloud offers three major storage services: Persistent Disks for block storage, Filestore for network file storage, and Cloud Storage for object storage.
What exactly is the API in Google Cloud?
The Google Cloud APIs provide programmatic access to Google Cloud Platform services. They are an important component of the Google Cloud Platform, letting you effortlessly add the power of computing, networking, storage, and machine-learning-based data analysis to your apps.
Which hypervisor does Google use in GCP?
KVM serves as the hypervisor for Google Compute Engine, which also supports guest images running Linux and Windows that are utilized to launch virtual machines with 64-bit x86 architectures.
Conclusion
In this blog, we discussed the Cloud Inference API, uploading and querying time series data, reading from BigQuery, and scoring guide along with timespan parameters.
Cheers, you have reached the end. Hope you liked the blog and it has added some knowledge to your life. Please look at these similar topics to learn more: BigQuery, GCP vs AWS, and GCP Certifications.
Refer to our Coding Ninjas Studio Guided Path to learn Data Structures and Algorithms, Competitive Programming, JavaScript, System Design, and even more! You can also check out the mock test series and participate in the contests hosted by Coding Ninjas Studio! But say you're just starting and want to learn about questions posed by tech titans like Amazon, Microsoft, Uber, and so on. In such a case, for placement preparations, you can also look at the problems, interview experiences, and interview bundle.
You should also consider our premium courses to offer your career advantage over others!
Please upvote our blogs if you find them useful and exciting!
Happy Coding!