Code360 powered by Coding Ninjas X Naukri.com. Code360 powered by Coding Ninjas X Naukri.com
Last Updated: Mar 27, 2024
Difficulty: Easy

Basic Concepts of BigQuery

Leveraging ChatGPT - GenAI as a Microsoft Data Expert
Speaker
Prerita Agarwal
Data Specialist @
23 Jul, 2024 @ 01:30 PM

Introduction

BigQuery is a data warehouse built in the cloud. It provides quick SQL queries and interactive analysis of huge datasets. BigQuery is designed to process read-only data. It was built using Google's Dremel technology.

BigQuery

In this article, we will discuss some basic concepts of BigQuery. This is the second part of the Big Query series. For the first and final part, you may visit Introduction to Big Query and Advanced Concepts of Big Query.

Load and Query Data

Load and Query Data

In this section, we will learn how to create datasets, load data, and query tables with the BQ tool:

Before you start

  1. Create an account if you do not have any.
     
  2. Go to the Project Selector page and click on Create a Google Cloud Project.
     
  3. Ensure that billing is enabled for the project.
     
  4. Enable the BigQuery API.
     
  5. Set up the gCloud CLI.

Create a dataset

  1. Create a dataset babynames.

    bq mk babynames
     
  2. Make sure that your project is created.

    bq ls

Load data into a table

  1. Load the source file yob2010.txt into a new table that's named names2010 in the babynames dataset.

    bq load babynames.names2010 yob2010.txt name:string,gender:string,count:integer
     
  2. Ensure that table names2010 is now shown in the babynames dataset:

    bq ls babynames
     
  3. Check if the table schema of your new names2010 table names: string, gender: string, and count: integer:

    bq show babynames.names2010

Query table data

  1. Check the most popular girls' names in the given data:

    bq query --use_legacy_sql=false \
        'SELECT
          name,
          count
        FROM
          `babynames.names2010`
        WHERE
          gender = "F"
        ORDER BY
          count DESC
        LIMIT 5;'
     
  2. Check the most popular boys' names in the given data:

    bq query --use_legacy_sql=false \
        'SELECT
          name,
          count
        FROM
          `babynames.names2010`
        WHERE
          gender = "M"
        ORDER BY
          count DESC
        LIMIT 5;'

 

The minimum count is 5 as the source data omits names with fewer than 5 occurrences.

Clean Up

Delete the project from your Google Cloud account to avoid incurring charges. Follow the below steps:

Delete the project

  1. Go to the Manage resources page in the Google Cloud console.
     
  2. Select the project that you wish to delete from the project list. Click on Delete.
     
  3. Type the project ID in the dialog and click on Shut down to delete.

Delete the resources

  1. We have to delete the babynames dataset.

    bq rm --recursive=true babynames
     
  2. Enter y to confirm your request.
Get the tech career you deserve, faster!
Connect with our expert counsellors to understand how to hack your way to success
User rating 4.7/5
1:1 doubt support
95% placement record
Akash Pal
Senior Software Engineer
326% Hike After Job Bootcamp
Himanshu Gusain
Programmer Analyst
32 LPA After Job Bootcamp
After Job
Bootcamp

Create and Use Tables

Create and Use Tables

There are many ways to create a table in BigQuery. Let's check them out.

  • By using CREATE TABLE.
  • By defining a table that makes use of a third-party data source.
  • By calling tables.insert API method.
  • By loading data.
  • By using the client library.
  • By using query results.

Create an empty table with a schema definition

There are four ways by which you can create an empty table with a schema definition:

  • Using the Google Cloud Console.
  • Schema inline using BQ command line tool.
  • Submit a JSON schema file by the bq command-line tool.
  • Using table resources.

We can also create an empty table without schema definition. Let's see how to create one.

Create an empty table without a schema definition

To create an empty table without a schema definition, we have to write code in Java Console:

import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryException;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.Schema;
import com.google.cloud.bigquery.StandardTableDefinition;
import com.google.cloud.bigquery.TableDefinition;
import com.google.cloud.bigquery.TableId;
import com.google.cloud.bigquery.TableInfo;


public class CreateTableWithoutSchema {


  public static void main(String[] args) {
    String datasetName = "MY_DATASET_NAME";
    String tableName = "MY_TABLE_NAME";
    createTableWithoutSchema(datasetName, tableName);
  }


  public static void createTableWithoutSchema(String datasetName, String tableName) {
    try {
      BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();


      TableId tableId = TableId.of(datasetName, tableName);
      TableDefinition tableDefinition = StandardTableDefinition.of(Schema.of());
      TableInfo tableInfo = TableInfo.newBuilder(tableId, tableDefinition).build();


      bigquery.create(tableInfo);
      System.out.println("Table created successfully");
    } catch (BigQueryException e) {
      System.out.println("Table was not created. \n" + e.toString());
    }
  }
}

Create a table using query result

The public bikeshare_trips table's data is used to create the trips table in the example that comes using the CREATE TABLE statement:

  1. Go to the BigQuery page in the Google Cloud Console.
     
  2. Give the statement in the query editor.

    CREATE TABLE mydataset.trips AS (
      SELECT
        bikeid,
        start_time,
        duration_minutes
      FROM
        bigquery-public-data.austin_bikeshare.bikeshare_trips
    );
     
  3. Click on Run.

Partitioned Tables

Partitioned Tables

A table that is divided into sections is known as Partition Tables. This is a special table that helps you manage and query your data. The sections in these tables offer improved performance and cost control by using fewer bytes.

There are three types of Partitioning in tables:

  • Time Unit Column.
  • Ingestion Time.
  • Integer range.

Loading CSV data 

There are two things that may happen while you load your CSV data from Cloud Storage:

  • Create a new table or partition.
  • Append to or overwrite an existing table.

Loading data into a table

You can use a lot of methods to load data into the table. Choose the process from the list given below.

  • Console.
  • SQL
  • BQ
  • API
  • C#
  • Go
  • Java
  • Node.js
  • PHP
  • Python
  • Ruby

 

Here, we will discuss loading data into a table using SQL. The below example will load a CSV file into the mytable.

  1. Go to the BigQuery page in the Google Cloud Console.
     
  2. Enter the given query in the query editor.

    LOAD DATA OVERWRITE mydataset.mytable
    FROM FILES (
      format = 'CSV',
      uris = ['gs://bucket/path/file.csv']);
     
  3. Click on Run.

Column-based time partitioning

We will now learn the Column based time partitioning using Go. For this, first, we have to know the BigQuery quickstart using client libraries. Let's learn this using an example.

import (
        "context"
        "fmt"
        "time"


        "cloud.google.com/go/bigquery"
)


// importPartitionedTable demonstrates specifying time partitioning for a BigQuery table when loading
// CSV data from Cloud Storage.
func importPartitionedTable(projectID, destDatasetID, destTableID string) error {
        // projectID := "my-project-id"
        // datasetID := "mydataset"
        // tableID := "mytable"
        ctx := context.Background()
        client, err := bigquery.NewClient(ctx, projectID)
        if err != nil {
                return fmt.Errorf("bigquery.NewClient: %v", err)
        }
        defer client.Close()


        gcsRef := bigquery.NewGCSReference("gs://cloud-samples-data/bigquery/us-states/us-states-by-date.csv")
        gcsRef.SkipLeadingRows = 1
        gcsRef.Schema = bigquery.Schema{
                {Name: "name", Type: bigquery.StringFieldType},
                {Name: "post_abbr", Type: bigquery.StringFieldType},
                {Name: "date", Type: bigquery.DateFieldType},
        }
        loader := client.Dataset(destDatasetID).Table(destTableID).LoaderFrom(gcsRef)
        loader.TimePartitioning = &bigquery.TimePartitioning{
                Field:      "date",
                Expiration: 90 * 24 * time.Hour,
        }
        loader.WriteDisposition = bigquery.WriteEmpty


        job, err := loader.Run(ctx)
        if err != nil {
                return err
        }
        status, err := job.Wait(ctx)
        if err != nil {
                return err
        }


        if status.Err() != nil {
                return fmt.Errorf("job completed with error: %v", status.Err())
        }
        return nil
}

Frequently Asked Questions

What is BigQuery used for?

BigQuery is a fully managed firm data warehouse that offers inbuilt technologies like ML, geospatial analysis, and business intelligence to assist you in collecting and analyzing your data.

Does BigQuery use SQL?

Although a legacy SQL dialect is also offered, BigQuery supports the Google Standard SQL dialect. Google Standard SQL provides the widest breadth of capability. And it is the best choice if you're new to BigQuery. E.g., Google Standard SQL is the only database engine that supports capabilities like DDL and DML statements.

Is BigQuery an ETL tool?

Google BigQuery ETL is based on a cloud data warehouse that serves as an ETL solution through the help of SQL queries. Also, users can gain analytical insights from the serverless product with its inbuilt ML.

Conclusion

We have discussed the topic of Basic concepts of BigQuery. We have learned how to Load and query data, create and use tables, Partitioned tables, and load CSV data.

We hope this blog has helped you enhance your knowledge of the Basic concepts of BigQuery. If you want to learn more, check out our similar articles on: 

 

And many more on our platform Coding Ninjas Studio.

Readers also prefer to read the blogs shown below.

But suppose you have just started your learning process and are looking for questions from tech giants like Amazon, Microsoft, Uber, etc. In that case, you must look at the problemsinterview experiences, and interview bundles for placement trials.

However, you may consider our paid courses to give your career an edge over others!

Happy Learning!

Topics covered
1.
Introduction
2.
Load and Query Data
2.1.
Before you start
2.2.
Create a dataset
2.3.
Load data into a table
2.4.
Query table data
2.5.
Clean Up
2.5.1.
Delete the project
2.5.2.
Delete the resources
3.
Create and Use Tables
3.1.
Create an empty table with a schema definition
3.2.
Create an empty table without a schema definition
3.3.
Create a table using query result
4.
Partitioned Tables
5.
Loading CSV data 
5.1.
Loading data into a table
5.2.
Column-based time partitioning
6.
Frequently Asked Questions
6.1.
What is BigQuery used for?
6.2.
Does BigQuery use SQL?
6.3.
Is BigQuery an ETL tool?
7.
Conclusion