Introduction
Google Cloud Platform, acronymed as GCP, is a collection of cloud computing services running on the same internal infrastructure that Google uses for its end-user products. Data Catalog, which will be discussed in this blog further, is one such service provided by Google.
When we say end-user products, it refers to those products that are used directly by the consumers, for example, Gmail, Google Search, YouTube, and Google Drive.There are several types of products offered by Google, such as Storage and Databases products, Networking Products, Big Data Products, etc.

Data Catalog
Data Catalog is a management service provided by Google that is fully managed and scalable within Dataplex. Dataplex is like a data fabrication that facilitates the unification of distributed data and automated data management. A large number of organizations have now realized the importance of informed decision-making, and thus, nowadays, they call organized data as data assets. Data Catalog has been helping and will continue to help organizations manage their data assets, search for insightful data, understand data, and make that useful for their firm.

With Data Catalog, clients can gain a unified view, technical and business metadata, and efficient data management capabilities. The three prominent data catalog functions include searching for data entries, tagging data entries, and facilitating column-level security for BigQuery tables.
Tags and Tag Templates
Handling a large number of data entries is quite difficult. The difficulty increases further if these entries are used by different groups of the same organization because of their varying needs. Often it was found that each group is creating its own set of data entries and metadata describing the same data resulting in duplication of efforts and incomplete information. Data Catalog has come up with a solution of using tags. Tags have enabled the organizations to create, search, and manage these data entries and metadata.
The two key Data Catalog concepts are Tags and Templates. We will discuss both these templates in this blog ahead.

🍁Tags
Tags in the data catalog are like any other tag which are used to provide context. You need to attach custom metadata fields along with data entries which will serve as tags for those data entries. The addition of tags gives a meaningful context to anyone who wants to use that asset. The tags are of two types, that is, Public and Private, differentiated on the basis of their use and advantages.
🌻 Private Tags
Private Tags come with strict access controls. Searching and viewing these tags and data entries associated with these tags can be done only if the required permissions are granted on both the data entries and the private tag template.
🌻 Public Tags
Public Tags come with less strict access controls. Any user with required view permissions for a data entry will have the permission to view all the public tags associated. Searching and viewing these tags become more accessible and easy.
🍁 Tag Templates
One or more tag templates are needed if you want to start tagging data. A tag template can be public or private, in which public is set by default when you create any tag template. A tag template is a collection of metadata key-value pairs known as fields. If you have a set of tag templates, it is like having a schema for your data.
To help users start better, Data Catalog provides them with a gallery of sample tag templates that will illustrate common tagging use cases. To use any template gallery, you will have to go to the Tag Templates page. Then, click on Create tag template option. You will see a template gallery displayed as a part of Create Template page.
How to Tag a BigQuery Table using Data Catalog
Before starting up the whole process, make sure to set up a project. To set up a project, you need to follow the steps below.
- Create an account in Google Cloud, and then using the Console page, create a Google Cloud Project.
- Enable a few options such as Data Catalog and BigQuery APIs, install Google Cloud CLI, and initialize it.
- Now, when you are done with building up the project, add a public data entry to your project using the Explorer section of the BigQuery page.
- After that, create a dataset using the Actions icon of the Explorer panel.
-
Once you are done creating a dataset, copy a publicly accessible table to that dataset using the copy table pane under the Explorer pane.
You are done setting up a project. Now, we will proceed with the steps necessary to tag a BigQuery Table.
🌻 Creating a Template and Attaching it
To create a Template as well as to attach it, you need to follow the steps listed below.
- Open the Dataplex Tag Templates page.
- Create a Tag Template and add the necessary details and click Create.
- After that, go to the Dataplex search page and search for your dataset.
- In the results, you will see the dataset and the table. Click on the table.
-
A page opens. Attach tags using Attach Tags panel and click save.
You can create an overview using the same table by using Add overview option.
🌻 Deletion
You can delete a Tag Template, a dataset, or even the complete project as per your requirements.
To delete a Tag Template, you need to go to the Templates page in the Data Catalog window. Under the Demo Tag Template option, click on Actions and delete the template.
To delete a dataset, you need to use the BigQuery page. From there, under the Explorer panel, search for the dataset and click on Actions options and then delete the opted dataset.
To delete the complete project, you need to open the Manage Resources page. From the project list, select the project and then delete it.
Searching Data Assets with Data Catalog
You are provided with several options from which you can choose any option as per your convenience, using which you will search data assets. The options available are Using Console and Filters, Implementation in Java, Implementation in Node.js, Implementation in Python, and Using REST & CMD LINE.

You can use any of these methods to search for your desired data assets by following the necessary steps. Here, we will see the implementation in Java.
import com.google.cloud.datacatalog.v1.DataCatalogClient;
import com.google.cloud.datacatalog.v1.DataCatalogClient.SearchCatalogPagedResponse;
import com.google.cloud.datacatalog.v1.SearchCatalogRequest;
import com.google.cloud.datacatalog.v1.SearchCatalogRequest.Scope;
import com.google.cloud.datacatalog.v1.SearchCatalogResult;
import java.io.IOException;
// Sample to search catalog
public class SearchAssets
{
public static void main(String[] args)
{
String projectId = "my-project-id";
String query = "type=dataset";
searchCatalog(projectId, query);
}
public static void searchCatalog(String projectId, String query) throws IOException
{
// Create a scope object setting search boundaries to the given organization.
// Scope scope = Scope.newBuilder().addIncludeOrgIds(orgId).build();
// Alternatively, search using project scopes.
Scope scope = Scope.newBuilder().addIncludeProjectIds(projectId).build();
// Initializing clients that can be used for sending requests. This client needs to be created only once, and can be reused for several requests. After
// completing every request of yours, call the close method on the clients to safely clean up any remaining background resources.
try (DataCatalogClient dataCatalogClient = DataCatalogClient.create())
{
// Search the catalog.
SearchCatalogRequest searchCatalogRequest = SearchCatalogRequest.newBuilder().setScope(scope).setQuery(query).build();
SearchCatalogPagedResponse response = dataCatalogClient.searchCatalog(searchCatalogRequest);
System.out.println("Search results:");
for (SearchCatalogResult result : response.iterateAll())
{
System.out.println(result);
}
}
}
}
Viewing Data Assets with Data Catalog
Data Catalog can also be used to view table details within the Cloud Console. You need to follow the steps below to view any table.
- Open the Dataplex Search page and in the search box, type the name of the dataset whose table you want to view.
- Click on the table. A BigQuery table detail opens.
The table details include Tags, Schema and Column Tags, and other details.
Creating Custom Data Catalog Entries
You can use the implementation in Java below to create custom data catalog entries.
import com.google.cloud.datacatalog.v1.ColumnSchema;
import com.google.cloud.datacatalog.v1.CreateEntryGroupRequest;
import com.google.cloud.datacatalog.v1.CreateEntryRequest;
import com.google.cloud.datacatalog.v1.DataCatalogClient;
import com.google.cloud.datacatalog.v1.Entry;
import com.google.cloud.datacatalog.v1.EntryGroup;
import com.google.cloud.datacatalog.v1.LocationName;
import com.google.cloud.datacatalog.v1.Schema;
import java.io.IOException;
// Sample to create custom entry
public class CreateEntry
{
public static void main(String[] args) throws IOException
{
String projectId = "my-project";
String entryGroupId = "onprem_entry_group";
String entryId = "onprem_entry_id";
createEntry(projectId, entryGroupId, entryId);
}
public static void createCustomEntry(String projectId, String entryGroupId, String entryId) throws IOException
{
// Currently, Data Catalog stores metadata in the us-central1 region.
String location = "us-central1";
// Initializing clients that can be used for sending requests. This client needs to be created only once, and can be reused for several requests. After
// completing all of your requests, call the "close" method on the client to safely clean up any remaining background resources.
try (DataCatalogClient dataCatalogClient = DataCatalogClient.create())
{
// Construct the EntryGroup for the EntryGroup request.
EntryGroup entryGroup = EntryGroup.newBuilder().setDisplayName("My awesome Entry Group").setDescription("This Entry Group represents an external system").build();
// Constructing EntryGroup request to be sent by the client.
CreateEntryGroupRequest entryGroupRequest = CreateEntryGroupRequest.newBuilder().setParent(LocationName.of(projectId, location).toString()).setEntryGroupId(entryGroupId).setEntryGroup(entryGroup).build();
// Use the client to send the API request.
EntryGroup createdEntryGroup = dataCatalogClient.createEntryGroup(entryGroupRequest);
// Constructing Entry for the Entry request.
Entry entry = Entry.newBuilder().setUserSpecifiedSystem("onprem_data_system").setUserSpecifiedType("onprem_data_asset").setDisplayName("My awesome data asset").setDescription("This data asset is managed by an external system.").setLinkedResource("//my-onprem-server.com/dataAssets/my-awesome-data-asset").setSchema(Schema.newBuilder().addColumns(ColumnSchema.newBuilder().setColumn("first_column").setDescription("This columns consists of ....").setMode("NULLABLE").setType("DOUBLE").build()).addColumns(ColumnSchema.newBuilder().setColumn("second_column").setDescription("This columns consists of ....").setMode("REQUIRED").setType("STRING").build()).build()).build();
// Construct the Entry request to be sent by the client.
CreateEntryRequest entryRequest =CreateEntryRequest.newBuilder().setParent(createdEntryGroup.getName()).setEntryId(entryId).setEntry(entry).build();
// Using client to send the API request.
Entry createdEntry = dataCatalogClient.createEntry(entryRequest);
System.out.printf("Custom entry created with name: %s", createdEntry.getName());
}
}
}