Dataplex is an intelligent data fabric that offers a mechanism to securely make your data available to a range of analytics and data science tools while centrally managing, monitoring, and governing your data across data lakes, data warehouses, and data marts.
You may use Dataplex to logically group your Cloud Storage and BigQuery data into lakes and zones, automate data management and governance across that data, and enable large-scale analytics.
Quickstart
First of all, you have to create a lake using the Google Cloud console.
Go to Dataplex in the console.
Navigate to the Manage view.
Click add Create.
Enter a Display name.
The lake ID is automatically generated for you.
Specify the Region in which to create the lake.
Click Create.
Then add a zone to your lake.
Next is to attach an asset and data can be attached as assets to data zones within a Dataplex lake.
After you create your lake, zones, and assets, now you can use your lakes.
Next is to avoid incurring charges to your Google Cloud account for the resources used on this page.
Create a lake
Here we will know how to create a Dataplex lake, using the Google Cloud console, gcloud CLI, or the lakes.create API method.
You have to make sure that you have the pre-defined roles roles/dataplex.admin or roles/dataplex.editor granted to you so that you can create and manage your lake.
Now you have to create a metastore (Dataproc Metastore service.)
Create a Dataplex Lake
Go to Dataplex in the console.
Navigate to the Manage view.
Click addCreate.
Enter a Display name.
The Lake ID will be automatically generated for you.
Then you have to specify the region and click CREATE.
Managing a lake
First of all, you have to create a lake.
Access control
You need to have IAM roles with the dataplex.lakes.create and dataplex.lakes.delete IAM permissions in order to update or delete a lake, respectively. To grant update and remove permission, use the roles/dataplex.editor and roles/dataplex.admin for Dataplex.
Viewing a Lake
You can view your Dataplex Lake on the console by clicking the lake name of the lake that you want to view on the Dataplex page.
Updating a Lake
You can alter a lake's details on the edit lake page launched in a local browser or by using the Dataplex API method lakes.patch.
Deleting a lake
A lake can be deleted using either the Dataplex API function lakes.delete or the delete button on the lake page displayed in a local browser.
Discover data
Here we will learn how to enable and use Dataplex Discovery. Data in a data lake are scanned and their metadata extracted, then registered by Discovery to Dataproc Metastore, BigQuery, and Data Catalog for analysis, search, and exploration.
Discovery configuration
By default, when you create a new zone or asset, discovery is enabled. Discovery can be turned off at the zone or asset level. You can decide whether to override Discovery settings at the asset level or to inherit Discovery settings at the zone level when you create a zone or an asset.
View discovered tables and filesets
You can search for discovered tables and filesets in the Dataplex Discover view in the console.
Discovery actions
Discovery raises the following admin actions whenever data-related issues are detected during scans.
Resolve Discovery actions
Additional Discovery scans verify data with actions. When the problem that prompted the action is remedied, the next scheduled Discovery scan automatically takes care of the action.
Other Actions
Missing resource: A matching dataset or bucket for an existing asset cannot be located.
Unauthorized resource: Dataplex lacks the necessary authorizations to perform discovery on the bucket or dataset it manages or to apply security policies to it.
Issues with security policy propagation: Security policies that were provided for a specific lake, zone, or asset could not be correctly propagated to the underlying buckets or datasets due to a number of problems. This kind of action could be raised at the lake, zone, and asset levels while all other actions are at the asset level.
We advise using the client libraries supplied by Google to call this service. Use the following data when making API calls if your application needs to use your own libraries to call this service.
REST Resource: v1.projects.locations
get
GET /v1/{name=projects/*/locations/*}
Gets information about a location.
list
GET /v1/{name=projects/*}/locations
Lists information about the supported locations for this service.
REST Resource: v1.projects.locations.lakes
create
POST /v1/{parent=projects/*/locations/*}/lakes
Creates a lake resource.
get
GET /v1/{name=projects/*/locations/*/lakes/*}
Retrieves a lake resource.
delete
DELETE /v1/{name=projects/*/locations/*/lakes/*}
Deletes a lake resource.
list
GET /v1/{parent=projects/*/locations/*}/lakes
Lists lake resources in a project and location.
getIamPolicy
GET /v1/{resource=projects/*/locations/*/lakes/*}:getIamPolicy
Gets the access control policy for a resource.
setIamPolicy
POST /v1/{resource=projects/*/locations/*/lakes/*}:setIamPolicy
Sets the access control policy on the specified resource.