Do you think IIT Guwahati certified course can help you in your career?
No
Introduction
An important part of the scientific method is recording both your parameters and the observations of an experiment. In data science, it is also vital to track the artifacts, parameters, and metrics used in a machine learning experiment. This metadata helps you:
ML experiments should be analysed to assess the efficacy of various sets of hyperparameters.
In order to comprehend variations in the accuracy of predictions, examine runs of a production ML system.
Follow the development of ML artifacts, such as datasets and models, to learn more about the factors that went into their formation or how they were applied to produce subsequent artifacts.
Rerun an ML workflow with the exact same settings and artifacts.
For the purpose of governance, monitor how ML objects are used later.
Vertex ML Metadata captures your ML system's metadata as a graph.
In the metadata graph, artifacts and executions are nodes, and events are edges that link artifacts as inputs or outputs of executions. Contexts represent subgraphs that are used to logically group sets of artifacts and executions.
Data model and resources
Vertex ML Metadata helps in organizing resources hierarchically, where every resource belongs to a MetadataStore. You should first have a MetadataStore before you can create Metadata resources.
Vertex ML Metadata terminologies
Below shows the data model and terminology that is used to describe Vertex ML Metadata components and resources.
MetadataStore
The highest-level container for metadata resources is a MetadataStore. Regionalized and linked to a particular Google Cloud project, MetadataStore. Each project's metadata resources are often stored in a single common MetadataStore inside an organisation.
Metadata resources
ML Vertex for the purpose of describing the metadata created by and used in ML workflows, metadata exposes a graph-like data model. Artifacts, events, executions, and contexts make up the main ideas.
Artifact
A discrete thing or piece of data created and used by a machine learning workflow is referred to as an artifact. Datasets, models, input files, and training logs are a few types of artifacts.
Context
Artifacts and executions are combined into a single, typed category known as a context. Sets of metadata can be represented using contexts. A machine learning pipeline run is an example of a context.
For eg, you can use contexts to represent sets of metadata such as:
A pipeline run for Vertex AI Pipelines. The context here stands in for a single run, and each execution is a stage in the ML pipeline.
A Jupyter notebook is used to run the experiment. In this scenario, the context may stand in for the notebook, and each execution could stand in for a notebook cell.
Configure
You may monitor and examine the metadata generated by your machine learning (ML) workflows with Vertex ML Metadata. Vertex AI produces your project's MetadataStore the first time you run a PipelineJob or create an experiment in the Vertex SDK.
Before utilising Vertex ML Metadata to track or analyze metadata, your metadata store must be created using a CMEK if you wish it to be encrypted using a customer-managed encryption key (CMEK).
The CMEK key that the metadata store uses, once it has been generated, is separate from the CMEK key used by processes that report metadata, such as a pipeline run.
Create a metadata store that uses a CMEK
Use the following instructions to create a CMEK and set up a Vertex ML Metadata metadata store that uses this CMEK.
Use the following instructions to create a CMEK and set up a Vertex ML Metadata metadata store that uses this CMEK.
To set up a customer-managed encryption key, use the Cloud Key Management Service.
To construct the default metadata store for your project using your CMEK, use the following REST call.
Make the following substitutions before utilising any request data:
Your area is the LOCATION.
Your project ID is PROJECT.
KEY RING is the name of the key ring that your encryption key is on in the Cloud Key Management Service.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT/locations/LOCATION/metadataStores?metadata_store_id=default
Vertex ML Metadata lets you track and analyze the metadata produced by your machine learning (ML) workflows. Below steps demonstrates how to log metadata using the following process:
Create an execution
Executions represent a step in your ML workflow. Use the following instructions to create execution.
input_artifacts: A list of instances of ai platform.
output_artifacts: A list of instances of ai platform. Artifact representing an output Artifact.
project: Your project ID.
execution_id: The formatted RESOURCE ID part of the execution name.
metadata: A set of characteristics, such as the execution parameters, that describe the execution.
schema_version: The version of the schema that explains the metadata field.
description: A human-readable string, that explains the purpose of the Execution to be created.
Look up an existing artifact
Artifacts represent data used or produced by your ML workflow, such as datasets and models. Use the following instructions to look up an existing artifact.
Before using any of the request data, make the following replacements:
LOCATION: Your region.
PROJECT: Your project ID.
METADATA STORE: The artifact's creation's metadata store ID. The name default refers to the default metadata store.
PAGE SIZE: (Optional) The most objects that will be returned. In the absence of this value, the service only retrieves up to 100 records.
PAGE TOKEN: A page token from an earlier MetadataService, which is optional.
FILTER: Describes the prerequisites for an artifact to be included in the result set.
HTTP method and URL:
GET https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/metadataStores/METADATA_STORE/artifacts?pageSize=PAGE_SIZE&pageToken=PAGE_TOKEN&filter=FILTER
You can also try this code with Online Python Compiler
artifact_id: The RESOURCE_ID portion of the Artifact name with the following format. This is globally unique in a metadataStore: projects/PROJECT/locations/LOCATION/metadataStores/METADATA_STORE_ID/artifacts/RESOURCE_ID.
display_name: Give a name for the artifact.
schema_version: The version of the schema that explains the metadata field.
description: a string that can be read by humans and expresses the intended use of the artifact.
metadata: A set of characteristics, such as the execution parameters, that describe the execution.
Create Events to link artifacts to an execution
Events represent the relationship between execution and its outputs and input artifacts. Use the below instructions to create events to link artifacts to an execution.
Before using any of the requested data, make the given replacements:
Your area is the LOCATION.
Your project ID is PROJECT.
METADATA STORE: The execution's creation's metadata store ID. The name default refers to the default metadata store.
EXECUTION: The execution record's ID.
ARTIFACT: The artifact's resource name. This is how the resource name is formatted: Projects/PROJECT Locations Location Metadata Store Artifacts Artifact Event Type: (Optional) The artifact's status as an input or output of the execution is indicated by a value from the EventType Enumeration.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/metadataStores/METADATA_STORE/executions/EXECUTION:addExecutionEvents
Use the following instructions to add artifacts and executions to a context.
Before using any of the request data, make the following replacements:
Your area is the LOCATION.
Your project ID is PROJECT.
METADATA STORE: The execution's creation's metadata store ID. The name default refers to the default metadata store.
CONTEXT: The context record's ID.
If you want to add any additional artifacts to this context, you must specify their resource name as ARTIFACT. Formatting for the resource name is as follows: projects/project/locations/location/metadataStores/metadata-store/artifacts/artifact
For any executions you wish to add to this context, you must specify the EXECUTION resource name. Formatting for the resource name is as follows:
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/metadataStores/METADATA_STORE/contexts/CONTEXT:addContextArtifactsAndExecutions
To monitor and examine the metadata created by your machine learning (ML) systems, use Vertex ML Metadata. Analyzing the behavior of your ML system is made simpler by keeping track of this metadata. This might assist you in comparing the artifacts your ML system generated or in understanding variations in the performance of your system.
You can learn how to query for the ML metadata that you want to analyze in the following ways:
Query for artifacts, executions, and contexts
To find the artifacts, executions, and events linking objects to executions in the given context, use the following methods.
Make the following substitutions before utilizing any request data:
LOCATION: Your location.
METADATA STORE: The metadata store ID where the execution is built.
PROJECT: The ID of your project. The name default refers to the default metadata store.
EXECUTION_ID: The ID of the execution record.
Query for an execution's input and output artifacts
Use the below instructions to query for the executions and artifacts in the specified context, along with the events that connect artifacts to executions.
Before using any of the request data, make the following replacements:
LOCATION: Your location.
METADATA STORE: The metadata store ID where the execution is built.
PROJECT: The ID of your project. The name default refers to the default metadata store.
EXECUTION_ID: The ID of the execution record.
Query for a context's lineage subgraph
Use the below instructions to query for the executions and artifacts in the specified context, along with the events that connect artifacts to executions.
Before using any of the request data, make the following replacements:
LOCATION: Your location.
METADATA STORE: The metadata store ID where the execution is built.
PROJECT: The ID of your project. The name default refers to the default metadata store.
EXECUTION_ID: The ID of the execution record.
HTTP method and URL:
GET https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/metadataStores/METADATA_STORE/contexts/CONTEXT_ID:queryContextLineageSubgraph
You should see an output similar to the following. EXECUTION_ID is The ID of the execution record. If the execution ID is not specified, Vertex ML Metadata created a unique identifier for this execution. ARTIFACT_ID is the ID of the artifact record.
The first time that you use Vertex ML Metadata in a Google Cloud project, Vertex AI creates your project's metadata store. If you want your metadata encrypted using a customer-managed encryption key (CMEK), you must create your metadata store using a CMEK before you use Vertex ML Metadata to track or analyze metadata.
It uses management techniques such as:
Artifact management
Execution management
Context management
These management techniques mostly do the following functions:
Creating an artifact/ execution/ context.
Looking up an existing artifact/ execution/ context.
Deleting an existing artifact/ execution/ context.
Purging an artifact/ execution/ context.
System Schemas
Each metadata resource is associated with a specific Metadata Schema. To simplify the metadata resource creation process Vertex ML Metadata publishes predefined types called system schemas for common ML concepts. System schemas live under the namespace system. You can access system schemas as MetadataSchema resources in the Vertex ML Metadata API. Schemas are always versioned.
How to use system schemas
Vertex AI uses system schemas to create metadata resources for tracking your ML workflows. You can then filter and group resources in metadata query by using the schema_title field.
You can also use system schemas through the Vertex ML Metadata API to create metadata resources directly. You can identify a system schema by its schema title and schema version. Fields in system schemas are always considered optional. Users aren't restricted to the predefined fields of system schemas and can also log additional arbitrary metadata to any metadata resource.
System schema examples
The following examples are common system schemas that are available for immediate use.
Artifact
system.Artifact is a generic schema that can hold metadata about any artifact. No specific fields are defined in this schema.
system.Dataset represents a container of data that was either consumed or produced by an ML workflow step. A dataset can point to either a file location or a query, for example a BigQuery URI.
title: system.Dataset
version: 0.0.1
type: object
properties:
container_format:
type: string
description: "Format of the container. Examples include 'TFRecord', 'Text', or 'Parquet'."
payload_format:
type: string
description: "Format of the payload. For example, 'proto:TFExample', 'CSV', or 'JSON'."
Model
system.Model represents a trained model. The URI of the model can point to a file location (PPP, Cloud Storage bucket, local drive) or an API resource such as the Model resource in Vertex AI API.
title: system.Model
version: 0.0.1
type: object
properties:
framework:
type: string
description: "The framework type. For example: 'TensorFlow' or 'Scikit-Learn'."
framework_version:
type: string
description: "The framework version. For example: '1.15' or '2.1'."
payload_format:
type: string
description: "The format of the Model payload, for example: 'SavedModel' or 'TFLite'."
Metrics
system.Metrics represents evaluation metrics produced during an ML workflow. Metrics are application and use case dependent and can consist of simple scalar metrics like accuracy or complex metrics that are stored elsewhere in the system.
title: system.Metrics
version: 0.0.1
type: object
properties:
type:
accuracy:
type: number
description: "Optional summary metric describing accuracy of a model."
precision:
type: number
description: "Optional summary metric describing precision of a model."
recall:
type: number
description: "Optional summary metric describing the recall of a model."
f1score:
type: number
description: "Optional summary metric describing the f1-score of a model."
mean_absolute_error:
type: number
description: "Optional summary metric describing the mean absolute error of a model."
mean_squared_error:
type: number
description: "Optional summary metric describing the mean-squared error of a model."
Frequently Asked Questions
What is Google Cloud Platform?
Google Cloud Platform is a Google cloud platform that allows users to access cloud systems and computing services. GCP gives a wide range of cloud computing services in the storage, compute, database, migration, and networking domains.
What are the GCP cloud storage libraries and tools?
Google Cloud Platform Console, which performs primary object and bucket operations.
GustilCommand-line Tool, which gives a command line interface for cloud storage. Cloud Storage Client Libraries provide programming support for various languages such as Java, Ruby, and Python.
How does ML training work in a container?
A container runs its training on samples and outputs its results back to storage. The container saves the checkpoints of its ML model to an outside data source and not on the container so that new instances of the container can pick up from where they left off.
Conclusion
I hope this article gave you insights into the vertex ML metadata supported by Google.
We hope this blog has helped you increase your knowledge regarding AWS Step functions, and if you liked this blog, check other links. Do upvote our blog to help other ninjas grow. Happy Coding!" Grammarly report: Report