Table of contents
1.
Introduction
2.
Monitor feature skew and drift
2.1.
Create a Model Monitoring job
2.1.1.
Skew detection
2.1.2.
Drift Detection
2.2.
Update a Model Monitoring job
2.3.
Analyze skew and drift data
3.
Monitor feature attribution skew and drift
3.1.
Steps to enable skew or drift detection
3.1.1.
Skew Detection
3.1.2.
Drift Detection
3.2.
Analyze skew and drift data
4.
Access Transparency in Vertex AI
4.1.
Supported services
4.2.
Limitations of Access Transparency in Vertex AI
5.
Frequently Asked Questions
5.1.
What are the GCP cloud storage libraries and tools?
5.2.
Name some of the built-in algorithms available for training on AI Platform Training.
5.3.
What are the different roles in a distributed Training structure?
6.
Conclusion
Last Updated: Mar 27, 2024

Vertex AI Model Monitoring

Career growth poll
Do you think IIT Guwahati certified course can help you in your career?

Introduction

A model deployed in production performs its optimal on prediction input data that is similar to the training data. When the input data changes or deviates from the data used to train the model, the performance of the model can deteriorate, even if the model itself has not changed.

To help you maintain the performance of the model, Model Monitoring monitors the model's prediction input data for feature drift and skew:

  • Training-serving skew happens when the feature data distribution in production changes from the distribution of data that is used to train the model. If the original training data is present, you can enable skew detection to monitor your models for training-serving skew.
  • Prediction drift happens when feature data distribution in production changes significantly over time. If the original training data is not available, you can enable drift detection to monitor the input data for changes over time.

You can enable both drift and skew detection.

Model Monitoring supports feature skew and drifts detection for categorical and numerical features:

  • Categorical features are data limited by a number of possible values, typically grouped by qualitative properties. For eg, categories such as product country, type, or customer type.
  • Numerical features are those that can be any numeric value. For eg, weight and height.

The below example shows skew or drift between the baseline and latest distributions of a categorical feature:

Baseline distribution

Latest Distribution

Monitor feature skew and drift

The below steps describes how to create, manage, and interpret the results of Model Monitoring jobs for models deployed to online prediction endpoints. Vertex AI Model Monitoring supports feature skew and drift detection for categorical and numerical input features.

You can enable skew detection if you provide the original training dataset for your model; otherwise, you should enable drift detection.

Create a Model Monitoring job

To set up either drift or skew detection, create a model deployment monitoring job:

  1. In the Google Cloud console, move to the Vertex AI Endpoints page.
  2. Click on the Create Endpoint.
  3. In the New endpoint pane, set a region and name your endpoint.
  4. Click on Continue.
  5. In the Model name field, choose an imported tabular AutoML model or custom training.
  6. In the Version field, choose a version for your model.
  7. Click on Continue.
  8. In the Model monitoring pane, ensure Enable model monitoring for this endpoint is toggled on. 
  9. Enter a Monitoring job display name.
  10. Click on Continue. The Monitoring objective pane opens up, with options for drift or skew detection:

Skew detection

  1. Select the Training-serving skew detection.
  2. Under Training data source, you can provide a training data source.
  3. Under Target column, you can input the column name from the training data that the model is trained to predict. This field is not included from the monitoring analysis.
  4. Under Alert thresholds, you can specify thresholds at which to trigger alerts.(Optional) 
  5. Click Create.

Drift Detection

  1. Select the Prediction drift detection.
  2. Under Alert thresholds, select thresholds at which to trigger alerts.(Optional) 
  3. Click Create.

Update a Model Monitoring job

You can pause, view, update, and delete a Model Monitoring job. You must pause a job before you can delete it.

Pausing and deleting is not possible in the console; we can instead use the gcloud CLI.

To update the parameters for a Model Monitoring job:

  1. In the console, move to the Vertex AI Endpoints page.
  2. Click the name of the endpoint you want to edit.
  3. Click Edit settings.
  4. In the Edit endpoint pane, select the Model monitoring or Monitoring objectives. Update the parameters you want to change.
  5. Click Update.

To view alerts, metrics, and monitoring properties for a model:

  1. In the console, move to the Vertex AI Endpoints page.
  2. Click endpoint name.
  3. In the Monitoring column for the model, you want to view, select Enabled.

Analyze skew and drift data

By using the console, you can see how each monitored feature's distributions vary over time and discover what changes caused skew or drift. The feature value distributions can be seen as a histogram.

Go to the Endpoints page of the console to access the feature distribution histograms.

  1. Click the endpoint you want to study on the Endpoints page.
  2. A list of all the models installed on the endpoint is available on the detail page for the endpoint you chose. To analyse a model, click its name.
  3. The details page for the model lists the model's input features and pertinent information.
  4. Click the name of the feature, to analyze a feature.

Monitor feature attribution skew and drift

The contribution of each feature in your model to the predictions for each specific occurrence is shown by the feature attributions. Predicted values are returned when you request them and are appropriate for your model. You receive the predictions and information about feature attribution when you ask for explanations.

The contribution of a feature to a model's prediction is shown in the attribution scores. They are usually signed, indicating whether a feature aids in raising or lowering the forecast. The model's prediction score must be calculated by adding the attributes of each feature.

By monitoring feature attributions, Model Monitoring tracks changes in a feature's contributions to a model's predictions over time. A change in a key feature's attribution score often signals that the feature has changed in a way that can impact the accuracy of the model's predictions.

Steps to enable skew or drift detection

To create a model deployment monitoring job using the console, create an endpoint:

  1. In the Google Cloud console, move the Vertex AI Endpoints page.
  2. Click on Create Endpoint.
  3. In the New endpoint pane, set a region and name your endpoint.
  4. Click Continue.
  5. In the Model name field, choose an imported custom training or tabular AutoML model.
  6. In the Version field, choose a version for your model.
  7. Click on Continue.
  8. In the Model monitoring pane, ensure Enable model monitoring for this endpoint is toggled on.
  9. Enter the Monitoring job display name.
  10. Click on Continue. The Monitoring objective pane opens, with options for drift or skew detection:

Skew Detection

  1. Select the Training-serving skew detection.
  2. Under Training data source, give a training data source.
  3. Under Target column, write the column name from the training data that the model is trained to predict. This field is not included from the monitoring analysis.
  4. Under Alert thresholds, write thresholds at which to trigger alerts.(Optional) 
  5. Click Create.

Drift Detection

  1. Select Prediction drift detection.
  2. Under Alert thresholds, specify thresholds at which to trigger alerts.(Optional) 
  3. Click Create.

Analyze skew and drift data

The interface allows you to see the feature attributions for each monitored feature and discover the modifications that caused skew or drift. The feature value distributions can be seen as a time series or a histogram.

 

The relative relevance of features in a reliable machine learning system typically holds steady over time. A key feature losing prominence could mean that something has changed with that feature. The following are typical reasons for feature priority drift or skew are:

  • Data source alterations.
  • Changes to the data schema and logging.
  • Changes in the mix or behavior of end users (for example, due to seasonal changes or outlier events).
  • Characteristics that have upstream alterations produced by another machine learning model. Examples include:
    1. Model updates that cause a decrease/increase in coverage.
    2. Change in performance of the model.
    3. Updates to the data pipeline can cause a decrease in overall coverage.

Access Transparency in Vertex AI

Access Transparency provides you with logs that capture the actions Google personnel take when accessing your content.

Members of your organization's access to content in your Google Cloud projects are recorded in Cloud Audit Logs. Similar action records are made available by Access Transparency for Google employees.

If a Google Cloud project is housed within an organization, you can enable Access Transparency for that project.

Supported services

Access Transparency supports the following Vertex AI services:

  1. Vertex AI Feature Store
  2. Vertex AI Model Monitoring
  3. Vertex AI AutoML training
  4. Vertex AI custom training
  5. Vertex AI data labeling
  6. Vertex AI Pipelines
  7. Vertex AI prediction

Limitations of Access Transparency in Vertex AI

All access to your data in Vertex AI by Google personnel is logged, except for the following scenarios:

  • Situations where none of the supported Google Cloud services have Access Transparency logs
  • Serving real-time prediction or batch prediction queries from specially trained models using customized containers
  • Utilizing customized containers and Vertex Explainable AI
  • Using resources relevant to AutoML forecasting, such as datasets and models for forecasting.

Frequently Asked Questions

What are the GCP cloud storage libraries and tools?

Google Cloud Platform Console, which performs primary object and bucket operations.

GustilCommand-line Tool, which gives a command line interface for cloud storage. Cloud Storage Client Libraries provide programming support for various languages such as Java, Ruby, and Python.

Name some of the built-in algorithms available for training on AI Platform Training.

Built-in algorithms help train models for various use cases that can be solved using classification and regression. Linear learner, Wide and deep, TabNet, XGBoost, Image classification and Object detection are some of the built-in algorithms on AI Platform Training.

What are the different roles in a distributed Training structure?

The master role manages the other roles and reports the status of the job as a whole. Workers are one or more replicas that do their portion of the work as specified in the job configuration. Parameter servers coordinate the shared model state between the workers.

Conclusion

I hope this article gave you insights into the vertex AI model monitoring upported by Google.

Refer to our guided paths on Coding Ninjas Studio to learn more about DSA, Competitive Programming, System Design, JavaScript, etc. Enroll in our courses, refer to the mock test and problems available, interview puzzles, and look at the interview bundle and interview experiences for placement preparations.

We hope this blog has helped you increase your knowledge regarding AWS Step functions, and if you liked this blog, check other links. Do upvote our blog to help other ninjas grow. Happy Coding!"

Grammarly report: Report

Live masterclass