Vertex AI Feature Store offers a unified repository for organizing, storing, and serving ML features. An enterprise can efficiently share, discover, and reuse ML features at scale by using a central featurestore, which helps speed up the development and deployment of new ML applications. The underpinning infrastructure, including storage and computing resources, is managed and scaled via the fully managed Vertex AI Feature Store. With this technique, data scientists can concentrate on the logic of feature computation rather than worrying about the difficulties of putting features into production.
An integral component of Vertex AI is Vertex AI Feature Store. Vertex AI Feature Store can be used individually or in conjunction with Vertex AI processes. As an illustration, you may train custom or AutoML models in Vertex AI by obtaining data from the Feature Store.
For more information about the Vertex AI feature store, let's dive into the article.
Overview of Vertex AI Feature Store
Resources like featurestores can be created and managed using the Vertex AI Feature Store. Your features and their values are stored at the top level in a featurestore. When a featurestore is created, authorized users can add and share their features without further engineering assistance. After defining features, users can ingest (import) feature values from different data sources.
Any authorized user can search the featurestore and obtain values. For instance, you can find features and then do a batch export to obtain training data for ML model building. For quick online predictions, you can also retrieve feature values in real-time.
Benefits
You may have computed feature values and saved them in multiple places before utilizing Vertex AI Feature Store, such as tables in BigQuery and as files in Cloud Storage. Additionally, it's possible that you created and maintained different solutions for storing and consuming feature values. Vertex AI Feature Store, in contrast, offers a single solution for batch and online storage in addition to the delivery of ML features. Vertex AI Feature Store's advantages are described in the following sections.
Share features across your organization
You may instantly share features you create in a featurestore with others for tasks like teaching or serving. Teams don't need to re-engineer features for various projects or use cases. Additionally, you may maintain consistency throughout your business and save duplication of work by managing and serving features from a central repository, especially for high-value features.
Vertex AI Feature Store offers to search and filtering functionality to make it simple for others to find and reuse pre-existing features. You can check pertinent metadata for each feature to learn more about its quality and usage trends. For instance, you may see the statistical distribution of feature values and the percentage of entities with a valid value for a certain feature (also known as feature coverage).
Managed solution for online serving at scale
Vertex AI Feature Store offers a managed solution for online feature serving (low-latency serving), which is essential for creating accurate online predictions. Vertex AI Feature Store takes care of building and running low-latency data serving infrastructure for you and expands as necessary. Although the logic to produce features is coded, serving features is delegated. The friction for developing new features is decreased by this integrated management, allowing data scientists to focus on their work without being concerned about deployment.
Mitigate training-serving skew
When the feature data distribution you use in production differs from the feature data distribution you used to train your model, this is known as training-serving skew. The performance of a model during training and its performance in production frequently differ due to this skew. Examples of how Vertex AI Feature Store can address various sources of training-serving skew are provided below:
The Vertex AI Feature Store ensures that a feature value is only ever entered into a featurestore once and is then used for training and serving. You might have various code routes for generating features between training and serving without a featurestore. As a result, feature values may vary between serving and training.
The Vertex AI Feature Store offers point-in-time lookups to retrieve past data for training. By retrieving only the feature values that were accessible before a prediction and not later, these lookups help you reduce data leakage.
Detecting drift
The Vertex AI Feature Store aids in the detection of drift, often known as substantial changes to your feature data distribution over time. The distribution of feature values ingested into the featurestore is continuously monitored by the Vertex AI Feature Store. You might need to retrain models that use the affected features when feature drift worsens.
Quotas and limits
Vertex AI Feature Store sets quotas and limitations to assist you in managing resources by allowing you to define your usage caps and safeguarding the Google Cloud user community by preventing unanticipated usage spikes. Review the Vertex AI Feature Store quotas on the Quotas and limitations page to avoid reaching unexpected constraints. Vertex AI Feature Store, for instance, imposes limitations on the maximum number of online serving nodes and the maximum number of online serving requests per minute.
Data retention
The Vertex AI Feature Store preserves feature values for the allotted time. This cap is determined by the feature values' timestamp, not the date and time the values were imported. Values with timestamps that go over the limit are scheduled for deletion by the Vertex AI Feature Store.
Pricing
The cost of the Vertex AI Feature Save on a variety of variables, including how much data you store and how many featurestore online nodes you employ. As soon as you create a featurestore, charges begin.
Data model and resources
The Vertex AI Feature Store data model and the vocabulary used to describe its resources and components are introduced in the following sections.
Vertex AI Feature Store data model.
A sequence of values for features is stored in the Vertex AI Feature Store using a time series data model. Vertex AI Feature Store can preserve feature values as they change over time because of this architecture. Resources are arranged hierarchically in the following order by Vertex AI Feature Store: EntityType -> Featurestore -> Feature. Before importing data into the Vertex AI Feature Store, you must first generate these resources.
Take the following sample source data from a BigQuery database as an example. Movies and their features are the subjects of this source data.
Featurestore
The primary storage location for entity types, features, and values is a featurestore. For feature ingestion, serving, and sharing across all teams in an organization, one shared featurestore is typically created. To isolate situations, you could occasionally decide to build several featurestores within a single project. For instance, you might have distinct featurestores for experimentation, testing, and production.
Entity type
A group of features with a common semantic meaning makes up an entity type. You create your entity types based on the ideas pertinent to your use case. The entity types movie and user, for instance, might be used to bundle aspects related to movies or users in a movie service.
Entity
An instance of an entity type is known as an entity. Examples of entities of the entity type movie are movie 01 and movie 02. Each entity in a featurestore needs to be of type STRING and have a distinct ID.
Feature
A feature is a quantifiable element or property of a certain entity type. For instance, the movie entity type tracks numerous movie attributes with features like average_rating and title. Entity kinds correspond to certain features. Features don't have to be globally unique, but they must be distinct within a specific entity type. Vertex AI Feature Store, for instance, sees the title as two different features if it is used for two different entity types. You must provide the feature and entity type in the request when reading feature values.
A feature's value type, such as BOOL_ARRAY, DOUBLE, DOUBLE_ARRAY, and STRING, is specified during feature creation. What value types you can ingest for a specific feature is determined by this value.
Feature value
The Vertex AI Feature Store records a feature's values at a certain time. In other words, a particular entity or feature can have more than one value. For the average_rating feature, the movie_01 entity, for instance, can contain numerous feature values. The number may alternate between 4.4 and 4.8 over time. Each feature value in the Vertex AI Feature Store is given a tuple identifier (entity_id, feature_id, and timestamp), which the Vertex AI Feature Store uses to look for data at serving time.
Even though time is continuous, the Vertex AI Feature Store saves discrete values. Vertex AI Feature Store returns the most recent value stored at or before time t when you request a feature value at time t.
Feature ingestion
Importing feature values generated by your feature engineering jobs into a featurestore is known as feature ingestion. The matching entity type and characteristics must be defined in the featurestore before data can be ingested. You can bulk import values into a featurestore using Vertex AI Feature Store's batch ingestion functionality. Your computed source data, for instance, might be stored somewhere like BigQuery or Cloud Storage. So that feature values may be consistently delivered from the central featurestore, you can then ingest data from those sources into a featurestore.
Feature serving
Exporting stored feature values for training or inference is a procedure known as feature serving. There are two ways to serve features in the Vertex AI Feature Store: batch and online. For high throughput and serving huge volumes of data for offline processing, use batch serving (like for model training or batch predictions). Online serving is used for small-batch, low-latency data retrieval for real-time processing (like for online predictions).
Entity view
The service returns an entity view with the requested feature values when you obtain values from a featurestore. A projection of the attributes and values that the Vertex AI Feature Store returns in response to an online or batch serving request is what an entity view is:
You can acquire all features for a specific entity type or just a select few of them when making online serving requests.
You can obtain all functionalities for one or more entity types for batch serving requests or only a selection of them. For instance, you can retrieve all of the features spread across different entity types in one request, joining the features together. The results can then be sent into a batch prediction or machine learning request.
Export data
You can backup and archive feature values using Vertex AI Feature Store's export functionality for data from your featurestores. You can export a range of data or the most recent feature values (snapshot) (full export).
Source data requirements
Vertex AI Feature Store can ingest data from tables in BigQuery or files in Cloud Storage. Avro or CSV format files are required for Cloud Storage.
The following requirements must be followed by each item (or row):
Entity IDs must be stored in a column with values of type STRING. The entity IDs for which the feature values are intended are listed in this column.
The value types of your source data must coincide with those of the featurestore's destination feature. For instance, boolean data must be consumed into a BOOL-type feature.
There must be a STRING-type header in each column. The names of the headers are not constrained.
The column name serves as the column header for BigQuery tables.
The Avro schema connected to the binary data defines the column header for Avro.
The first row of a CSV file is the column header.
Use one of the timestamp formats listed below if you include a column for the feature generation timestamps:
Timestamps must be in the TIMESTAMP column for BigQuery tables.
Timestamps for Avro must be of the long and logical timestamp-micros types.
Timestamps for CSV files must adhere to RFC 3339 specifications.
Array data types cannot be included in CSV files. Instead, use BigQuery or Avro.
Null values cannot be contained in arrays of array types. However, you may also use an empty array.
Feature value timestamps
Vertex AI Feature Store requires user-provided timestamps for the ingested feature values for batch ingestions. You can either set the same timestamp for all values or a different date for each value:
Provide the timestamps in a column in your source data if the timestamps for the feature values disagree. Each row needs a timestamp that shows the exact moment the feature value was created. You identify the timestamp column in your ingestion request by its column name.
If the timestamp is consistent across all feature values, you can include it as a parameter in your ingestion request. In your source data, you may also define the timestamp in a column where every row has the same timestamp.
Data source region
The source dataset or bucket must be in the same area or multi-regional location as your featurestore if your source data is in BigQuery or Cloud Storage. A featurestore in us-central1, for instance, can only ingest data from BigQuery datasets or Cloud Storage buckets in us-central1 or the US multi-region location. You are unable to ingest data from, say, us-east1. Additionally, source data from buckets with two regions is not supported.
Setup
Setup explains how to create a project for the Vertex AI Feature Store and what authorizations are necessary to use it.
Configure project
To use the Vertex AI Feature Store, a new project must be created, and the Vertex AI API must be enabled. This tutorial explains how to do this. You can use an existing project with the Vertex AI API enabled rather than starting a new one if you already have one.
Create an account if you're new to Google Cloud to see how well our products work in practical situations. Additionally, new users receive $300 in complimentary credits to run, test, and deploy workloads.
Choose or create a Google Cloud project from the project selector page in the Google Cloud dashboard.
Go to project selector.
Make sure your Cloud project's billing is enabled. Find out how to determine whether billing is enabled for a project.
Vertex AI API should be enabled.
Enable the API
Vertex AI Feature Store service account
Vertex AI Feature Store works on your behalf to carry out tasks like gaining access to source data and user rights. Vertex AI Feature Store uses a service account managed by Google: service-PROJECT_NUMBER@gcp-sa-aiplatform.iam.gserviceaccount.com. Vertex AI Feature Store by default, has access to the source data in the same project as your featurestore, thanks to the service account. You must provide the service account access to the project containing the source data if it is separate from your featurestore and contains the source data.
IAM permissions
Vertex AI admins have access to the Vertex AI Feature Store. The Vertex AI Feature Store offers several preconfigured IAM jobs if you need more specificity. Based on the following personalities, these roles offer various sets of permissions:
IT operations and DevOps
DevOps and IT operations oversee Google Cloud resources and are in charge of building featurestores and optimizing their performance. The featurestoreAdmin or featurestoreInstanceCreator roles are available. You cannot see or write data to featurestores in the instance creator role, but you can manage featurestores.
Data scientists and data engineers
Data engineers and data scientists produce features and additional information for featurestores. The featurestoreDataWriter role can read and write feature values, while the featurestoreResourceEditor role can handle entity types and features.
ML researchers and business analysts
To train models or make predictions, ML researchers and business analysts search for features and export values; they don't need to develop new features or write data. You can utilize the functions. To use the function of ResourceViewer to browse or search for features, use the toreDataViewer role to read feature values.
Quotas and limits
Vertex AI Feature Store sets quotas and limitations to assist you in managing resources by allowing you to define your usage caps and safeguarding the Google Cloud user community by preventing unanticipated usage spikes. Review the Vertex AI Feature Store quotas on the Quotas and limitations page to avoid reaching unexpected constraints. Vertex AI Feature Store, for instance, imposes limitations on the maximum number of online serving nodes and the maximum number of online serving requests per minute.
Manage featurestores
Discover the creation, listing, description, updating, and deletion of featurestores. Entity types, features, and values are all stored at the top level in featurestores.
Online and offline storage
The online storage and offline storage options used by the Vertex AI Feature Store are priced differently. Every featurestore has offline storage and the option of online storage.
To properly handle online serving requests, online storage keeps your features' most recent timestamp values. When you use the API to conduct an ingestion process, you have control over whether or not the data is written to the online store. Skipping the online store, any strain on the internet serving nodes is avoided. You can disable writes to the online store, for instance, when you run backfill jobs, and write solely to the offline store.
Data is kept indefinitely by Vertex AI Feature Store using offline storage until it meets the retention limit or you decide to delete it. The amount of data you store can be managed to reduce offline storage costs. You can see how much offline and online storage you currently utilize in the console.
Online serving nodes
The computational resources needed to store and serve feature values for low-latency online serving are provided by online serving nodes. These nodes are constantly active even when they are not providing data services. For each node hour, you are billed. Create or alter featurestores to have zero nodes if you don't need online serving to avoid paying for online serving nodes. The following two factors are directly inversely related to the number of online serving nodes you need:
The rate at which the featurestore receives online serving requests (in queries per second)
The number of ingestion jobs that write to online storage
Both elements affect how well the nodes perform and how much CPU they use. View the metrics for the following from the console:
Your featurestore's queries per second, in terms of queries per second.
Node count - the number of your online serving nodes
CPU efficiency - CPU efficiency of your nodes
Consider expanding the number of online serving nodes for your featurestore if CPU use is persistently high.
Scaling Options
You can choose between the two choices shown below to set the quantity of serving nodes that are available online:
Autoscaling (preview)
Allocating a fixed node count
With autoscaling, the featurestore adjusts the number of nodes according to CPU usage by adding nodes when traffic increases and deleting nodes when traffic falls, autoscaling analyzes traffic patterns to maintain performance and save costs. For traffic patterns that develop and drop gradually, autoscaling works effectively. Autoscaling, however, is ineffective for traffic patterns that frequently experience traffic spikes.
Nodes will remain at a constant number if a fixed node count is assigned, regardless of traffic patterns. When there are enough nodes to handle the load, the nodes should operate effectively, and the constant node count will keep costs predictable. The fixed node count can be manually changed to adapt to changes in traffic patterns.
Additional Considerations
There are four more things to think about if autoscaling is your method of choice, including:
After adding online serving nodes, the online store requires time to rebalance the data. Before you notice a noticeable increase in performance while under stress, it can take up to 20 minutes. Therefore, increasing the number of nodes might not benefit brief traffic spikes. Both manual scaling and automatic scaling are subject to this restriction.
Vertex AI Feature Store doesn't transfer any data to the online store if you increase the number of online serving nodes from 0 to 1 or higher. Online serving queries receive empty responses as if no data had been ingested. Ingest your data one more time to fill the online store. You may, for instance, export your current data and consume it more. Wait for the lengthy operation to finish before consuming data when you supply online serving nodes. In-progress ingestion jobs don't write to the online store while a node is being provisioned.
Data from the online shop is lost and cannot be recovered if the number of online serving nodes is updated from 1 or higher to 0. The entire online store is deleted when there are no serving nodes online. For instance, you cannot temporarily disable your online store and enable it again. The offline store is unaffected if the number of online serving nodes is decreased to 0.
Submitting online serving requests to a featurestore results in an error without online serving nodes.
Create a featurestore
To store entity kinds and features, create a featurestore resource. Your featurestore must be located in the same area as your source data. For instance, if your featurestore is located in the US central region, you can ingest data from files in US central region or US multi-region Cloud Storage buckets. Still, source data from dual-region buckets is not supported. Similar to this, BigQuery allows you to ingest data from tables located in the US multi-region location or us-central1.
Send a POST request using the featurestores.create a method to create a featurestore. A featurestore with a fixed node count of 1 is created in the following example. The node count specifies the number of online serving nodes, which impacts how many online serving requests the featurestore can manage. If there aren't enough nodes to handle the volume of incoming requests, the latency can increase.
Make the following substitutions before utilizing any request data:
LOCATION: The area in which the featurestore is made. Us-Central1, for instance.
PROJECT: The project ID.
FEATURESTORE ID: The featurestore's ID.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/featurestores?featurestoreId=FEATURESTORE_ID
Before you start, utilize Cloud Key Management Service to configure a customer-managed encryption key and establish permissions if you don't already have one. The featurestore that employs a CMEK key is created by the following sample. Any resources and values within featurestores encrypted by the related CMEK key become inaccessible if Vertex AI loses access until Vertex AI is granted access to the key once more.
Vertex AI deletes all featurestores encrypted with the CMEK key after 30 days if it still doesn't have access to it. You cannot use these featurestore names again when making new featurestores.
REST & CMD LINE
Make the following substitutions before utilizing any request data:
LOCATION: The area in which the featurestore is made. Us-Central1, as an example
PROJECT: The ID for your project.
FEATURESTORE_ID: The featurestore's ID.
CMEK_PROJECT: The project ID or project number that contains your CMEK.
KEY_RING: The name of the key ring that your encryption key is on in the Cloud Key Management Service.
KEY_NAME: The name of the encryption key to use.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/featurestores?featurestoreId=FEATURESTORE_ID
Send a GET request using the featurestores.list method in your project to list featurestores for a specific region. Make the following substitutions before utilizing any request data:
LOCATION: The area where the featurestore is situated, for example, us-central1.
PROJECT: Your unique project ID.
HTTP method and URL:
GET https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/featurestores
Find out information about a featurestore, including its name and online serving setup. You may examine Cloud Monitoring metrics for featurestores from the console as well. Send a GET request using the featurestores.get function to obtain information about a certain featurestore. Make the following substitutions before utilizing any request data:
LOCATION: The area in which the featurestore is situated, for example, us-central
PROJECT: Your unique project ID.
FEATURESTORE ID: The featurestore's ID.
HTTP method and URL:
GET https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/featurestores/FEATURESTORE_ID
Update a featurestore
For instance, you may update a featurestore to modify the number of active serving nodes or change the labels on a featurestore. Use the featurestores.patch function to send a PATCH request to update a featurestore. The featurestore now has two online serving nodes with the help of the following sample. Other settings are left alone.
Make the following substitutions before utilizing any request data:
LOCATION: The area in which the featurestore is situated, for example, us-central
Removing a featurestore Enable the force query parameter to remove the featurestore and all of its contents if the featurestore contains existing entity types and features. Use the featurestores.delete method to make a DELETE request to remove a featurestore and all its contents. Make the following substitutions before utilizing any request data:
LOCATION: The area where the featurestore is situated, for example, us-central1.
PROJECT: The ID for your project.
FEATURESTORE ID: The featurestore's ID.
BOOLEAN: Whether to destroy the featurestore even if it contains entity types and features. The optional force query parameter has the value false by default.
This section will discover how to create, list, and delete entity types.
Create an entity type
Make an entity type so that you can make features for it.
Web UI
Navigate to the Features page in the Vertex AI part of the Google Cloud console.
Go to the Features page.
Click Create entity type on the action bar to access the Create entity type window.
The featurestore where you wish to build an entity type is included in the Region drop-down list, so choose that region.
Make a featurestore choice.
The entity type should be given a name.
Enter a description if you want to provide one for the entity type.
Setting monitoring to enable and specifying the snapshot interval in days will enable feature value monitoring (Preview). This monitoring configuration covers all features falling under this entity type.
Press Create.
REST & CMD LINE
Send a POST request using the featurestores.entityTypes.create function to create an entity type. Make the following substitutions before utilizing any request data:
LOCATION: The area where the featurestore is situated, for example, us-central1.
PROJECT: The ID for your project.
FEATURESTORE ID: The featurestore's ID.
ENTITY TYPE ID: The entity type's ID.
DESCRIPTION: A description of the entity type.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/featurestores/FEATURESTORE_ID/entityTypes?entityTypeId=ENTITY_TYPE_ID
Request JSON body:
{
"description": "DESCRIPTION"
}
Select one of these choices to send your request. Run the following command after saving the request body in a file named request.json:
List every entity type that exists in a featurestore.
Web UI
Navigate to the Features page in the Vertex AI part of the Google Cloud console.
Go to the Features page
From the Region drop-down menu, choose a region.
View the Entity type column in the features table to see the entity types in your project for the chosen area.
REST & CMD LINE
Send a GET request using the featurestores.entityTypes.list function to get a list of entity kinds. Make the following substitutions before utilizing any request data:
LOCATION: The area where the featurestore is situated, for example, us-central1.
PROJECT: The ID for your project.
FEATURESTORE ID: The featurestore's ID.
HTTP method and URL:
GET https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/featurestores/FEATURESTORE_ID/entityTypes
Remove a type of entity. The entity type and its contents are deleted if you use the console to access the Vertex AI Feature Store. Use the force query argument to remove the entity type and its contents if you're using the API.
Web UI
Navigate to the Features page in the Vertex AI part of the Google Cloud console. Go to the Features page
From the Region drop-down menu, choose a region.
Find the entity type to delete by looking at the Entity type column in the features table.
To select an entity type, click its name.
Click Delete in the action bar.
To delete the entity type, click Confirm.
REST & CMD LINE
Use the featurestores.entityTypes.delete function to make a DELETE request to delete an entity type. Make the following substitutions before utilizing any request data:
LOCATION: The area where the featurestore is situated, for example, us-central1.
PROJECT: The ID for your project.
FEATURESTORE ID: The featurestore's ID.
ENTITY TYPE ID: The entity type's ID.
BOOLEAN: Whether to remove the entity type regardless if it has features. The optional force query parameter has the value false by default.
In this part, you can find out how to identify and manage features.
Create a feature
For an existing entity type, create a single feature. Send a POST request using the featurestores.entityTypes.features.create method to add a feature to an existing entity type. Make the following substitutions before utilizing any request data:
LOCATION: The area where the featurestore is situated, for example, us-central1.
PROJECT: The ID for your project.
FEATURESTORE ID: The featurestore's ID.
ENTITY TYPE ID: The entity type's ID.
FEATURE ID: The feature's ID.
DESCRIPTION: Feature description.
VALUE TYPE: The feature's value type.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/featurestores/FEATURESTORE_ID/entityTypes/ENTITY_TYPE_ID?featureId=FEATURE_ID
Create features for an existing type in bulk. Vertex AI Feature Store creates several features simultaneously for batch creation requests, quicker than the featurestores.entityTypes.features.create method for creating many features.
Send a POST request using the featurestores.entityTypes.features.batchCreate method, as seen in the accompanying sample, to generate one or more features for an existing entity type. Make the following substitutions before utilizing any request data:
LOCATION: The area where the featurestore is situated, for example, us-central1.
PROJECT: The ID for your project.
FEATURESTORE ID: The featurestore's ID.
ENTITY TYPE ID: The entity type's ID.
PARENT: The entity type's resource name under which the features should be created. Required format: \sprojects/PROJECT/locations/LOCATION/featurestores/FEATURESTORE ID/entityTypes/ENTITY_TYPE_ID
FEATURE ID: An ID for the feature.
DESCRIPTION: Feature description.
VALUE_TYPE: The feature's value type.
DURATION: (Optional) The number of seconds between each photo. The value must have an's' at the end.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/featurestores/FEATURESTORE_ID/entityTypes/ENTITY_TYPE_ID/features:batchCreate
List every feature at a particular location. See the Searching for features method to search across all entity kinds and featurestores in a specific area. Send a GET request using the featurestores.entityTypes.features.list method to list all features for a specific entity type.
Make the following substitutions before utilizing any request data:
LOCATION: The area in which the featurestore is situated, for example, us-central
PROJECT: Your unique project ID.
FEATURESTORE ID: The featurestore's ID.
ENTITY_TYPE_ID: The entity type's ID.
Use the PowerShell command below to send your request:
Use the feature ID, entity type ID, or feature description to search for features based on one or more of their properties. Vertex AI Feature Store searches all entity kinds and featurestores in a specific area. You can restrict the results by applying a filter to particular featurestores, value kinds, and labels.
Send a GET request using the featurestores.searchFeatures method to look for features. The search parameters in the next example are defined as featureId:test AND valueType=STRING. The search produces features with tests in their IDs and STRING-type values.
Make the following substitutions before utilizing any request data:
LOCATION: The area in which the featurestore is situated, for example, us-central
PROJECT: Your unique project ID.
HTTP method and URL:
GET https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/featurestores:searchFeatures?query="featureId:test%20AND%20valueType=STRING"
Use the PowerShell command below to send your request:
Use the feature ID, entity type ID, or feature description to search for features based on one or more of their properties. Vertex AI Feature Store searches all entity kinds and featurestores in a specific area. You can restrict the results by applying a filter to particular featurestores, value kinds, and labels.
Send a GET request using the featurestores.searchFeatures method to look for features. The search parameters in the next example are defined as featureId:test AND valueType=STRING. The search produces features with tests in their IDs and STRING-type values.
Make the following substitutions before utilizing any request data:
LOCATION: The area in which the featurestore is situated, for example, us-central
PROJECT: Your unique project ID.
HTTP method and URL:
GET https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/featurestores/FEATURESTORE_ID/entityTypes/ENTITY_TYPE_ID/features/FEATURE_ID
Use the PowerShell command below to send your request:
Remove a feature together with each of its values. Use the featurestores.entityTypes.features.delete function to make a DELETE request to delete a feature. Make the following substitutions before utilizing any request data:
LOCATION: The area where the featurestore is situated, for example, us-central1.
You may keep an eye on featurestores and features in the Vertex AI Feature Store and create alerts for them. An operations team might, for instance, keep an eye on a featurestore to gauge its CPU usage. Data scientists and other feature owners may monitor feature values to spot drift over time.
The following sections provide descriptions of the featurestore and feature monitoring techniques:
Featurestore monitoring
Feature monitoring
Featurestore monitoring
Vertex AI Feature Store sends metrics, such as CPU load, storage capacity, and request latencies, about your featurestore to Cloud Monitoring. These metrics are gathered and reported for you by Vertex AI. No configuration or featurestore monitoring enablement is required.
Use Cloud Monitoring to set thresholds and notifications. For instance, you can configure an alert for when the average CPU load exceeds 70%, which may need adding more featurestore nodes.
You may also view featurestore metrics in the Vertex AI console to examine trends over time. The console displays aggregated or computed numbers for various visualizations to make the information easier to understand. The raw data is always accessible in Cloud Monitoring.
Feature value monitoring
A feature's value distribution in a featurestore can be monitored to see how much it changes over time. The following are two varieties of feature value monitoring that are supported:
Snapshot Analysis: The Vertex AI Feature Store periodically snapshot your feature values. The distribution of your feature values may shift over time as you take in more and more data. This modification suggests that any models utilizing those attributes may require retraining. You can set a threshold so that anomalies are recorded in the Cloud Logging console anytime the distribution deviation exceeds the threshold.
Import Feature Analysis: Each ImportFeatureValues action generates distribution statistics for the values ingested into the Vertex AI Feature Store. By contrasting your distribution statistics with the feature value distribution that was previously imported or, if enabled, the snapshot distribution, you can select to look for abnormalities.
Consider a feature that gathers recent home sale prices and then incorporates the data into a model that forecasts the price of a home. The values in the batch of imported values may contain information that considerably deviates from the training data, or the prices of recently sold residences may fluctuate significantly over time. This modification is announced to you by Vertex AI Feature Store. Then, you can retrain your model to incorporate the most recent data.
Set a monitoring configuration
You can define a monitoring configuration on an entity type to enable monitoring for all of the following sorts of features to begin monitoring:
BOOL
STRING
DOUBLE
INT64
When you create or modify an entity type, you have the option to specify the monitoring configuration. Using the disableMonitoring property, you can decide not to monitor a certain function. The setting for entity type monitoring includes the following information:
Whether to make monitoring possible. By default, monitoring is turned off.
Thresholds for identifying anomalies. 0.3 is the default threshold.
In addition to the time between each snapshot, the lookback window (for snapshot analysis). 21 is the default value.
To enable import feature analysis or not. Disabled is the default.
Web UI
From the UI, only snapshot analysis is supported.
Navigate to the Features page in the Vertex AI part of the Google Cloud console.
From the Region drop-down menu, choose a region.
Simply select Create Entity Type.
Toggle Enabled in the Feature monitoring section.
The Monitoring time interval area is where you may enter the number of days between photos.
In the Monitoring lookback window field, specify how many days to look back for each snapshot.
In the field labeled "Numerical alerting threshold," enter the number representing the anomaly detection threshold.
In the Category alerting threshold field, enter the number that represents the threshold for categorical features in this EntityType that are used to detect abnormalities.
Press Create.
Select an entity type in the features table in a similar manner.
To add features, click.
If monitoring is turned on for the parent entity type, turn it off.
REST & CMD LINE
Send a POST request using the entityTypes.construct method to create an entity type. Make the following substitutions before utilizing any request data:
LOCATION: The region where the featurestore is situated, for example, us-central1.
PROJECT: Your project ID.
FEATURESTORE_ID: The featurestore's ID.
ENTITY_TYPE_ID: The entity type's ID.
DURATION: The number of days between each snapshot.
STALENESS_DAYS: The amount of time between photos that should be considered.
NUMERICAL_THRESHOLD_VALUE: The threshold for detecting anomalies in numerical characteristics for this entity type. The Jenson-Shannon divergence is used to calculate the statistical deviation.
CATEGORICAL_THRESHOLD_VALUE: The cutoff point for this entity type's categorical feature anomaly detection. The L-Infinity distance is used to calculate statistical deviation.
IMPORT_FEATURE_ANALYSIS_STATE: The status indicating whether to enable import feature analysis.
IMPORT_FEATURE_ANALYSIS_BASELINE: The import feature analysis baseline, if enabled.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1beta1/projects/PROJECT/locations/LOCATION/featurestores/FEATURESTORE_ID/entityTypes?entityTypeId=ENTITY_TYPE_ID
Use the entityTypes.patch function to send a PATCH request to update an entity type. Make the following substitutions before utilizing any request data:
LOCATION: The region where the featurestore is situated, for example, us-central1.
PROJECT: Your project ID.
FEATURESTORE_ID: The featurestore's ID.
ENTITY_TYPE_ID: The entity type's ID.
DURATION_IN_DAYS: The number of days between each snapshot.
STALENESS_DAYS: The amount of time between photos that should be considered.
NUMERICAL_THRESHOLD_VALUE: The threshold for detecting anomalies in numerical characteristics for this entity type. The Jenson-Shannon divergence is used to calculate the statistical deviation.
CATEGORICAL_THRESHOLD_VALUE: The cutoff point for this entity type's categorical feature anomaly detection. The L-Infinity distance is used to calculate statistical deviation.
IMPORT_FEATURE_ANALYSIS_STATE: The status indicating whether to enable import feature analysis.
IMPORT_FEATURE_ANALYSIS_BASELINE: The import feature analysis baseline, if enabled.
View the distribution of feature values over time using the console.
Navigate to the Features page in the Vertex AI part of the Google Cloud console. Go to the Features page
From the Region drop-down menu, choose a region.
Find the feature you wish to view details about by looking through the Features column in the features table.
The feature's status shows if monitoring is enabled. The parent entity type determines the inheritance of all other feature monitoring settings.
To view the feature value distribution metrics, click Metrics.
View feature value anomalies
Abnormality logs are submitted to Cloud Logging for each monitoring pipeline when the distribution of a feature deviates beyond the threshold, signaling the detection of an anomaly. Any downstream services that Cloud Logging supports, such as Pub/Sub, can sync the logs. The anomalies are recorded in the "featurestore log" log in Cloud Logging.
A tool for keeping frequently used features is a feature store. The feature store can be expanded with new features that data scientists create for machine learning models. This allows for the reuse of certain functionalities.
What distinguishes a feature store from a database?
A feature store is a repository for machine learning features. Unlike a data warehouse, it is a dual database, with one providing features to online applications with minimal latency and the other storing massive volumes of features.
Why is an online feature store necessary?
A feature store becomes a crucial tool for data scientists throughout this process. Data scientists may streamline how features are maintained with feature stores, opening the door to more effective procedures while guaranteeing that features are correctly saved, recorded, and tested.
Conclusion
In this article, we have extensively discussed the Vertex AI feature store. We have also explained the benefits of Vertex AI, data model and resources, source data requirement, steps to set up, monitoring, and more in detail.