Introduction
The AI Platform Training in Google Cloud hosts several machine learning frameworks such as TensorFlow, scikit-learn, or XGBoost. There are various options to configure your development environment. After training, you can also customise your development environment and deploy your trained model to AI Platform Prediction. Before running the training application with AI Platform Training, the code must be uploaded to the Cloud. Packaging the trained application and running and monitoring it constitute the training workflow in Google Cloud.
Packaging an Application
For training applications with AI Platform Training, one must upload their code and any dependencies into a Cloud Storage bucket accessible by the Google Cloud project. Let’s see the different ways to package an application to the Cloud.
Using gcloud for Packaging
It is the simplest way to package a training application and upload it along with its dependencies to the Cloud. A single command in the Google Cloud CLI can package and upload the application - “gcloud ai-platform jobs submit training”.
It's helpful to define the configuration values as shell variables in the CLI:
PACKAGEPATH='LOCAL_PACKAGE_PATH'
MODULENAME='MODULE_NAME'
STAGINGBUCKET='BUCKET_NAME'
JOBNAME='JOB_NAME'
JOBDIR='JOB_OUTPUT_PATH'
REGION='REGION'
gcloud ai-platform jobs submit training $JOBNAME \
--staging-bucket=$STAGINGBUCKET \
--job-dir=$JOBDIR \
--package-path=$PACKAGEPATH \
--module-name=$MODULENAME \
--region=$REGION \
-- \
--user_first_arg=first_arg_value \
--user_second_arg=second_arg_value
-
PACKAGE_PATH is the path to the package’s directory in the local environment.
-
MODULE_NAME is the full name of the training module.
-
BUCKET_NAME is the name of a Cloud Storage bucket.
-
JOB_NAME is a name for the training job
-
JOB_OUTPUT_PATH is the URI of a Cloud Storage directory where the training job will save its output.
- REGION to define the region where the training job has to run.
Working with dependencies
Dependencies are packages that are imported into the code. The application may have any number of dependencies to make it work. A training application runs on training instances that have many Python packages previously installed. A user may need to add two types of dependencies:
-
Standard dependencies - The common Python packages available on PyPI.
- Custom packages developed by the user or those internal to an organisation.
Standard Dependencies can be added to the setup.py script in the training application's root directory. The pip command can be used to install the package in the training instances
from setuptools import find_packages
from setuptools import setup
REQUIRED_PACKAGES = ['some_PyPI_package>=1.0']
setup(
name='trainer_gcp',
version='0.2',
install_requires=REQUIRED_PACKAGES,
packages=find_packages(),
include_package_data=True,
description='The training application packages.')
Run the following command to execute the setup.py script.
python setup.py sdist
There are options to specify custom dependencies for the training application. Users can pass the paths as part of the job configuration. The URI to the package of each dependency. All custom dependencies have to be stored in Cloud Storage.
In the gcloud CLI, users can specify the dependencies on their local machine and Cloud Storage as part of the “gcloud ai-platform jobs submit training” command. The --packages flag includes the dependencies in a comma-separated list.
Uploading Existing Packages
Users can upload previously built packages with the Cloud CLI. These could be uploaded from the local system or Cloud Storage. In the “gcloud ai-platform jobs submit training” command:
-
Set the --packages flag to the path of the packaged application.
- Set the --module-name flag to the name of the application's main module, the package namespace dot notation.
gcloud ai-platform jobs submit training $JOBNAME \
--staging-bucket $PACKAGESTAGINGPATH \
--job-dir $JOBDIR \
--packages trainer-0.0.1.tar.gz \
--module-name $MODULENAME \
--region us-central1 \
-- \
--user_first_arg=first_arg_value \
--user_second_arg=second_arg_value
Packages can also be uploaded manually by using the gsutil tool:
gsutil cp /local/path/to/package1.tar.gz gs://bucket/path/