notebooks/official/ml_metadata/sdk-metric-parameter-tracking-for-custom-jobs.ipynb (804 lines of code) (raw):
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ur8xi4C7S06n"
},
"outputs": [],
"source": [
"# Copyright 2021 Google LLC\n",
"#\n",
"# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "JAPoU8Sm5E6e"
},
"source": [
"# Vertex AI: Track parameters and metrics for custom training jobs\n",
"\n",
"<table align=\"left\">\n",
" <td style=\"text-align: center\">\n",
" <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/ml_metadata/sdk-metric-parameter-tracking-for-custom-jobs.ipynb\">\n",
" <img src=\"https://cloud.google.com/ml-engine/images/colab-logo-32px.png\" alt=\"Google Colaboratory logo\"><br> Open in Colab\n",
" </a>\n",
" </td>\n",
" <td style=\"text-align: center\">\n",
" <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fvertex-ai-samples%2Fmain%2Fnotebooks%2Fofficial%2Fml_metadata%2Fsdk-metric-parameter-tracking-for-custom-jobs.ipynb\">\n",
" <img src=\"https://cloud.google.com/ml-engine/images/colab-enterprise-logo-32px.png\" alt=\"Google Cloud Colab Enterprise logo\"><br> Open in Colab Enterprise\n",
" </a>\n",
" </td>\n",
" <td style=\"text-align: center\">\n",
" <a href=\"https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/ml_metadata/sdk-metric-parameter-tracking-for-custom-jobs.ipynb\">\n",
" <img src=\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt=\"GitHub logo\"><br> View on GitHub\n",
" </a>\n",
" </td>\n",
" <td style=\"text-align: center\">\n",
"<a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/official/ml_metadata/sdk-metric-parameter-tracking-for-custom-jobs.ipynb\" target='_blank'>\n",
" <img src=\"https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n",
" </a>\n",
" </td>\n",
"</table>\n",
"<br/><br/><br/>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2e0464050974"
},
"source": [
"## Overview\n",
"\n",
"This notebook demonstrates how to track metrics and parameters for Vertex AI custom training jobs, and how to perform detailed analysis using this data.\n",
"\n",
"Learn more about [Vertex ML Metadata](https://cloud.google.com/vertex-ai/docs/ml-metadata),\n",
"[Custom training](https://cloud.google.com/vertex-ai/docs/training/custom-training), and \n",
"[Vertex AI Experiments](https://cloud.google.com/vertex-ai/docs/experiments/intro-vertex-ai-experiments)."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "b95ab729fccd"
},
"source": [
"### Objective\n",
"\n",
"In this notebook, you learn how to use Vertex AI SDK for Python to:\n",
"\n",
"This tutorial uses the following Google Cloud ML services and resources:\n",
"- Vertex AI Dataset\n",
"- Vertex AI Model\n",
"- Vertex AI Endpoint\n",
"- Vertex AI Custom Training Job\n",
"- Vertex AI Experiments\n",
"\n",
"The steps performed include:\n",
"- Track training parameters and prediction metrics for a custom training job.\n",
"- Extract and perform analysis for all parameters and metrics within an Experiment."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9fd87cf689bf"
},
"source": [
"### Dataset\n",
"\n",
"This example uses the Abalone Dataset. For more information about this dataset please visit: https://archive.ics.uci.edu/ml/datasets/abalone"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tvgnzT1CKxrO"
},
"source": [
"### Costs \n",
"\n",
"\n",
"This tutorial uses billable components of Google Cloud:\n",
"\n",
"* Vertex AI\n",
"* Cloud Storage\n",
"\n",
"Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and \n",
"[Cloud Storage pricing](https://cloud.google.com/storage/pricing), and use the \n",
"[Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "i7EUnXsZhAGF"
},
"source": [
"### Get Started\n",
"Install Vertex AI SDK for Python and other required packages"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "qblyW_dcyOQA"
},
"outputs": [],
"source": [
"! pip3 install --upgrade tensorflow \\\n",
" google-cloud-aiplatform \\\n",
" scikit-learn -q \\\n",
" pandas"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5eec42e37bcf"
},
"source": [
"### Restart runtime (Colab only)\n",
"To use the newly installed packages, you must restart the runtime on Google Colab."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "dcc98768955f"
},
"outputs": [],
"source": [
"import sys\n",
"\n",
"if \"google.colab\" in sys.modules:\n",
"\n",
" import IPython\n",
"\n",
" app = IPython.Application.instance()\n",
" app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4de1bd77992b"
},
"source": [
"<div class=\"alert alert-block alert-warning\">,\n",
"<b>⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️</b>,\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "56e219dbcb9a"
},
"source": [
"### Authenticate your notebook environment (Colab only)\n",
"Authenticate your environment on Google Colab."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "c97be6a73155"
},
"outputs": [],
"source": [
"import sys\n",
"\n",
"if \"google.colab\" in sys.modules:\n",
"\n",
" from google.colab import auth\n",
"\n",
" auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "442da99b7efa"
},
"source": [
"### Set Google Cloud project information\n",
"Learn more about [setting up a project and a development environment.](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "cde8e0876d62"
},
"outputs": [],
"source": [
"PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}\n",
"LOCATION = \"us-central1\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "zgPO1eR3CYjk"
},
"source": [
"### Create a Cloud Storage bucket\n",
"\n",
"Create a storage bucket to store intermediate artifacts such as datasets."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "MzGDU7TWdts_"
},
"outputs": [],
"source": [
"BUCKET_URI = f\"gs://your-bucket-name-{PROJECT_ID}-unique\" # @param {type:\"string\"}"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-EcIXiGsCePi"
},
"source": [
"**If your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "NIq7R4HZCfIc"
},
"outputs": [],
"source": [
"! gsutil mb -l {LOCATION} -p {PROJECT_ID} {BUCKET_URI}"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "XoEqT2Y4DJmf"
},
"source": [
"### Import libraries and define constants"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Y9Uo3tifg1kx"
},
"source": [
"Import required libraries.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "pRUOFELefqf1"
},
"outputs": [],
"source": [
"import pandas as pd\n",
"from google.cloud import aiplatform\n",
"from sklearn.metrics import mean_absolute_error, mean_squared_error\n",
"from tensorflow.python.keras.utils import data_utils"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "O8XJZB3gR8eL"
},
"source": [
"## Initialize Vertex AI and set an experiment\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xtXZWmYqJ1bh"
},
"source": [
"Define experiment name."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "JIOrI-hoJ46P"
},
"outputs": [],
"source": [
"EXPERIMENT_NAME = \"my-experiment-unique\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DKIsYVjj56_X"
},
"source": [
"Initialize the *client* for Vertex AI."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Wrlk2B2nJ7-X"
},
"outputs": [],
"source": [
"aiplatform.init(\n",
" project=PROJECT_ID,\n",
" location=LOCATION,\n",
" staging_bucket=BUCKET_URI,\n",
" experiment=EXPERIMENT_NAME,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6PlilQPFeS_h"
},
"source": [
"### Tracking parameters and metrics in Vertex AI custom training jobs"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "f8fd397cc4f6"
},
"source": [
"# Download the Dataset to Cloud Storage"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "V_T10yTTqcS_"
},
"outputs": [],
"source": [
"!wget https://storage.googleapis.com/download.tensorflow.org/data/abalone_train.csv\n",
"!gsutil cp abalone_train.csv {BUCKET_URI}/data/\n",
"\n",
"gcs_csv_path = f\"{BUCKET_URI}/data/abalone_train.csv\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "35QVNhACqcTJ"
},
"source": [
"### Create a Vertex AI Tabular dataset from CSV data\n",
"\n",
"A Vertex AI dataset can be used to create an AutoML model or a custom model. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "4OfCqaYRqcTJ"
},
"outputs": [],
"source": [
"ds = aiplatform.TabularDataset.create(display_name=\"abalone\", gcs_source=[gcs_csv_path])\n",
"\n",
"ds.resource_name"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VcEOYYolqcTN"
},
"source": [
"### Write the training script\n",
"\n",
"Next, you create the training script that is used in the sample custom training job."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "OauJqJmJqcTO"
},
"outputs": [],
"source": [
"%%writefile training_script.py\n",
"\n",
"import pandas as pd\n",
"import argparse\n",
"import os\n",
"import tensorflow as tf\n",
"from tensorflow import keras\n",
"from tensorflow.keras import layers\n",
"\n",
"parser = argparse.ArgumentParser()\n",
"parser.add_argument('--epochs', dest='epochs',\n",
" default=10, type=int,\n",
" help='Number of epochs.')\n",
"parser.add_argument('--num_units', dest='num_units',\n",
" default=64, type=int,\n",
" help='Number of unit for first layer.')\n",
"args = parser.parse_args()\n",
"\n",
"col_names = [\"Length\", \"Diameter\", \"Height\", \"Whole weight\", \"Shucked weight\", \"Viscera weight\", \"Shell weight\", \"Age\"]\n",
"target = \"Age\"\n",
"\n",
"def aip_data_to_dataframe(wild_card_path):\n",
" return pd.concat([pd.read_csv(fp.numpy().decode(), names=col_names)\n",
" for fp in tf.data.Dataset.list_files([wild_card_path])])\n",
"\n",
"def get_features_and_labels(df):\n",
" return df.drop(target, axis=1).values, df[target].values\n",
"\n",
"def data_prep(wild_card_path):\n",
" return get_features_and_labels(aip_data_to_dataframe(wild_card_path))\n",
"\n",
"\n",
"model = tf.keras.Sequential([layers.Dense(args.num_units), layers.Dense(1)])\n",
"model.compile(loss='mse', optimizer='adam')\n",
"\n",
"model.fit(*data_prep(os.environ[\"AIP_TRAINING_DATA_URI\"]),\n",
" epochs=args.epochs ,\n",
" validation_data=data_prep(os.environ[\"AIP_VALIDATION_DATA_URI\"]))\n",
"print(model.evaluate(*data_prep(os.environ[\"AIP_TEST_DATA_URI\"])))\n",
"\n",
"# save as Vertex AI Managed model\n",
"tf.saved_model.save(model, os.environ[\"AIP_MODEL_DIR\"])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Yp2clkOJSDhR"
},
"source": [
"### Launch a custom training job and track its trainig parameters on Vertex ML Metadata"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "btb6d48lqcTT"
},
"outputs": [],
"source": [
"job = aiplatform.CustomTrainingJob(\n",
" display_name=\"train-abalone-dist-1-replica\",\n",
" script_path=\"training_script.py\",\n",
" container_uri=\"us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-8:latest\",\n",
" requirements=[\"gcsfs==0.7.1\"],\n",
" model_serving_container_image_uri=\"us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-8:latest\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "k_QorXXztzPH"
},
"source": [
"Start a new experiment run to track training parameters and start the training job. Note that this operation takes around 10 minutes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "oVTORjQpJ7-Y"
},
"outputs": [],
"source": [
"aiplatform.start_run(\n",
" \"custom-training-run-unique\"\n",
") # Change this to your desired run name\n",
"parameters = {\"epochs\": 10, \"num_units\": 64}\n",
"aiplatform.log_params(parameters)\n",
"\n",
"model = job.run(\n",
" ds,\n",
" replica_count=1,\n",
" model_display_name=\"abalone-model\",\n",
" args=[f\"--epochs={parameters['epochs']}\", f\"--num_units={parameters['num_units']}\"],\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5vhDsMJNqcTW"
},
"source": [
"### Deploy model and calculate prediction metrics"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "O-uCOL3Naap4"
},
"source": [
"Next, deploy your Vertex AI Model resource to a Vertex AI endpoint resource. This operation takes 10-20 minutes."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Y9GH72wWqcTX"
},
"outputs": [],
"source": [
"endpoint = model.deploy(machine_type=\"n1-standard-4\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "JY-5skFhasWs"
},
"source": [
"### Prediction dataset preparation and online prediction"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "saw50bqwa-dR"
},
"source": [
"Once model is deployed, perform online prediction using the `abalone_test` dataset and calculate prediction metrics.\n",
"\n",
"Prepare the prediction dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ABZQmqsWISQv"
},
"outputs": [],
"source": [
"def read_data(uri):\n",
" dataset_path = data_utils.get_file(\"abalone_test.data\", uri)\n",
" col_names = [\n",
" \"Length\",\n",
" \"Diameter\",\n",
" \"Height\",\n",
" \"Whole weight\",\n",
" \"Shucked weight\",\n",
" \"Viscera weight\",\n",
" \"Shell weight\",\n",
" \"Age\",\n",
" ]\n",
" dataset = pd.read_csv(\n",
" dataset_path,\n",
" names=col_names,\n",
" na_values=\"?\",\n",
" comment=\"\\t\",\n",
" sep=\",\",\n",
" skipinitialspace=True,\n",
" )\n",
" return dataset\n",
"\n",
"\n",
"def get_features_and_labels(df):\n",
" target = \"Age\"\n",
" return df.drop(target, axis=1).values, df[target].values\n",
"\n",
"\n",
"test_dataset, test_labels = get_features_and_labels(\n",
" read_data(\n",
" \"https://storage.googleapis.com/download.tensorflow.org/data/abalone_test.csv\"\n",
" )\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_HphZ38obJeB"
},
"source": [
"Perform online prediction."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "eXD-OvsrKmCt"
},
"outputs": [],
"source": [
"prediction = endpoint.predict(test_dataset.tolist())\n",
"prediction"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TDKiv_O7bNwE"
},
"source": [
"Calculate and track prediction evaluation metrics."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "cj0fHucbKopn"
},
"outputs": [],
"source": [
"mse = mean_squared_error(test_labels, prediction.predictions)\n",
"mae = mean_absolute_error(test_labels, prediction.predictions)\n",
"\n",
"aiplatform.log_metrics({\"mse\": mse, \"mae\": mae})"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CCGmesdIbbHf"
},
"source": [
"### Extract all parameters and metrics created during this experiment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "KlcEBou-Pl4Z"
},
"outputs": [],
"source": [
"aiplatform.get_experiment_df()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "WTHvPMweMlP1"
},
"source": [
"### View data in the Cloud Console"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "F19_5lw0MqXv"
},
"source": [
"Parameters and metrics can also be viewed in the Cloud Console. \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "GmN9vE9pqqzt"
},
"outputs": [],
"source": [
"print(\"Vertex AI Experiments:\")\n",
"print(\n",
" f\"https://console.cloud.google.com/ai/platform/experiments/experiments?folder=&organizationId=&project={PROJECT_ID}\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TpV-iwP9qw9c"
},
"source": [
"## Cleaning up\n",
"\n",
"To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n",
"project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n",
"\n",
"Otherwise, you can delete the individual resources you created in this tutorial:\n",
"Training Job\n",
"Model\n",
"Cloud Storage Bucket\n",
"\n",
"* Vertex AI Dataset\n",
"* Training Job\n",
"* Model\n",
"* Endpoint\n",
"* Cloud Storage Bucket\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "rwPZoZISHhaY"
},
"outputs": [],
"source": [
"# Warning: Setting this to true deletes everything in your bucket\n",
"delete_bucket = False\n",
"\n",
"# Delete dataset\n",
"ds.delete()\n",
"\n",
"# Delete experiment\n",
"experiment = aiplatform.Experiment(\n",
" experiment_name=EXPERIMENT_NAME, project=PROJECT_ID, location=LOCATION\n",
")\n",
"experiment.delete()\n",
"\n",
"# Delete the training job\n",
"job.delete()\n",
"\n",
"# Undeploy model from endpoint\n",
"endpoint.undeploy_all()\n",
"\n",
"# Delete the endpoint\n",
"endpoint.delete()\n",
"\n",
"# Delete the model\n",
"model.delete()\n",
"\n",
"\n",
"if delete_bucket:\n",
" ! gsutil -m rm -r $BUCKET_URI"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"name": "sdk-metric-parameter-tracking-for-custom-jobs.ipynb",
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}