notebooks/official/ml_metadata/sdk-metric-parameter-tracking-for-custom-jobs.ipynb (804 lines of code) (raw):

{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "id": "ur8xi4C7S06n" }, "outputs": [], "source": [ "# Copyright 2021 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "JAPoU8Sm5E6e" }, "source": [ "# Vertex AI: Track parameters and metrics for custom training jobs\n", "\n", "<table align=\"left\">\n", " <td style=\"text-align: center\">\n", " <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/ml_metadata/sdk-metric-parameter-tracking-for-custom-jobs.ipynb\">\n", " <img src=\"https://cloud.google.com/ml-engine/images/colab-logo-32px.png\" alt=\"Google Colaboratory logo\"><br> Open in Colab\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fvertex-ai-samples%2Fmain%2Fnotebooks%2Fofficial%2Fml_metadata%2Fsdk-metric-parameter-tracking-for-custom-jobs.ipynb\">\n", " <img src=\"https://cloud.google.com/ml-engine/images/colab-enterprise-logo-32px.png\" alt=\"Google Cloud Colab Enterprise logo\"><br> Open in Colab Enterprise\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/ml_metadata/sdk-metric-parameter-tracking-for-custom-jobs.ipynb\">\n", " <img src=\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt=\"GitHub logo\"><br> View on GitHub\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", "<a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/official/ml_metadata/sdk-metric-parameter-tracking-for-custom-jobs.ipynb\" target='_blank'>\n", " <img src=\"https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n", " </a>\n", " </td>\n", "</table>\n", "<br/><br/><br/>" ] }, { "cell_type": "markdown", "metadata": { "id": "2e0464050974" }, "source": [ "## Overview\n", "\n", "This notebook demonstrates how to track metrics and parameters for Vertex AI custom training jobs, and how to perform detailed analysis using this data.\n", "\n", "Learn more about [Vertex ML Metadata](https://cloud.google.com/vertex-ai/docs/ml-metadata),\n", "[Custom training](https://cloud.google.com/vertex-ai/docs/training/custom-training), and \n", "[Vertex AI Experiments](https://cloud.google.com/vertex-ai/docs/experiments/intro-vertex-ai-experiments)." ] }, { "cell_type": "markdown", "metadata": { "id": "b95ab729fccd" }, "source": [ "### Objective\n", "\n", "In this notebook, you learn how to use Vertex AI SDK for Python to:\n", "\n", "This tutorial uses the following Google Cloud ML services and resources:\n", "- Vertex AI Dataset\n", "- Vertex AI Model\n", "- Vertex AI Endpoint\n", "- Vertex AI Custom Training Job\n", "- Vertex AI Experiments\n", "\n", "The steps performed include:\n", "- Track training parameters and prediction metrics for a custom training job.\n", "- Extract and perform analysis for all parameters and metrics within an Experiment." ] }, { "cell_type": "markdown", "metadata": { "id": "9fd87cf689bf" }, "source": [ "### Dataset\n", "\n", "This example uses the Abalone Dataset. For more information about this dataset please visit: https://archive.ics.uci.edu/ml/datasets/abalone" ] }, { "cell_type": "markdown", "metadata": { "id": "tvgnzT1CKxrO" }, "source": [ "### Costs \n", "\n", "\n", "This tutorial uses billable components of Google Cloud:\n", "\n", "* Vertex AI\n", "* Cloud Storage\n", "\n", "Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and \n", "[Cloud Storage pricing](https://cloud.google.com/storage/pricing), and use the \n", "[Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage." ] }, { "cell_type": "markdown", "metadata": { "id": "i7EUnXsZhAGF" }, "source": [ "### Get Started\n", "Install Vertex AI SDK for Python and other required packages" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "qblyW_dcyOQA" }, "outputs": [], "source": [ "! pip3 install --upgrade tensorflow \\\n", " google-cloud-aiplatform \\\n", " scikit-learn -q \\\n", " pandas" ] }, { "cell_type": "markdown", "metadata": { "id": "5eec42e37bcf" }, "source": [ "### Restart runtime (Colab only)\n", "To use the newly installed packages, you must restart the runtime on Google Colab." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "dcc98768955f" }, "outputs": [], "source": [ "import sys\n", "\n", "if \"google.colab\" in sys.modules:\n", "\n", " import IPython\n", "\n", " app = IPython.Application.instance()\n", " app.kernel.do_shutdown(True)" ] }, { "cell_type": "markdown", "metadata": { "id": "4de1bd77992b" }, "source": [ "<div class=\"alert alert-block alert-warning\">,\n", "<b>⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️</b>,\n", "</div>" ] }, { "cell_type": "markdown", "metadata": { "id": "56e219dbcb9a" }, "source": [ "### Authenticate your notebook environment (Colab only)\n", "Authenticate your environment on Google Colab." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "c97be6a73155" }, "outputs": [], "source": [ "import sys\n", "\n", "if \"google.colab\" in sys.modules:\n", "\n", " from google.colab import auth\n", "\n", " auth.authenticate_user()" ] }, { "cell_type": "markdown", "metadata": { "id": "442da99b7efa" }, "source": [ "### Set Google Cloud project information\n", "Learn more about [setting up a project and a development environment.](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "cde8e0876d62" }, "outputs": [], "source": [ "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}\n", "LOCATION = \"us-central1\"" ] }, { "cell_type": "markdown", "metadata": { "id": "zgPO1eR3CYjk" }, "source": [ "### Create a Cloud Storage bucket\n", "\n", "Create a storage bucket to store intermediate artifacts such as datasets." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "MzGDU7TWdts_" }, "outputs": [], "source": [ "BUCKET_URI = f\"gs://your-bucket-name-{PROJECT_ID}-unique\" # @param {type:\"string\"}" ] }, { "cell_type": "markdown", "metadata": { "id": "-EcIXiGsCePi" }, "source": [ "**If your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "NIq7R4HZCfIc" }, "outputs": [], "source": [ "! gsutil mb -l {LOCATION} -p {PROJECT_ID} {BUCKET_URI}" ] }, { "cell_type": "markdown", "metadata": { "id": "XoEqT2Y4DJmf" }, "source": [ "### Import libraries and define constants" ] }, { "cell_type": "markdown", "metadata": { "id": "Y9Uo3tifg1kx" }, "source": [ "Import required libraries.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "pRUOFELefqf1" }, "outputs": [], "source": [ "import pandas as pd\n", "from google.cloud import aiplatform\n", "from sklearn.metrics import mean_absolute_error, mean_squared_error\n", "from tensorflow.python.keras.utils import data_utils" ] }, { "cell_type": "markdown", "metadata": { "id": "O8XJZB3gR8eL" }, "source": [ "## Initialize Vertex AI and set an experiment\n" ] }, { "cell_type": "markdown", "metadata": { "id": "xtXZWmYqJ1bh" }, "source": [ "Define experiment name." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "JIOrI-hoJ46P" }, "outputs": [], "source": [ "EXPERIMENT_NAME = \"my-experiment-unique\"" ] }, { "cell_type": "markdown", "metadata": { "id": "DKIsYVjj56_X" }, "source": [ "Initialize the *client* for Vertex AI." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Wrlk2B2nJ7-X" }, "outputs": [], "source": [ "aiplatform.init(\n", " project=PROJECT_ID,\n", " location=LOCATION,\n", " staging_bucket=BUCKET_URI,\n", " experiment=EXPERIMENT_NAME,\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "6PlilQPFeS_h" }, "source": [ "### Tracking parameters and metrics in Vertex AI custom training jobs" ] }, { "cell_type": "markdown", "metadata": { "id": "f8fd397cc4f6" }, "source": [ "# Download the Dataset to Cloud Storage" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "V_T10yTTqcS_" }, "outputs": [], "source": [ "!wget https://storage.googleapis.com/download.tensorflow.org/data/abalone_train.csv\n", "!gsutil cp abalone_train.csv {BUCKET_URI}/data/\n", "\n", "gcs_csv_path = f\"{BUCKET_URI}/data/abalone_train.csv\"" ] }, { "cell_type": "markdown", "metadata": { "id": "35QVNhACqcTJ" }, "source": [ "### Create a Vertex AI Tabular dataset from CSV data\n", "\n", "A Vertex AI dataset can be used to create an AutoML model or a custom model. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "4OfCqaYRqcTJ" }, "outputs": [], "source": [ "ds = aiplatform.TabularDataset.create(display_name=\"abalone\", gcs_source=[gcs_csv_path])\n", "\n", "ds.resource_name" ] }, { "cell_type": "markdown", "metadata": { "id": "VcEOYYolqcTN" }, "source": [ "### Write the training script\n", "\n", "Next, you create the training script that is used in the sample custom training job." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "OauJqJmJqcTO" }, "outputs": [], "source": [ "%%writefile training_script.py\n", "\n", "import pandas as pd\n", "import argparse\n", "import os\n", "import tensorflow as tf\n", "from tensorflow import keras\n", "from tensorflow.keras import layers\n", "\n", "parser = argparse.ArgumentParser()\n", "parser.add_argument('--epochs', dest='epochs',\n", " default=10, type=int,\n", " help='Number of epochs.')\n", "parser.add_argument('--num_units', dest='num_units',\n", " default=64, type=int,\n", " help='Number of unit for first layer.')\n", "args = parser.parse_args()\n", "\n", "col_names = [\"Length\", \"Diameter\", \"Height\", \"Whole weight\", \"Shucked weight\", \"Viscera weight\", \"Shell weight\", \"Age\"]\n", "target = \"Age\"\n", "\n", "def aip_data_to_dataframe(wild_card_path):\n", " return pd.concat([pd.read_csv(fp.numpy().decode(), names=col_names)\n", " for fp in tf.data.Dataset.list_files([wild_card_path])])\n", "\n", "def get_features_and_labels(df):\n", " return df.drop(target, axis=1).values, df[target].values\n", "\n", "def data_prep(wild_card_path):\n", " return get_features_and_labels(aip_data_to_dataframe(wild_card_path))\n", "\n", "\n", "model = tf.keras.Sequential([layers.Dense(args.num_units), layers.Dense(1)])\n", "model.compile(loss='mse', optimizer='adam')\n", "\n", "model.fit(*data_prep(os.environ[\"AIP_TRAINING_DATA_URI\"]),\n", " epochs=args.epochs ,\n", " validation_data=data_prep(os.environ[\"AIP_VALIDATION_DATA_URI\"]))\n", "print(model.evaluate(*data_prep(os.environ[\"AIP_TEST_DATA_URI\"])))\n", "\n", "# save as Vertex AI Managed model\n", "tf.saved_model.save(model, os.environ[\"AIP_MODEL_DIR\"])" ] }, { "cell_type": "markdown", "metadata": { "id": "Yp2clkOJSDhR" }, "source": [ "### Launch a custom training job and track its trainig parameters on Vertex ML Metadata" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "btb6d48lqcTT" }, "outputs": [], "source": [ "job = aiplatform.CustomTrainingJob(\n", " display_name=\"train-abalone-dist-1-replica\",\n", " script_path=\"training_script.py\",\n", " container_uri=\"us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-8:latest\",\n", " requirements=[\"gcsfs==0.7.1\"],\n", " model_serving_container_image_uri=\"us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-8:latest\",\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "k_QorXXztzPH" }, "source": [ "Start a new experiment run to track training parameters and start the training job. Note that this operation takes around 10 minutes." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "oVTORjQpJ7-Y" }, "outputs": [], "source": [ "aiplatform.start_run(\n", " \"custom-training-run-unique\"\n", ") # Change this to your desired run name\n", "parameters = {\"epochs\": 10, \"num_units\": 64}\n", "aiplatform.log_params(parameters)\n", "\n", "model = job.run(\n", " ds,\n", " replica_count=1,\n", " model_display_name=\"abalone-model\",\n", " args=[f\"--epochs={parameters['epochs']}\", f\"--num_units={parameters['num_units']}\"],\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "5vhDsMJNqcTW" }, "source": [ "### Deploy model and calculate prediction metrics" ] }, { "cell_type": "markdown", "metadata": { "id": "O-uCOL3Naap4" }, "source": [ "Next, deploy your Vertex AI Model resource to a Vertex AI endpoint resource. This operation takes 10-20 minutes." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Y9GH72wWqcTX" }, "outputs": [], "source": [ "endpoint = model.deploy(machine_type=\"n1-standard-4\")" ] }, { "cell_type": "markdown", "metadata": { "id": "JY-5skFhasWs" }, "source": [ "### Prediction dataset preparation and online prediction" ] }, { "cell_type": "markdown", "metadata": { "id": "saw50bqwa-dR" }, "source": [ "Once model is deployed, perform online prediction using the `abalone_test` dataset and calculate prediction metrics.\n", "\n", "Prepare the prediction dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ABZQmqsWISQv" }, "outputs": [], "source": [ "def read_data(uri):\n", " dataset_path = data_utils.get_file(\"abalone_test.data\", uri)\n", " col_names = [\n", " \"Length\",\n", " \"Diameter\",\n", " \"Height\",\n", " \"Whole weight\",\n", " \"Shucked weight\",\n", " \"Viscera weight\",\n", " \"Shell weight\",\n", " \"Age\",\n", " ]\n", " dataset = pd.read_csv(\n", " dataset_path,\n", " names=col_names,\n", " na_values=\"?\",\n", " comment=\"\\t\",\n", " sep=\",\",\n", " skipinitialspace=True,\n", " )\n", " return dataset\n", "\n", "\n", "def get_features_and_labels(df):\n", " target = \"Age\"\n", " return df.drop(target, axis=1).values, df[target].values\n", "\n", "\n", "test_dataset, test_labels = get_features_and_labels(\n", " read_data(\n", " \"https://storage.googleapis.com/download.tensorflow.org/data/abalone_test.csv\"\n", " )\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "_HphZ38obJeB" }, "source": [ "Perform online prediction." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "eXD-OvsrKmCt" }, "outputs": [], "source": [ "prediction = endpoint.predict(test_dataset.tolist())\n", "prediction" ] }, { "cell_type": "markdown", "metadata": { "id": "TDKiv_O7bNwE" }, "source": [ "Calculate and track prediction evaluation metrics." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "cj0fHucbKopn" }, "outputs": [], "source": [ "mse = mean_squared_error(test_labels, prediction.predictions)\n", "mae = mean_absolute_error(test_labels, prediction.predictions)\n", "\n", "aiplatform.log_metrics({\"mse\": mse, \"mae\": mae})" ] }, { "cell_type": "markdown", "metadata": { "id": "CCGmesdIbbHf" }, "source": [ "### Extract all parameters and metrics created during this experiment." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "KlcEBou-Pl4Z" }, "outputs": [], "source": [ "aiplatform.get_experiment_df()" ] }, { "cell_type": "markdown", "metadata": { "id": "WTHvPMweMlP1" }, "source": [ "### View data in the Cloud Console" ] }, { "cell_type": "markdown", "metadata": { "id": "F19_5lw0MqXv" }, "source": [ "Parameters and metrics can also be viewed in the Cloud Console. \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "GmN9vE9pqqzt" }, "outputs": [], "source": [ "print(\"Vertex AI Experiments:\")\n", "print(\n", " f\"https://console.cloud.google.com/ai/platform/experiments/experiments?folder=&organizationId=&project={PROJECT_ID}\"\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "TpV-iwP9qw9c" }, "source": [ "## Cleaning up\n", "\n", "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", "\n", "Otherwise, you can delete the individual resources you created in this tutorial:\n", "Training Job\n", "Model\n", "Cloud Storage Bucket\n", "\n", "* Vertex AI Dataset\n", "* Training Job\n", "* Model\n", "* Endpoint\n", "* Cloud Storage Bucket\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "rwPZoZISHhaY" }, "outputs": [], "source": [ "# Warning: Setting this to true deletes everything in your bucket\n", "delete_bucket = False\n", "\n", "# Delete dataset\n", "ds.delete()\n", "\n", "# Delete experiment\n", "experiment = aiplatform.Experiment(\n", " experiment_name=EXPERIMENT_NAME, project=PROJECT_ID, location=LOCATION\n", ")\n", "experiment.delete()\n", "\n", "# Delete the training job\n", "job.delete()\n", "\n", "# Undeploy model from endpoint\n", "endpoint.undeploy_all()\n", "\n", "# Delete the endpoint\n", "endpoint.delete()\n", "\n", "# Delete the model\n", "model.delete()\n", "\n", "\n", "if delete_bucket:\n", " ! gsutil -m rm -r $BUCKET_URI" ] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "sdk-metric-parameter-tracking-for-custom-jobs.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }