notebooks/official/ml_metadata/sdk-metric-parameter-tracking-for-locally-trained-models.ipynb (717 lines of code) (raw):
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ur8xi4C7S06n"
},
"outputs": [],
"source": [
"# Copyright 2022 Google LLC\n",
"#\n",
"# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "JAPoU8Sm5E6e"
},
"source": [
"# Vertex AI: Track parameters and metrics for locally trained models\n",
"\n",
"<table align=\"left\">\n",
" <td style=\"text-align: center\">\n",
" <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/ml_metadata/sdk-metric-parameter-tracking-for-locally-trained-models.ipynb\">\n",
" <img src=\"https://cloud.google.com/ml-engine/images/colab-logo-32px.png\" alt=\"Google Colaboratory logo\"><br> Open in Colab\n",
" </a>\n",
" </td>\n",
" <td style=\"text-align: center\">\n",
" <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fvertex-ai-samples%2Fmain%2Fnotebooks%2Fofficial%2Fml_metadata%2Fsdk-metric-parameter-tracking-for-locally-trained-models.ipynb\">\n",
" <img src=\"https://cloud.google.com/ml-engine/images/colab-enterprise-logo-32px.png\" alt=\"Google Cloud Colab Enterprise logo\"><br> Open in Colab Enterprise\n",
" </a>\n",
" </td>\n",
" <td style=\"text-align: center\">\n",
" <a href=\"https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/ml_metadata/sdk-metric-parameter-tracking-for-locally-trained-models.ipynb\">\n",
" <img src=\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt=\"GitHub logo\"><br> View on GitHub\n",
" </a>\n",
" </td>\n",
" <td style=\"text-align: center\">\n",
"<a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/official/ml_metadata/sdk-metric-parameter-tracking-for-locally-trained-models.ipynb\" target='_blank'>\n",
" <img src=\"https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n",
" </a>\n",
" </td>\n",
"</table>\n",
"<br/><br/><br/>\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "e150477c2d92"
},
"source": [
"## Overview\n",
"\n",
"This notebook demonstrates how to track metrics and parameters for ML training jobs and analyze this metadata using Vertex AI SDK for Python.\n",
"\n",
"Learn more about [Vertex ML Metadata](https://cloud.google.com/vertex-ai/docs/ml-metadata)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tvgnzT1CKxrO"
},
"source": [
"### Objective\n",
"\n",
"In this notebook, you learn how to use Vertex ML Metadata to track training parameters and evaluation metrics.\n",
"\n",
"This tutorial uses the following Google Cloud ML services:\n",
"\n",
"- Vertex ML Metadata\n",
"- Vertex AI Experiments\n",
"\n",
"The steps performed include:\n",
"\n",
"- Track parameters and metrics for a locally trained model.\n",
"- Extract and perform analysis for all parameters and metrics within an experiment."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "d2b71369d6d3"
},
"source": [
"### Dataset\n",
"\n",
"In this notebook, you train a simple distributed neural network (DNN) model to predict automobile's miles per gallon (MPG) based on automobile information in the [auto-mpg dataset](https://www.kaggle.com/devanshbesain/exploration-and-analysis-auto-mpg)."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0c997d8d92ce"
},
"source": [
"### Costs \n",
"\n",
"\n",
"This tutorial uses billable components of Google Cloud:\n",
"\n",
"* Vertex AI\n",
"* Cloud Storage\n",
"\n",
"\n",
"Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and \n",
"[Cloud Storage pricing](https://cloud.google.com/storage/pricing), and use the \n",
"[Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "i7EUnXsZhAGF"
},
"source": [
"## Get started\n",
"Install Vertex AI SDK for Python and other required packages"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "IaYsrh0Tc17L"
},
"outputs": [],
"source": [
"! pip install --upgrade --quiet google-cloud-aiplatform \\\n",
" tensorflow==2.11 \\\n",
" matplotlib \\\n",
" pandas \\\n",
" 'numpy<2.0.0'"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5eec42e37bcf"
},
"source": [
"### Restart runtime (Colab only)\n",
"To use the newly installed packages, you must restart the runtime on Google Colab."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "dcc98768955f"
},
"outputs": [],
"source": [
"import sys\n",
"\n",
"if \"google.colab\" in sys.modules:\n",
"\n",
" import IPython\n",
"\n",
" app = IPython.Application.instance()\n",
" app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4de1bd77992b"
},
"source": [
"<div class=\"alert alert-block alert-warning\">,\n",
"<b>⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️</b>,\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "56e219dbcb9a"
},
"source": [
"### Authenticate your notebook environment (Colab only)\n",
"Authenticate your environment on Google Colab."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "c97be6a73155"
},
"outputs": [],
"source": [
"import sys\n",
"\n",
"if \"google.colab\" in sys.modules:\n",
"\n",
" from google.colab import auth\n",
"\n",
" auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "442da99b7efa"
},
"source": [
"### Set Google Cloud project information\n",
"Learn more about [setting up a project and a development environment.](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "oM1iC_MfAts1"
},
"outputs": [],
"source": [
"PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}\n",
"LOCATION = \"us-central1\" # @param {type:\"string\"}"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "XoEqT2Y4DJmf"
},
"source": [
"### Import libraries and define constants\n",
"Import required libraries."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "pRUOFELefqf1"
},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"from google.cloud import aiplatform\n",
"from tensorflow.python.keras import Sequential, layers\n",
"from tensorflow.python.keras.utils import data_utils"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xtXZWmYqJ1bh"
},
"source": [
"Define some constants"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "JIOrI-hoJ46P"
},
"outputs": [],
"source": [
"EXPERIMENT_NAME = \"my-experiment-name-unique\" # @param {type:\"string\"}"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Xuny18aMcWDb"
},
"source": [
"## Concepts\n",
"\n",
"To better understanding how parameters and metrics are stored and organized, we'd like to introduce the following concepts:\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "NThDci5bp0Uw"
},
"source": [
"### Experiment\n",
"Experiments describe a context that groups your runs and the artifacts you create into a logical session. For example, in this notebook you create an experiment and log data to that experiment."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "SAyRR3Ydp4X5"
},
"source": [
"### Run\n",
"A run represents a single path/avenue that you executed while performing an experiment. A run includes artifacts that you used as inputs or outputs, and parameters that you used in this execution. An experiment can contain multiple runs. "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "l1YW2pgyegFP"
},
"source": [
"## Getting started tracking parameters and metrics\n",
"\n",
"You can use the Vertex SDK for Python to track metrics and parameters for models trained locally. \n",
"\n",
"In the following example, you train a simple distributed neural network (DNN) model to predict automobile's miles per gallon (MPG) based on automobile information in the [auto-mpg dataset](https://www.kaggle.com/devanshbesain/exploration-and-analysis-auto-mpg)."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "KPY41M9_AhZU"
},
"source": [
"### Load and process the training dataset"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bfMQSmRuUuX-"
},
"source": [
"Download and process the dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "RiQuMv4bmpuV"
},
"outputs": [],
"source": [
"def read_data(uri):\n",
" dataset_path = data_utils.get_file(\"auto-mpg.data\", uri)\n",
" column_names = [\n",
" \"MPG\",\n",
" \"Cylinders\",\n",
" \"Displacement\",\n",
" \"Horsepower\",\n",
" \"Weight\",\n",
" \"Acceleration\",\n",
" \"Model Year\",\n",
" \"Origin\",\n",
" ]\n",
" raw_dataset = pd.read_csv(\n",
" dataset_path,\n",
" names=column_names,\n",
" na_values=\"?\",\n",
" comment=\"\\t\",\n",
" sep=\" \",\n",
" skipinitialspace=True,\n",
" )\n",
" dataset = raw_dataset.dropna()\n",
" dataset[\"Origin\"] = dataset[\"Origin\"].map(\n",
" lambda x: {1: \"USA\", 2: \"Europe\", 3: \"Japan\"}.get(x)\n",
" )\n",
" dataset = pd.get_dummies(dataset, prefix=\"\", prefix_sep=\"\")\n",
" return dataset\n",
"\n",
"\n",
"dataset = read_data(\n",
" \"http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Y06J7A7yU21t"
},
"source": [
"Split dataset for training and testing."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "p5JBCBKyH-NC"
},
"outputs": [],
"source": [
"def train_test_split(dataset, split_frac=0.8, random_state=0):\n",
" train_dataset = dataset.sample(frac=split_frac, random_state=random_state)\n",
" test_dataset = dataset.drop(train_dataset.index)\n",
" train_labels = train_dataset.pop(\"MPG\")\n",
" test_labels = test_dataset.pop(\"MPG\")\n",
"\n",
" return train_dataset, test_dataset, train_labels, test_labels\n",
"\n",
"\n",
"train_dataset, test_dataset, train_labels, test_labels = train_test_split(dataset)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "gaNNTFPaU7KT"
},
"source": [
"Normalize the features in the dataset for better model performance."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "VGq5QCoyIEWJ"
},
"outputs": [],
"source": [
"def normalize_dataset(train_dataset, test_dataset):\n",
" train_stats = train_dataset.describe()\n",
" train_stats = train_stats.transpose()\n",
"\n",
" def norm(x):\n",
" return (x - train_stats[\"mean\"]) / train_stats[\"std\"]\n",
"\n",
" normed_train_data = norm(train_dataset)\n",
" normed_test_data = norm(test_dataset)\n",
"\n",
" return normed_train_data, normed_test_data\n",
"\n",
"\n",
"normed_train_data, normed_test_data = normalize_dataset(train_dataset, test_dataset)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UBXUgxgqA_GB"
},
"source": [
"### Define ML model and training function"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "66odBYKrIN4q"
},
"outputs": [],
"source": [
"def train(\n",
" train_data,\n",
" train_labels,\n",
" num_units=64,\n",
" activation=\"relu\",\n",
" dropout_rate=0.0,\n",
" validation_split=0.2,\n",
" epochs=1000,\n",
"):\n",
"\n",
" model = Sequential(\n",
" [\n",
" layers.Dense(\n",
" num_units,\n",
" activation=activation,\n",
" input_shape=[len(train_dataset.keys())],\n",
" ),\n",
" layers.Dropout(rate=dropout_rate),\n",
" layers.Dense(num_units, activation=activation),\n",
" layers.Dense(1),\n",
" ]\n",
" )\n",
"\n",
" model.compile(loss=\"mse\", optimizer=\"adam\", metrics=[\"mae\", \"mse\"])\n",
" print(model.summary())\n",
"\n",
" history = model.fit(\n",
" train_data, train_labels, epochs=epochs, validation_split=validation_split\n",
" )\n",
"\n",
" return model, history"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "O8XJZB3gR8eL"
},
"source": [
"### Initialize the Vertex AI SDK for Python and create an Experiment\n",
"\n",
"Initialize the *client* for Vertex AI and create an experiment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "o_wnT10RJ7-W"
},
"outputs": [],
"source": [
"aiplatform.init(project=PROJECT_ID, location=LOCATION, experiment=EXPERIMENT_NAME)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "u-iTnzt3B6Z_"
},
"source": [
"### Start several model training runs\n",
"\n",
"Training parameters and metrics are logged for each run."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "i2wnpu8_7JfV"
},
"outputs": [],
"source": [
"parameters = [\n",
" {\"num_units\": 16, \"epochs\": 3, \"dropout_rate\": 0.1},\n",
" {\"num_units\": 16, \"epochs\": 10, \"dropout_rate\": 0.1},\n",
" {\"num_units\": 16, \"epochs\": 10, \"dropout_rate\": 0.2},\n",
" {\"num_units\": 32, \"epochs\": 10, \"dropout_rate\": 0.1},\n",
" {\"num_units\": 32, \"epochs\": 10, \"dropout_rate\": 0.2},\n",
"]\n",
"\n",
"for i, params in enumerate(parameters):\n",
" aiplatform.start_run(run=f\"auto-mpg-lcl-run-{i}\")\n",
" aiplatform.log_params(params)\n",
" model, history = train(\n",
" normed_train_data,\n",
" train_labels,\n",
" num_units=params[\"num_units\"],\n",
" activation=\"relu\",\n",
" epochs=params[\"epochs\"],\n",
" dropout_rate=params[\"dropout_rate\"],\n",
" )\n",
"\n",
" for metric, values in history.history.items():\n",
" try:\n",
" aiplatform.log_metrics({metric: values[-1]})\n",
" except:\n",
" aiplatform.log_metrics({metric: 0.0})\n",
"\n",
" loss, mae, mse = model.evaluate(normed_test_data, test_labels, verbose=2)\n",
" try:\n",
" aiplatform.log_metrics({\"eval_loss\": loss, \"eval_mae\": mae, \"eval_mse\": mse})\n",
" except:\n",
" aiplatform.log_metrics({\"eval_loss\": 0.0, \"eval_mae\": 0.0, \"eval_mse\": 0.0})"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jZLrJZTfL7tE"
},
"source": [
"### Extract parameters and metrics into a dataframe for analysis"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "A1PqKxlpOZa2"
},
"source": [
"You can also extract all parameters and metrics associated with any experiment into a dataframe for further analysis."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "jbRf1WoH_vbY"
},
"outputs": [],
"source": [
"experiment_df = aiplatform.get_experiment_df()\n",
"experiment_df"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EYuYgqVCMKU1"
},
"source": [
"### Visualizing an experiment's parameters and metrics"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "r8orCj8iJuO1"
},
"outputs": [],
"source": [
"plt.rcParams[\"figure.figsize\"] = [15, 5]\n",
"\n",
"ax = pd.plotting.parallel_coordinates(\n",
" experiment_df.reset_index(level=0),\n",
" \"run_name\",\n",
" cols=[\n",
" \"param.num_units\",\n",
" \"param.dropout_rate\",\n",
" \"param.epochs\",\n",
" \"metric.loss\",\n",
" \"metric.val_loss\",\n",
" \"metric.eval_loss\",\n",
" ],\n",
" color=[\"blue\", \"green\", \"pink\", \"red\"],\n",
")\n",
"ax.set_yscale(\"symlog\")\n",
"ax.legend(bbox_to_anchor=(1.0, 0.5))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "WTHvPMweMlP1"
},
"source": [
"## Visualizing experiments in Cloud Console"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "F19_5lw0MqXv"
},
"source": [
"Run the following to get the URL of Vertex AI experiments for your project.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "GmN9vE9pqqzt"
},
"outputs": [],
"source": [
"print(\"Vertex AI Experiments:\")\n",
"print(\n",
" f\"https://console.cloud.google.com/ai/platform/experiments/experiments?folder=&organizationId=&project={PROJECT_ID}\"\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TpV-iwP9qw9c"
},
"source": [
"## Cleaning up\n",
"\n",
"delete the individual resources you created in this tutorial:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "d02bde73377a"
},
"outputs": [],
"source": [
"from google.cloud import aiplatform\n",
"\n",
"# delete experiment and runs associated with experiment\n",
"experiment_name = (EXPERIMENT_NAME,)\n",
"project = (PROJECT_ID,)\n",
"location = (LOCATION,)\n",
"delete_backing_tensorboard_runs = (True,)\n",
"\n",
"experiment = aiplatform.Experiment(\n",
" experiment_name=EXPERIMENT_NAME, project=PROJECT_ID, location=LOCATION\n",
")\n",
"\n",
"experiment.delete(delete_backing_tensorboard_runs=delete_backing_tensorboard_runs)"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"name": "sdk-metric-parameter-tracking-for-locally-trained-models.ipynb",
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}