notebooks/official/automl/automl-tabular-classification.ipynb (581 lines of code) (raw):
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "copyright"
},
"outputs": [],
"source": [
"# Copyright 2021 Google LLC\n",
"#\n",
"# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "JAPoU8Sm5E6e"
},
"source": [
"# Vertex AI SDK for Python: AutoML tabular training and prediction\n",
"\n",
"<table align=\"left\">\n",
" <td style=\"text-align: center\">\n",
" <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/automl/automl-tabular-classification.ipynb\">\n",
" <img src=\"https://cloud.google.com/ml-engine/images/colab-logo-32px.png\" alt=\"Google Colaboratory logo\"><br> Open in Colab\n",
" </a>\n",
" </td>\n",
" <td style=\"text-align: center\">\n",
" <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fvertex-ai-samples%2Fmain%2Fnotebooks%2Fofficial%2Fautoml%2Fautoml-tabular-classification.ipynb\">\n",
" <img src=\"https://cloud.google.com/ml-engine/images/colab-enterprise-logo-32px.png\" alt=\"Google Cloud Colab Enterprise logo\"><br> Open in Colab Enterprise\n",
" </a>\n",
" </td>\n",
" <td style=\"text-align: center\">\n",
" <a href=\"https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/automl/automl-tabular-classification.ipynb\">\n",
" <img src=\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt=\"GitHub logo\"><br> View on GitHub\n",
" </a>\n",
" </td>\n",
" <td style=\"text-align: center\">\n",
"<a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/official/automl/automl-tabular-classification.ipynb\" target='_blank'>\n",
" <img src=\"https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n",
" </a>\n",
" </td>\n",
"</table>\n",
"<br/><br/><br/>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "411c6c769293"
},
"source": [
"## Overview\n",
"\n",
"This tutorial demonstrates how to use the Vertex AI Python client library to train and deploy a tabular classification model for online prediction.\n",
"\n",
"**Note**: you may incur charges for training, prediction, storage, or usage of other Google Cloud products in connection with testing this SDK.\n",
"\n",
"Learn more about [Classification for tabular data](https://cloud.google.com/vertex-ai/docs/tabular-data/classification-regression/overview)."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tvgnzT1CKxrO"
},
"source": [
"### Objective\n",
"\n",
"In this tutorial, you learn how to train and make predictions on an AutoML model based on a tabular dataset. Alternatively, you can train and make predictions on models by using the `gcloud` command-line tool or by using the Google Cloud Console.\n",
"\n",
"This tutorial uses the following Google Cloud ML services and resources:\n",
"\n",
"- Vertex AI\n",
"- AutoML Tabular\n",
"\n",
"The steps performed include the following:\n",
"\n",
"- Create a Vertex AI model training job.\n",
"- Train an AutoML Tabular model.\n",
"- Deploy the model resource to a serving endpoint resource.\n",
"- Make a prediction by sending data.\n",
"- Undeploy the model resource."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "d87e05416046"
},
"source": [
"### Dataset\n",
"\n",
"The dataset we are using is the PetFinder Dataset, available locally in Colab. To learn more about this dataset, visit https://www.kaggle.com/c/petfinder-adoption-prediction."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5e2eba58ad71"
},
"source": [
"### Costs \n",
"\n",
"This tutorial uses billable components of Google Cloud:\n",
"\n",
"* Vertex AI\n",
"* Cloud Storage\n",
"\n",
"Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and \n",
"[Cloud Storage pricing](https://cloud.google.com/storage/pricing), and use the \n",
"[Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "install_aip"
},
"source": [
"## Installation\n",
"\n",
"Install the packages required for executing this notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "PyQmSRbKA8r-"
},
"outputs": [],
"source": [
"# Install the packages\n",
"! pip3 install --quiet --upgrade google-cloud-aiplatform \\\n",
" google-cloud-storage"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5eec42e37bcf"
},
"source": [
"### Restart runtime (Colab only)\n",
"To use the newly installed packages, you must restart the runtime on Google Colab."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "dcc98768955f"
},
"outputs": [],
"source": [
"import sys\n",
"\n",
"if \"google.colab\" in sys.modules:\n",
"\n",
" import IPython\n",
"\n",
" app = IPython.Application.instance()\n",
" app.kernel.do_shutdown(True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4de1bd77992b"
},
"source": [
"<div class=\"alert alert-block alert-warning\">,\n",
"<b>⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️</b>,\n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "befa6ca14bc0"
},
"source": [
"### Authenticate your notebook environment (Colab only)\n",
"Authenticate your environment on Google Colab."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "7de6ef0fac42"
},
"outputs": [],
"source": [
"import sys\n",
"\n",
"if \"google.colab\" in sys.modules:\n",
"\n",
" from google.colab import auth\n",
"\n",
" auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "80b8daedb2c6"
},
"source": [
"### Set Google Cloud project information\n",
"To get started using Vertex AI, you must have an existing Google Cloud project. Learn more about [setting up a project and a development environment.](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "575f9339da1d"
},
"outputs": [],
"source": [
"PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}\n",
"LOCATION = \"us-central1\" # @param {type:\"string\"}"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bucket:mbsdk"
},
"source": [
"### Create a Cloud Storage bucket\n",
"\n",
"Create a storage bucket to store intermediate artifacts such as datasets."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "bucket"
},
"outputs": [],
"source": [
"BUCKET_URI = (\n",
" f\"gs://your-bucket-name-unique-{PROJECT_ID}-unique\" # @param {type:\"string\"}\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "create_bucket"
},
"source": [
"**If your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "create_bucket"
},
"outputs": [],
"source": [
"! gsutil mb -l $LOCATION $BUCKET_URI"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "d89a66b8923f"
},
"source": [
"### Copy dataset into your Cloud Storage bucket"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "59a46204cddb"
},
"outputs": [],
"source": [
"IMPORT_FILE = \"petfinder-tabular-classification.csv\"\n",
"! gsutil cp gs://cloud-samples-data/ai-platform-unified/datasets/tabular/{IMPORT_FILE} {BUCKET_URI}/data/\n",
"\n",
"gcs_source = f\"{BUCKET_URI}/data/{IMPORT_FILE}\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Y9Uo3tifg1kx"
},
"source": [
"### Import Vertex AI SDK for Python\n",
"\n",
"Import the Vertex AI SDK into your Python environment and initialize it."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "pRUOFELefqf1"
},
"outputs": [],
"source": [
"from google.cloud import aiplatform\n",
"\n",
"aiplatform.init(project=PROJECT_ID, location=LOCATION)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "643dfd86b00d"
},
"source": [
"## Tutorial\n",
"\n",
"Now you're ready to create your AutoML Tabular model."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "8f4f50a0112c"
},
"source": [
"### Create a Managed tabular dataset from a CSV\n",
"\n",
"This section creates a dataset from a CSV file stored on your GCS bucket."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "f1eef64ee47b"
},
"outputs": [],
"source": [
"ds = dataset = aiplatform.TabularDataset.create(\n",
" display_name=\"petfinder-tabular-dataset\",\n",
" gcs_source=gcs_source,\n",
")\n",
"\n",
"ds.resource_name"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ba5011d50ac7"
},
"source": [
"### Launch a training job to create a model\n",
"\n",
"Once you've defined your training script, you'll create a model. The `run` function creates a training pipeline that trains and creates a model object. After the training pipeline completes, the `run` function returns the model object."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "24c2c081d683"
},
"outputs": [],
"source": [
"job = aiplatform.AutoMLTabularTrainingJob(\n",
" display_name=\"train-petfinder-automl-1\",\n",
" optimization_prediction_type=\"classification\",\n",
" column_transformations=[\n",
" {\"categorical\": {\"column_name\": \"Type\"}},\n",
" {\"numeric\": {\"column_name\": \"Age\"}},\n",
" {\"categorical\": {\"column_name\": \"Breed1\"}},\n",
" {\"categorical\": {\"column_name\": \"Color1\"}},\n",
" {\"categorical\": {\"column_name\": \"Color2\"}},\n",
" {\"categorical\": {\"column_name\": \"MaturitySize\"}},\n",
" {\"categorical\": {\"column_name\": \"FurLength\"}},\n",
" {\"categorical\": {\"column_name\": \"Vaccinated\"}},\n",
" {\"categorical\": {\"column_name\": \"Sterilized\"}},\n",
" {\"categorical\": {\"column_name\": \"Health\"}},\n",
" {\"numeric\": {\"column_name\": \"Fee\"}},\n",
" {\"numeric\": {\"column_name\": \"PhotoAmt\"}},\n",
" ],\n",
")\n",
"\n",
"# This takes about an hour to run\n",
"model = job.run(\n",
" dataset=ds,\n",
" target_column=\"Adopted\",\n",
" training_fraction_split=0.8,\n",
" validation_fraction_split=0.1,\n",
" test_fraction_split=0.1,\n",
" model_display_name=\"adopted-prediction-model\",\n",
" disable_early_stopping=False,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "93a4d034c73b"
},
"source": [
"### Deploy your model\n",
"\n",
"Before you use your model to make predictions, you need to deploy it to an endpoint. You can do this by calling the `deploy` function on the model resource. This function does two things:\n",
"\n",
"1. Creates an endpoint resource to which the model resource is deployed.\n",
"2. Deploys the model resource to the endpoint resource.\n",
"\n",
"Deploy your model.\n",
"\n",
"### NOTE: Wait until the model **FINISHES** deployment before proceeding to prediction."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "a371544057d9"
},
"outputs": [],
"source": [
"endpoint = model.deploy(\n",
" machine_type=\"n1-standard-4\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fd44380b9ae3"
},
"source": [
"### Predict on the endpoint\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "13f3e8aa27c0"
},
"source": [
"* This sample instance is taken from an observation in which `Adopted` = **Yes**\n",
"* Note that the values are all strings. Since the original data was in CSV format, everything is treated as a string. The transformations you defined when creating your `AutoMLTabularTrainingJob` inform Vertex AI to transform the inputs to their defined types.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "00c0d01dc8ae"
},
"outputs": [],
"source": [
"prediction = endpoint.predict(\n",
" [\n",
" {\n",
" \"Type\": \"Cat\",\n",
" \"Age\": \"3\",\n",
" \"Breed1\": \"Tabby\",\n",
" \"Gender\": \"Male\",\n",
" \"Color1\": \"Black\",\n",
" \"Color2\": \"White\",\n",
" \"MaturitySize\": \"Small\",\n",
" \"FurLength\": \"Short\",\n",
" \"Vaccinated\": \"No\",\n",
" \"Sterilized\": \"No\",\n",
" \"Health\": \"Healthy\",\n",
" \"Fee\": \"100\",\n",
" \"PhotoAmt\": \"2\",\n",
" }\n",
" ]\n",
")\n",
"\n",
"print(prediction)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "154258dfb12f"
},
"source": [
"### Undeploy the model\n",
"\n",
"To undeploy your model resource from the serving endpoint resource, use the endpoint's `undeploy` method with the following parameter:\n",
"\n",
"- `deployed_model_id`: The model deployment identifier returned by the prediction service when the model resource is deployed. You can retrieve the `deployed_model_id` using the prediction object's `deployed_model_id` property."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "186856f896fc"
},
"outputs": [],
"source": [
"endpoint.undeploy(deployed_model_id=prediction.deployed_model_id)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "d7d2aa967f46"
},
"source": [
"# Cleaning up\n",
"\n",
"To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n",
"\n",
"Otherwise, you can delete the individual resources you created in this tutorial:\n",
"\n",
"- Training Job\n",
"- Model\n",
"- Endpoint\n",
"- Cloud Storage Bucket\n",
"\n",
"**Note**: You must delete any model resources deployed to the endpoint resource before deleting the endpoint resource."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "1a9c201f8589"
},
"outputs": [],
"source": [
"# Warning: Setting this to true will delete everything in your bucket\n",
"delete_bucket = False\n",
"\n",
"# Delete the training job\n",
"job.delete()\n",
"\n",
"# Delete the model\n",
"model.delete()\n",
"\n",
"# Delete the endpoint\n",
"endpoint.delete()\n",
"\n",
"if delete_bucket:\n",
" ! gsutil -m rm -r $BUCKET_URI"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [],
"name": "automl-tabular-classification.ipynb",
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}