notebooks/official/automl/automl-tabular-classification.ipynb

{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "id": "copyright" }, "outputs": [], "source": [ "# Copyright 2021 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "JAPoU8Sm5E6e" }, "source": [ "# Vertex AI SDK for Python: AutoML tabular training and prediction\n", "\n", "<table align=\"left\">\n", " <td style=\"text-align: center\">\n", " <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/automl/automl-tabular-classification.ipynb\">\n", " <img src=\"https://cloud.google.com/ml-engine/images/colab-logo-32px.png\" alt=\"Google Colaboratory logo\"><br> Open in Colab\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fvertex-ai-samples%2Fmain%2Fnotebooks%2Fofficial%2Fautoml%2Fautoml-tabular-classification.ipynb\">\n", " <img src=\"https://cloud.google.com/ml-engine/images/colab-enterprise-logo-32px.png\" alt=\"Google Cloud Colab Enterprise logo\"><br> Open in Colab Enterprise\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/automl/automl-tabular-classification.ipynb\">\n", " <img src=\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt=\"GitHub logo\"><br> View on GitHub\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", "<a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/official/automl/automl-tabular-classification.ipynb\" target='_blank'>\n", " <img src=\"https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n", " </a>\n", " </td>\n", "</table>\n", "<br/><br/><br/>" ] }, { "cell_type": "markdown", "metadata": { "id": "411c6c769293" }, "source": [ "## Overview\n", "\n", "This tutorial demonstrates how to use the Vertex AI Python client library to train and deploy a tabular classification model for online prediction.\n", "\n", "**Note**: you may incur charges for training, prediction, storage, or usage of other Google Cloud products in connection with testing this SDK.\n", "\n", "Learn more about [Classification for tabular data](https://cloud.google.com/vertex-ai/docs/tabular-data/classification-regression/overview)." ] }, { "cell_type": "markdown", "metadata": { "id": "tvgnzT1CKxrO" }, "source": [ "### Objective\n", "\n", "In this tutorial, you learn how to train and make predictions on an AutoML model based on a tabular dataset. Alternatively, you can train and make predictions on models by using the `gcloud` command-line tool or by using the Google Cloud Console.\n", "\n", "This tutorial uses the following Google Cloud ML services and resources:\n", "\n", "- Vertex AI\n", "- AutoML Tabular\n", "\n", "The steps performed include the following:\n", "\n", "- Create a Vertex AI model training job.\n", "- Train an AutoML Tabular model.\n", "- Deploy the model resource to a serving endpoint resource.\n", "- Make a prediction by sending data.\n", "- Undeploy the model resource." ] }, { "cell_type": "markdown", "metadata": { "id": "d87e05416046" }, "source": [ "### Dataset\n", "\n", "The dataset we are using is the PetFinder Dataset, available locally in Colab. To learn more about this dataset, visit https://www.kaggle.com/c/petfinder-adoption-prediction." ] }, { "cell_type": "markdown", "metadata": { "id": "5e2eba58ad71" }, "source": [ "### Costs \n", "\n", "This tutorial uses billable components of Google Cloud:\n", "\n", "* Vertex AI\n", "* Cloud Storage\n", "\n", "Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and \n", "[Cloud Storage pricing](https://cloud.google.com/storage/pricing), and use the \n", "[Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage." ] }, { "cell_type": "markdown", "metadata": { "id": "install_aip" }, "source": [ "## Installation\n", "\n", "Install the packages required for executing this notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "PyQmSRbKA8r-" }, "outputs": [], "source": [ "# Install the packages\n", "! pip3 install --quiet --upgrade google-cloud-aiplatform \\\n", " google-cloud-storage" ] }, { "cell_type": "markdown", "metadata": { "id": "5eec42e37bcf" }, "source": [ "### Restart runtime (Colab only)\n", "To use the newly installed packages, you must restart the runtime on Google Colab." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "dcc98768955f" }, "outputs": [], "source": [ "import sys\n", "\n", "if \"google.colab\" in sys.modules:\n", "\n", " import IPython\n", "\n", " app = IPython.Application.instance()\n", " app.kernel.do_shutdown(True)" ] }, { "cell_type": "markdown", "metadata": { "id": "4de1bd77992b" }, "source": [ "<div class=\"alert alert-block alert-warning\">,\n", "<b>⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️</b>,\n", "</div>" ] }, { "cell_type": "markdown", "metadata": { "id": "befa6ca14bc0" }, "source": [ "### Authenticate your notebook environment (Colab only)\n", "Authenticate your environment on Google Colab." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "7de6ef0fac42" }, "outputs": [], "source": [ "import sys\n", "\n", "if \"google.colab\" in sys.modules:\n", "\n", " from google.colab import auth\n", "\n", " auth.authenticate_user()" ] }, { "cell_type": "markdown", "metadata": { "id": "80b8daedb2c6" }, "source": [ "### Set Google Cloud project information\n", "To get started using Vertex AI, you must have an existing Google Cloud project. Learn more about [setting up a project and a development environment.](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "575f9339da1d" }, "outputs": [], "source": [ "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}\n", "LOCATION = \"us-central1\" # @param {type:\"string\"}" ] }, { "cell_type": "markdown", "metadata": { "id": "bucket:mbsdk" }, "source": [ "### Create a Cloud Storage bucket\n", "\n", "Create a storage bucket to store intermediate artifacts such as datasets." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "bucket" }, "outputs": [], "source": [ "BUCKET_URI = (\n", " f\"gs://your-bucket-name-unique-{PROJECT_ID}-unique\" # @param {type:\"string\"}\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "create_bucket" }, "source": [ "**If your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "create_bucket" }, "outputs": [], "source": [ "! gsutil mb -l $LOCATION $BUCKET_URI" ] }, { "cell_type": "markdown", "metadata": { "id": "d89a66b8923f" }, "source": [ "### Copy dataset into your Cloud Storage bucket" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "59a46204cddb" }, "outputs": [], "source": [ "IMPORT_FILE = \"petfinder-tabular-classification.csv\"\n", "! gsutil cp gs://cloud-samples-data/ai-platform-unified/datasets/tabular/{IMPORT_FILE} {BUCKET_URI}/data/\n", "\n", "gcs_source = f\"{BUCKET_URI}/data/{IMPORT_FILE}\"" ] }, { "cell_type": "markdown", "metadata": { "id": "Y9Uo3tifg1kx" }, "source": [ "### Import Vertex AI SDK for Python\n", "\n", "Import the Vertex AI SDK into your Python environment and initialize it." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "pRUOFELefqf1" }, "outputs": [], "source": [ "from google.cloud import aiplatform\n", "\n", "aiplatform.init(project=PROJECT_ID, location=LOCATION)" ] }, { "cell_type": "markdown", "metadata": { "id": "643dfd86b00d" }, "source": [ "## Tutorial\n", "\n", "Now you're ready to create your AutoML Tabular model." ] }, { "cell_type": "markdown", "metadata": { "id": "8f4f50a0112c" }, "source": [ "### Create a Managed tabular dataset from a CSV\n", "\n", "This section creates a dataset from a CSV file stored on your GCS bucket." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "f1eef64ee47b" }, "outputs": [], "source": [ "ds = dataset = aiplatform.TabularDataset.create(\n", " display_name=\"petfinder-tabular-dataset\",\n", " gcs_source=gcs_source,\n", ")\n", "\n", "ds.resource_name" ] }, { "cell_type": "markdown", "metadata": { "id": "ba5011d50ac7" }, "source": [ "### Launch a training job to create a model\n", "\n", "Once you've defined your training script, you'll create a model. The `run` function creates a training pipeline that trains and creates a model object. After the training pipeline completes, the `run` function returns the model object." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "24c2c081d683" }, "outputs": [], "source": [ "job = aiplatform.AutoMLTabularTrainingJob(\n", " display_name=\"train-petfinder-automl-1\",\n", " optimization_prediction_type=\"classification\",\n", " column_transformations=[\n", " {\"categorical\": {\"column_name\": \"Type\"}},\n", " {\"numeric\": {\"column_name\": \"Age\"}},\n", " {\"categorical\": {\"column_name\": \"Breed1\"}},\n", " {\"categorical\": {\"column_name\": \"Color1\"}},\n", " {\"categorical\": {\"column_name\": \"Color2\"}},\n", " {\"categorical\": {\"column_name\": \"MaturitySize\"}},\n", " {\"categorical\": {\"column_name\": \"FurLength\"}},\n", " {\"categorical\": {\"column_name\": \"Vaccinated\"}},\n", " {\"categorical\": {\"column_name\": \"Sterilized\"}},\n", " {\"categorical\": {\"column_name\": \"Health\"}},\n", " {\"numeric\": {\"column_name\": \"Fee\"}},\n", " {\"numeric\": {\"column_name\": \"PhotoAmt\"}},\n", " ],\n", ")\n", "\n", "# This takes about an hour to run\n", "model = job.run(\n", " dataset=ds,\n", " target_column=\"Adopted\",\n", " training_fraction_split=0.8,\n", " validation_fraction_split=0.1,\n", " test_fraction_split=0.1,\n", " model_display_name=\"adopted-prediction-model\",\n", " disable_early_stopping=False,\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "93a4d034c73b" }, "source": [ "### Deploy your model\n", "\n", "Before you use your model to make predictions, you need to deploy it to an endpoint. You can do this by calling the `deploy` function on the model resource. This function does two things:\n", "\n", "1. Creates an endpoint resource to which the model resource is deployed.\n", "2. Deploys the model resource to the endpoint resource.\n", "\n", "Deploy your model.\n", "\n", "### NOTE: Wait until the model **FINISHES** deployment before proceeding to prediction." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "a371544057d9" }, "outputs": [], "source": [ "endpoint = model.deploy(\n", " machine_type=\"n1-standard-4\",\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "fd44380b9ae3" }, "source": [ "### Predict on the endpoint\n" ] }, { "cell_type": "markdown", "metadata": { "id": "13f3e8aa27c0" }, "source": [ "* This sample instance is taken from an observation in which `Adopted` = **Yes**\n", "* Note that the values are all strings. Since the original data was in CSV format, everything is treated as a string. The transformations you defined when creating your `AutoMLTabularTrainingJob` inform Vertex AI to transform the inputs to their defined types.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "00c0d01dc8ae" }, "outputs": [], "source": [ "prediction = endpoint.predict(\n", " [\n", " {\n", " \"Type\": \"Cat\",\n", " \"Age\": \"3\",\n", " \"Breed1\": \"Tabby\",\n", " \"Gender\": \"Male\",\n", " \"Color1\": \"Black\",\n", " \"Color2\": \"White\",\n", " \"MaturitySize\": \"Small\",\n", " \"FurLength\": \"Short\",\n", " \"Vaccinated\": \"No\",\n", " \"Sterilized\": \"No\",\n", " \"Health\": \"Healthy\",\n", " \"Fee\": \"100\",\n", " \"PhotoAmt\": \"2\",\n", " }\n", " ]\n", ")\n", "\n", "print(prediction)" ] }, { "cell_type": "markdown", "metadata": { "id": "154258dfb12f" }, "source": [ "### Undeploy the model\n", "\n", "To undeploy your model resource from the serving endpoint resource, use the endpoint's `undeploy` method with the following parameter:\n", "\n", "- `deployed_model_id`: The model deployment identifier returned by the prediction service when the model resource is deployed. You can retrieve the `deployed_model_id` using the prediction object's `deployed_model_id` property." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "186856f896fc" }, "outputs": [], "source": [ "endpoint.undeploy(deployed_model_id=prediction.deployed_model_id)" ] }, { "cell_type": "markdown", "metadata": { "id": "d7d2aa967f46" }, "source": [ "# Cleaning up\n", "\n", "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", "\n", "Otherwise, you can delete the individual resources you created in this tutorial:\n", "\n", "- Training Job\n", "- Model\n", "- Endpoint\n", "- Cloud Storage Bucket\n", "\n", "**Note**: You must delete any model resources deployed to the endpoint resource before deleting the endpoint resource." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "1a9c201f8589" }, "outputs": [], "source": [ "# Warning: Setting this to true will delete everything in your bucket\n", "delete_bucket = False\n", "\n", "# Delete the training job\n", "job.delete()\n", "\n", "# Delete the model\n", "model.delete()\n", "\n", "# Delete the endpoint\n", "endpoint.delete()\n", "\n", "if delete_bucket:\n", " ! gsutil -m rm -r $BUCKET_URI" ] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "automl-tabular-classification.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }

notebooks/official/automl/automl-tabular-classification.ipynb (581 lines of code) (raw):