notebooks/official/workbench/predictive_maintainance/predictive_maintenance_usecase.ipynb (1,238 lines of code) (raw):

{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "id": "18ebbd838e32" }, "outputs": [], "source": [ "# Copyright 2022 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "aef73cfa8725" }, "source": [ "# Predictive maintenance using Vertex AI\n", "\n", "<table align=\"left\">\n", " <td style=\"text-align: center\">\n", " <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/workbench/predictive_maintainance/predictive_maintenance_usecase.ipynb\">\n", " <img src=\"https://cloud.google.com/ml-engine/images/colab-logo-32px.png\" alt=\"Google Colaboratory logo\"><br> Open in Colab\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fvertex-ai-samples%2Fmain%2Fnotebooks%2Fofficial%2Fworkbench%2Fpredictive_maintainance%2Fpredictive_maintenance_usecase.ipynb\">\n", " <img width=\"32px\" src=\"https://cloud.google.com/ml-engine/images/colab-enterprise-logo-32px.png\" alt=\"Google Cloud Colab Enterprise logo\"><br> Open in Colab Enterprise\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", "<a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/official/workbench/predictive_maintainance/predictive_maintenance_usecase.ipynb\" target='_blank'>\n", " <img src=\"https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32\" alt=\"Vertex AI logo\"><br> Open in Workbench\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/workbench/predictive_maintainance/predictive_maintenance_usecase.ipynb\">\n", " <img src=\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt=\"GitHub logo\"><br> View on GitHub\n", " </a>\n", " </td>\n", "</table>\n", "<br/><br/><br/>\n", "\n", "\n", "## Table of contents\n", "* [Overview](#section-1)\n", "* [Objective](#section-2)\n", "* [Dataset](#section-3)\n", "* [Costs](#section-4)\n", "* [Data analysis](#section-5)\n", "* [Fit a regression model](#section-6)\n", "* [Evaluate the trained model](#section-7)\n", "* [Save the model](#section-8)\n", "* [Running a notebook end-to-end using the executor](#section-9)\n", "* [Hosting the model on Vertex AI](#section-10)\n", " * [Create an endpoint](#section-11)\n", " * [Deploy the model to the created endpoint](#section-12)\n", " * [Test calling the endpoint](#section-13)\n", "* [Clean up](#section-14)" ] }, { "cell_type": "markdown", "metadata": { "id": "e10c5167a061" }, "source": [ "## Overview\n", "<a name=\"section-1\"></a>\n", "\n", "In this notebook, you go through a predictive maintenance usecase on industrial data using machine learning techniques, deploy the machine learning model on Vertex AI, and automate the workflow using the executor feature of Vertex AI Workbench.\n", "\n", "*Note: This notebook file is developed to run in a [Vertex AI Workbench managed notebooks](https://console.cloud.google.com/vertex-ai/workbench/list/managed) instance using the XGBoost (Local) kernel. Some components of this notebook may not work in other notebook environments.*\n", "\n", "Learn more about [Vertex AI Workbench](https://cloud.google.com/vertex-ai/docs/workbench/introduction) and [Vertex AI training](https://cloud.google.com/vertex-ai/docs/training/overview)." ] }, { "cell_type": "markdown", "metadata": { "id": "fead9e83ebd7" }, "source": [ "### Objective\n", "<a name=\"section-2\"></a>\n", "\n", "In this tutorial, you learn how to use the executor feature of Vertex AI Workbench to automate a workflow to train and deploy a model.\n", "\n", "This tutorial uses the following Google Cloud ML services:\n", "\n", "- Vertex AI training\n", "- Vertex AI model evaluation\n", "\n", "The steps performed are:\n", "\n", "- Loading the required dataset from a Cloud Storage bucket.\n", "- Analyzing the fields present in the dataset.\n", "- Selecting the required data for the predictive maintenance model.\n", "- Training an XGBoost regression model for predicting the remaining useful life.\n", "- Evaluating the model.\n", "- Running the notebook end-to-end as a training job using Executor.\n", "- Deploying the model on Vertex AI.\n", "- Clean up." ] }, { "cell_type": "markdown", "metadata": { "id": "a71f4d96bf80" }, "source": [ "### Dataset\n", "<a name=\"section-3\"></a>\n", "\n", "The dataset used in this notebook is a part of the [NASA Turbofan Engine Degradation Simulation dataset](https://ti.arc.nasa.gov/tech/dash/groups/pcoe/prognostic-data-repository/), which consists of simulated time-series data for four sets of fleet engines under different combinations of operational conditions and fault modes. A version of this dataset which is saved to a public Cloud Storage bucket is used in this notebook. In this notebook, one of the engine's simulated data (FD001) is used to analyze and train a model that can predict the engine's remaining useful life." ] }, { "cell_type": "markdown", "metadata": { "id": "36c53c95b4b9" }, "source": [ "### Costs\n", "<a name=\"section-4\"></a>\n", "\n", "This tutorial uses the following billable components of Google Cloud:\n", "\n", "- Vertex AI\n", "- Cloud Storage\n", "\n", "Learn about [Vertex AI\n", "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", "Calculator](https://cloud.google.com/products/calculator/)\n", "to generate a cost estimate based on your projected usage." ] }, { "cell_type": "markdown", "metadata": { "id": "629f52f6efe1" }, "source": [ "### Kernel selection\n", "Select <b>XGBoost</b> kernel while running this notebook on Vertex AI Workbench's managed instances. Otherwise, ensure that the following libraries are installed in the environment where this notebook is being run.\n", "- XGBoost\n", "- Pandas\n", "- Seaborn\n", "- Sklearn\n", "\n", "Along with the above libraries, th`e following google-cloud libraries are also used in this notebook.\n", "\n", "- google.cloud.aiplatform\n", "- google.cloud.storage" ] }, { "cell_type": "markdown", "metadata": { "id": "d1ea81ac77f0" }, "source": [ "## Get started" ] }, { "cell_type": "markdown", "metadata": { "id": "e5d353aa47ac" }, "source": [ "### Install Vertex AI SDK for Python and other required packages\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "69520a67e54c" }, "outputs": [], "source": [ "! pip3 install --quiet --upgrade google-cloud-aiplatform \\\n", " google-cloud-storage \\\n", " xgboost==1.7.1 \\\n", " seaborn \\\n", " scikit-learn \\\n", " fsspec \\\n", " gcsfs \\\n", " pandas -q" ] }, { "cell_type": "markdown", "metadata": { "id": "16220914acc5" }, "source": [ "### Restart runtime (Colab only)\n", "\n", "To use the newly installed packages, you must restart the runtime on Google Colab." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "157953ab28f0" }, "outputs": [], "source": [ "import sys\n", "\n", "if \"google.colab\" in sys.modules:\n", "\n", " import IPython\n", "\n", " app = IPython.Application.instance()\n", " app.kernel.do_shutdown(True)" ] }, { "cell_type": "markdown", "metadata": { "id": "c87a2a5d7e35" }, "source": [ "<div class=\"alert alert-block alert-warning\">\n", "<b>⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️</b>\n", "</div>\n" ] }, { "cell_type": "markdown", "metadata": { "id": "5dccb1c8feb6" }, "source": [ "### Authenticate your notebook environment (Colab only)\n", "\n", "Authenticate your environment on Google Colab.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "cc7251520a07" }, "outputs": [], "source": [ "import sys\n", "\n", "if \"google.colab\" in sys.modules:\n", "\n", " from google.colab import auth\n", "\n", " auth.authenticate_user()" ] }, { "cell_type": "markdown", "metadata": { "id": "c2fc3d7b6bfa" }, "source": [ "### Set Google Cloud project information and initialize Vertex AI SDK for Python\n", "\n", "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "be175254a715" }, "outputs": [], "source": [ "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}\n", "\n", "# set the project id\n", "! gcloud config set project $PROJECT_ID\n", "\n", "LOCATION = \"us-central1\" # @param {type: \"string\"}" ] }, { "cell_type": "markdown", "metadata": { "id": "e5755d1a554f" }, "source": [ "### Create a Cloud Storage bucket\n", "\n", "Create a storage bucket to store intermediate artifacts such as datasets." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "d2de92accb67" }, "outputs": [], "source": [ "BUCKET_URI = f\"gs://your-bucket-name-{PROJECT_ID}-unique\" # @param {type:\"string\"}" ] }, { "cell_type": "markdown", "metadata": { "id": "b72bfdf29dae" }, "source": [ "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "a4453435d115" }, "outputs": [], "source": [ "! gsutil mb -l {LOCATION} {BUCKET_URI}" ] }, { "cell_type": "markdown", "metadata": { "id": "06571eb4063b" }, "source": [ "#### UUID\n", "\n", "To avoid name collisions between users on created resources, create a UUID for each session instance. Append these UUIDs to the respective names of the resources created in this tutorial." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "697568e92bd6" }, "outputs": [], "source": [ "import random\n", "import string\n", "\n", "\n", "# Generate a uuid of a specifed length(default=8)\n", "def generate_uuid(length: int = 8) -> str:\n", " return \"\".join(random.choices(string.ascii_lowercase + string.digits, k=length))\n", "\n", "\n", "UUID = generate_uuid()" ] }, { "cell_type": "markdown", "metadata": { "id": "4c0f6aac282a" }, "source": [ "### Import the required libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "33f840806ed4" }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "\n", "%matplotlib inline\n", "import os\n", "\n", "import numpy as np\n", "import seaborn as sns\n", "import xgboost as xgb\n", "from google.cloud import aiplatform, storage\n", "from sklearn.metrics import mean_squared_error, r2_score\n", "from sklearn.model_selection import train_test_split" ] }, { "cell_type": "markdown", "metadata": { "id": "4df7ab06a373" }, "source": [ "## Initialize Vertex AI SDK for Python\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "32414d85ae7c" }, "outputs": [], "source": [ "aiplatform.init(project=PROJECT_ID, location=LOCATION)" ] }, { "cell_type": "markdown", "metadata": { "id": "08bcba53eb99" }, "source": [ "Load the data and check the data shape." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "2890f46a1907" }, "outputs": [], "source": [ "# load the data from the source\n", "INPUT_PATH = \"gs://cloud-samples-data/ai-platform-unified/datasets/tabular/predictive_maintenance.csv\" # data source\n", "raw_data = pd.read_csv(INPUT_PATH, sep=\" \", header=None)\n", "# check the data\n", "print(raw_data.shape)\n", "raw_data.head()" ] }, { "cell_type": "markdown", "metadata": { "id": "8cfc304d35b5" }, "source": [ "The data itself doesn't contain any feature names and thus needs its columns to be renamed. The data source already provides some data description. Apparently, the <b>ID</b> column represents the unit-number of the fleet-engine and <b>Cycle</b> represents the time in cycles. <b>OpSet1</b>,<b>Opset2</b> & <b>Opset3</b> represent the three operational settings that are described in the original data source and have a substantial effect on engine performance. The rest of the fields show sensor readings collected from 21 different sensors." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "34251837c120" }, "outputs": [], "source": [ "# name the columns (based on the original data source page)\n", "raw_data = raw_data[[f for f in range(0, 26)]]\n", "raw_data.columns = [\n", " \"ID\",\n", " \"Cycle\",\n", " \"OpSet1\",\n", " \"OpSet2\",\n", " \"OpSet3\",\n", " \"SensorMeasure1\",\n", " \"SensorMeasure2\",\n", " \"SensorMeasure3\",\n", " \"SensorMeasure4\",\n", " \"SensorMeasure5\",\n", " \"SensorMeasure6\",\n", " \"SensorMeasure7\",\n", " \"SensorMeasure8\",\n", " \"SensorMeasure9\",\n", " \"SensorMeasure10\",\n", " \"SensorMeasure11\",\n", " \"SensorMeasure12\",\n", " \"SensorMeasure13\",\n", " \"SensorMeasure14\",\n", " \"SensorMeasure15\",\n", " \"SensorMeasure16\",\n", " \"SensorMeasure17\",\n", " \"SensorMeasure18\",\n", " \"SensorMeasure19\",\n", " \"SensorMeasure20\",\n", " \"SensorMeasure21\",\n", "]\n", "raw_data.head()" ] }, { "cell_type": "markdown", "metadata": { "id": "da8c475ddc86" }, "source": [ "## Data Analysis\n", "<a name=\"section-5\"></a>\n", "The current dataset consists of timeseries data for various unit IDs. The data is represented in terms of cycles. Lets first see the distribution of number of cycles across the units." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "e18ef3931be5" }, "outputs": [], "source": [ "# plot the cycle count for each IDs\n", "raw_data[[\"ID\", \"Cycle\"]].groupby(by=[\"ID\"]).count().plot(kind=\"bar\", figsize=(12, 5))" ] }, { "cell_type": "markdown", "metadata": { "id": "43c3f01352ad" }, "source": [ "On an average, there seem to be around 225 cycles per each ID in the dataset. Next, lets check the data types of the fields and the number of null records in the data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "d83cddab2b28" }, "outputs": [], "source": [ "# check the data-types\n", "raw_data.info()" ] }, { "cell_type": "markdown", "metadata": { "id": "da698b792397" }, "source": [ "The data doesn't have any null records or any categorical fields. Next, lets check the numerical distribution of the fields." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "18d8731e68aa" }, "outputs": [], "source": [ "# check the numerical characteristics of the data\n", "raw_data.describe().T" ] }, { "cell_type": "markdown", "metadata": { "id": "b632592001d5" }, "source": [ "Features **OpSet3**, **SensorMeasure1**, **SensorMeasure10**, **SensorMeasure18** & **SensorMeasure19** seem to be constant throughout the dataset and thus can be eliminated. Apart from the fields that are constant throughout the data, fields that are correlated highly can also be considered for dropping. Having highly correlated fields in the data often leads to multi-collinearity situation which unnecessarily increases the size of feature-space even if it doesn't affect the accuracy much. Such fields can be identified through correlation-matrices and heatmaps." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "2d41a0e277b2" }, "outputs": [], "source": [ "# plot the correlation matrix\n", "plt.figure(figsize=(15, 10))\n", "cols = [\n", " i\n", " for i in raw_data.columns\n", " if i\n", " not in [\n", " \"ID\",\n", " \"Cycle\",\n", " \"OpSet3\",\n", " \"SensorMeasure1\",\n", " \"SensorMeasure10\",\n", " \"SensorMeasure18\",\n", " \"SensorMeasure19\",\n", " ]\n", "]\n", "corr_mat = raw_data[cols].corr()\n", "matrix = np.triu(corr_mat)\n", "\n", "sns.heatmap(corr_mat, annot=True, mask=matrix, fmt=\".1g\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "id": "284debdf4294" }, "source": [ "Fields **SensorMeasure7**, **SensorMeasure12**, **SensorMeasure20** & **SensorMeasure21** correlate highly with many other fields. These fields can be omitted. Further, **SensorMeasure8**, **SensorMeasure11** and **SensorMeasure4** seem highly correlated with each other and so any one of them, for example, **SensorMeasure4**, can be kept and the rest can be omitted." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "40477e592ce7" }, "outputs": [], "source": [ "cols = [\n", " i\n", " for i in cols\n", " if i\n", " not in [\n", " \"SensorMeasure7\",\n", " \"SensorMeasure12\",\n", " \"SensorMeasure20\",\n", " \"SensorMeasure21\",\n", " \"SensorMeasure8\",\n", " \"SensorMeasure11\",\n", " ]\n", "]\n", "corr_mat = raw_data[cols].corr()\n", "matrix = np.triu(corr_mat)\n", "plt.figure(figsize=(9, 5))\n", "sns.heatmap(corr_mat, annot=True, mask=matrix, fmt=\".1g\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "id": "8197cdef2cff" }, "source": [ "As the current objective is to predict the remaining useful life (RUL) of each unit (ID), the target variable needs to be identified. Since you're dealing with a timeseries data that represents the lifetime of a unit, remaining useful life of a unit can be calculated by subtracting the current cycle from the maximum cycle of that unit.\n", "\n", "\t\t\t\t\tRUL = Max. Cycle - Current Cycle \n", "## RUL calculation and Feature selection" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "b9cd2797cdae" }, "outputs": [], "source": [ "# get max-cycle of the ids\n", "cols = [\"ID\", \"Cycle\"] + cols\n", "max_cycles_df = (\n", " raw_data.groupby([\"ID\"], sort=False)[\"Cycle\"]\n", " .max()\n", " .reset_index()\n", " .rename(columns={\"Cycle\": \"MaxCycleID\"})\n", ")\n", "# merge back to original dataset\n", "FD001_df = pd.merge(raw_data, max_cycles_df, how=\"inner\", on=\"ID\")\n", "# calculate rul from max-cycle and current-cycle\n", "FD001_df[\"RUL\"] = FD001_df[\"MaxCycleID\"] - FD001_df[\"Cycle\"]" ] }, { "cell_type": "markdown", "metadata": { "id": "53a73de9ee27" }, "source": [ "To ensure that the target field is generated properly, the RUL field can be plotted." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ecd5aa0a130f" }, "outputs": [], "source": [ "# plot the RUL vs Cycles\n", "one_engine = []\n", "for i, r in FD001_df.iterrows():\n", " rul = r[\"RUL\"]\n", " one_engine.append(rul)\n", " if rul == 0:\n", " plt.plot(one_engine)\n", " one_engine = []\n", "\n", "plt.grid()" ] }, { "cell_type": "markdown", "metadata": { "id": "fc3b82355cdc" }, "source": [ "The above plot suggests that the RUL, in other words, the remaining cycles, is decreasing as the current cycle increases which is expected. Further, lets see the how the other fields relate to RUL in the current dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "30284ff6c8ab" }, "outputs": [], "source": [ "# plot feature vs the RUL\n", "def plot_feature(feature):\n", " plt.figure(figsize=(10, 5))\n", " for i in FD001_df[\"ID\"].unique():\n", " if i % 10 == 0: # only plot every 10th ID\n", " plt.plot(\"RUL\", feature, data=FD001_df[FD001_df[\"ID\"] == i])\n", " plt.xlim(250, 0) # reverse the x-axis so RUL counts down to zero\n", " plt.xticks(np.arange(0, 275, 25))\n", " plt.ylabel(feature)\n", " plt.xlabel(\"RUL\")\n", " plt.show()\n", "\n", "\n", "for i in cols:\n", " if i not in [\"ID\", \"Cycle\"]:\n", " plot_feature(i)" ] }, { "cell_type": "markdown", "metadata": { "id": "828154ec8d90" }, "source": [ "The following set of observations can be made from the outcome of the above cell :\n", "- Fields **SensorMeasure5** and **SensorMeasure16** don't show much variance with the RUL and seem constant all the time. Hence, they can be removed.\n", "- Fields **SensorMeasure2**, **SensorMeasure3**, **SensorMeasure4**, **SensorMeasure13**, **SensorMeasure15** & **SensorMeasure17** show a similar rising trend.\n", "- **SensorMeasure9** and **SensorMeasure14** show a similar trend.\n", "- **SensorMeasure6** shows a flatline most of the time except in a very few places and therefore can be ignored." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "28395edac8f7" }, "outputs": [], "source": [ "# remove the unnecessary fields\n", "cols = [\n", " i\n", " for i in cols\n", " if i not in [\"ID\", \"SensorMeasure5\", \"SensorMeasure6\", \"SensorMeasure16\"]\n", "]\n", "cols" ] }, { "cell_type": "markdown", "metadata": { "id": "cae198bd96ef" }, "source": [ "## Split the data into train and test\n", "\n", "Divide the dataset with the selected features into train and test sets." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "40a38e17b6b0" }, "outputs": [], "source": [ "# split data into train and test\n", "X = FD001_df[cols].copy()\n", "y = FD001_df[\"RUL\"].copy()\n", "\n", "# split the data into 70-30 ratio of train-test\n", "X_train, X_test, y_train, y_test = train_test_split(\n", " X, y, train_size=0.7, random_state=36\n", ")\n", "X_train.shape, y_train.shape, X_test.shape, y_test.shape" ] }, { "cell_type": "markdown", "metadata": { "id": "43a26d74c687" }, "source": [ "## Fit a regression model\n", "<a name=\"section-6\"></a>\n", "\n", "Initialize and train a regression model using the XGBoost library with the calculated RUL as the target feature." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "306a693a00ff" }, "outputs": [], "source": [ "model = xgb.XGBRegressor()\n", "model.fit(X_train, y_train)" ] }, { "cell_type": "markdown", "metadata": { "id": "ef2a6b567e33" }, "source": [ "## Evaluate the trained model\n", "<a name=\"section-7\"></a>\n", "\n", "Check the R2 scores of the model on train and test sets." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "e30af6e41081" }, "outputs": [], "source": [ "# print test R2 score\n", "y_train_pred = model.predict(X_train)\n", "train_score = r2_score(y_train, y_train_pred)\n", "y_test_pred = model.predict(X_test)\n", "test_score = r2_score(y_test, y_test_pred)\n", "print(\"Train score:\", train_score)\n", "print(\"Test score:\", test_score)" ] }, { "cell_type": "markdown", "metadata": { "id": "a9e00d61b71c" }, "source": [ "Check the RMSE errors on train and test sets." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "5e32f5bcd6b0" }, "outputs": [], "source": [ "# print train and test RMSEs\n", "train_error = mean_squared_error(y_train, y_train_pred, squared=False)\n", "test_error = mean_squared_error(y_test, y_test_pred, squared=False)\n", "print(\"Train error:\", train_error)\n", "print(\"Test error:\", test_error)" ] }, { "cell_type": "markdown", "metadata": { "id": "18e4cf950007" }, "source": [ "Plot the predicted values against the target values. The closer the plot to a straight line passing through origin with a unit slope, the better the model. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "5bde609ba3f9" }, "outputs": [], "source": [ "# plot the train and test predictions\n", "plt.scatter(y_train, y_train_pred)\n", "plt.xlabel(\"Target\")\n", "plt.ylabel(\"Prediction\")\n", "plt.title(\"Train\")\n", "plt.show()\n", "plt.scatter(y_test, y_test_pred)\n", "plt.xlabel(\"Target\")\n", "plt.ylabel(\"Prediction\")\n", "plt.title(\"Test\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "id": "cbbcd5de298f" }, "source": [ "## Save the model\n", "<a name=\"section-8\"></a>\n", "\n", "Save the model to a booster file." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "0fdd760d9a74" }, "outputs": [], "source": [ "# save the trained model to a local file \"model.bst\"\n", "FILE_NAME = \"model.bst\"\n", "model.save_model(FILE_NAME)" ] }, { "cell_type": "markdown", "metadata": { "id": "a83c6f381722" }, "source": [ "Copy the model to the cloud-storage bucket" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "81fa220c7fd4" }, "outputs": [], "source": [ "# Upload the saved model file to Cloud Storage\n", "BLOB_PATH = \"mfg_predictive_maintenance/\"\n", "BLOB_NAME = os.path.join(BLOB_PATH, FILE_NAME)\n", "bucket = storage.Client().bucket(BUCKET_URI[5:])\n", "blob = bucket.blob(BLOB_NAME)\n", "blob.upload_from_filename(FILE_NAME)" ] }, { "cell_type": "markdown", "metadata": { "id": "4bd88d7f4bbb" }, "source": [ "## Running a notebook end-to-end using executor\n", "<a name=\"section-9\"></a>\n", "\n", "**Note:** This section can only be considered when running this notebook on Managed instances from Vertex AI Workbench.\n", "### Automating the notebook execution\n", "All the steps followed until now can be run as a training job without using any additional code using the Vertex AI Workbench executor. The executor can help you run a notebook file from start to end, with your choice of the environment, machine type, input parameters, and other characteristics. After setting up an execution, the notebook is executed as a job in Vertex AI custom training. Your jobs can be monitored from the Executor pane in the left sidebar.\n", "\n", "<img src=\"images/executor.PNG\">\n", "\n", "The executor also lets you choose the environment and machine type while automating the runs similar to Vertex AI training jobs without switching to the training jobs UI. Apart from the custom container that replicates the existing kernel by default, pre-built environments like TensorFlow Enterprise, PyTorch, and others can also be selected to run the notebook. The required compute power can be specified by choosing from the list of machine types available, including GPUs.\n", "\n", "### Scheduled runs on executor\n", "\n", "Notebook runs can also be scheduled recurringly with the executor. To do so, select Schedule-based recurring executions as the run type instead of One-time execution. The frequency of the job and the time when it executes is provided when you create the execution.\n", "\n", "<img src=\"https://storage.googleapis.com/gweb-cloudblog-publish/images/7_Vertex_AI_Workbench.max-1100x1100.jpg\">\n", "\n", "### Parameterizing the variables\n", "\n", "The executor lets you run a notebook with different sets of input parameters. If you’ve added parameter tags to any of your notebook cells, you can pass in your parameter values to the executor. More about how to use this feature can be found on this [blog](https://cloud.google.com/blog/products/ai-machine-learning/schedule-and-execute-notebooks-with-vertex-ai-workbench).\n", "\n", "<img src=\"https://storage.googleapis.com/gweb-cloudblog-publish/images/6_Vertex_AI_Workbench.max-700x700.jpg\">\n" ] }, { "cell_type": "markdown", "metadata": { "id": "57522cb1d5e7" }, "source": [ "## Hosting the model on Vertex AI\n", "<a name=\"section-10\"></a>\n", "\n", "### Create a model resource\n", "\n", "The saved model from the Cloud Storage can be deployed easily using the Vertex AI SDK. To do so, first create a model resource." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "246206c9eedc" }, "outputs": [], "source": [ "ARTIFACT_GCS_PATH = f\"{BUCKET_URI}/{BLOB_PATH}\"" ] }, { "cell_type": "markdown", "metadata": { "id": "1aa75b3d4616" }, "source": [ "Give a display name to the Vertex AI model resource." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "02ca350dba6c" }, "outputs": [], "source": [ "# Set the model-dsiplay-name\n", "MODEL_DISPLAY_NAME = \"[your-model-display-name]\" # @param {type:\"string\"}\n", "\n", "# Otherwise, use the default name\n", "if (\n", " MODEL_DISPLAY_NAME == \"[your-model-display-name]\"\n", " or MODEL_DISPLAY_NAME is None\n", " or MODEL_DISPLAY_NAME == \"\"\n", "):\n", " MODEL_DISPLAY_NAME = \"pred_maint_model_\" + UUID\n", "\n", "print(MODEL_DISPLAY_NAME)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "49f1acde47e3" }, "outputs": [], "source": [ "# Create a Vertex AI model resource\n", "\n", "model = aiplatform.Model.upload(\n", " display_name=MODEL_DISPLAY_NAME,\n", " artifact_uri=ARTIFACT_GCS_PATH,\n", " serving_container_image_uri=\"us-docker.pkg.dev/vertex-ai/prediction/xgboost-cpu.1-7:latest\",\n", ")\n", "\n", "model.wait()\n", "\n", "print(model.display_name)\n", "print(model.resource_name)" ] }, { "cell_type": "markdown", "metadata": { "id": "7784e605fe20" }, "source": [ "### Create an Endpoint\n", "<a name=\"section-11\"></a>\n", "\n", "\n", "Next, create an endpoint resource for deploying the model." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "e1e0cd571992" }, "outputs": [], "source": [ "# Set the endpoint-dsiplay-name\n", "ENDPOINT_DISPLAY_NAME = \"[your-endpoint-display-name]\" # @param {type:\"string\"}\n", "\n", "# Otherwise, use the default name\n", "if (\n", " ENDPOINT_DISPLAY_NAME == \"[your-endpoint-display-name]\"\n", " or ENDPOINT_DISPLAY_NAME is None\n", " or ENDPOINT_DISPLAY_NAME == \"\"\n", "):\n", " ENDPOINT_DISPLAY_NAME = \"pred_maint_endpoint_\" + UUID\n", "\n", "print(ENDPOINT_DISPLAY_NAME)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "6ae2e3b9555e" }, "outputs": [], "source": [ "# Create the Endpoint resource\n", "endpoint = aiplatform.Endpoint.create(display_name=ENDPOINT_DISPLAY_NAME)\n", "\n", "print(endpoint.display_name)\n", "print(endpoint.resource_name)" ] }, { "cell_type": "markdown", "metadata": { "id": "08128843a059" }, "source": [ "### Deploy the model to the created Endpoint\n", "<a name=\"section-12\"></a>\n", "\n", "\n", "Configure the following parameters and deploy the model to the created endpoint.\n", "\n", "- `endpoint`: The `Endpoint` object created using Vertex AI SDK.\n", "- `deployed_model_display_name`: A display-name for the deployment.\n", "- `machine_type`: Type of the machine required for the deployment environment. See [here](https://cloud.google.com/vertex-ai/docs/predictions/configure-compute) for references." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "3c3e7597d233" }, "outputs": [], "source": [ "# deploy the model to the endpoint\n", "model.deploy(\n", " endpoint=endpoint,\n", " deployed_model_display_name=MODEL_DISPLAY_NAME + \"_deployment\",\n", " machine_type=\"n1-standard-2\",\n", ")\n", "\n", "model.wait()\n", "\n", "print(model.display_name)\n", "print(model.resource_name)" ] }, { "cell_type": "markdown", "metadata": { "id": "5d24eb85cf12" }, "source": [ "## Test calling the endpoint\n", "<a name=\"section-13\"></a>\n", "\n", "Send some sample data to the deployed model on the endpoint to get predictions." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "fb6b0fc29a16" }, "outputs": [], "source": [ "# get predictions on sample data\n", "instances = X_test.iloc[0:2].to_numpy().tolist()\n", "print(endpoint.predict(instances=instances).predictions)" ] }, { "cell_type": "markdown", "metadata": { "id": "3cec4549f32c" }, "source": [ "## Clean up\n", "<a name=\"section-14\"></a>\n", "\n", "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", "\n", "Otherwise, you can delete the individual resources you created in this tutorial:\n", "* Vertex AI Model\n", "* Vertex AI Endpoint\n", "* Cloud Storage bucket\n", "\n", "Set `delete_bucket` to **True** to delete the Cloud Storage bucket." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "737a7b8eb6d6" }, "outputs": [], "source": [ "# Undeploy all the models from the endpoint\n", "endpoint.undeploy_all()\n", "\n", "# Delete the endpoint resource\n", "endpoint.delete()\n", "\n", "# Delete the model resource\n", "model.delete()\n", "\n", "# Delete the Cloud Storage bucket\n", "delete_bucket = False\n", "if delete_bucket:\n", " ! gsutil -m rm -r $BUCKET_URI\n", "\n", "!rm -rf model.bst" ] } ], "metadata": { "colab": { "name": "predictive_maintenance_usecase.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }