notebooks/official/workbench/fraud_detection/fraud-detection-model.ipynb

{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "id": "5fcd3e4da897" }, "outputs": [], "source": [ "# Copyright 2022 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "05c670d35496" }, "source": [ "# Build a fraud detection model on Vertex AI\n", "\n", "<table align=\"left\">\n", "\n", " <td style=\"text-align: center\">\n", " <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/workbench/fraud_detection/fraud-detection-model.ipynb\">\n", " <img src=\"https://cloud.google.com/ml-engine/images/colab-logo-32px.png\" alt=\"Google Colaboratory logo\"><br> Open in Colab\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"\n", "https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fvertex-ai-samples%2Fmain%2Fnotebooks%2Fofficial%2Fworkbench%2Ffraud_detection%2Ffraud-detection-model.ipynb\">\n", " <img width=\"32px\" src=\"https://cloud.google.com/ml-engine/images/colab-enterprise-logo-32px.png\" alt=\"Google Cloud Colab Enterprise logo\"><br> Open in Colab Enterprise\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", "<a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/vertex-ai-samples/main/notebooks/official/workbench/fraud_detection/fraud-detection-model.ipynb\" target='_blank'>\n", " <img src=\"https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32\" alt=\"Vertex AI logo\"><br> Open in Workbench\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/workbench/fraud_detection/fraud-detection-model.ipynb\">\n", " <img src=\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt=\"GitHub logo\"><br> View on GitHub\n", " </a>\n", " </td>\n", "</table>" ] }, { "cell_type": "markdown", "metadata": { "id": "962e636b5cee" }, "source": [ "**_NOTE_**: This notebook has been tested in the following environment:\n", "\n", "* Python version = 3.9" ] }, { "cell_type": "markdown", "metadata": { "id": "b7fdf2b19b83" }, "source": [ "## Overview\n", "\n", "This tutorial shows you how to build, deploy, and analyze predictions from a simple [Random Forest](https://en.wikipedia.org/wiki/Random_forest) model using tools like scikit-learn, Vertex AI, and the [What-IF Tool (WIT)](https://cloud.google.com/ai-platform/prediction/docs/using-what-if-tool) on a synthetic fraud transaction dataset to solve a financial fraud detection problem.\n", "\n", "**Note:** The What-If tool widget used in this notebook only runs in a Colab environment. It isn't explicitly supported for Vertex AI user-managed notebooks instances. \n", "\n", "Learn more about [Vertex AI Workbench](https://cloud.google.com/vertex-ai/docs/workbench/introduction) and [Custom training](https://cloud.google.com/vertex-ai/docs/training/overview)." ] }, { "cell_type": "markdown", "metadata": { "id": "411d886b6144" }, "source": [ "### Objective\n", "\n", "This tutorial demonstrates data analysis and model-building using a synthetic financial dataset. The model is trained on identifying fraudulent cases among the transactions. Then, the trained model is deployed on a Vertex AI Endpoint and analyzed using the What-If Tool. The steps taken in this tutorial are as follows: \n", "\n", "This tutorial uses the following Google Cloud ML services and resources:\n", "\n", "- Vertex AI Model\n", "- Vertex AI Endpoint\n", "\n", "The steps performed include:\n", "\n", "- Installation of required libraries\n", "- Reading the dataset from a Cloud Storage bucket\n", "- Performing exploratory analysis on the dataset\n", "- Preprocessing the dataset\n", "- Training a random forest model using scikit-learn\n", "- Saving the model to a Cloud Storage bucket\n", "- Creating a Vertex AI model resource and deploying to an endpoint\n", "- Running the What-If Tool on test data\n", "- Un-deploying the model and cleaning up the model resources" ] }, { "cell_type": "markdown", "metadata": { "id": "3037523e7523" }, "source": [ "### Dataset\n", "\n", "\n", "The dataset used in this tutorial is publicly available at Kaggle. See [Synthetic Financial Datasets For Fraud Detection](https://www.kaggle.com/ealaxi/paysim1)." ] }, { "cell_type": "markdown", "metadata": { "id": "65f7cca50e5c" }, "source": [ "### Costs\n", "\n", "\n", "This tutorial uses billable components of Google Cloud:\n", "\n", "* Vertex AI\n", "* Cloud Storage\n", "\n", "Learn about [Vertex AI\n", "pricing](https://cloud.google.com/vertex-ai/pricing) and [Cloud Storage\n", "pricing](https://cloud.google.com/storage/pricing), and use the [Pricing\n", "Calculator](https://cloud.google.com/products/calculator/)\n", "to generate a cost estimate based on your projected usage. " ] }, { "cell_type": "markdown", "metadata": { "id": "d1ea81ac77f0" }, "source": [ "## Get started" ] }, { "cell_type": "markdown", "metadata": { "id": "e5d353aa47ac" }, "source": [ "### Install Vertex AI SDK for Python and other required packages\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "172533a994ad" }, "outputs": [], "source": [ "! pip3 install --upgrade --quiet google-cloud-aiplatform \\\n", " witwidget \\\n", " fsspec \\\n", " gcsfs\n", "! pip3 install --quiet scikit-learn==1.2 \\\n", " protobuf==3.20.1" ] }, { "cell_type": "markdown", "metadata": { "id": "16220914acc5" }, "source": [ "### Restart runtime (Colab only)\n", "\n", "To use the newly installed packages, you must restart the runtime on Google Colab." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "157953ab28f0" }, "outputs": [], "source": [ "import sys\n", "\n", "if \"google.colab\" in sys.modules:\n", "\n", " import IPython\n", "\n", " app = IPython.Application.instance()\n", " app.kernel.do_shutdown(True)" ] }, { "cell_type": "markdown", "metadata": { "id": "c87a2a5d7e35" }, "source": [ "<div class=\"alert alert-block alert-warning\">\n", "<b>⚠️ The kernel is going to restart. Wait until it's finished before continuing to the next step. ⚠️</b>\n", "</div>\n" ] }, { "cell_type": "markdown", "metadata": { "id": "5dccb1c8feb6" }, "source": [ "### Authenticate your notebook environment (Colab only)\n", "\n", "Authenticate your environment on Google Colab.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "cc7251520a07" }, "outputs": [], "source": [ "import sys\n", "\n", "if \"google.colab\" in sys.modules:\n", "\n", " from google.colab import auth\n", "\n", " auth.authenticate_user()" ] }, { "cell_type": "markdown", "metadata": { "id": "c2fc3d7b6bfa" }, "source": [ "### Set Google Cloud project information and initialize Vertex AI SDK for Python\n", "\n", "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "be175254a715" }, "outputs": [], "source": [ "PROJECT_ID = \"[your-project-id]\" # @param {type:\"string\"}\n", "\n", "# set the project id\n", "! gcloud config set project $PROJECT_ID\n", "\n", "LOCATION = \"us-central1\" # @param {type: \"string\"}" ] }, { "cell_type": "markdown", "metadata": { "id": "e5755d1a554f" }, "source": [ "### Create a Cloud Storage bucket\n", "\n", "Create a storage bucket to store intermediate artifacts such as datasets." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "d2de92accb67" }, "outputs": [], "source": [ "BUCKET_URI = f\"gs://your-bucket-name-{PROJECT_ID}-unique\" # @param {type:\"string\"}" ] }, { "cell_type": "markdown", "metadata": { "id": "b72bfdf29dae" }, "source": [ "**Only if your bucket doesn't already exist**: Run the following cell to create your Cloud Storage bucket." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "a4453435d115" }, "outputs": [], "source": [ "! gsutil mb -l {LOCATION} -p {PROJECT_ID} {BUCKET_URI}" ] }, { "cell_type": "markdown", "metadata": { "id": "4bf0287233a6" }, "source": [ "### Import libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "e4c19cf811b7" }, "outputs": [], "source": [ "import os\n", "import pickle\n", "import sys\n", "import warnings\n", "\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "from google.cloud import aiplatform, storage\n", "from IPython.display import display\n", "from sklearn.ensemble import RandomForestClassifier\n", "from sklearn.metrics import (average_precision_score, classification_report,\n", " confusion_matrix, f1_score)\n", "from sklearn.model_selection import train_test_split\n", "from witwidget.notebook.visualization import WitConfigBuilder, WitWidget\n", "\n", "warnings.filterwarnings(\"ignore\")" ] }, { "cell_type": "markdown", "metadata": { "id": "5abe40a9d335" }, "source": [ "### Initialize Vertex AI SDK for Python\n", "\n", "Initialize the Vertex AI SDK for Python for your project." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "6a16f0d6e5a1" }, "outputs": [], "source": [ "aiplatform.init(project=PROJECT_ID, location=LOCATION, staging_bucket=BUCKET_URI)" ] }, { "cell_type": "markdown", "metadata": { "id": "fdcb614c716f" }, "source": [ "## Load dataset\n", "\n", "Load the dataset from the public csv file path using Pandas." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ef87c3afe3a3" }, "outputs": [], "source": [ "# set the dataset path\n", "DATASET_SOURCE_PATH = \"gs://cloud-samples-data/vertex-ai/managed_notebooks/fraud_detection/fraud_detection_data.csv\"\n", "# read the csv data using pandas\n", "df = pd.read_csv(DATASET_SOURCE_PATH)" ] }, { "cell_type": "markdown", "metadata": { "id": "5467471277e9" }, "source": [ "## Analyze the dataset\n", "Take a quick look at the dataset and the number of rows." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "2439f8994596" }, "outputs": [], "source": [ "# print the shape of dataframe\n", "print(\"shape : \", df.shape)\n", "# display the dataframe\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": { "id": "677d10ff3a8f" }, "source": [ "Check for null values." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "f1cf29145731" }, "outputs": [], "source": [ "# print the total null count per column\n", "df.isnull().sum()" ] }, { "cell_type": "markdown", "metadata": { "id": "7841d9921c72" }, "source": [ "Check the type of transactions involved and total amount associated with each type." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "0af58be0e70d" }, "outputs": [], "source": [ "# check value counts for type\n", "print(df.type.value_counts())\n", "# show total amount per type as a bar chart\n", "var = df[[\"type\", \"amount\"]].groupby(\"type\").sum()\n", "var.plot(kind=\"bar\")\n", "plt.title(\"Total amount per transaction type\")\n", "plt.xlabel(\"Type of Transaction\")\n", "plt.ylabel(\"Amount\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "id": "3ea666cc9ea7" }, "source": [ "## Working with imbalanced data\n", "\n", "Althuogh the outcome variable \"isFraud\" seems to be very imbalanced, a base model can be trained on it to check the quality of fraudulent transactions in the data. If needed, counter measures like undersampling of majority class or oversampling of the minority class can be considered." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "62c36cf345a4" }, "outputs": [], "source": [ "# Count number of fraudulent/non-fraudulent transactions\n", "df.isFraud.value_counts()" ] }, { "cell_type": "markdown", "metadata": { "id": "6fa96dc368fb" }, "source": [ "Show the percentage of fraudulent transactions detected as a pie chart." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "4792737cfe01" }, "outputs": [], "source": [ "# plot the percentage of frauds as a pie chart\n", "piedata = df[[\"isFlaggedFraud\", \"isFraud\"]].groupby([\"isFlaggedFraud\"]).sum()\n", "f, axes = plt.subplots(1, 1, figsize=(6, 6))\n", "axes.set_title(\"% of fraud transaction detected\")\n", "piedata.plot(\n", " kind=\"pie\", y=\"isFraud\", ax=axes, fontsize=14, shadow=False, autopct=\"%1.1f%%\"\n", ")\n", "axes.set_ylabel(\"\")\n", "plt.legend(loc=\"upper left\", labels=[\"Not Detected\", \"Detected\"])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "id": "6d2de35a680d" }, "source": [ "## Prepare data for modeling\n", "To prepare the dataset for training, a few columns need to be dropped that contain unique data ('nameOrig','nameDest') and redundant fields ('isFlaggedFraud'). The categorical field \"type\" which describes the type of transaction and is important for fraud detection needs to be one-hot encoded." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "6f3ff2800803" }, "outputs": [], "source": [ "# drop the unnecessary fields\n", "df.drop([\"nameOrig\", \"nameDest\", \"isFlaggedFraud\"], axis=1, inplace=True)\n", "# encode the \"type\" field\n", "X = pd.concat([df.drop(\"type\", axis=1), pd.get_dummies(df[\"type\"])], axis=1)\n", "X.head()" ] }, { "cell_type": "markdown", "metadata": { "id": "32d8675912f6" }, "source": [ "Remove the outcome variable from the training data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "d28694b6532e" }, "outputs": [], "source": [ "# copy the target data\n", "y = X[[\"isFraud\"]]\n", "# remove the target field from the features\n", "X = X.drop([\"isFraud\"], axis=1)" ] }, { "cell_type": "markdown", "metadata": { "id": "a86d250f6e89" }, "source": [ "Split the data and assign 70% for training and 30% for testing. \n", "\n", "For splitting, you specify the following parameters to Sklearn's `train_test_split` method:\n", "\n", "- `*arrays`: The feature array(X) and the target array(y).\n", "- `test_size`: Percentage(float) or number(integer) of test samples.\n", "- `random_state`: Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls.\n", "- `stratify`: If none, no stratified sampling is performed.\n", "\n", "As the data is imbalanced, you use stratified sampling while splitting. Learn more about [stratified sampling and other parameters for train-test-splitting](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "9a1c56f01b4b" }, "outputs": [], "source": [ "# split the data into train and test\n", "X_train, X_test, y_train, y_test = train_test_split(\n", " X, y, test_size=0.3, random_state=42, stratify=y\n", ")\n", "# check the data shapes\n", "print(X_train.shape, X_test.shape)" ] }, { "cell_type": "markdown", "metadata": { "id": "c2755e0dcfa2" }, "source": [ "## Fit a Random Forest model\n", "\n", "Fit a simple Random Forest classifier on the preprocessed training dataset.\n", "\n", "Note: Setting `n_jobs` to -1 while defining the `RandomForestClassifier` object allows it to parallelize the training process using all processors. \n", "\n", "Learn more about [Random Forest algorithm](https://en.wikipedia.org/wiki/Random_forest) and Sklearn's [RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "2e01c5fc5df7" }, "outputs": [], "source": [ "# create a randomforestclassifier object\n", "forest = RandomForestClassifier(n_jobs=-1, verbose=1)\n", "# fit the model on the data\n", "forest.fit(X_train, y_train)" ] }, { "cell_type": "markdown", "metadata": { "id": "dcd6f1e38b31" }, "source": [ "## Analyze the results\n", "\n", "Generate the prediction classes and the probabilty scores on test data.\n", "\n", "Evaluate the model using the following metrics:\n", "\n", "- `AP`: Average precision summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight.\n", "- F1 Score: The F1 score is the harmonic mean of the precision and recall.\n", "- Confusion matrix: Matrix indicating the true positives, true negatives, false positives and false negatives predicted.\n", "- Classification report: Sklearn's classification report is a text report showing the main classification metrics like precision, recall, f1score, accuracy, and weighted and macro averages of those metrics.\n", "\n", "Learn more about [Sklearn metrics](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "2cff9b77c9f7" }, "outputs": [], "source": [ "# predict probability of fraudulent transactions over test set\n", "y_prob = forest.predict_proba(X_test)\n", "# predict the classes over test set\n", "y_pred = forest.predict(X_test)\n", "# check the average precision score\n", "print(\"AP :\", (average_precision_score(y_test, y_prob[:, 1])))\n", "# check the f1-score\n", "print(\"F1 - score :\", (f1_score(y_test, y_pred)))\n", "# print the confusion matrix\n", "print(\"Confusion_matrix : \")\n", "print(confusion_matrix(y_test, y_pred))\n", "# print the classification report\n", "print(\"classification_report\")\n", "print(classification_report(y_test, y_pred))" ] }, { "cell_type": "markdown", "metadata": { "id": "5f9309a5c6de" }, "source": [ "Use `RandomForestClassifier`'s `feature_importances_` function to get a better understanding about which features are the most useful to the model." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "e4e8d572079c" }, "outputs": [], "source": [ "importances = forest.feature_importances_\n", "std = np.std([tree.feature_importances_ for tree in forest.estimators_], axis=0)\n", "forest_importances = pd.Series(importances, index=list(X_train))\n", "fig, ax = plt.subplots()\n", "forest_importances.plot.bar(yerr=std, ax=ax)\n", "ax.set_title(\"Feature Importance for Fraud Transaction Detection Model\")\n", "ax.set_ylabel(\"Importance\")\n", "fig.tight_layout()" ] }, { "cell_type": "markdown", "metadata": { "id": "f96d2120eaf7" }, "source": [ "## Save the model to Cloud Storage\n", "\n", "Save your model to a pickle file and then, upload your model to Cloud Storage bucket. The uploaded model path is later used for creating a model in the Vertex AI Model Registry.\n", "\n", "Note: You can also upload the model to Vertex AI Model Registry from your local environment using the latest Vertex AI SDK for Python." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "53b6d3053566" }, "outputs": [], "source": [ "# save the trained model to a local file\n", "LOCAL_FILE_NAME = \"model.pkl\"\n", "with open(LOCAL_FILE_NAME, \"wb\") as file:\n", " pickle.dump(forest, file)\n", "\n", "# Upload the saved model file to Cloud Storage\n", "BLOB_PATH = \"fraud-detect-model-path-unique\" # @param {type:\"string\"}\n", "BLOB_NAME = os.path.join(BLOB_PATH, LOCAL_FILE_NAME)\n", "\n", "bucket = storage.Client(PROJECT_ID).bucket(BUCKET_URI[5:])\n", "blob = bucket.blob(BLOB_NAME)\n", "blob.upload_from_filename(LOCAL_FILE_NAME)" ] }, { "cell_type": "markdown", "metadata": { "id": "624a66e36aef" }, "source": [ "## Create a model in Vertex AI\n", "\n", "Set the parameters required for model creation in Vertex AI Model Registry." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "5604b728b7bb" }, "outputs": [], "source": [ "# set model display name\n", "MODEL_DISPLAY_NAME = \"fraud-detection-model-unique\" # @param {type:\"string\"}\n", "# set the GCS path to the model artifact\n", "ARTIFACT_GCS_PATH = f\"{BUCKET_URI}/{BLOB_PATH}\"\n", "# set the prediction container uri\n", "SERVING_CONTAINER_IMAGE_URI = (\n", " \"us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-2:latest\"\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "fa7f96efe3c5" }, "source": [ "Create a model resouce in Vertex AI using the `Model.upload` method.\n", "\n", "Learn more about [Vertex AI Model Registry](https://cloud.google.com/vertex-ai/docs/model-registry/introduction)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "a6629488879c" }, "outputs": [], "source": [ "# create a Vertex AI model resource\n", "model = aiplatform.Model.upload(\n", " display_name=MODEL_DISPLAY_NAME,\n", " artifact_uri=ARTIFACT_GCS_PATH,\n", " serving_container_image_uri=SERVING_CONTAINER_IMAGE_URI,\n", ")\n", "# print the model's display name\n", "print(\"Display name:\\n\", model.display_name)\n", "# print the model's resource name\n", "print(\"Resource name:\\n\", model.resource_name)" ] }, { "cell_type": "markdown", "metadata": { "id": "208e1e07c9f6" }, "source": [ "## Create an endpoint\n", "\n", "Set the display name and create an endpoint for deploying the model.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "224016f69665" }, "outputs": [], "source": [ "# set the endpoint display name\n", "ENDPOINT_DISPLAY_NAME = \"fraud-detect-endpoint-unique\" # @param {type:\"string\"}\n", "# create the Endpoint\n", "endpoint = aiplatform.Endpoint.create(display_name=ENDPOINT_DISPLAY_NAME)\n", "# print the endpoint display name\n", "print(\"Display name:\\n\", endpoint.display_name)\n", "# print the endpoint resource name\n", "print(\"Resource name:\\n\", endpoint.resource_name)" ] }, { "cell_type": "markdown", "metadata": { "id": "fb18c0e6f2ce" }, "source": [ "### Deploy model to the endpoint\n", "\n", "Set the following parameters for endpoint deployment:\n", "\n", "- `endpoint`: The Vertex AI Endpoint resource created in the last step.\n", "- `deployed_model_display_name`: Display name for the model. If not provided, model's display name is used.\n", "- `machine_type`: Machine type required for serving the model on the endpoint." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "7fcb85a9c097" }, "outputs": [], "source": [ "# set the display name for the deployed model\n", "DEPLOYED_MODEL_NAME = \"fraud-detection-deployed-model\"\n", "# set the machine type for the endpoint\n", "MACHINE_TYPE = \"n1-standard-2\"" ] }, { "cell_type": "markdown", "metadata": { "id": "fbc1fa101b59" }, "source": [ "Deploy the model to the created endpoint." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "32a896d705ef" }, "outputs": [], "source": [ "# deploy the model to the endpoint\n", "model.deploy(\n", " endpoint=endpoint,\n", " deployed_model_display_name=DEPLOYED_MODEL_NAME,\n", " machine_type=MACHINE_TYPE,\n", ")\n", "# print the model display name\n", "print(model.display_name)\n", "# print the model resource name\n", "print(model.resource_name)" ] }, { "cell_type": "markdown", "metadata": { "id": "c70d82a5f6d5" }, "source": [ "## What-If Tool \n", "\n", "The What-If Tool can be used to analyze the model predictions on a test data. See a [brief introduction to the What-If Tool](https://pair-code.github.io/what-if-tool/). \n", "\n", "In this tutorial, the What-If Tool is configured and run with the locally trained model as well as the model deployed on Vertex AI Endpoint.\n", "\n", "[WitConfigBuilder](https://github.com/PAIR-code/what-if-tool/blob/master/witwidget/notebook/visualization.py#L30) provides the `set_ai_platform_model()` method to configure the What-If Tool with a model deployed as a version on AI Platform models. This feature currently supports AI Platform only but not Vertex AI models. However, there is also an option to pass a custom function for generating predictions through the `set_custom_predict_fn()` method where either the locally trained model or a function that returns predictions from a Vertex AI model can be passed." ] }, { "cell_type": "markdown", "metadata": { "id": "d0b3caf9b49e" }, "source": [ "### Prepare test samples\n", "\n", "Set some samples aside from the test data for both the available classes (Fraud/not-Fraud) to analyze the model using the What-If Tool." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "c07ef36a521a" }, "outputs": [], "source": [ "# set sample size\n", "SAMPLE = 10\n", "\n", "# collect samples for each class-label from the test data\n", "pos_samples = y_test[y_test[\"isFraud\"] == 1].sample(SAMPLE).index\n", "neg_samples = y_test[y_test[\"isFraud\"] == 0].sample(SAMPLE).index\n", "test_samples_y = pd.concat([y_test.loc[pos_samples], y_test.loc[neg_samples]])\n", "test_samples_X = X_test.loc[test_samples_y.index].copy()" ] }, { "cell_type": "markdown", "metadata": { "id": "0691c01e8fee" }, "source": [ "### Running the What-If Tool on the local model\n", "\n", "Build the What-IF tool widget using the local model's `predict_proba` function.\n", "\n", "The following step generates an interactive widget for analyzing the predictions. \n", "\n", "Note: This following cell only runs in a Colab environment where the What-If tool is supported." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "9cccdcf14cea" }, "outputs": [], "source": [ "# check for Colab environment\n", "IS_COLAB = \"google.colab\" in sys.modules\n", "\n", "# run what-if tool\n", "if IS_COLAB:\n", " # define target and labels\n", " TARGET_FEATURE = \"isFraud\"\n", " LABEL_VOCAB = [\"not-fraud\", \"fraud\"]\n", "\n", " # define the function to adjust the predictions\n", "\n", " def adjust_prediction(pred):\n", " return [1 - pred, pred]\n", "\n", " # Combine the features and labels into one array for the What-If Tool\n", " test_examples = np.hstack(\n", " (test_samples_X.to_numpy(), test_samples_y.to_numpy().reshape(-1, 1))\n", " )\n", "\n", " # Configure the WIT to run on the locally trained model\n", " config_builder = (\n", " WitConfigBuilder(\n", " test_examples.tolist(), test_samples_X.columns.tolist() + [\"isFraud\"]\n", " )\n", " .set_custom_predict_fn(forest.predict_proba)\n", " .set_target_feature(TARGET_FEATURE)\n", " .set_label_vocab(LABEL_VOCAB)\n", " )\n", "\n", " # display the WIT widget\n", " display(WitWidget(config_builder, height=600))" ] }, { "cell_type": "markdown", "metadata": { "id": "a97c4880ff33" }, "source": [ "### Running the What-If Tool on the deployed Vertex AI model\n", "\n", "In this step, you define a function that sends requests to the deployed model on endpoint and returns the formatted predictions. The function is then used to build the What-IF tool widget for analyzing the predictions." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "362ef35a5475" }, "outputs": [], "source": [ "# run the what-if tool\n", "if IS_COLAB:\n", " # configure the target and class-labels\n", " TARGET_FEATURE = \"isFraud\"\n", " LABEL_VOCAB = [\"not-fraud\", \"fraud\"]\n", "\n", " # function to return predictions from the deployed Model\n", "\n", " def endpoint_predict_sample(instances: list):\n", " prediction = endpoint.predict(instances=instances)\n", " preds = [[1 - i, i] for i in prediction.predictions]\n", " return preds\n", "\n", " # Combine the features and labels into one array for the What-If Tool\n", " test_examples = np.hstack(\n", " (test_samples_X.to_numpy(), test_samples_y.to_numpy().reshape(-1, 1))\n", " )\n", "\n", " # Configure the WIT with the prediction function\n", " config_builder = (\n", " WitConfigBuilder(\n", " test_examples.tolist(), test_samples_X.columns.tolist() + [\"isFraud\"]\n", " )\n", " .set_custom_predict_fn(endpoint_predict_sample)\n", " .set_target_feature(TARGET_FEATURE)\n", " .set_label_vocab(LABEL_VOCAB)\n", " )\n", "\n", " # run the WIT-widget\n", " display(WitWidget(config_builder, height=400))" ] }, { "cell_type": "markdown", "metadata": { "id": "aea461dd4549" }, "source": [ "## Cleaning up\n", "\n", "\n", "To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud\n", "project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.\n", "\n", "Otherwise, you can delete the individual resources you created in this tutorial:\n", "\n", "- Vertex AI Endpoint\n", "- Vertex AI Model\n", "- Cloud Storage bucket" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "f62e2c499f70" }, "outputs": [], "source": [ "# undeploy the model from the endpoint\n", "endpoint.undeploy_all()\n", "\n", "# delete the endpoint\n", "endpoint.delete()\n", "\n", "# delete the model\n", "model.delete()\n", "\n", "# delete the bucket\n", "delete_bucket = False\n", "if delete_bucket:\n", " ! gsutil rm -r $BUCKET_URI\n", "\n", "# delete the local files\n", "! rm $LOCAL_FILE_NAME" ] } ], "metadata": { "colab": { "name": "fraud-detection-model.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }

notebooks/official/workbench/fraud_detection/fraud-detection-model.ipynb (1,099 lines of code) (raw):