deploy-models/real-time-inference/multi_model_catboost/multi_model_catboost.ipynb (611 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"id": "38d13858",
"metadata": {},
"source": [
"# Multi-Model Endpoint - CatBoost\n"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n",
"\n",
"\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"id": "38d13858",
"metadata": {},
"source": [
"\n",
"This example notebook showcases how to use a custom container to host multiple CatBoost models on a SageMaker Multi Model Endpoint. The model this notebook deploys is taken from this [CatBoost tutorial](https://github.com/catboost/tutorials/blob/master/python_tutorial_with_tasks.ipynb). \n",
"\n",
"We are using this framework as an example to demonstrate deployment and serving using MultiModel Endpoint and showcase the capability. This notebook can be extended to any framework.\n",
"\n",
"Catboost is gaining in popularity and is not yet supported as a framework for SageMaker MultiModelEndpoint. Further this example serves to demostrate how to bring your own container to a MultiModelEndpoint\n",
"\n",
"In this Notebook we will use identical model to simulate multiple models for loading and inference"
]
},
{
"cell_type": "markdown",
"id": "ad05f099",
"metadata": {},
"source": [
"## Prerequisites\n",
"### Packages and Permissions\n",
"The SageMaker SDK uses the SageMaker default S3 bucket when needed. If the get_execution_role does not return a role with the appropriate permissions, you'll need to specify an IAM role ARN that does. Please make sure the `SageMakerFullAccess` policy is attached to the execution role you are using."
]
},
{
"cell_type": "markdown",
"id": "780f5b91",
"metadata": {},
"source": [
"## Load model and test local inference\n",
"Here, install `catboost` to test we can load up the model locally and make inference. \n",
"\n",
"We load up the model locally using `CatBoostClassifier()`. `test_data.csv` contains a single row of test inference data."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "73923ffc",
"metadata": {},
"outputs": [],
"source": [
"!pip install catboost"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "49a23684",
"metadata": {},
"outputs": [],
"source": [
"from catboost import CatBoostClassifier, Pool as CatboostPool, cv\n",
"import os\n",
"import pandas\n",
"\n",
"model_file = CatBoostClassifier()\n",
"model_file = model_file.load_model(\"catboost_model.bin\")\n",
"df = pandas.read_csv(\"test_data.csv\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c69eb076",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import io\n",
"import json\n",
"\n",
"out = model_file.predict_proba(df)\n",
"print(out)"
]
},
{
"cell_type": "markdown",
"id": "8dadea21",
"metadata": {},
"source": [
"## Upload tar ball to s3\n"
]
},
{
"cell_type": "markdown",
"id": "7803c750",
"metadata": {},
"source": [
"### Create a model tar ball\n",
"\n",
"SageMaker requires our model to be packaged in a tar.gz file."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cb1e94e5",
"metadata": {},
"outputs": [],
"source": [
"! tar -czvf catboost-model.tar.gz catboost_model.bin"
]
},
{
"cell_type": "markdown",
"id": "c026ca66",
"metadata": {},
"source": [
"### Upload 100 copies of the model to S3\n",
"\n",
"Multi-Model Endpoints require all our models to be in a specific S3 prefix. Here we upload 100 of them to our default bucket. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "941e1c75",
"metadata": {},
"outputs": [],
"source": [
"import sagemaker\n",
"\n",
"sess = sagemaker.Session()\n",
"s3_bucket = sess.default_bucket() # Replace with your own bucket name if needed\n",
"print(s3_bucket)"
]
},
{
"cell_type": "markdown",
"id": "4abdedf7",
"metadata": {},
"source": [
"### Upload the model tar balls using boto3 with a unique name"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "540086f1",
"metadata": {},
"outputs": [],
"source": [
"import boto3\n",
"\n",
"s3 = boto3.client(\"s3\")\n",
"for i in range(0, 100):\n",
" with open(\"catboost-model.tar.gz\", \"rb\") as f:\n",
" s3.upload_fileobj(f, s3_bucket, \"catboost/catboost-model-{}.tar.gz\".format(i))"
]
},
{
"cell_type": "markdown",
"id": "c325d98a",
"metadata": {},
"source": [
"### List all models in s3 prefix we will use for our Multi-Model Endpoint"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d39463fa",
"metadata": {},
"outputs": [],
"source": [
"!aws s3 ls s3://$s3_bucket/catboost/"
]
},
{
"cell_type": "markdown",
"id": "337c3b8c",
"metadata": {},
"source": [
"## Building the custom container"
]
},
{
"cell_type": "markdown",
"id": "92924607",
"metadata": {},
"source": [
"The container folder in this example contains 3 files:\n",
"```\n",
"\u251c\u2500\u2500 container\n",
"\u2502 \u251c\u2500\u2500 dockerd-entrypoint.py\n",
"\u2502 \u251c\u2500\u2500 Dockerfile\n",
"\u2502 \u2514\u2500\u2500 model_handler.py\n",
"```\n",
"\n",
"- `dockerd-entrypoint.py` is the entry point script that will start the multi model server.\n",
"- `Dockerfile` contains the container definition that will be used to assemble the image. This includes the packages that need to be installed.\n",
"- `model_handler.py` is the script that will contain the logic to load up the model and make inference.\n",
"\n",
"Take a look through the files to see if there is any customization that you would like to do.\n",
"Below cells highlight the main part of the files. \n"
]
},
{
"cell_type": "markdown",
"id": "2b7b1822",
"metadata": {},
"source": [
"### Install catboost in the `Dockerfile`"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8a17e523",
"metadata": {},
"outputs": [],
"source": [
"! sed -n '26,30p' container/Dockerfile"
]
},
{
"cell_type": "markdown",
"id": "7badc7f1",
"metadata": {},
"source": [
"### Update `initialize` function in `model_handler.py` with logic to load up the model\n",
"In this case we are using `CatBoostClassifier()`. Feel free to update the loading logic in this function to your needs."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ecac159c",
"metadata": {},
"outputs": [],
"source": [
"! sed -n '22,40p' container/model_handler.py"
]
},
{
"cell_type": "markdown",
"id": "764c9b2e",
"metadata": {},
"source": [
"### Update `handle` function in `model_handler.py` with logic to load up the model"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5c739eee",
"metadata": {},
"outputs": [],
"source": [
"! sed -n '70,85p' container/model_handler.py"
]
},
{
"cell_type": "markdown",
"id": "2a22ba5f",
"metadata": {},
"source": [
"### Build and Push the custom image to ECR"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "32693051",
"metadata": {},
"outputs": [],
"source": [
"%%sh\n",
"\n",
"# The name of our algorithm\n",
"algorithm_name=catboost-sagemaker-multimodel\n",
"\n",
"cd container\n",
"\n",
"account=$(aws sts get-caller-identity --query Account --output text)\n",
"\n",
"# Get the region defined in the current configuration (default to us-east-1 if none defined)\n",
"region=$(aws configure get region)\n",
"region=${region:-us-east-1}\n",
"\n",
"fullname=\"${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest\"\n",
"\n",
"# If the repository doesn't exist in ECR, create it.\n",
"aws ecr describe-repositories --repository-names \"${algorithm_name}\" > /dev/null 2>&1\n",
"\n",
"if [ $? -ne 0 ]\n",
"then\n",
" aws ecr create-repository --repository-name \"${algorithm_name}\" > /dev/null\n",
"fi\n",
"\n",
"# Get the login command from ECR and execute it directly\n",
"$(aws ecr get-login --region ${region} --no-include-email)\n",
"\n",
"# Build the docker image locally with the image name and then push it to ECR\n",
"# with the full name.\n",
"\n",
"docker build -q -t ${algorithm_name} .\n",
"docker tag ${algorithm_name} ${fullname}\n",
"\n",
"docker push ${fullname}"
]
},
{
"cell_type": "markdown",
"id": "313f6105",
"metadata": {},
"source": [
"### Deploy Multi Model Endpoint"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e91012c4",
"metadata": {},
"outputs": [],
"source": [
"from sagemaker import get_execution_role\n",
"\n",
"sm_client = boto3.client(service_name=\"sagemaker\")\n",
"runtime_sm_client = boto3.client(service_name=\"sagemaker-runtime\")\n",
"\n",
"account_id = boto3.client(\"sts\").get_caller_identity()[\"Account\"]\n",
"region = boto3.Session().region_name\n",
"\n",
"role = get_execution_role()"
]
},
{
"cell_type": "markdown",
"id": "a52b175a",
"metadata": {},
"source": [
"### Create the SageMaker Multi-Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "82795067",
"metadata": {},
"outputs": [],
"source": [
"from time import gmtime, strftime\n",
"\n",
"model_name = \"catboost-multimodel-\" + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n",
"model_url = \"s3://{}/catboost/\".format(s3_bucket) ## MODEL S3 URL\n",
"container = \"{}.dkr.ecr.{}.amazonaws.com/catboost-sagemaker-multimodel:latest\".format(\n",
" account_id, region\n",
")\n",
"instance_type = \"ml.m5.xlarge\"\n",
"\n",
"print(\"Model name: \" + model_name)\n",
"print(\"Model data Url: \" + model_url)\n",
"print(\"Container image: \" + container)\n",
"\n",
"container = {\"Image\": container, \"ModelDataUrl\": model_url, \"Mode\": \"MultiModel\"}\n",
"\n",
"create_model_response = sm_client.create_model(\n",
" ModelName=model_name, ExecutionRoleArn=role, Containers=[container]\n",
")\n",
"\n",
"print(\"Model ARN: \" + create_model_response[\"ModelArn\"])"
]
},
{
"cell_type": "markdown",
"id": "2e425655",
"metadata": {},
"source": [
"### Create the SageMaker Endpoint Configuration\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "28ffd4b6",
"metadata": {},
"outputs": [],
"source": [
"endpoint_config_name = \"catboost-multimodel-config\" + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n",
"print(\"Endpoint config name: \" + endpoint_config_name)\n",
"\n",
"create_endpoint_config_response = sm_client.create_endpoint_config(\n",
" EndpointConfigName=endpoint_config_name,\n",
" ProductionVariants=[\n",
" {\n",
" \"InstanceType\": instance_type,\n",
" \"InitialInstanceCount\": 1,\n",
" \"InitialVariantWeight\": 1,\n",
" \"ModelName\": model_name,\n",
" \"VariantName\": \"AllTraffic\",\n",
" }\n",
" ],\n",
")\n",
"\n",
"print(\"Endpoint config ARN: \" + create_endpoint_config_response[\"EndpointConfigArn\"])"
]
},
{
"cell_type": "markdown",
"id": "76dcfb2e",
"metadata": {},
"source": [
"### Create the SageMaker Multi-Model Endpoint"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5809862f",
"metadata": {},
"outputs": [],
"source": [
"%%time\n",
"\n",
"import time\n",
"\n",
"endpoint_name = \"catboost-multimodel-endpoint-\" + strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())\n",
"print(\"Endpoint name: \" + endpoint_name)\n",
"\n",
"create_endpoint_response = sm_client.create_endpoint(\n",
" EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name\n",
")\n",
"print(\"Endpoint Arn: \" + create_endpoint_response[\"EndpointArn\"])\n",
"\n",
"resp = sm_client.describe_endpoint(EndpointName=endpoint_name)\n",
"status = resp[\"EndpointStatus\"]\n",
"print(\"Endpoint Status: \" + status)\n",
"\n",
"print(\"Waiting for {} endpoint to be in service...\".format(endpoint_name))\n",
"waiter = sm_client.get_waiter(\"endpoint_in_service\")\n",
"waiter.wait(EndpointName=endpoint_name)"
]
},
{
"cell_type": "markdown",
"id": "3dfcf43c",
"metadata": {},
"source": [
"### Invoke each of the 100 models\n",
"We have identical models here to simulate multiple models belonging to the same framework"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b12b05e2",
"metadata": {},
"outputs": [],
"source": [
"for i in range(0, 100):\n",
" response = runtime_sm_client.invoke_endpoint(\n",
" EndpointName=endpoint_name,\n",
" TargetModel=\"catboost-model-{}.tar.gz\".format(i),\n",
" Body=df.to_csv(index=False),\n",
" )\n",
" print(json.loads(response[\"Body\"].read().decode(\"utf-8\")))"
]
},
{
"cell_type": "markdown",
"id": "a6e7a780",
"metadata": {},
"source": [
"### Invoke just one of models 1000 times \n",
"Since the models are in memory and loaded, these invocations should not have any latency \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d3d8831e",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"\n",
"results = []\n",
"for i in range(0, 1000):\n",
" start = time.time()\n",
" response = runtime_sm_client.invoke_endpoint(\n",
" EndpointName=endpoint_name,\n",
" TargetModel=\"catboost-model-1.tar.gz\",\n",
" Body=df.to_csv(index=False),\n",
" )\n",
" results.append((time.time() - start) * 1000)\n",
"print(\"\\nPredictions for model latency: \\n\")\n",
"print(\"\\nP95: \" + str(np.percentile(results, 95)) + \" ms\\n\")\n",
"print(\"P90: \" + str(np.percentile(results, 90)) + \" ms\\n\")\n",
"print(\"Average: \" + str(np.average(results)) + \" ms\\n\")"
]
},
{
"cell_type": "markdown",
"source": [
"## Optional Clean up\n",
"Clean up and delete the end point"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%% md\n"
}
}
},
{
"cell_type": "code",
"execution_count": null,
"outputs": [],
"source": [
"# delete the end point\n",
"\n",
"sm_client.delete_endpoint(EndpointName=endpoint_name)"
],
"metadata": {
"collapsed": false,
"pycharm": {
"name": "#%%\n"
}
}
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Notebook CI Test Results\n",
"\n",
"This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "conda_pytorch_p36",
"language": "python",
"name": "conda_pytorch_p36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}