training/built-in-frameworks/create_endpoint.ipynb (1,228 lines of code) (raw):

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Create Endpoint \n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "This notebook's CI test result for us-west-2 is as follows. CI test results in other regions can be found at the end of the notebook. \n", "\n", "![This us-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-2/sagemaker-fundamentals|create-endpoint|create_endpoint.ipynb)\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "In this notebook, you will learn basics about hosting your trained model on Amazon SageMaker for inference. There are two ways you can use Amazon SageMaker for inference:\n", "1. Set up persistent endpoint for real-time online inference\n", "2. Gather data to be predicted in batch and use SageMaker batch transform for offline inference. \n", "\n", "In this notebook, we focus on the first option and we will discuss batch transform in another notebook. \n", "\n", "You are highly recommeneded to go through [the section on model deployment](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-deployment.html) in the official docs before moving on.\n", "\n", "\n", "The pricing for setting up an endpoint can be found [here](https://aws.amazon.com/sagemaker/pricing/)\n", "\n", "Like a [CreateTrainingJob](https://github.com/hsl89/amazon-sagemaker-examples/blob/sagemaker-fundamentals/sagemaker-fundamentals/create-training-job/create-training-job.ipynb), Amazon SageMaker interacts with your inference logic via a containerized enviornment. \n", "\n", "The following APIs are relavent:\n", "* [`CreateModel`](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model)\n", "* [`CreateEndpointConfig`](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint_config)\n", "* [`CreateEndpoint`](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint)\n", "\n", "You are highly recommended to go through them. It's okay if you don't understand everything, we will go through them in detail in this notebook. \n", "\n", "The outline of this notebook is:\n", "* Create an IAM role for SageMaker\n", "* Build an inference image\n", "* Test the inference image / container locally and push it to ECR\n", "* Use the ECR address of the inference container to define a model by calling `CreateModel`\n", "* Specify configuration of an endpoint by calling `CreateEndpointConfig`\n", "* Use model definition from 3 and endpoint configuration from 4 to create an endpoint by calling `CreateEndpoint`\n", "* Invoke the endpoint by using SageMaker runtime client " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# setups\n", "import boto3\n", "import datetime\n", "import pprint\n", "import os\n", "import time\n", "import re\n", "\n", "pp = pprint.PrettyPrinter(indent=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Set up a service role for SageMaker\n", "\n", "Review [notebook on execution role](https://github.com/hsl89/amazon-sagemaker-examples/blob/execution-role/sagemaker-fundamentals/execution-role/execution-role.ipynb) for step-by-step instructions on how to create an IAM Role.\n", "\n", "The service role is intended to be assumed by the SageMaker service to procure resources in your AWS account on your behalf.\n", "\n", "1. If you are running this this notebook on SageMaker infrastructure like Notebook Instances or Studio, then we will use the role you used to spin up those resources\n", "\n", "2. If you are running this notebook on an EC2 instance, then we will create a service role attach `AmazonSageMakerFullAccess` to it. If you already have a SageMaker service role, you can paste its role_arn here.\n", "\n", "\n", "First get some useful functions we created there to help us creating an execution role. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "cp ../execution-role/iam_helpers.py ." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# set up service role for SageMaker\n", "from iam_helpers import create_execution_role\n", "\n", "sts = boto3.client(\"sts\")\n", "caller = sts.get_caller_identity()\n", "\n", "if \":user/\" in caller[\"Arn\"]: # as IAM user\n", " # either paste in a role_arn with or create a new one and attach\n", " # AmazonSageMakerFullAccess\n", " role_name = \"sm\"\n", " role_arn = create_execution_role(role_name=role_name)[\"Role\"][\"Arn\"]\n", "\n", " # attach the permission to the role\n", " # skip it if you want to use a SageMaker service that\n", " # already has AmazonFullSageMakerFullAccess\n", " iam.attach_role_policy(\n", " RoleName=role_name, PolicyArn=\"arn:aws:iam::aws:policy/AmazonSageMakerFullAccess\"\n", " )\n", "elif \"assumed-role\" in caller[\"Arn\"]: # on SageMaker infra\n", " assumed_role = caller[\"Arn\"]\n", " role_arn = re.sub(r\"^(.+)sts::(\\d+):assumed-role/(.+?)/.*$\", r\"\\1iam::\\2:role/\\3\", assumed_role)\n", "else:\n", " print(\"I assume you are on an EC2 instance launched with an IAM role\")\n", " role_arn = caller[\"Arn\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Build an inference image\n", "\n", "You inference image must be a self-contained web server. When you run your inference container locally, it should listen on port 8080 and accept POST requests to the `/invocations` endpoint. The payload of the POST requests is the content of the data that you want your model to predict. Since the inference container is essentially a web server, you should expect it to look differently from the container we used for [`CreateTrainingJob`](https://github.com/hsl89/amazon-sagemaker-examples/blob/sagemaker-fundamentals/sagemaker-fundamentals/create-training-job/create-training-job.ipynb). \n", "\n", "In this notebook, we use a minimal python stack to build our web server:\n", "![Request serving stack](stack.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Further readings on the serving stack\n", "\n", "* [Overview of the stack](https://flask.palletsprojects.com/en/1.1.x/deploying/uwsgi/)\n", "* [Ngnix homepage](https://www.nginx.com/resources/wiki/start/) \n", "* [WSGI homepage](https://gunicorn.org/)\n", "* [Flask homepage](https://flask.palletsprojects.com/en/1.1.x/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### How SageMaker runs your container\n", "\n", "SageMaker runs your container like\n", "\n", "```sh\n", "docker run <image> serve\n", "```\n", "\n", "This means you need to have an executable called `serve` in the `PATH`. In this notebook, we will create a python script as an **executable** and put it in the working directory of the docker image. \n", " \n", "The folder `container/src` contains the configs and entry point of the web server" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "nginx.conf predictor.py serve wsgi.py\n" ] } ], "source": [ "!ls container/src" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Entrypoint for Ngnixs server\n", "\n", "`serve` is a python executable that is intended to be used as the entrypoint for the inference image." ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "#!/usr/bin/env python\n", "\n", "# This file implements the scoring service shell. You don't necessarily need to modify it for various\n", "# algorithms. It starts nginx and gunicorn with the correct configurations and then simply waits until\n", "# gunicorn exits.\n", "#\n", "# The flask server is specified to be the app object in wsgi.py\n", "#\n", "# We set the following parameters:\n", "#\n", "# Parameter Environment Variable Default Value\n", "# --------- -------------------- -------------\n", "# number of workers MODEL_SERVER_WORKERS the number of CPU cores\n", "# timeout MODEL_SERVER_TIMEOUT 60 seconds\n", "\n", "import multiprocessing\n", "import os\n", "import signal\n", "import subprocess\n", "import sys\n", "\n", "cpu_count = multiprocessing.cpu_count()\n", "\n", "model_server_timeout = os.environ.get('MODEL_SERVER_TIMEOUT', 60)\n", "model_server_workers = int(os.environ.get('MODEL_SERVER_WORKERS', cpu_count))\n", "\n", "def sigterm_handler(nginx_pid, gunicorn_pid):\n", " try:\n", " os.kill(nginx_pid, signal.SIGQUIT)\n", " except OSError:\n", " pass\n", " try:\n", " os.kill(gunicorn_pid, signal.SIGTERM)\n", " except OSError:\n", " pass\n", "\n", " sys.exit(0)\n", "\n", "def start_server():\n", " print('Starting the inference server with {} workers.'.format(model_server_workers))\n", "\n", " # link the log streams to stdout/err so they will be logged to the container logs\n", " subprocess.check_call(['ln', '-sf', '/dev/stdout', '/var/log/nginx/access.log'])\n", " subprocess.check_call(['ln', '-sf', '/dev/stderr', '/var/log/nginx/error.log'])\n", "\n", " nginx = subprocess.Popen(['nginx', '-c', '/opt/program/nginx.conf'])\n", " gunicorn = subprocess.Popen(['gunicorn',\n", " '--timeout', str(model_server_timeout),\n", " '-k', 'sync',\n", " '-b', 'unix:/tmp/gunicorn.sock',\n", " '-w', str(model_server_workers),\n", " 'wsgi:app'])\n", "\n", " signal.signal(signal.SIGTERM, lambda a, b: sigterm_handler(nginx.pid, gunicorn.pid))\n", "\n", " # If either subprocess exits, so do we.\n", " pids = set([nginx.pid, gunicorn.pid])\n", " while True:\n", " pid, _ = os.wait()\n", " if pid in pids:\n", " break\n", "\n", " sigterm_handler(nginx.pid, gunicorn.pid)\n", " print('Inference server exiting')\n", "\n", "# The main routine just invokes the start function.\n", "\n", "if __name__ == '__main__':\n", " start_server()\n" ] } ], "source": [ "!cat container/src/serve" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Config file for Ngnix server\n", "`nginx.conf` is the config file for the nginx server." ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "worker_processes 1;\n", "daemon off; # Prevent forking\n", "\n", "\n", "pid /tmp/nginx.pid;\n", "error_log /var/log/nginx/error.log;\n", "\n", "events {\n", " # defaults\n", "}\n", "\n", "http {\n", " include /etc/nginx/mime.types;\n", " default_type application/octet-stream;\n", " access_log /var/log/nginx/access.log combined;\n", " \n", " upstream gunicorn {\n", " server unix:/tmp/gunicorn.sock;\n", " }\n", "\n", " server {\n", " listen 8080 deferred;\n", " client_max_body_size 5m;\n", "\n", " keepalive_timeout 5;\n", " proxy_read_timeout 1200s;\n", "\n", " location ~ ^/(ping|invocations) {\n", " proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;\n", " proxy_set_header Host $http_host;\n", " proxy_redirect off;\n", " proxy_pass http://gunicorn;\n", " }\n", "\n", " location / {\n", " return 404 \"{}\";\n", " }\n", " }\n", "}\n" ] } ], "source": [ "!cat container/src/nginx.conf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### WSGI config" ] }, { "cell_type": "code", "execution_count": 80, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "import predictor as myapp\n", "\n", "# This is just a simple wrapper for gunicorn to find your app.\n", "# If you want to change the algorithm file, simply change \"predictor\" above to the\n", "# new file.\n", "\n", "app = myapp.app\n" ] } ], "source": [ "!cat container/src/wsgi.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Inference logic\n", "\n", "The most important file in `container/src` is `predictor.py`. It contains the inference logic. Other files in the `container/src` can be used **as it**. But you will need to customize `predictor.py` to implement your own inference logic. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pygmentize container/src/predictor.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Build the container\n", "\n", "We build the container from `container/Dockderfile`. And let's call this image `example-serve`. " ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "# Build an image that can do training and inference in SageMaker\n", "# This is a Python 3 image that uses the nginx, gunicorn, flask stack\n", "# for serving inferences in a stable way.\n", "\n", "FROM ubuntu:18.04\n", "\n", "MAINTAINER Amazon AI <sage-learner@amazon.com>\n", "\n", "\n", "RUN apt-get -y update && apt-get install -y --no-install-recommends \\\n", " wget \\\n", " python3-pip \\\n", " python3-setuptools \\\n", " nginx \\\n", " ca-certificates \\\n", " && rm -rf /var/lib/apt/lists/*\n", "\n", "RUN ln -s /usr/bin/python3 /usr/bin/python\n", "RUN ln -s /usr/bin/pip3 /usr/bin/pip\n", "\n", "# Here we get all python packages.\n", "# There's substantial overlap between scipy and numpy that we eliminate by\n", "# linking them together. Likewise, pip leaves the install caches populated which uses\n", "# a significant amount of space. These optimizations save a fair amount of space in the\n", "# image, which reduces start up time.\n", "RUN pip --no-cache-dir install numpy==1.16.2 scipy==1.2.1 scikit-learn==0.20.2 pandas flask gunicorn\n", "\n", "# Set some environment variables. PYTHONUNBUFFERED keeps Python from buffering our standard\n", "# output stream, which means that logs can be delivered to the user quickly. PYTHONDONTWRITEBYTECODE\n", "# keeps Python from writing the .pyc files which are unnecessary in this case. We also update\n", "# PATH so that the train and serve programs are found when the container is invoked.\n", "\n", "ENV PYTHONUNBUFFERED=TRUE\n", "ENV PYTHONDONTWRITEBYTECODE=TRUE\n", "ENV PATH=\"/opt/program:${PATH}\"\n", "\n", "# Set up the program in the image\n", "COPY src /opt/program\n", "WORKDIR /opt/program\n", "\n" ] } ], "source": [ "!cat container/Dockerfile" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%sh\n", "# build the image\n", "cd container/\n", "\n", "# tag it as example-image:latest\n", "docker build -t example-serve:latest ." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Test your image\n", "\n", "Like in the [notebook for CreateTrainingJob](https://github.com/hsl89/amazon-sagemaker-examples/blob/sagemaker-fundamentals/sagemaker-fundamentals/create-training-job/create-training-job.ipynb), we replicate the Amazon SageMaker hosting environment and test your image locally before serving in production. You are encouraged to read through the section on [Use Your Own Inference Code with Hosting Services](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html) and think about how would you replicate SageMaker hosting environment before moving on. \n", "\n", "Like for `CreateTrainingJob`, SageMaker reserves `/opt/ml` directory in your image to inject ML-related info for `CreateEndpoint`. In particular, it downloads your trained model artifact and inject it in the directory `/opt/ml/model`. When calling `CreateEndpoint` you will need to tell SageMaker the S3 URI of your model artifact. SageMaker will use then pull the artifact and inject it into `/opt/ml/model`. This means when defining your own inference logic, you should load your trained model from `/opt/ml/model`. \n", "\n", "We will use docker python client to run your image and we will mount `container/local_test/ml` to `/opt/ml` as docker volume. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# look at what's inside `container/ml`\n", "!ls container/local_test/ml" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The inference logic we implemented in `container/src/predictor.py` under `def inference():` does not require a real ML model. Therefore we do not need to inject anything for the purpose of local test. We will discuss how to load a real model in a more advanced notebook. \n", "\n", "<span style=\"color:red\"> TODO for Dev: add link to the advanced notebook when it is ready</span>." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Run the container\n", "\n", "To run the container `example-serve`, open a terminal in the current directory and go to `container/local_test`\n", "\n", "```sh\n", "cd container/local_test\n", "```\n", "\n", "Then run the following command\n", "\n", "```sh\n", "docker run -v ml:/opt/ml -p 8080:8080 --rm example-serve:latest serve \n", "```\n", "\n", "`-v ml:/opt/ml` binds the directory `ml` (in `container/local_test`) to `/opt/ml` in the image as a docker volume.\n", "\n", "`-p 8080:8080` exposes port 8080 inside container as port 8080 on the hos\n", "\n", "`--rm` removes the container from daemon when it is stopped. \n", "\n", "We suggest you to run the image from the shell instead of within the notebook because when you are debugging your own container, you can more easily stdout from the container when you have a shell process running it. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Ping your container\n", "Once your container is up, you can ping it at `http://localhost:8080`. \n", "\n", "To trigger the logic under `def ping():` in `container/src/predictor.py`, do\n", "\n", "```sh\n", "curl localhost:8080/ping\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To trigger the logic under `def inference():` in `container/src/predictor.py` with a json string, do\n", "\n", "```sh\n", "curl --header \"Content-Type: application/json\" \\\n", " --request POST \\\n", " --data '{\"key\":\"value\"}' \\\n", " http://localhost:8080/invocations\n", "```\n", "\n", "To trigger the logic under `def inference():` in `container/src/predictor.py` with a non-json payload, do\n", "```\n", "curl --header \"Content-Type: text/csv\" \\\n", " --request POST \\\n", " http://localhost:8080/invocations\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To stop the container, go to the terminal that runs your container and press `Control + C`. Alternatively, you can find out it container id by grepping for a docker process that binds port 8080 on the host and manually remove it.\n", "\n", "```sh\n", "docker rm -f $(docker ps | grep -e \"0.0.0.0:8080->8080/tcp\" | awk '{print $1}'\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Push the image to ECR\n", "Now you have tested your image, the next thing to do is to push it to your ECR so that SageMaker can download it. We have discussed this in the [previous notebook on `CreateTrainingJob`](https://github.com/hsl89/amazon-sagemaker-examples/blob/sagemaker-fundamentals/sagemaker-fundamentals/create-training-job/create-training-job.ipynb) in the section where we push the training image to ECR. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a repo" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'ResponseMetadata': {'HTTPHeaders': {'content-length': '393',\n", " 'content-type': 'application/x-amz-json-1.1',\n", " 'date': 'Tue, 23 Mar 2021 00:24:00 GMT',\n", " 'x-amzn-requestid': '547615ed-af77-4f7a-ac0e-bff9aa37d34b'},\n", " 'HTTPStatusCode': 200,\n", " 'RequestId': '547615ed-af77-4f7a-ac0e-bff9aa37d34b',\n", " 'RetryAttempts': 0},\n", " 'repository': {'createdAt': datetime.datetime(2021, 3, 23, 0, 24, 1, tzinfo=tzlocal()),\n", " 'encryptionConfiguration': {'encryptionType': 'AES256'},\n", " 'imageScanningConfiguration': {'scanOnPush': False},\n", " 'imageTagMutability': 'MUTABLE',\n", " 'registryId': '688520471316',\n", " 'repositoryArn': 'arn:aws:ecr:us-west-2:688520471316:repository/example-serve',\n", " 'repositoryName': 'example-serve',\n", " 'repositoryUri': '688520471316.dkr.ecr.us-west-2.amazonaws.com/example-serve'}}\n" ] } ], "source": [ "ecr = boto3.client(\"ecr\")\n", "\n", "try:\n", " # The repository might already exist\n", " # in your ECR\n", " cr_res = ecr.create_repository(repositoryName=\"example-serve\")\n", " pp.pprint(cr_res)\n", "except Exception as e:\n", " print(e)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Push the image to ECR" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Login Succeeded\n", "The push refers to repository [688520471316.dkr.ecr.us-west-2.amazonaws.com/example-serve]\n", "4d7e5149d4e3: Preparing\n", "b520e3bd5eba: Preparing\n", "3a3b4090fe28: Preparing\n", "5b850ff3c508: Preparing\n", "408c63ea099b: Preparing\n", "9f10818f1f96: Preparing\n", "27502392e386: Preparing\n", "c95d2191d777: Preparing\n", "9f10818f1f96: Waiting\n", "c95d2191d777: Waiting\n", "3a3b4090fe28: Pushed\n", "5b850ff3c508: Pushed\n", "4d7e5149d4e3: Pushed\n", "9f10818f1f96: Pushed\n", "27502392e386: Pushed\n", "c95d2191d777: Pushed\n", "408c63ea099b: Pushed\n", "b520e3bd5eba: Pushed\n", "latest: digest: sha256:24bb29a270095e8a3491c89288cdffe45fe03bc46f728bbd1d0a54acea31f711 size: 1989\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "WARNING! Your password will be stored unencrypted in /home/ubuntu/.docker/config.json.\n", "Configure a credential helper to remove this warning. See\n", "https://docs.docker.com/engine/reference/commandline/login/#credentials-store\n", "\n" ] } ], "source": [ "%%bash\n", "account=$(aws sts get-caller-identity --query Account | sed -e 's/^\"//' -e 's/\"$//')\n", "region=$(aws configure get region)\n", "ecr_account=${account}.dkr.ecr.${region}.amazonaws.com\n", "\n", "# Give docker your ECR login password\n", "aws ecr get-login-password --region $region | docker login --username AWS --password-stdin $ecr_account\n", "\n", "# Fullname of the repo\n", "fullname=$ecr_account/example-serve:latest\n", "\n", "#echo $fullname\n", "# Tag the image with the fullname\n", "docker tag example-serve:latest $fullname\n", "\n", "# Push to ECR\n", "docker push $fullname" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create model\n", "Now we use the image you just pushed to ECR to create a model in Amazon SageMaker. This is done by calling `CreateModel` API. Once a model is created we will be able to host it on an Amazon SageMaker endpoint by creating an endpoint configuration and calling `CreateEndpoint` API. " ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'ModelArn': 'arn:aws:sagemaker:us-west-2:688520471316:model/example-serve',\n", " 'ResponseMetadata': {'HTTPHeaders': {'content-length': '75',\n", " 'content-type': 'application/x-amz-json-1.1',\n", " 'date': 'Wed, 10 Mar 2021 23:01:20 GMT',\n", " 'x-amzn-requestid': '58071bde-0339-4400-9e05-17dca25ca4bc'},\n", " 'HTTPStatusCode': 200,\n", " 'RequestId': '58071bde-0339-4400-9e05-17dca25ca4bc',\n", " 'RetryAttempts': 0}}\n" ] } ], "source": [ "sm_boto3 = boto3.client(\"sagemaker\")\n", "\n", "region = boto3.Session().region_name\n", "account_id = boto3.client(\"sts\").get_caller_identity()[\"Account\"]\n", "\n", "image_uri = \"{}.dkr.ecr.{}.amazonaws.com/example-serve:latest\".format(account_id, region)\n", "\n", "cm_res = sm_boto3.create_model(\n", " ModelName=\"example-serve\", # name the of the model does not need to be the same as the image repob\n", " Containers=[\n", " {\n", " \"Image\": image_uri,\n", " },\n", " ],\n", " ExecutionRoleArn=role[\"Arn\"],\n", " EnableNetworkIsolation=False,\n", ")\n", "\n", "pp.pprint(cm_res)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create endpoint configuration\n", "\n", "Next we configure the resources we need to deploy this model by creating an endpoint configuration. This is done by calling `CreateEndpointConfig` API. For more info about this API, read its [API reference](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint_config)." ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [], "source": [ "model_name = \"example-serve\" # model defined above\n", "initial_instance_count = 1\n", "instance_type = \"ml.t2.medium\"\n", "\n", "variant_name = \"AMeaningfulProdVarName\" # ^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}\n", "\n", "production_variants = [\n", " {\n", " \"VariantName\": variant_name,\n", " \"ModelName\": model_name,\n", " \"InitialInstanceCount\": initial_instance_count,\n", " \"InstanceType\": instance_type,\n", " }\n", "]\n", "\n", "endpoint_config_name = \"ExampleServeConfig\" # ^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}\n", "\n", "endpoint_config = {\n", " \"EndpointConfigName\": endpoint_config_name,\n", " \"ProductionVariants\": production_variants,\n", "}\n", "\n", "ep_conf_res = sm_boto3.create_endpoint_config(**endpoint_config)" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'EndpointConfigArn': 'arn:aws:sagemaker:us-west-2:688520471316:endpoint-config/exampleserveconfig',\n", " 'ResponseMetadata': {'HTTPHeaders': {'content-length': '99',\n", " 'content-type': 'application/x-amz-json-1.1',\n", " 'date': 'Wed, 10 Mar 2021 23:42:10 GMT',\n", " 'x-amzn-requestid': 'a9b17d8e-0ac9-472e-a43b-e26878925854'},\n", " 'HTTPStatusCode': 200,\n", " 'RequestId': 'a9b17d8e-0ac9-472e-a43b-e26878925854',\n", " 'RetryAttempts': 0}}\n" ] } ], "source": [ "pp.pprint(ep_conf_res)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create Endpoint\n", "Put everything together, we are ready to create an endpoint using the model and the endpoing configuration. We will create an endpoint by calling `CreateEndpoint` API. The API reference is [here](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint). " ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'EndpointArn': 'arn:aws:sagemaker:us-west-2:688520471316:endpoint/example-endpoint',\n", " 'ResponseMetadata': {'HTTPHeaders': {'content-length': '84',\n", " 'content-type': 'application/x-amz-json-1.1',\n", " 'date': 'Wed, 10 Mar 2021 23:47:37 GMT',\n", " 'x-amzn-requestid': '473836e2-9010-4895-a13f-3ba86bf9e187'},\n", " 'HTTPStatusCode': 200,\n", " 'RequestId': '473836e2-9010-4895-a13f-3ba86bf9e187',\n", " 'RetryAttempts': 0}}\n" ] } ], "source": [ "endpoint_name = \"example-endpoint\"\n", "ep_res = sm_boto3.create_endpoint(\n", " EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name\n", ")\n", "pp.pprint(ep_res)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Inspect endpoint status\n", "It takes a litte while for the endpoint to be fully ready, because SageMaker needs to provision the EC2 instance hosting it. To get an update on the endpoint status, we can call `DescribeEndpoint`. " ] }, { "cell_type": "code", "execution_count": 61, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'CreationTime': datetime.datetime(2021, 3, 10, 23, 47, 38, 119000, tzinfo=tzlocal()),\n", " 'EndpointArn': 'arn:aws:sagemaker:us-west-2:688520471316:endpoint/example-endpoint',\n", " 'EndpointConfigName': 'ExampleServeConfig',\n", " 'EndpointName': 'example-endpoint',\n", " 'EndpointStatus': 'Creating',\n", " 'LastModifiedTime': datetime.datetime(2021, 3, 10, 23, 47, 38, 119000, tzinfo=tzlocal()),\n", " 'ResponseMetadata': {'HTTPHeaders': {'content-length': '256',\n", " 'content-type': 'application/x-amz-json-1.1',\n", " 'date': 'Wed, 10 Mar 2021 23:51:09 GMT',\n", " 'x-amzn-requestid': '4adc90e4-75cf-4700-99db-e10c09727b67'},\n", " 'HTTPStatusCode': 200,\n", " 'RequestId': '4adc90e4-75cf-4700-99db-e10c09727b67',\n", " 'RetryAttempts': 0}}\n" ] } ], "source": [ "ep_des_res = sm_boto3.describe_endpoint(EndpointName=endpoint_name)\n", "\n", "\n", "pp.pprint(ep_des_res)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "`EndpointStatus` field from `ep_des_res` takes the following value (See [AWS API Documentation](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeEndpoint.html#sagemaker-DescribeEndpoint-response-EndpointStatus)):\n", "\n", "`OutOfService`: Endpoint is not available to take incoming requests.\n", "\n", "`Creating`: CreateEndpoint is executing.\n", "\n", "`Updating`: UpdateEndpoint or UpdateEndpointWeightsAndCapacities is executing.\n", "\n", "`SystemUpdating`: Endpoint is undergoing maintenance and cannot be updated or deleted or re-scaled until it has completed. This maintenance operation does not change any customer-specified values such as VPC config, KMS encryption, model, instance type, or instance count.\n", "\n", "`RollingBack`: Endpoint fails to scale up or down or change its variant weight and is in the process of rolling back to its previous configuration. Once the rollback completes, endpoint returns to an InService status. This transitional status only applies to an endpoint that has autoscaling enabled and is undergoing variant weight or capacity changes as part of an UpdateEndpointWeightsAndCapacities call or when the UpdateEndpointWeightsAndCapacities operation is called explicitly.\n", "\n", "`InService`: Endpoint is available to process incoming requests.\n", "\n", "`Deleting`: DeleteEndpoint is executing.\n", "\n", "`Failed`: Endpoint could not be created, updated, or re-scaled. Use DescribeEndpoint:FailureReason for information about the failure. DeleteEndpoint is the only operation that can be performed on a failed endpoint.\n", "\n", "---\n", "\n", "To get real time update on the endpoint status, we can call `DescribeEndpoint` once every few seconds until the status becomes `InService` or `Failed`." ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'CreationTime': datetime.datetime(2021, 3, 10, 23, 47, 38, 119000, tzinfo=tzlocal()),\n", " 'EndpointArn': 'arn:aws:sagemaker:us-west-2:688520471316:endpoint/example-endpoint',\n", " 'EndpointConfigName': 'ExampleServeConfig',\n", " 'EndpointName': 'example-endpoint',\n", " 'EndpointStatus': 'InService',\n", " 'LastModifiedTime': datetime.datetime(2021, 3, 10, 23, 56, 2, 741000, tzinfo=tzlocal()),\n", " 'ProductionVariants': [{'CurrentInstanceCount': 1,\n", " 'CurrentWeight': 1.0,\n", " 'DeployedImages': [{'ResolutionTime': datetime.datetime(2021, 3, 10, 23, 47, 41, 524000, tzinfo=tzlocal()),\n", " 'ResolvedImage': '688520471316.dkr.ecr.us-west-2.amazonaws.com/example-serve@sha256:24bb29a270095e8a3491c89288cdffe45fe03bc46f728bbd1d0a54acea31f711',\n", " 'SpecifiedImage': '688520471316.dkr.ecr.us-west-2.amazonaws.com/example-serve:latest'}],\n", " 'DesiredInstanceCount': 1,\n", " 'DesiredWeight': 1.0,\n", " 'VariantName': 'AMeaningfulProdVarName'}],\n", " 'ResponseMetadata': {'HTTPHeaders': {'content-length': '700',\n", " 'content-type': 'application/x-amz-json-1.1',\n", " 'date': 'Thu, 11 Mar 2021 00:00:55 GMT',\n", " 'x-amzn-requestid': '73391382-7f2b-48f5-bdd4-b198a764a497'},\n", " 'HTTPStatusCode': 200,\n", " 'RequestId': '73391382-7f2b-48f5-bdd4-b198a764a497',\n", " 'RetryAttempts': 0}}\n" ] } ], "source": [ "import time\n", "\n", "creating = True\n", "\n", "while creating:\n", " ep_des_res = sm_boto3.describe_endpoint(EndpointName=endpoint_name)\n", " pp.pprint(ep_des_res)\n", " time.sleep(15)\n", " if ep_des_res[\"EndpointStatus\"] != \"Creating\":\n", " creating = False" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Test the endpoint\n", "Now the endpoint is in service, let's invoke it with [SageMaker runtime client](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-runtime.html)" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'Body': <botocore.response.StreamingBody object at 0x7f50a501aac8>,\n", " 'ContentType': 'text/plain; charset=utf-8',\n", " 'InvokedProductionVariant': 'AMeaningfulProdVarName',\n", " 'ResponseMetadata': {'HTTPHeaders': {'content-length': '41',\n", " 'content-type': 'text/plain; '\n", " 'charset=utf-8',\n", " 'date': 'Thu, 11 Mar 2021 00:16:24 GMT',\n", " 'x-amzn-invoked-production-variant': 'AMeaningfulProdVarName',\n", " 'x-amzn-requestid': '51a58074-4e0d-4886-93b4-888e6835ba14'},\n", " 'HTTPStatusCode': 200,\n", " 'RequestId': '51a58074-4e0d-4886-93b4-888e6835ba14',\n", " 'RetryAttempts': 0}}\n" ] } ], "source": [ "# invoke endpoint\n", "import json\n", "\n", "sm_runtime = boto3.client(\"sagemaker-runtime\")\n", "\n", "body = json.dumps(\"a json string\")\n", "content_type = \"application/json\"\n", "\n", "# respnse type\n", "accept = \"text/plain\"\n", "\n", "res = sm_runtime.invoke_endpoint(\n", " EndpointName=endpoint_name,\n", " Body=body, # encoded input data\n", " ContentType=content_type, # I told the endpoint what's the encode\n", " Accept=accept, # I told the endpoint how I want to decode its response\n", ")\n", "\n", "pp.pprint(res)" ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'I am fed with json. Therefore, I am happy'" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# decode the response body\n", "res_body = res[\"Body\"]\n", "res_body.read().decode(\"utf-8\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clean up\n", "Congratulations! You now understand the basics of a creating an endpoint on Amazon SageMaker. The endpoint you just created does not do too much ML. So feel free to delete all relevant resources. " ] }, { "cell_type": "code", "execution_count": 72, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'ResponseMetadata': {'HTTPHeaders': {'content-length': '288',\n", " 'content-type': 'application/x-amz-json-1.1',\n", " 'date': 'Thu, 11 Mar 2021 00:29:08 GMT',\n", " 'x-amzn-requestid': '17914b8b-edc1-4f2c-acdf-81ecef87084f'},\n", " 'HTTPStatusCode': 200,\n", " 'RequestId': '17914b8b-edc1-4f2c-acdf-81ecef87084f',\n", " 'RetryAttempts': 0},\n", " 'repository': {'createdAt': datetime.datetime(2021, 3, 10, 21, 49, tzinfo=tzlocal()),\n", " 'imageTagMutability': 'MUTABLE',\n", " 'registryId': '688520471316',\n", " 'repositoryArn': 'arn:aws:ecr:us-west-2:688520471316:repository/example-serve',\n", " 'repositoryName': 'example-serve',\n", " 'repositoryUri': '688520471316.dkr.ecr.us-west-2.amazonaws.com/example-serve'}}\n" ] } ], "source": [ "# delete the ECR repo\n", "ecr = boto3.client(\"ecr\")\n", "del_repo_res = ecr.delete_repository(repositoryName=\"example-serve\", force=True)\n", "\n", "pp.pprint(del_repo_res)" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'ResponseMetadata': {'HTTPHeaders': {'content-length': '0',\n", " 'content-type': 'application/x-amz-json-1.1',\n", " 'date': 'Thu, 11 Mar 2021 00:29:22 GMT',\n", " 'x-amzn-requestid': 'c24e0777-f7fd-4481-a005-2a61810279f9'},\n", " 'HTTPStatusCode': 200,\n", " 'RequestId': 'c24e0777-f7fd-4481-a005-2a61810279f9',\n", " 'RetryAttempts': 0}}\n" ] } ], "source": [ "# delete the model\n", "del_model_res = sm_boto3.delete_model(ModelName=model_name)\n", "\n", "pp.pprint(del_model_res)" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'ResponseMetadata': {'HTTPHeaders': {'content-length': '0',\n", " 'content-type': 'application/x-amz-json-1.1',\n", " 'date': 'Thu, 11 Mar 2021 00:30:34 GMT',\n", " 'x-amzn-requestid': '5f4fe112-a192-4dbb-ae63-586fbce5d265'},\n", " 'HTTPStatusCode': 200,\n", " 'RequestId': '5f4fe112-a192-4dbb-ae63-586fbce5d265',\n", " 'RetryAttempts': 0}}\n" ] } ], "source": [ "# delete endpoint config\n", "del_ep_config_res = sm_boto3.delete_endpoint_config(EndpointConfigName=endpoint_config_name)\n", "\n", "pp.pprint(del_ep_config_res)" ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'ResponseMetadata': {'HTTPHeaders': {'content-length': '0',\n", " 'content-type': 'application/x-amz-json-1.1',\n", " 'date': 'Thu, 11 Mar 2021 00:32:13 GMT',\n", " 'x-amzn-requestid': 'fb77806c-7aeb-471b-8de0-190a9b592c6d'},\n", " 'HTTPStatusCode': 200,\n", " 'RequestId': 'fb77806c-7aeb-471b-8de0-190a9b592c6d',\n", " 'RetryAttempts': 0}}\n" ] } ], "source": [ "# delete the endpoint\n", "del_ep_res = sm_boto3.delete_endpoint(EndpointName=endpoint_name)\n", "\n", "pp.pprint(del_ep_res)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Notebook CI Test Results\n", "\n", "This notebook was tested in multiple regions. The test results are as follows, except for us-west-2 which is shown at the top of the notebook.\n", "\n", "![This us-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-1/sagemaker-fundamentals|create-endpoint|create_endpoint.ipynb)\n", "\n", "![This us-east-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-east-2/sagemaker-fundamentals|create-endpoint|create_endpoint.ipynb)\n", "\n", "![This us-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/us-west-1/sagemaker-fundamentals|create-endpoint|create_endpoint.ipynb)\n", "\n", "![This ca-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ca-central-1/sagemaker-fundamentals|create-endpoint|create_endpoint.ipynb)\n", "\n", "![This sa-east-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/sa-east-1/sagemaker-fundamentals|create-endpoint|create_endpoint.ipynb)\n", "\n", "![This eu-west-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-1/sagemaker-fundamentals|create-endpoint|create_endpoint.ipynb)\n", "\n", "![This eu-west-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-2/sagemaker-fundamentals|create-endpoint|create_endpoint.ipynb)\n", "\n", "![This eu-west-3 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-west-3/sagemaker-fundamentals|create-endpoint|create_endpoint.ipynb)\n", "\n", "![This eu-central-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-central-1/sagemaker-fundamentals|create-endpoint|create_endpoint.ipynb)\n", "\n", "![This eu-north-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/eu-north-1/sagemaker-fundamentals|create-endpoint|create_endpoint.ipynb)\n", "\n", "![This ap-southeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-1/sagemaker-fundamentals|create-endpoint|create_endpoint.ipynb)\n", "\n", "![This ap-southeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-southeast-2/sagemaker-fundamentals|create-endpoint|create_endpoint.ipynb)\n", "\n", "![This ap-northeast-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-1/sagemaker-fundamentals|create-endpoint|create_endpoint.ipynb)\n", "\n", "![This ap-northeast-2 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-northeast-2/sagemaker-fundamentals|create-endpoint|create_endpoint.ipynb)\n", "\n", "![This ap-south-1 badge failed to load. Check your device's internet connectivity, otherwise the service is currently unavailable](https://h75twx4l60.execute-api.us-west-2.amazonaws.com/sagemaker-nb/ap-south-1/sagemaker-fundamentals|create-endpoint|create_endpoint.ipynb)\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.2" } }, "nbformat": 4, "nbformat_minor": 4 }