sdk/python/endpoints/online/managed/online-endpoints-safe-rollout.ipynb

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Safe rollout for managed online endpoints\n", "\n", "You've an existing model deployed in production and you want to deploy a new version of the model. How do you roll out your new machine learning model without causing any disruption? A good answer is blue-green deployment, an approach in which a new version of a web service is introduced to production by rolling out the change to a small subset of users/requests before rolling it out completely. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Requirements - In order to benefit from this tutorial, you will need:\n", "- This sample notebook assumes you're using online endpoints; for more information, see [What are Azure Machine Learning endpoints?](https://docs.microsoft.com/azure/machine-learning/concept-endpoints).\n", "- An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)\n", "- An Azure ML workspace with computer cluster - [Configure workspace](../../jobs/configuration.ipynb)\n", "- Installed Azure Machine Learning Python SDK v2 - [install instructions](../../README.md) - check the getting started section" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### In this sample, you'll learn to:\n", "\n", "1. Deploy a new online endpoint called \"blue\" that serves version 1 of the model\n", "1. Scale this deployment so that it can handle more requests\n", "1. Deploy version 2 of the model to an endpoint called \"green\" that accepts no live traffic\n", "1. Test the green deployment in isolation\n", "1. Send 10% of live traffic to the green deployment\n", "1. Fully cut-over all live traffic to the green deployment\n", "1. Delete the now-unused v1 blue deployment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1. Connect to Azure Machine Learning Workspace\n", "The [workspace](https://docs.microsoft.com/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.1 Import the required libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "import_libraries" }, "outputs": [], "source": [ "# import required libraries\n", "from azure.ai.ml import MLClient\n", "from azure.ai.ml.entities import (\n", " ManagedOnlineEndpoint,\n", " ManagedOnlineDeployment,\n", " Model,\n", " Environment,\n", " CodeConfiguration,\n", ")\n", "from azure.identity import DefaultAzureCredential" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.2 Configure workspace details and get a handle to the workspace\n", "\n", "To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ai.ml` to get a handle to the required Azure Machine Learning workspace. We use the default [default azure authentication](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) for this tutorial. Check the [configuration notebook](../../jobs/configuration.ipynb) for more details on how to configure credentials and connect to a workspace." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "workspace_details" }, "outputs": [], "source": [ "# enter details of your AML workspace\n", "subscription_id = \"<SUBSCRIPTION_ID>\"\n", "resource_group = \"<RESOURCE_GROUP>\"\n", "workspace = \"<AML_WORKSPACE_NAME>\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "workspace_handle" }, "outputs": [], "source": [ "# get a handle to the workspace\n", "ml_client = MLClient(\n", " DefaultAzureCredential(), subscription_id, resource_group, workspace\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2. Create Online Endpoint\n", "\n", "Online endpoints are endpoints that are used for online (real-time) inferencing. Online endpoints contain deployments that are ready to receive data from clients and can send responses back in real time.\n", "\n", "To create an online endpoint we will use `ManagedOnlineEndpoint`. This class allows user to configure the following key aspects:\n", "\n", "- `name` - Name of the endpoint. Needs to be unique at the Azure region level\n", "- `auth_mode` - The authentication method for the endpoint. Key-based authentication and Azure ML token-based authentication are supported. Key-based authentication doesn't expire but Azure ML token-based authentication does. Possible values are `key` or `aml_token`.\n", "- `identity`- The managed identity configuration for accessing Azure resources for endpoint provisioning and inference.\n", " - `type`- The type of managed identity. Azure Machine Learning supports `system_assigned` or `user_assigned identity`.\n", " - `user_assigned_identities` - List (array) of fully qualified resource IDs of the user-assigned identities. This property is required is `identity.type` is user_assigned.\n", "- `description`- Description of the endpoint." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.1 Configure the endpoint" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "configure_endpoint" }, "outputs": [], "source": [ "# Creating a unique endpoint name with current datetime to avoid conflicts\n", "import random\n", "\n", "online_endpoint_name = \"endpt-moe-\" + str(random.randint(0, 10000))\n", "\n", "# create an online endpoint\n", "endpoint = ManagedOnlineEndpoint(\n", " name=online_endpoint_name,\n", " description=\"this is a sample online endpoint\",\n", " auth_mode=\"key\",\n", " tags={\"foo\": \"bar\"},\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.2 Create the endpoint\n", "Using the `MLClient` created earlier, we will now create the Endpoint in the workspace. This command will start the endpoint creation and return a confirmation response while the endpoint creation continues." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "create_endpoint" }, "outputs": [], "source": [ "ml_client.online_endpoints.begin_create_or_update(endpoint).result()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Create a blue deployment\n", "\n", "A deployment is a set of resources required for hosting the model that does the actual inferencing. We will create a deployment for our endpoint using the `ManagedOnlineDeployment` class. This class allows user to configure the following key aspects.\n", "\n", "- `name` - Name of the deployment.\n", "- `endpoint_name` - Name of the endpoint to create the deployment under.\n", "- `model` - The model to use for the deployment. This value can be either a reference to an existing versioned model in the workspace or an inline model specification.\n", "- `environment` - The environment to use for the deployment. This value can be either a reference to an existing versioned environment in the workspace or an inline environment specification.\n", "- `code_configuration` - the configuration for the source code and scoring script\n", " - `path`- Path to the source code directory for scoring the model\n", " - `scoring_script` - Relative path to the scoring file in the source code directory\n", "- `instance_type` - The VM size to use for the deployment. For the list of supported sizes, see [Managed online endpoints SKU list](https://docs.microsoft.com/azure/machine-learning/reference-managed-online-endpoints-vm-sku-list).\n", "- `instance_count` - The number of instances to use for the deployment" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3.1 Configure blue deployment" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "configure_deployment" }, "outputs": [], "source": [ "# create blue deployment\n", "model = Model(path=\"../model-1/model/sklearn_regression_model.pkl\")\n", "env = Environment(\n", " conda_file=\"../model-1/environment/conda.yaml\",\n", " image=\"mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04\",\n", ")\n", "\n", "blue_deployment = ManagedOnlineDeployment(\n", " name=\"blue\",\n", " endpoint_name=online_endpoint_name,\n", " model=model,\n", " environment=env,\n", " code_configuration=CodeConfiguration(\n", " code=\"../model-1/onlinescoring\", scoring_script=\"score.py\"\n", " ),\n", " instance_type=\"Standard_DS3_v2\",\n", " instance_count=1,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3.2 Create the deployment\n", "\n", "Using the `MLClient` created earlier, we will now create the deployment in the workspace. This command will start the deployment creation and return a confirmation response while the deployment creation continues." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "create_deployment" }, "outputs": [], "source": [ "ml_client.online_deployments.begin_create_or_update(blue_deployment).result()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "deployment_traffic" }, "outputs": [], "source": [ "# blue deployment takes 100 traffic\n", "endpoint.traffic = {\"blue\": 100}\n", "ml_client.online_endpoints.begin_create_or_update(endpoint).result()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 4. Test the endpoint with sample data\n", "\n", "Using the `MLClient` created earlier, we will get a handle to the endpoint. The endpoint can be invoked using the invoke command with the following parameters:\n", "\n", "- `endpoint_name` - Name of the endpoint\n", "- `request_file` - File with request data\n", "- `deployment_name` - Name of the specific deployment to test in an endpoint\n", "\n", "We will send a sample request using a [json](./model-1/sample-request.json) file." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "test_deployment" }, "outputs": [], "source": [ "# test the blue deployment with some sample data\n", "ml_client.online_endpoints.invoke(\n", " endpoint_name=online_endpoint_name,\n", " deployment_name=\"blue\",\n", " request_file=\"../model-1/sample-request.json\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 5. Scale the deployment\n", "\n", "Using the `MLClient` created earlier, we will get a handle to the deployment. The deployment can be scaled by increasing or decreasing the `instance count`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "scale_deployment" }, "outputs": [], "source": [ "# scale the deployment\n", "blue_deployment = ml_client.online_deployments.get(\n", " name=\"blue\", endpoint_name=online_endpoint_name\n", ")\n", "blue_deployment.instance_count = 2\n", "ml_client.online_deployments.begin_create_or_update(blue_deployment).result()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 6. Get endpoint details" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "get_endpoint_details" }, "outputs": [], "source": [ "# Get the details for online endpoint\n", "endpoint = ml_client.online_endpoints.get(name=online_endpoint_name)\n", "\n", "# existing traffic details\n", "print(endpoint.traffic)\n", "\n", "# Get the scoring URI\n", "print(endpoint.scoring_uri)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 7. Deploy a new model, but send no traffic yet\n", "Create a new deployment named green" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "configure_new_deployment" }, "outputs": [], "source": [ "# create green deployment\n", "model2 = Model(path=\"../model-2/model/sklearn_regression_model.pkl\")\n", "env2 = Environment(\n", " conda_file=\"../model-2/environment/conda.yaml\",\n", " image=\"mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04\",\n", ")\n", "\n", "green_deployment = ManagedOnlineDeployment(\n", " name=\"green\",\n", " endpoint_name=online_endpoint_name,\n", " model=model2,\n", " environment=env2,\n", " code_configuration=CodeConfiguration(\n", " code=\"../model-2/onlinescoring\", scoring_script=\"score.py\"\n", " ),\n", " instance_type=\"Standard_DS3_v2\",\n", " instance_count=1,\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "create_new_deployment" }, "outputs": [], "source": [ "# use MLClient to create green deployment\n", "ml_client.online_deployments.begin_create_or_update(green_deployment).result()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7.1 Test the new deployment \n", "Though green has 0% of traffic allocated, you can still invoke the endpoint and deployment with [json](./model-2/sample-request.json) file." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "test_new_deployment" }, "outputs": [], "source": [ "ml_client.online_endpoints.invoke(\n", " endpoint_name=online_endpoint_name,\n", " deployment_name=\"green\",\n", " request_file=\"../model-2/sample-request.json\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 8. Test the deployment with mirrored traffic\n", "Once you've tested your `green` deployment, you can copy (or 'mirror') a percentage of the live traffic to it. Mirroring traffic doesn't change results returned to clients. Requests still flow 100% to the blue deployment. The mirrored percentage of the traffic is copied and submitted to the `green` deployment so you can gather metrics and logging without impacting your clients. Mirroring is useful when you want to validate a new deployment without impacting clients. For example, to check if latency is within acceptable bounds and that there are no HTTP errors." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following command mirrors 10% of the traffic to the `green` deployment:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "new_deployment_traffic" }, "outputs": [], "source": [ "endpoint.mirror_traffic = {\"green\": 10}\n", "ml_client.begin_create_or_update(endpoint).result()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "several_tests_to_mirror_traffic" }, "outputs": [], "source": [ "# You can test mirror traffic by invoking the endpoint several times\n", "for i in range(20):\n", " ml_client.online_endpoints.invoke(\n", " endpoint_name=online_endpoint_name,\n", " request_file=\"../model-1/sample-request.json\",\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After testing, you can set the mirror traffic to zero to disable mirroring:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "disable_traffic_mirroring" }, "outputs": [], "source": [ "endpoint.mirror_traffic = {\"green\": 0}\n", "ml_client.begin_create_or_update(endpoint).result()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 9. Test the new deployment with a small percentage of live traffic\n", "Once you've tested your `green` deployment, allocate a small percentage of traffic to it:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "allocate_some_traffic" }, "outputs": [], "source": [ "endpoint.traffic = {\"blue\": 90, \"green\": 10}\n", "ml_client.begin_create_or_update(endpoint).result()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, your green deployment will receive 10% of requests." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 10. Send all traffic to your new deployment\n", "Once you're satisfied that your green deployment is fully satisfactory, switch all traffic to it." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "allocate_all_traffic" }, "outputs": [], "source": [ "endpoint.traffic = {\"blue\": 0, \"green\": 100}\n", "ml_client.begin_create_or_update(endpoint).result()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 11. Remove the old deployment\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "remove_old_deployment" }, "outputs": [], "source": [ "ml_client.online_deployments.begin_delete(\n", " name=\"blue\", endpoint_name=online_endpoint_name\n", ").wait()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 12. Delete endpoint" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "delete_endpoint" }, "outputs": [], "source": [ "ml_client.online_endpoints.begin_delete(name=online_endpoint_name)" ] } ], "metadata": { "description": { "description": "Safely rollout a new version of a web service to production by rolling out the change to a small subset of users/requests before rolling it out completely" }, "kernelspec": { "display_name": "Python 3.10 - SDK V2", "language": "python", "name": "python310-sdkv2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.4 (main, Mar 31 2022, 08:41:55) [GCC 7.5.0]" }, "vscode": { "interpreter": { "hash": "c54d4b4f21f908d21f1064b6d031502c08620e465e849bef5aa76d1f6a474870" } } }, "nbformat": 4, "nbformat_minor": 2 }

sdk/python/endpoints/online/managed/online-endpoints-safe-rollout.ipynb (592 lines of code) (raw):