sdk/python/using-mlflow/deploy/mlflow_sdk_online_endpoints

{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Progressive rollout of MLflow deployments\n", "\n", "Online Endpoints have the concept of __Endpoint__ and __Deployment__. An endpoint represents the API that customers uses to consume the model, while a deployment indicates the specific implementation of that API. This distinction allows users to decouple the API from the implementation and to change the underlying implementation without affecting the consumer." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Connect to Azure Machine Learning Workspace" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Import the namespaces:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from mlflow.tracking import MlflowClient\n", "from azure.ai.ml import MLClient\n", "from azure.identity import DefaultAzureCredential\n", "\n", "import json\n", "import requests\n", "import mlflow\n", "import pandas as pd" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### If you are working in a Compute Instance in Azure Machine Learning\n", "\n", "If you are working in Azure Machine Learning Compute Instances, your MLflow installation is automatically connected to Azure Machine Learning, and you don't need to do anything." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### If you are working in your local machine or in a cloud outside Azure Machine Learning\n", "\n", "You will need to connect MLflow to the Azure Machine Learning workspace you want to work on. MLflow uses the tracking URI to indicate the MLflow server you want to connect to. There are multiple ways to get the Azure Machine Learning MLflow Tracking URI. In this tutorial we will use the Azure ML SDK for Python, but you can check [Set up tracking environment - Azure Machine Learning Docs](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-mlflow-cli-runs#set-up-tracking-environment) for more alternatives." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "subscription_id = \"<SUBSCRIPTION_ID>\"\n", "resource_group = \"<RESOURCE_GROUP>\"\n", "workspace = \"<AML_WORKSPACE_NAME>\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ml_client = MLClient(\n", " DefaultAzureCredential(), subscription_id, resource_group, workspace\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "You can use the workspace object to get the tracking URI:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "azureml_tracking_uri = ml_client.workspaces.get(\n", " ml_client.workspace_name\n", ").mlflow_tracking_uri\n", "mlflow.set_tracking_uri(azureml_tracking_uri)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Registering the model in the registry\n", "\n", "This example uses an MLflow model based on the [UCI Heart Disease Data Set](https://archive.ics.uci.edu/ml/datasets/Heart+Disease). The database contains 76 attributes, but we are using a subset of 14 of them. The model tries to predict the presence of heart disease in a patient. It is integer valued from 0 (no presence) to 1 (presence).\n", "\n", "The model has been trained using an XGBoost classifier and all the required preprocessing has been packaged as a scikit-learn pipeline, making this model an end-to-end pipeline that goes from raw data to predictions.\n", "\n", "Let's ensure the model is registered in the workspace:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_name = \"heart-classifier\"\n", "model_local_path = \"model\"" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Let's check if the model is registered:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mlflow_client = MlflowClient()\n", "model_versions = mlflow_client.search_model_versions(\n", " filter_string=f\"name = '{model_name}'\"\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "If not, let's create one:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if any(model_versions):\n", " version = model_versions[0].version\n", "else:\n", " try:\n", " mlflow_client.create_registered_model(model_name)\n", " except Exception as e:\n", " print(e)\n", " registered_model = mlflow_client.create_model_version(\n", " name=model_name, source=f\"file://{model_local_path}\"\n", " )\n", " version = registered_model.version" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(f\"We are going to deploy model {model_name} with version {version}\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Create an Online Endpoint\n", "\n", "Online endpoints are endpoints that are used for online (real-time) inferencing. Online endpoints contain deployments that are ready to receive data from clients and can send responses back in real time." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "We are going to exploit this functionality by deploying multiple versions of the same model under the same endpoint. However, the new deployment will receive 0% of the traffic at the begging. Once we are sure about the new model to work correctly, we are going to progressively move traffic from one deployment to the other." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "First, let's create an MLflow deployment client for Azure Machine Learning:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from mlflow.deployments import get_deploy_client\n", "\n", "deployment_client = get_deploy_client(mlflow.get_tracking_uri())" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 3.1 Configure the endpoint" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Let's create an endpoint explicitly now. We can configure the properties of this deployment using a configuration file as we did before. In this case, we are configuring the authentication mode of the endpoint to be \"key\". The configuration file is optional." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "endpoint_config = {\"auth_mode\": \"key\", \"identity\": {\"type\": \"system_assigned\"}}" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Let's write this configuration into a `JSON` file:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "endpoint_config_path = \"endpoint_config.json\"\n", "with open(endpoint_config_path, \"w\") as outfile:\n", " outfile.write(json.dumps(endpoint_config))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Endpoints require a name, which needs to be unique in the same region. Let's ensure to create one that doesn't exist:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import random\n", "import string\n", "\n", "# Creating a unique endpoint name by including a random suffix\n", "allowed_chars = string.ascii_lowercase + string.digits\n", "endpoint_suffix = \"\".join(random.choice(allowed_chars) for x in range(5))\n", "endpoint_name = \"heart-classifier-\" + endpoint_suffix\n", "\n", "print(f\"Endpoint name: {endpoint_name}\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2 Create an Online Endpoint\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Create the endpoint" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "endpoint = deployment_client.create_endpoint(\n", " name=endpoint_name,\n", " config={\"endpoint-config-file\": endpoint_config_path},\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 3.3 Get the scoring URI from the endpoint" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "scoring_uri = deployment_client.get_endpoint(endpoint=endpoint_name)[\"properties\"][\n", " \"scoringUri\"\n", "]\n", "print(scoring_uri)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Create deployments" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 4.1 Create a blue deployment under the endpoint" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "So far, the endpoint is empty. There are no deployments on it. Let's create the first one by deploying the same model we were working on before. We will call this deployment \"default\" and this will represent our \"blue deployment\"." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### 4.1.1 Configure the deployment" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "blue_deployment_name = \"default\"" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Configure the default deployment:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "deploy_config = {\n", " \"instance_type\": \"Standard_DS3_v2\",\n", " \"instance_count\": 1,\n", "}" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Write the configuration to a file:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "deployment_config_path = \"deployment_config.json\"\n", "with open(deployment_config_path, \"w\") as outfile:\n", " outfile.write(json.dumps(deploy_config))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Create the deployment:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "blue_deployment = deployment_client.create_deployment(\n", " name=blue_deployment_name,\n", " endpoint=endpoint_name,\n", " model_uri=f\"models:/{model_name}/{version}\",\n", " config={\"deploy-config-file\": deployment_config_path},\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### 4.1.2 Assign all the traffic to the created deployment" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "By default, new deployments receive none of the traffic from the endpoint. Let's assign all of it to the deployment:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "traffic_config = {\"traffic\": {blue_deployment_name: 100}}" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Let's write the configuration to a file:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "traffic_config_path = \"traffic_config.json\"\n", "with open(traffic_config_path, \"w\") as outfile:\n", " outfile.write(json.dumps(traffic_config))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "We are going to use the key `endpoint-config-file` to update the configuration:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "deployment_client.update_endpoint(\n", " endpoint=endpoint_name,\n", " config={\"endpoint-config-file\": traffic_config_path},\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Now all the traffic is on our blue deployment." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 4.2 Test the endpoint" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "The following code samples 5 observations from the training dataset, removes the `target` column (as the model will predict it), and creates a data frame we can use to test the deployment." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "samples = (\n", " pd.read_csv(\"data/heart.csv\")\n", " .sample(n=5)\n", " .drop(columns=[\"target\"])\n", " .reset_index(drop=True)\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### 4.2.1 Invoke the endpoint using the deployment client\n", "\n", "You can use the MLflow deployment client to invoke the endpoint and test it:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "deployment_client.predict(endpoint=endpoint_name, df=samples)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### 4.2.2 Making REST requests\n", "\n", "Online Endpoints support both key-based authentication or Azure Active Directory. In this case we are going to use key-based authentication which is based on a secret that the caller needs to include in the headers of the request. You can get this key using:\n", "\n", "- Azure ML SDK for Python\n", "- Azure ML CLI\n", "- [Azure ML studio](https://ml.azure.com)\n", "\n", "In our case, we are going to use the Azure ML SDK for Python. If you didn't create an `MLClient` before, create a client for the Azure Machine Learning workspace:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ml_client = MLClient(\n", " DefaultAzureCredential(), subscription_id, resource_group, workspace\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Let's get the secrets of the endpoint:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "endpoint_secret_key = ml_client.online_endpoints.get_keys(\n", " name=endpoint_name\n", ").access_token" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Once you have the secret key, we need to create the headers for the request:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "headers = {\n", " \"Content-Type\": \"application/json\",\n", " \"Authorization\": (\"Bearer \" + endpoint_secret_key),\n", "}" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Make a post to the endpoint. Azure Machine Learning requires the key `input_data` to be added to the input examples that you want to provide to the service. Notice that this is not the case of the command `mlflow model serve`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sample_request = {\n", " \"input_data\": json.loads(samples.to_json(orient=\"split\", index=False))\n", "}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "req = requests.post(scoring_uri, json=sample_request, headers=headers)\n", "req.json()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 4.3 Create a green deployment under the endpoint\n", "\n", "Let's imagine that there is a new version of the model created by the development team and it is ready to be in production. We can first try to fly this model and once we are confident, we can update the endpoint to route the traffic to it." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### 4.3.1 Register a new model version" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "registered_model = mlflow_client.create_model_version(\n", " name=model_name, source=f\"file://{model_local_path}\"\n", ")\n", "version = registered_model.version" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### 4.3.2 Create a new deployment under the same endpoint\n", "\n", "We will call this new deployment `xgboost-model-<version>` and this correspond to our \"green deployment\"." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "green_deployment_name = f\"xgboost-model-{version}\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "new_deployment = deployment_client.create_deployment(\n", " name=green_deployment_name,\n", " endpoint=endpoint_name,\n", " model_uri=f\"models:/{model_name}/{version}\",\n", " config={\"deploy-config-file\": deployment_config_path},\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "> We are using the same hardware confirmation indicated in the `deployment-config-file`. However, there is no requirements to have the same configuration. You can configure different hardware for different models depending on the requirements." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### 4.3.3 Test the new deployment\n", "\n", "Let's test the new deployment. By default, the endpoint is configure to do not route any request to the green deployment. However, we can bypass the router by adding an specific deployment in our request:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "deployment_client.predict(endpoint=endpoint_name, deployment_name=green_deployment_name df=samples)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "We can also use REST to invoke this specific deployment:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "headers = {\n", " \"Content-Type\": \"application/json\",\n", " \"Authorization\": (\"Bearer \" + endpoint_secret_key),\n", " \"azureml-model-deployment\": green_deployment_name,\n", "}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "req = requests.post(scoring_uri, json=sample_request, headers=headers)\n", "req.json()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Progressively update the traffic" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 5.1 Update traffic\n", "\n", "One we are confident with the new deployment, we can update the traffic to route some of it to the new deployment. Traffic is configured at the endpoint level:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "traffic_config = {\"traffic\": {blue_deployment_name: 90, green_deployment_name: 10}}" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Let's write the configuration to a file:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "traffic_config_path = \"traffic_config.json\"\n", "with open(traffic_config_path, \"w\") as outfile:\n", " outfile.write(json.dumps(traffic_config))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "We are going to use the key `endpoint-config-file` to update the configuration:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "deployment_client.update_endpoint(\n", " endpoint=endpoint_name,\n", " config={\"endpoint-config-file\": traffic_config_path},\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 5.2 Update all the traffic\n", "\n", "Let's see how we can transfer all the traffic to the new deployment" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "traffic_config = {\"traffic\": {blue_deployment_name: 0, green_deployment_name: 100}}" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "traffic_config_path = \"traffic_config.json\"\n", "with open(traffic_config_path, \"w\") as outfile:\n", " outfile.write(json.dumps(traffic_config))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "deployment_client.update_endpoint(\n", " endpoint=endpoint_name,\n", " config={\"endpoint-config-file\": traffic_config_path},\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 5.3 If you want, you can delete the old deployment now" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "deployment_client.delete_deployment(blue_deployment_name, endpoint=endpoint_name)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Notice that at this point, the former \"blue deployment\" has been deleted and the new \"green deployment\" has taken the place of the \"blue deployment\"." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Delete resources" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Once you are ready, delete the created resources:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "deployment_client.delete_endpoint(endpoint_name)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "> This operation deletes the endpoint all along with its deployments." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.10 - SDK V2", "language": "python", "name": "python310-sdkv2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.4" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "2139c70ac98f3202d028164a545621647e07f47fd6f5d8ac55cf952bf7c15ed1" } } }, "nbformat": 4, "nbformat_minor": 2 }

sdk/python/using-mlflow/deploy/mlflow_sdk_online_endpoints_progresive.ipynb (974 lines of code) (raw):