doc/deployment/deploy_hf_model_aml.ipynb (335 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Deploying Hugging Face Models into Azure ML Managed Online Endpoint\n",
"\n",
"This notebook demonstrates the process of deploying registered models in Azure ML workspace\n",
"to an AZURE ML managed online endpoint for real-time inference.\n",
"\n",
"[Learn more about Azure ML Managed Online Endpoints](https://learn.microsoft.com/en-us/azure/machine-learning/concept-endpoints-online?view=azureml-api-2)\n",
"\n",
"## Prerequisites\n",
"- An Azure account with an active subscription. [Create one for free](https://azure.microsoft.com/free/).\n",
"- An Azure ML workspace set up. [Learn how to set up a workspace](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-manage-workspace?view=azureml-api-2&tabs=python).\n",
"- Install the Azure ML client library for Python with pip.\n",
" ```bash\n",
" pip install azure-ai-ml\n",
" pip install azure-identity\n",
" ```\n",
"- Execute the `az login` command to sign in to your Azure subscription. For detailed instructions, refer to the \"Authenticate with Azure Subscription\" section in the markdown file provided [here](../setup/populating_secrets.md)\n",
"- A Hugging Face model should be present in the AZURE ML model catalog. If it is missing, execute the [notebook](./download_and_register_hf_model_aml.ipynb) to download and register the Hugging Face model in the AZURE ML registry."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load Environment Variables\n",
"\n",
"Load necessary environment variables from an `.env` file.\n",
"\n",
"For example, to download the Hugging Face model `cognitivecomputations/Wizard-Vicuna-13B-Uncensored` into your Azure environment, below are the environment variables that needs to be set in `.env` file:\n",
"\n",
"1. **AZURE_SUBSCRIPTION_ID**\n",
" - Obtain your Azure Subscription ID, essential for accessing Azure services.\n",
"\n",
"2. **AZURE_RESOURCE_GROUP**\n",
" - Identify the Resource Group where your Azure Machine Learning (AZURE ML) workspace is located.\n",
"\n",
"3. **AZURE_ML_WORKSPACE_NAME**\n",
" - Specify the name of your AZURE ML workspace where the model will be registered.\n",
"\n",
"4. **AZURE_ML_REGISTRY_NAME**\n",
" - Choose a name for registering the model in your AZURE ML workspace, such as \"HuggingFace\". This helps in identifying if the model already exists in your AZURE ML Hugging Face registry.\n",
"\n",
"5. **AZURE_ML_MODEL_NAME_TO_DEPLOY**\n",
" - If the model is listed in the AZURE ML Hugging Face model catalog, then supply the model name as shown in the following image.\n",
" <br> <img src=\"./../../assets/aml_hf_model.png\" alt=\"aml_hf_model.png\" height=\"400\"/> <br>\n",
" - If you intend to deploy the model from the AZURE ML workspace model registry, then use the model name as shown in the subsequent image.\n",
" <br> <img src=\"./../../assets/aml_ws_model.png\" alt=\"aml_ws_model.png\" height=\"400\"/> <br>\n",
"6. **AZURE_ML_MODEL_VERSION_TO_DEPLOY**\n",
" - You can find the details of the model version in the images from previous step associated with the respective model.\n",
"\n",
"7. **AZURE_ML_MODEL_DEPLOY_INSTANCE_SIZE**\n",
" - Select the size of the compute instance of for deploying the model, ensuring it's at least double the size of the model to effective inference.\n",
"\n",
"9. **AZURE_ML_MODEL_DEPLOY_INSTANCE_COUNT**\n",
" - Number of compute instances for model deployment.\n",
"\n",
"10. **AZURE_ML_MODEL_DEPLOY_REQUEST_TIMEOUT_MS**\n",
" - Set the AZURE ML inference endpoint request timeout, recommended value is 60000 (in millis).\n",
"\n",
"11. **AZURE_ML_MODEL_DEPLOY_LIVENESS_PROBE_INIT_DELAY_SECS**\n",
" - Configure the liveness probe initial delay value for the Azure ML container hosting your model. The default `initial_delay` value for the liveness probe, as established by Azure ML managed compute, is 600 seconds. Consider raising this value for the deployment of larger models.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import random\n",
"import string\n",
"\n",
"from dotenv import load_dotenv\n",
"\n",
"# Load the environment variables from the .env file\n",
"load_dotenv()\n",
"\n",
"subscription_id = os.getenv(\"AZURE_SUBSCRIPTION_ID\")\n",
"resource_group = os.getenv(\"AZURE_RESOURCE_GROUP\")\n",
"workspace_name = os.getenv(\"AZURE_ML_WORKSPACE_NAME\")\n",
"registry_name = os.getenv(\"AZURE_ML_REGISTRY_NAME\")\n",
"model_to_deploy = os.getenv(\"AZURE_ML_MODEL_NAME_TO_DEPLOY\")\n",
"model_version = os.getenv(\"AZURE_ML_MODEL_VERSION_TO_DEPLOY\")\n",
"instance_type = os.getenv(\"AZURE_ML_MODEL_DEPLOY_INSTANCE_SIZE\")\n",
"instance_count = int(os.getenv(\"AZURE_ML_MODEL_DEPLOY_INSTANCE_COUNT\"))\n",
"request_timeout_ms = int(os.getenv(\"AZURE_ML_MODEL_DEPLOY_REQUEST_TIMEOUT_MS\"))\n",
"liveness_probe_initial_delay = int(os.getenv(\"AZURE_ML_MODEL_DEPLOY_LIVENESS_PROBE_INIT_DELAY_SECS\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(f\"Subscription ID: {subscription_id}\")\n",
"print(f\"Resource group: {resource_group}\")\n",
"print(f\"Workspace name: {workspace_name}\")\n",
"print(f\"Registry name: {registry_name}\")\n",
"print(f\"Model to deploy: {model_to_deploy}\")\n",
"print(f\"Model version: {model_version}\")\n",
"print(f\"Instance type: {instance_type}\")\n",
"print(f\"Instance count: {instance_count}\")\n",
"print(f\"Request timeout in millis: {request_timeout_ms}\")\n",
"print(f\"Liveness probe initial delay in secs: {liveness_probe_initial_delay}\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"lines_to_next_cell": 0
},
"source": [
"## Configure Credentials\n",
"\n",
"Set up the `DefaultAzureCredential` for seamless authentication with Azure services. This method should handle most authentication scenarios. If you encounter issues, refer to the [Azure Identity documentation](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity?view=azure-python) for alternative credentials.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from typing import Union\n",
"\n",
"from azure.ai.ml import MLClient\n",
"from azure.core.exceptions import ResourceNotFoundError\n",
"from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential\n",
"\n",
"try:\n",
" credential: Union[DefaultAzureCredential, InteractiveBrowserCredential] = DefaultAzureCredential()\n",
" credential.get_token(\"https://management.azure.com/.default\")\n",
"except Exception as ex:\n",
" credential = InteractiveBrowserCredential()\n",
"\n",
"workspace_ml_client = MLClient(\n",
" credential, subscription_id=subscription_id, resource_group_name=resource_group, workspace_name=workspace_name\n",
")\n",
"registry_ml_client = MLClient(credential, registry_name=registry_name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def check_model_version_exists(client, model_name, version) -> bool:\n",
" \"\"\"\n",
" Checks for the existence of a specific version of a model with the given name in the client registry.\n",
"\n",
" This function lists all models with the given name in the registry using the provided client. It then checks if the specified version exists among those models.\n",
"\n",
" Args:\n",
" client: The client object used to interact with the model registry. This can be an Azure ML model catalog client or an Azure ML workspace model client.\n",
" model_name (str): The name of the model to check in the registry.\n",
" version (str): The specific version of the model to check for.\n",
"\n",
" Returns:\n",
" bool: True if the model with the specified version exists in the registry, False otherwise.\n",
" \"\"\"\n",
" model_found = False\n",
" try:\n",
" models = list(client.models.list(name=model_name))\n",
" model_found = any(model.version == version for model in models)\n",
" except ResourceNotFoundError as rnfe:\n",
" print(f\"Model not found in the registry{registry_name}, please try other registry.\")\n",
" return model_found"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Check if the Hugging Face model exists in the Azure ML workspace model registry\n",
"model = None\n",
"if check_model_version_exists(workspace_ml_client, model_to_deploy, model_version):\n",
" print(\"Model found in the Azure ML workspace model registry.\")\n",
" model = workspace_ml_client.models.get(model_to_deploy, model_version)\n",
" print(\n",
" \"\\n\\nUsing model name: {0}, version: {1}, id: {2} for inferencing\".format(model.name, model.version, model.id)\n",
" )\n",
"# Check if the Hugging Face model exists in the Azure ML model catalog registry\n",
"elif check_model_version_exists(registry_ml_client, model_to_deploy, model_version):\n",
" print(\"Model found in the Azure ML model catalog registry.\")\n",
" model = registry_ml_client.models.get(model_to_deploy, model_version)\n",
" print(\n",
" \"\\n\\nUsing model name: {0}, version: {1}, id: {2} for inferencing\".format(model.name, model.version, model.id)\n",
" )\n",
"else:\n",
" raise ValueError(\n",
" f\"Model {model_to_deploy} not found in any registry. Please run the notebook (download_and_register_hf_model_aml.ipynb) to download and register Hugging Face model to Azure ML workspace model registry.\"\n",
" )\n",
"endpoint_name = model_to_deploy + str(model_version)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def get_updated_endpoint_name(endpoint_name):\n",
" \"\"\"\n",
" Generates a unique string based on the Azure ML endpoint name.\n",
"\n",
" This function takes the first 26 characters of the given endpoint name and appends\n",
" a 5-character random alphanumeric string with hypen to ensure uniqueness.\n",
" \"\"\"\n",
" # Take the first 26 characters of the endpoint name\n",
" base_name = endpoint_name[:26]\n",
"\n",
" # Generate a 5-char random alphanumeric string and append to '-'\n",
" random_suffix = \"-\" + \"\".join(random.choices(string.ascii_letters + string.digits, k=5))\n",
"\n",
" updated_name = f\"{base_name}{random_suffix}\"\n",
"\n",
" return updated_name"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"endpoint_name = get_updated_endpoint_name(endpoint_name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(f\"Endpoint name: {endpoint_name}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Create an Azure ML managed online endpoint**\n",
"To define an endpoint, you need to specify:\n",
"\n",
"Endpoint name: The name of the endpoint. It must be unique in the Azure region. For more information on the naming rules, see managed online endpoint limits.\n",
"Authentication mode: The authentication method for the endpoint. Choose between key-based authentication and Azure Machine Learning token-based authentication. A key doesn't expire, but a token does expire."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azure.ai.ml.entities import (\n",
" ManagedOnlineDeployment,\n",
" ManagedOnlineEndpoint,\n",
" OnlineRequestSettings,\n",
" ProbeSettings,\n",
")\n",
"\n",
"# create an online endpoint\n",
"endpoint = ManagedOnlineEndpoint(\n",
" name=endpoint_name, description=f\"Online endpoint for {model_to_deploy}\", auth_mode=\"key\"\n",
")\n",
"workspace_ml_client.begin_create_or_update(endpoint).wait()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Add deployment to an Azure ML endpoint created above**\n",
"\n",
"Please be aware that deploying, particularly larger models, may take some time. Once the deployment is finished, the provisioning state will be marked as 'Succeeded', as illustrated in the image below.\n",
"<br> <img src=\"./../../assets/aml_endpoint_deployment.png\" alt=\"aml_endpoint_deployment.png\" height=\"400\"/> <br>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# create a deployment\n",
"# Create probe settings\n",
"liveness_probe = ProbeSettings(initial_delay=liveness_probe_initial_delay)\n",
"deployment = ManagedOnlineDeployment(\n",
" name=f\"{endpoint_name}\",\n",
" endpoint_name=endpoint_name,\n",
" model=model.id,\n",
" instance_type=instance_type,\n",
" instance_count=instance_count,\n",
" request_settings=OnlineRequestSettings(request_timeout_ms=request_timeout_ms),\n",
" liveness_probe=liveness_probe,\n",
")\n",
"workspace_ml_client.online_deployments.begin_create_or_update(deployment).wait()\n",
"workspace_ml_client.begin_create_or_update(endpoint).result()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
}
},
"nbformat": 4,
"nbformat_minor": 2
}