sdk/python/endpoints/batch/deploy-models/huggingface-text-summarization/text-summarization-batch.ipynb

{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "# Using batch deployments for NLP processing\n", "\n", "Batch Endpoints can be used for processing tabular data, but also any other file type like text. Those deployments are supported in both MLflow and custom models. In this tutorial we will learn how to deploy a model that can perform text summarization of long sequences of text using a model from HuggingFace." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "This notebook requires:\n", "\n", "- `azure-ai-ml`\n", "- `mlflow`\n", "- `azureml-mlflow`\n", "- `numpy`\n", "- `pandas`\n", "- `huggingface`\n", "- `Torch`" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Connect to Azure Machine Learning Workspace\n", "\n", "The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.\n", "\n", "### 1.1. Import the required libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "name" ] }, "outputs": [], "source": [ "from azure.ai.ml import MLClient, Input\n", "from azure.ai.ml.entities import (\n", " BatchEndpoint,\n", " ModelBatchDeployment,\n", " ModelBatchDeploymentSettings,\n", " Model,\n", " AmlCompute,\n", " Data,\n", " BatchRetrySettings,\n", " CodeConfiguration,\n", " Environment,\n", ")\n", "from azure.ai.ml.constants import AssetTypes, BatchDeploymentOutputAction\n", "from azure.identity import DefaultAzureCredential" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "### 1.2. Configure workspace details and get a handle to the workspace\n", "\n", "To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ai.ml` to get a handle to the required Azure Machine Learning workspace. We use the default [default azure authentication](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) for this tutorial. Check the [configuration notebook](../../jobs/configuration.ipynb) for more details on how to configure credentials and connect to a workspace." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "subscription_id = \"<SUBSCRIPTION_ID>\"\n", "resource_group = \"<RESOURCE_GROUP>\"\n", "workspace = \"<AML_WORKSPACE_NAME>\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ml_client = MLClient(\n", " DefaultAzureCredential(), subscription_id, resource_group, workspace\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "If you are working in a Azure Machine Learning compute, you can simply:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ml_client = MLClient.from_config(DefaultAzureCredential())" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "## 2. Registering the model\n", "\n", "### 2.1 About the model\n", "\n", "The model we are going to work with was built using the popular library transformers from HuggingFace along with a pre-trained model from Facebook with the BART architecture. It was introduced in the paper BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation. This model has the following constrains that are important to keep in mind for deployment:\n", "\n", "- It can work with sequences up to 1024 tokens.\n", "- It is trained for summarization of text in English.\n", "- We are going to use TensorFlow as a backend." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "from transformers import pipeline\n", "\n", "summarizer = pipeline(\"summarization\", model=\"facebook/bart-large-cnn\", framework=\"pt\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "Let's save this model locally:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "model_local_path = \"model\"\n", "summarizer.save_pretrained(model_local_path)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "### 2.2 Registering the model in the workspace\n", "\n", "We need to register the model in order to use it with Azure Machine Learning:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "model_name = \"bart-text-summarization\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "if not any(filter(lambda m: m.name == model_name, ml_client.models.list())):\n", " print(f\"Model {model_name} is not registered. Creating...\")\n", " model = ml_client.models.create_or_update(\n", " Model(name=model_name, path=model_local_path, type=AssetTypes.CUSTOM_MODEL)\n", " )" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "Let's get a reference to the model:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "model = ml_client.models.get(name=model_name, label=\"latest\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 3 Create Batch Endpoint\n", "\n", "Batch endpoints are endpoints that are used batch inferencing on large volumes of data over a period of time. Batch endpoints receive pointers to data and run jobs asynchronously to process the data in parallel on compute clusters. Batch endpoints store outputs to a data store for further analysis.\n", "\n", "To create an online endpoint we will use `BatchEndpoint`. This class allows user to configure the following key aspects:\n", "- `name` - Name of the endpoint. Needs to be unique at the Azure region level\n", "- `auth_mode` - The authentication method for the endpoint. Currently only Azure Active Directory (Azure AD) token-based (`aad_token`) authentication is supported. \n", "- `description`- Description of the endpoint.\n", "\n", "### 3.1 Configure the endpoint\n", "\n", "First, let's create the endpoint that is going to host the batch deployments. Remember that each endpoint can host multiple deployments at any time, however, only one of them is the default one:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "import random\n", "import string\n", "\n", "# Creating a unique endpoint name by including a random suffix\n", "allowed_chars = string.ascii_lowercase + string.digits\n", "endpoint_suffix = \"\".join(random.choice(allowed_chars) for x in range(5))\n", "endpoint_name = \"text-summarization-\" + endpoint_suffix" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Let's configure the endpoint:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "endpoint = BatchEndpoint(\n", " name=endpoint_name,\n", " description=\"An batch service to perform text sumarization of content in CSV files\",\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2 Create the endpoint\n", "Using the `MLClient` created earlier, we will now create the Endpoint in the workspace. This command will start the endpoint creation and return a confirmation response while the endpoint creation continues." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "ml_client.batch_endpoints.begin_create_or_update(endpoint).result()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "## 4. Create a batch deployment\n", "\n", "A deployment is a set of resources required for hosting the model that does the actual inferencing. We will create a deployment for our endpoint using the `BatchDeployment` class.\n", "\n", "### 4.1 Creating an scoring script to work with the model" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "%%writefile code/batch_driver.py\n", "\n", "import os\n", "import time\n", "import torch\n", "import subprocess\n", "import mlflow\n", "from pprint import pprint\n", "from transformers import AutoTokenizer, BartForConditionalGeneration\n", "from optimum.bettertransformer import BetterTransformer\n", "from datasets import load_dataset\n", "\n", "def init():\n", " global model\n", " global tokenizer\n", " global device\n", "\n", " cuda_available = torch.cuda.is_available()\n", " device = \"cuda\" if cuda_available else \"cpu\"\n", "\n", " if cuda_available:\n", " print(f\"[INFO] CUDA version: {torch.version.cuda}\")\n", " print(f\"[INFO] ID of current CUDA device: {torch.cuda.current_device()}\")\n", " print(\"[INFO] nvidia-smi output:\")\n", " pprint(subprocess.run(['nvidia-smi'], stdout=subprocess.PIPE).stdout.decode('utf-8'))\n", " else:\n", " print(\"[WARN] CUDA acceleration is not available. This model takes hours to run on medium size data.\")\n", "\n", " # AZUREML_MODEL_DIR is an environment variable created during deployment\n", " model_path = os.path.join(os.environ[\"AZUREML_MODEL_DIR\"], \"model\")\n", "\n", " # load the tokenizer\n", " tokenizer = AutoTokenizer.from_pretrained(\n", " model_path, truncation=True, max_length=1024\n", " )\n", " \n", " # Load the model\n", " try:\n", " model = BartForConditionalGeneration.from_pretrained(model_path, device_map=\"auto\")\n", " except Exception as e: \n", " print(f\"[ERROR] Error happened when loading the model on GPU or the default device. Error: {e}\")\n", " print(\"[INFO] Trying on CPU.\")\n", " model = BartForConditionalGeneration.from_pretrained(model_path)\n", " device = \"cpu\"\n", "\n", " # Optimize the model\n", " if device != \"cpu\":\n", " try: \n", " model = BetterTransformer.transform(model, keep_original_model=False)\n", " print(\"[INFO] BetterTransformer loaded.\")\n", " except Exception as e: \n", " print(f\"[ERROR] Error when converting to BetterTransformer. An unoptimized version of the model will be used.\\n\\t> {e}\")\n", "\n", " mlflow.log_param(\"device\", device)\n", " mlflow.log_param(\"model\", type(model).__name__)\n", "\n", "def run(mini_batch):\n", " resultList = []\n", "\n", " print(f\"[INFO] Reading new mini-batch of {len(mini_batch)} file(s).\")\n", " ds = load_dataset(\"csv\", data_files={\"score\": mini_batch})\n", "\n", " start_time = time.perf_counter()\n", " for idx, text in enumerate(ds[\"score\"][\"text\"]):\n", " # perform inference\n", " inputs = tokenizer.batch_encode_plus(\n", " [text], truncation=True, padding=True, max_length=1024, return_tensors=\"pt\"\n", " )\n", " input_ids = inputs[\"input_ids\"].to(device)\n", " summary_ids = model.generate(\n", " input_ids, max_length=130, min_length=30, do_sample=False\n", " )\n", " summaries = tokenizer.batch_decode(\n", " summary_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False\n", " )\n", "\n", " # Get results:\n", " resultList.append(summaries[0])\n", " rps = idx / (time.perf_counter() - start_time + 00000.1)\n", " print(\"Rows per second:\", rps)\n", " \n", " mlflow.log_metric(\"rows_per_second\", rps)\n", " return resultList" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "> tokenizer is configured to truncate the lenght of the text as the model has a limit of 1024 tokens." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "### 4.2 Creating the compute\n", "\n", "Batch deployments can run on any Azure ML compute that already exists in the workspace. That means that multiple batch deployments can share the same compute infrastructure. In this example, we are going to work on an AzureML compute cluster called `gpu-cluster` with GPU hardware acceleration. Let's verify the compute exists on the workspace or create it otherwise." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "compute_name = \"gpu-cluster\"\n", "if not any(filter(lambda m: m.name == compute_name, ml_client.compute.list())):\n", " compute_cluster = AmlCompute(\n", " name=compute_name,\n", " description=\"GPU cluster compute\",\n", " size=\"Standard_NV6\",\n", " min_instances=0,\n", " max_instances=2,\n", " )\n", " ml_client.compute.begin_create_or_update(compute_cluster).result()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "### 4.3 Creating the environment\n", "\n", "Let's create the environment. In our case, our model runs on `TensorFlow`. Azure Machine Learning already has an environment with the required software installed, so we can reutilize this environment." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "environment = Environment(\n", " conda_file=\"environment/torch200-conda.yaml\",\n", " image=\"mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.8-cudnn8-ubuntu22.04:latest\",\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ " ### 4.4 Configuring the deployment\n", " \n", " We will create a deployment for our endpoint using the `BatchDeployment` class. This class allows user to configure the following key aspects.\n", "- `name` - Name of the deployment.\n", "- `endpoint_name` - Name of the endpoint to create the deployment under.\n", "- `model` - The model to use for the deployment. This value can be either a reference to an existing versioned model in the workspace or an inline model specification.\n", "- `environment` - The environment to use for the deployment. This value can be either a reference to an existing versioned environment in the workspace or an inline environment specification.\n", "- `code_path`- Path to the source code directory for scoring the model\n", "- `scoring_script` - Relative path to the scoring file in the source code directory\n", "- `compute` - Name of the compute target to execute the batch scoring jobs on\n", "- `instance_count`- The number of nodes to use for each batch scoring job.\t\t1\n", "- `max_concurrency_per_instance`- The maximum number of parallel scoring_script runs per instance.\n", "- `mini_batch_size`\t- The number of files the code_configuration.scoring_script can process in one `run()` call.\n", "- `retry_settings`- Retry settings for scoring each mini batch.\t\t\n", " - `max_retries`- The maximum number of retries for a failed or timed-out mini batch (default is 3)\n", " - `timeout`- The timeout in seconds for scoring a mini batch (default is 30)\n", "- `output_action`- Indicates how the output should be organized in the output file. Allowed values are `append_row` or `summary_only`. Default is `append_row`\n", "- `output_file_name`- Name of the batch scoring output file. Default is `predictions.csv`\n", "- `environment_variables`- Dictionary of environment variable name-value pairs to set for each batch scoring job.\n", "- `logging_level`- The log verbosity level.\tAllowed values are `warning`, `info`, `debug`. Default is `info`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "deployment = ModelBatchDeployment(\n", " name=\"text-summarization-hfbart\",\n", " description=\"A text summarization deployment implemented with HuggingFace and BART architecture\",\n", " endpoint_name=endpoint.name,\n", " model=model,\n", " environment=environment,\n", " code_configuration=CodeConfiguration(\n", " code=\"code\",\n", " scoring_script=\"batch_driver.py\",\n", " ),\n", " compute=compute_name,\n", " settings=ModelBatchDeploymentSettings(\n", " instance_count=2,\n", " max_concurrency_per_instance=1,\n", " mini_batch_size=1,\n", " output_action=BatchDeploymentOutputAction.APPEND_ROW,\n", " output_file_name=\"predictions.csv\",\n", " retry_settings=BatchRetrySettings(max_retries=3, timeout=3000),\n", " logging_level=\"info\",\n", " ),\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 4.5 Create the deployment\n", "Using the `MLClient` created earlier, we will now create the deployment in the workspace. This command will start the deployment creation and return a confirmation response while the deployment creation continues." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "ml_client.batch_deployments.begin_create_or_update(deployment).result()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "Let's update the default deployment name in the endpoint:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "endpoint = ml_client.batch_endpoints.get(endpoint_name)\n", "endpoint.defaults.deployment_name = deployment.name\n", "ml_client.batch_endpoints.begin_create_or_update(endpoint).result()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "We can see the endpoint URL as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "print(f\"The default deployment is {endpoint.defaults.deployment_name}\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "### 4.6 Testing the deployment\n", "\n", "Once the deployment is created, it is ready to recieve jobs.\n", "\n", "#### 4.6.1 Creating a data asset\n", "\n", "Let's first register a data asset so we can run the job against it. This data asset is a folder containing a sample of BillSum, the first dataset for summarization of US Congressional and California state bills. The BillSum dataset consists of three parts: US training bills, US test bills and California test bills. The US bills were collected from the Govinfo service provided by the United States Government Publishing Office (GPO).\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "data_path = \"data\"\n", "dataset_name = \"billsummary-small\"\n", "\n", "billsummary_data = Data(\n", " path=data_path,\n", " type=AssetTypes.URI_FOLDER,\n", " description=\"A sample of the billsum dataset for text summarization, in CSV file format\",\n", " name=dataset_name,\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "billsummary_data = ml_client.data.create_or_update(billsummary_data)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Let's wait for the data asset:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from time import sleep\n", "\n", "print(f\"Waiting for data asset {dataset_name}\", end=\"\")\n", "while not any(filter(lambda m: m.name == dataset_name, ml_client.data.list())):\n", " sleep(10)\n", " print(\".\", end=\"\")\n", "\n", "print(\" [DONE]\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Let's get a reference of the new data asset:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "billsummary_data = ml_client.data.get(name=dataset_name, label=\"latest\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "#### 4.6.2 Creating an input for the deployment" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "input = Input(type=AssetTypes.URI_FOLDER, path=billsummary_data.id)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "#### 4.6.3 Invoke the deployment\n", "\n", "Using the `MLClient` created earlier, we will get a handle to the endpoint. The endpoint can be invoked using the `invoke` command with the following parameters:\n", "- `name` - Name of the endpoint\n", "- `input_path` - Path where input data is present" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "job = ml_client.batch_endpoints.invoke(endpoint_name=endpoint.name, input=input)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "#### 4.6.4 Get the details of the invoked job\n", "\n", "Let us get details and logs of the invoked job:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "ml_client.jobs.get(job.name)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "We can wait for the job to finish using the following code:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ml_client.jobs.stream(job.name)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "### 4.7 Exploring the results\n", "\n", "#### 4.7.1 Download the results\n", "\n", "The deployment creates a child job that executes the scoring. We can get the details of it using the following code:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "scoring_job = list(ml_client.jobs.list(parent_job_name=job.name))[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(\"Job name:\", scoring_job.name)\n", "print(\"Job status:\", scoring_job.status)\n", "print(\n", " \"Job duration:\",\n", " scoring_job.creation_context.last_modified_at\n", " - scoring_job.creation_context.created_at,\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "The outputs generated by the deployment job will be placed in an output named `score`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "ml_client.jobs.download(name=scoring_job.name, download_path=\".\", output_name=\"score\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8 - AzureML", "language": "python", "name": "python38-azureml" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 0 }

sdk/python/endpoints/batch/deploy-models/huggingface-text-summarization/text-summarization-batch.ipynb (1,103 lines of code) (raw):