sdk/python/jobs/automl-standalone-jobs/automl-image-object-detection-task-fridge-items-batch-scoring/image-object-detection-batch-scoring-non-mlflow-model.ipynb (1,353 lines of code) (raw):

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Batch Predictions for an Object Detection model trained using AutoML.\n", "\n", "**Requirements** - In order to benefit from this tutorial, you will need:\n", "- A basic understanding of Machine Learning\n", "- An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F)\n", "- An Azure ML workspace. [Check this notebook for creating a workspace](../../../resources/workspace/workspace.ipynb) \n", "- A Compute Cluster. [Check this notebook to create a compute cluster](../../../resources/compute/compute.ipynb)\n", "- A python environment\n", "- Installed Azure Machine Learning Python SDK v2 - [install instructions](../../../README.md) - check the getting started section\n", "\n", "**Learning Objectives** - By the end of this tutorial, you should be able to:\n", "- Connect to your AML workspace from the Python SDK\n", "- Create an `AutoML Object Detection Training Job` with the 'image_object_detection()' factory-function.\n", "- Train the model using AmlCompute by submitting/running the AutoML training job\n", "- Obtain the best model, register it and deploy it to a batch endpoint\n", "- Generate batch predictions using the batch endpoint\n", "\n", "**Please note**: For this notebook you can use an existing image object detection model trained using AutoML for Images or use the simple model training we included below for convenience. For detailed instructions on how to train an image object detection model with AutoML, please refer to the official [documentation](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-auto-train-image-models?tabs=CLI-v2) and to the image object detection [notebook](../automl-image-object-detection-task-fridge-items\\automl-image-object-detection-task-fridge-items.ipynb).\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1. Connect to Azure Machine Learning Workspace\n", "\n", "The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.\n", "\n", "## 1.1. Import the required libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Import required libraries\n", "from azure.identity import DefaultAzureCredential\n", "from azure.ai.ml import MLClient\n", "\n", "from azure.ai.ml.constants import AssetTypes, InputOutputModes\n", "from azure.ai.ml import Input\n", "from azure.ai.ml.automl import SearchSpace\n", "from azure.ai.ml.sweep import (\n", " Choice,\n", " Uniform,\n", " BanditPolicy,\n", ")\n", "\n", "from azure.ai.ml import automl" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.2. Configure workspace details and get a handle to the workspace\n", "\n", "To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ai.ml` to get a handle to the required Azure Machine Learning workspace. We use the [default azure authentication](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) for this tutorial. Check the [configuration notebook](../../configuration.ipynb) for more details on how to configure credentials and connect to a workspace." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "validation-workspace" ] }, "outputs": [], "source": [ "credential = DefaultAzureCredential()\n", "ml_client = None\n", "try:\n", " ml_client = MLClient.from_config(credential)\n", "except Exception as ex:\n", " print(ex)\n", " # Enter details of your AML workspace\n", " subscription_id = \"<SUBSCRIPTION_ID>\"\n", " resource_group = \"<RESOURCE_GROUP>\"\n", " workspace = \"<AML_WORKSPACE_NAME>\"\n", " ml_client = MLClient(credential, subscription_id, resource_group, workspace)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2. MLTable with input Training Data\n", "\n", "In order to generate models for computer vision tasks with automated machine learning, you need to bring labeled image data as input for model training in the form of an MLTable. You can create an MLTable from labeled training data in JSONL format. If your labeled training data is in a different format (like, pascal VOC or COCO), you can use a conversion script to first convert it to JSONL, and then create an MLTable. Alternatively, you can use Azure Machine Learning's [data labeling tool](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-image-labeling-projects) to manually label images, and export the labeled data to use for training your AutoML model.\n", "\n", "In this notebook, we use a toy dataset called Fridge Objects, which consists of 128 images of 4 labels of beverage container {`can`, `carton`, `milk bottle`, `water bottle`} photos taken on different backgrounds.\n", "\n", "All images in this notebook are hosted in [this repository](https://github.com/microsoft/computervision-recipes) and are made available under the [MIT license](https://github.com/microsoft/computervision-recipes/blob/master/LICENSE)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.1. Download the Data\n", "We first download and unzip the data locally. By default, the data would be downloaded in `./data` folder in current directory. \n", "If you prefer to download the data at a different location, update it in `dataset_parent_dir = ...` in the next cell." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import urllib\n", "from zipfile import ZipFile\n", "\n", "# Change to a different location if you prefer\n", "dataset_parent_dir = \"./data\"\n", "\n", "# create data folder if it doesnt exist.\n", "os.makedirs(dataset_parent_dir, exist_ok=True)\n", "\n", "# download data\n", "download_url = \"https://automlsamplenotebookdata-adcuc7f7bqhhh8a4.b02.azurefd.net/image-object-detection/odFridgeObjects.zip\"\n", "\n", "# Extract current dataset name from dataset url\n", "dataset_name = os.path.split(download_url)[-1].split(\".\")[0]\n", "# Get dataset path for later use\n", "dataset_dir = os.path.join(dataset_parent_dir, dataset_name)\n", "\n", "# Get the data zip file path\n", "data_file = os.path.join(dataset_parent_dir, f\"{dataset_name}.zip\")\n", "\n", "# Download the dataset\n", "urllib.request.urlretrieve(download_url, filename=data_file)\n", "\n", "# extract files\n", "with ZipFile(data_file, \"r\") as zip:\n", " print(\"extracting files...\")\n", " zip.extractall(path=dataset_parent_dir)\n", " print(\"done\")\n", "# delete zip file\n", "os.remove(data_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a sample image from this dataset:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from IPython.display import Image\n", "\n", "sample_image = os.path.join(dataset_dir, \"images\", \"31.jpg\")\n", "Image(filename=sample_image)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.2. Upload the images to Datastore through an AML Data asset (URI Folder)\n", "\n", "In order to use the data for training in Azure ML, we upload it to our default Azure Blob Storage of our Azure ML Workspace.\n", "\n", "[Check this notebook for AML data asset example](https://github.com/Azure/azureml-examples/blob/b674293eaec6c60cb26cb3e5672c100c41ab4769/sdk/assets/data/data.ipynb)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Uploading image files by creating a 'data asset URI FOLDER':\n", "\n", "from azure.ai.ml.entities import Data\n", "from azure.ai.ml.constants import AssetTypes, InputOutputModes\n", "from azure.ai.ml import Input\n", "\n", "my_data = Data(\n", " path=dataset_dir,\n", " type=AssetTypes.URI_FOLDER,\n", " description=\"Fridge-items images Object detection\",\n", " name=\"fridge-items-images-object-detection\",\n", ")\n", "\n", "uri_folder_data_asset = ml_client.data.create_or_update(my_data)\n", "\n", "print(uri_folder_data_asset)\n", "print(\"\")\n", "print(\"Path to folder in Blob Storage:\")\n", "print(uri_folder_data_asset.path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.3. Convert the downloaded data to JSONL\n", "\n", "In this example, the fridge object dataset is annotated in Pascal VOC format, where each image corresponds to an xml file. Each xml file contains information on where its corresponding image file is located and also contains information about the bounding boxes and the object labels. \n", "\n", "For documentation on preparing the datasets beyond this notebook, please refer to the [documentation on how to prepare datasets](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-prepare-datasets-for-automl-images).\n", "\n", "\n", "In order to use this data to create an AzureML MLTable, we first need to convert it to the required JSONL format. The following script is creating two `.jsonl` files (one for training and one for validation) in the corresponding MLTable folder. The train / validation ratio corresponds to 20% of the data going into the validation file. For further details on jsonl file used for image classification task in automated ml, please refer to the [data schema documentation for image object-detection task](https://learn.microsoft.com/en-us/azure/machine-learning/reference-automl-images-schema#object-detection)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## First generate JSONL files" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The JSONL Conversion helpers require pycocotools and simplification packages" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install pycocotools\n", "!pip install simplification==0.7.12\n", "!pip install scikit-image\n", "!pip install pandas==2.2.1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import sys\n", "\n", "sys.path.insert(0, \"../jsonl-conversion/\")\n", "from base_jsonl_converter import write_json_lines\n", "from voc_jsonl_converter import VOCJSONLConverter\n", "\n", "base_url = os.path.join(uri_folder_data_asset.path, \"images/\")\n", "converter = VOCJSONLConverter(base_url, os.path.join(dataset_dir, \"annotations\"))\n", "jsonl_annotations = os.path.join(dataset_dir, \"annotations_voc.jsonl\")\n", "write_json_lines(converter, jsonl_annotations)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Then split into train and validation" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "# We'll copy each JSONL file within its related MLTable folder\n", "training_mltable_path = os.path.join(dataset_parent_dir, \"training-mltable-folder\")\n", "validation_mltable_path = os.path.join(dataset_parent_dir, \"validation-mltable-folder\")\n", "\n", "# First, let's create the folders if they don't exist\n", "os.makedirs(training_mltable_path, exist_ok=True)\n", "os.makedirs(validation_mltable_path, exist_ok=True)\n", "\n", "train_validation_ratio = 5\n", "\n", "# Path to the training and validation files\n", "train_annotations_file = os.path.join(training_mltable_path, \"train_annotations.jsonl\")\n", "validation_annotations_file = os.path.join(\n", " validation_mltable_path, \"validation_annotations.jsonl\"\n", ")\n", "\n", "with open(jsonl_annotations, \"r\") as annot_f:\n", " json_lines = annot_f.readlines()\n", "\n", "index = 0\n", "with open(train_annotations_file, \"w\") as train_f:\n", " with open(validation_annotations_file, \"w\") as validation_f:\n", " for json_line in json_lines:\n", " if index % train_validation_ratio == 0:\n", " # validation annotation\n", " validation_f.write(json_line)\n", " else:\n", " # train annotation\n", " train_f.write(json_line)\n", " index += 1" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "validation-remove" ] }, "source": [ "## 2.4. Convert annotation file from COCO to JSONL\n", "If you want to try with a dataset in COCO format, the scripts below shows how to convert it to `jsonl` format. The file \"odFridgeObjects_coco.json\" consists of annotation information for the `odFridgeObjects` dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "validation-remove" ] }, "outputs": [], "source": [ "import sys\n", "\n", "sys.path.insert(0, \"../jsonl-conversion/\")\n", "from base_jsonl_converter import write_json_lines\n", "from coco_jsonl_converter import COCOJSONLConverter\n", "\n", "base_url = os.path.join(uri_folder_data_asset.path, \"images/\")\n", "print(base_url)\n", "converter = COCOJSONLConverter(base_url, \"./odFridgeObjects_coco.json\")\n", "jsonl_annotations = os.path.join(dataset_dir, \"annotations_coco.jsonl\")\n", "write_json_lines(converter, jsonl_annotations)" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "validation-remove" ] }, "source": [ "### Visualize bounding boxes\n", "Please refer to the \"Visualize data\" section in the following [tutorial](https://docs.microsoft.com/en-us/azure/machine-learning/tutorial-auto-train-image-models#visualize-data) to see how to easily visualize your ground truth bounding boxes before starting to train." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.5. Create MLTable data input\n", "\n", "Create MLTable data input using the jsonl files created above.\n", "\n", "For documentation on creating your own MLTable assets for jobs beyond this notebook, please refer to below resources\n", "- [MLTable YAML Schema](https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-mltable) - covers how to write MLTable YAML, which is required for each MLTable asset.\n", "- [Create MLTable data asset](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-data-assets?tabs=Python-SDK#create-a-mltable-data-asset) - covers how to create MLTable data asset. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def create_ml_table_file(filename):\n", " \"\"\"Create ML Table definition\"\"\"\n", "\n", " return (\n", " \"paths:\\n\"\n", " \" - file: ./{0}\\n\"\n", " \"transformations:\\n\"\n", " \" - read_json_lines:\\n\"\n", " \" encoding: utf8\\n\"\n", " \" invalid_lines: error\\n\"\n", " \" include_path_column: false\\n\"\n", " \" - convert_column_types:\\n\"\n", " \" - columns: image_url\\n\"\n", " \" column_type: stream_info\"\n", " ).format(filename)\n", "\n", "\n", "def save_ml_table_file(output_path, mltable_file_contents):\n", " with open(os.path.join(output_path, \"MLTable\"), \"w\") as f:\n", " f.write(mltable_file_contents)\n", "\n", "\n", "# Create and save train mltable\n", "train_mltable_file_contents = create_ml_table_file(\n", " os.path.basename(train_annotations_file)\n", ")\n", "save_ml_table_file(training_mltable_path, train_mltable_file_contents)\n", "\n", "# Save train and validation mltable\n", "validation_mltable_file_contents = create_ml_table_file(\n", " os.path.basename(validation_annotations_file)\n", ")\n", "save_ml_table_file(validation_mltable_path, validation_mltable_file_contents)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Training MLTable defined locally, with local data to be uploaded\n", "my_training_data_input = Input(type=AssetTypes.MLTABLE, path=training_mltable_path)\n", "\n", "# Validation MLTable defined locally, with local data to be uploaded\n", "my_validation_data_input = Input(type=AssetTypes.MLTABLE, path=validation_mltable_path)\n", "\n", "# WITH REMOTE PATH: If available already in the cloud/workspace-blob-store\n", "# my_training_data_input = Input(type=AssetTypes.MLTABLE, path=\"azureml://datastores/workspaceblobstore/paths/vision-classification/train\")\n", "# my_validation_data_input = Input(type=AssetTypes.MLTABLE, path=\"azureml://datastores/workspaceblobstore/paths/vision-classification/valid\")" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "validation-remove" ] }, "source": [ "To create data input from TabularDataset created using V1 sdk, specify the `type` as `AssetTypes.MLTABLE`, `mode` as `InputOutputModes.DIRECT` and `path` in the following format `azureml:<tabulardataset_name>:<version>`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "validation-remove" ] }, "outputs": [], "source": [ "\"\"\"\n", "# Training MLTable with v1 TabularDataset\n", "my_training_data_input = Input(\n", " type=AssetTypes.MLTABLE, path=\"azureml:odFridgeObjectsTrainingDataset:1\",\n", " mode=InputOutputModes.DIRECT\n", ")\n", "\n", "# Validation MLTable with v1 TabularDataset\n", "my_validation_data_input = Input(\n", " type=AssetTypes.MLTABLE, path=\"azureml:odFridgeObjectsValidationDataset:1\",\n", " mode=InputOutputModes.DIRECT\n", ")\n", "\"\"\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 3. Compute target setup\n", "\n", "We will need to provide a [Compute Target](https://docs.microsoft.com/en-us/azure/machine-learning/concept-azure-machine-learning-architecture#computes) that will be used for your AutoML model training. AutoML models for image tasks require [GPU SKUs](https://docs.microsoft.com/en-us/azure/virtual-machines/sizes-gpu) such as the ones from the NC, NCv2, NCv3, ND, NDv2 and NCasT4 series. We recommend using the NCsv3-series (with v100 GPUs) for faster training. Using a compute target with a multi-GPU VM SKU will leverage the multiple GPUs to speed up training. Additionally, setting up a compute target with multiple nodes will allow for faster model training by leveraging parallelism, when tuning hyperparameters for your model." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "validation-compute" ] }, "outputs": [], "source": [ "from azure.ai.ml.entities import AmlCompute\n", "from azure.core.exceptions import ResourceNotFoundError\n", "\n", "compute_name = \"gpu-cluster-nc6s\"\n", "\n", "try:\n", " _ = ml_client.compute.get(compute_name)\n", " print(\"Found existing compute target.\")\n", "except ResourceNotFoundError:\n", " print(\"Creating a new compute target...\")\n", " compute_config = AmlCompute(\n", " name=compute_name,\n", " type=\"amlcompute\",\n", " size=\"Standard_NC6s_v3\",\n", " idle_time_before_scale_down=120,\n", " min_instances=0,\n", " max_instances=4,\n", " )\n", " ml_client.begin_create_or_update(compute_config).result()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 4. Configure and run the AutoML for Images Object Detection training job\n", "\n", "AutoML allows you to easily train models for Image Classification, Object Detection & Instance Segmentation on your image data. You can control the model algorithm to be used, specify hyperparameter values for your model as well as perform a sweep across the hyperparameter space to generate an optimal model.\n", "\n", "When using AutoML for image tasks, you need to specify the model algorithms using the model_name parameter. You can either specify a single model or choose to sweep over multiple models. Please refer to the [documentation](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-auto-train-image-models?tabs=CLI-v2#configure-model-algorithms-and-hyperparameters) for the list of supported model algorithms.\n", "\n", "## 4.1. Using default hyperparameter values for the specified algorithm\n", "Before doing a large sweep to search for the optimal models and hyperparameters, we recommend trying the default values for a given model to get a first baseline. Next, you can explore multiple hyperparameters for the same model before sweeping over multiple models and their parameters. This allows an iterative approach, as with multiple models and multiple hyperparameters for each (as we showcase in the next section), the search space grows exponentially, and you need more iterations to find optimal configurations.\n", "\n", "Following functions are used to configure the AutoML image job:\n", "\n", "### image_object_detection() function parameters:\n", "The `image_object_detection()` factory function allows user to configure the training job.\n", "\n", "- `compute` - The compute on which the AutoML job will run. In this example we are using a compute called 'gpu-cluster' present in the workspace. You can replace it any other compute in the workspace.\n", "- `experiment_name` - The name of the experiment. An experiment is like a folder with multiple runs in Azure ML Workspace that should be related to the same logical machine learning experiment.\n", "- `name` - The name of the Job/Run. This is an optional property. If not specified, a random name will be generated.\n", "- `primary_metric` - The metric that AutoML will optimize for model selection.\n", "- `target_column_name` - The name of the column to target for predictions. It must always be specified. This parameter is applicable to 'training_data' and 'validation_data'.\n", "- `training_data` - The data to be used for training. It should contain both training feature columns and a target column. Optionally, this data can be split for segregating a validation or test dataset. \n", "You can use a registered MLTable in the workspace using the format '<mltable_name>:<version>' OR you can use a local file or folder as a MLTable. For e.g Input(mltable='my_mltable:1') OR Input(mltable=MLTable(local_path=\"./data\"))\n", "The parameter 'training_data' must always be provided.\n", "\n", "### set_limits() parameters:\n", "This is an optional configuration method to configure limits parameters such as timeouts. \n", " \n", "- `timeout_minutes` - Maximum amount of time in minutes that the whole AutoML job can take before the job terminates. If not specified, the default job's total timeout is 6 days (8,640 minutes).\n", "- `max_trials` - Parameter for maximum number of configurations to sweep. Must be an integer between 1 and 1000. When exploring just the default hyperparameters for a given model algorithm, set this parameter to 1. Default value is 1.\n", "- `max_concurrent_trials` - Maximum number of runs that can run concurrently. If not specified, all runs launch in parallel. If specified, must be an integer between 1 and 100. Default value is 1.\n", " NOTE: The number of concurrent runs is gated on the resources available in the specified compute target. Ensure that the compute target has the available resources for the desired concurrency.\n", "\n", "### set_training_parameters() function parameters:\n", "This is an optional configuration method to configure fixed settings or parameters that don't change during the parameter space sweep. Some of the key parameters of this function are:\n", "\n", "- `model_name` - The name of the ML algorithm that we want to use in training job. Please refer to this [documentation](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-auto-train-image-models?tabs=CLI-v2#supported-model-algorithms) for supported model algorithm.\n", "- `number_of_epochs` - The number of training epochs. It must be positive integer (default value is 15).\n", "- `layers_to_freeze` - The number of layers to freeze in model for transfer learning. It must be a positive integer (default value is 0).\n", "- `early_stopping` - It enable early stopping logic during training, It must be boolean value (default is True). \n", "- `optimizer` - Type of optimizer to use in training. It must be either sgd, adam, adamw (default is sgd).\n", "- `distributed` - It enable distributed training if compute target contain multiple GPUs. It must be boolean value (default is True).\n", "\n", "If you wish to use the default hyperparameter values for a given algorithm (say `yolov5`), you can specify the job for your AutoML Image runs as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# set up experiment name\n", "exp_name = \"dpv2-image-object-detection-experiment\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "validation-scenario" ] }, "outputs": [], "source": [ "# Create the AutoML job with the related factory-function.\n", "\n", "image_object_detection_job = automl.image_object_detection(\n", " compute=compute_name,\n", " experiment_name=exp_name,\n", " training_data=my_training_data_input,\n", " validation_data=my_validation_data_input,\n", " target_column_name=\"label\",\n", ")\n", "\n", "# Set limits\n", "image_object_detection_job.set_limits(timeout_minutes=60)\n", "\n", "# Pass the fixed settings or parameters\n", "image_object_detection_job.set_training_parameters(model_name=\"yolov5\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Submitting an AutoML job for Computer Vision tasks\n", "Once you've configured your job, you can submit it as a job in the workspace in order to train a vision model using your training dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Submit the AutoML job\n", "returned_job = ml_client.jobs.create_or_update(image_object_detection_job)\n", "\n", "print(f\"Created job: {returned_job}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ml_client.jobs.stream(returned_job.name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 5. Retrieve the Best Trial (Best Model's trial/run)\n", "Use the MLFLowClient to access the results (such as Models, Artifacts, Metrics) of a previously completed AutoML Trial." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Initialize MLFlow Client\n", "\n", "The models and artifacts that are produced by AutoML can be accessed via the MLFlow interface.\n", "Initialize the MLFlow client here, and set the backend as Azure ML, via. the MLFlow Client.\n", "\n", "IMPORTANT, you need to have installed the latest MLFlow packages with:\n", "\n", " pip install azureml-mlflow\n", "\n", " pip install mlflow" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.1 Obtain the tracking URI for MLFlow" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import mlflow\n", "\n", "# Obtain the tracking URL from MLClient\n", "MLFLOW_TRACKING_URI = ml_client.workspaces.get(\n", " name=ml_client.workspace_name\n", ").mlflow_tracking_uri\n", "\n", "print(MLFLOW_TRACKING_URI)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Set the MLFLOW TRACKING URI\n", "\n", "mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)\n", "\n", "print(f\"\\nCurrent tracking uri: {mlflow.get_tracking_uri()}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from mlflow.tracking.client import MlflowClient\n", "\n", "# Initialize MLFlow client\n", "mlflow_client = MlflowClient()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.2 Get the AutoML parent Job and find the best run" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "job_name = returned_job.name\n", "\n", "# Get the parent run\n", "mlflow_parent_run = mlflow_client.get_run(job_name)\n", "\n", "# Print parent run tags. 'automl_best_child_run_id' tag should be there.\n", "print(mlflow_parent_run.data.tags)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Get the best model's child run\n", "best_child_run_id = mlflow_parent_run.data.tags[\"automl_best_child_run_id\"]\n", "print(\"Found best child run id: \", best_child_run_id)\n", "\n", "best_run = mlflow_client.get_run(best_child_run_id)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5.3 Download the best model locally\n", "\n", "Access the results (such as Models, Artifacts, Metrics) of a previously completed AutoML Run and download them locally. We will need these artifacts when deploying the model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "# Create local folder\n", "local_dir = \"./artifact_downloads\"\n", "if not os.path.exists(local_dir):\n", " os.mkdir(local_dir)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Download run's artifacts/outputs\n", "local_path = mlflow_client.download_artifacts(\n", " best_run.info.run_id, \"outputs\", local_dir\n", ")\n", "print(f\"Artifacts downloaded in: {local_path}\")\n", "print(f\"Artifacts: {os.listdir(local_path)}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import yaml\n", "\n", "\n", "def update_recordclass_version(file_path, old_version, new_version):\n", " with open(file_path, \"r\") as file:\n", " data = yaml.safe_load(file)\n", "\n", " if \"dependencies\" in data:\n", " dependencies = data[\"dependencies\"]\n", " for index, dependency in enumerate(dependencies):\n", " if (\n", " isinstance(dependency, str)\n", " and dependency == f\"recordclass={old_version}\"\n", " ):\n", " dependencies[index] = f\"recordclass={new_version}\"\n", "\n", " with open(file_path, \"w\") as file:\n", " yaml.dump(data, file)\n", "\n", "\n", "# Example usage\n", "file_name = \"conda_env_v_1_0_0.yml\"\n", "file_path = os.path.join(local_path, file_name)\n", "old_version = \"0.15.1\"\n", "new_version = \"0.21.1\"\n", "\n", "update_recordclass_version(file_path, old_version, new_version)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "mlflow_model_dir = os.path.join(local_dir, \"outputs\", \"mlflow-model\")\n", "\n", "# Show the contents of the MLFlow model folder\n", "os.listdir(mlflow_model_dir)\n", "\n", "# You should see a list of files such as the following:\n", "# ['artifacts', 'conda.yaml', 'MLmodel', 'python_env.yaml', 'python_model.pkl', 'requirements.txt']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 6. Deploy the non-MLFlow model with batch endpoints and run batch scoring\n", "\n", "We will now deploy the non-MLFlow model to batch endpoint. Batch endpoint simply means, a REST endpoint which is capable of handing inputs in batch format.\n", "\n", "To create a batch deployment, you need all the following items:\n", "\n", "- **Model files**, or a registered model in your workspace referenced using `azureml:<model-name>:<model-version>`. In this notebook we will take the model file from the best_child_run.\n", "- **The code to score the model**. \n", "- **The environment** in which the model runs. It can be a Docker image with Conda dependencies, or an environment already registered in your workspace referenced using `azureml:<environment-name>:<environment-version>`. In this notebook we will create a environment object using the environment definition downloaded from the best_child_run artifacts.\n", "- **The pre-created compute** referenced using `azureml:<compute-name>`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6.1 Regsiter The Model in the workspace\n", "For deploying the model, we first need to register it with the existing workspace so that we can discover it during runtime." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from azure.ai.ml.entities import Model\n", "\n", "tags = dict()\n", "tags[\"mean_average_precision\"] = best_run.data.metrics[\"mean_average_precision\"]\n", "\n", "model_name = \"od-fridge-items-model\"\n", "model = Model(\n", " path=f\"azureml://jobs/{best_run.info.run_id}/outputs/artifacts/outputs/model.pt\",\n", " name=model_name,\n", " description=\"fridge object Detection model\",\n", " # tags=tags,\n", ")\n", "\n", "# for downloaded file\n", "# model = Model(path=\"artifact_downloads/outputs/model.pt\", name=model_name)\n", "\n", "registered_model = ml_client.models.create_or_update(model)\n", "registered_model.id" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# recordclass=0.15.1 is not available in conda\n", "\n", "conda_file_name = \"artifact_downloads/outputs/conda_env_v_1_0_0.yml\"\n", "\n", "with open(conda_file_name, \"r\") as f:\n", " conda_file_lines = f.readlines()\n", "\n", "with open(conda_file_name, \"w\") as f:\n", " for line in conda_file_lines:\n", " if not \"recordclass\" in line:\n", " f.write(line)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6.2 Configure environment\n", "It is recommended to use the same envionment for model deployment as the model training, therefore we are creating the Environment object using the conda environment file downloaded as artifacts. We also need to specify the base image, which in case of vision tasks, is `mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.6-cudnn8-ubuntu20.04`\n", "\n", "To read more about environments, please follow this [notebook](./../../../assets/environment/environment.ipynb)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from azure.ai.ml.entities import Environment\n", "\n", "env = Environment(\n", " name=\"automl-images-env\",\n", " description=\"environment for automl images inference\",\n", " image=\"mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.6-cudnn8-ubuntu20.04\",\n", " conda_file=\"artifact_downloads/outputs/conda_env_v_1_0_0.yml\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6.3 Get A Scoring Script\n", "\n", "To do the scoring, you need to create a batch scoring script batch_scoring.py, and write it to the scripts folder in current directory. The script takes a minibatch of input images, applies the object detection model, and outputs the predictions to a results file.\n", "\n", "While creating the batch scoring script, refer to the scoring scripts generated under the outputs folder of the Automl training runs. This will help to identify the right model settings to be used in the batch scoring script init method while loading the model. Note: The batch scoring script we generate in the subsequent step is different from the scoring script generated by the training runs in the below screenshot. We refer to it just to identify the right model settings to be used in the batch scoring script.\n", "\n", "![scoring-script.png](./ui_outputs_updated.jpg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Understanding the scoring script\n", "\n", "The scoring_script must contain two functions:\n", "\n", "- init(): Use this function to load the model into a global object. This function will be called once at the beginning of the process.\n", "- run(mini_batch): This function will be called for each mini_batch and do the actual scoring.\n", " + `mini_batch`: The mini_batch value is a list of file paths.\n", " + `result`: The run() method should return a pandas DataFrame or an array. Each returned output element indicates one successful run of an input element in the input mini_batch.\n", " \n", "**Note** The scoring script used in this notebook is shown below" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# View the batch scoring script. Use the model settings as appropriate for your model.\n", "with open(\"./scripts/batch_scoring.py\", \"r\") as f:\n", " print(f.read())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6.4 Deploy the model to batch endpoint\n", "**Now, let's deploy the model with batch endpoints and run batch scoring.** \n", "\n", "It has three steps.\n", "- Create a batch endpoint\n", "- configure the endpoint\n", "- deploy the endpoint using MLClient.\n", "\n", "### 6.4.1 Create A Batch Endpoint\n", "\n", "**Please Note** that The name of the endpoint must be unique in the Azure region. For more information on the naming rules, see [managed endpoint limits](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-manage-quotas#azure-machine-learning-managed-online-endpoints). \n", "\n", "Optionally, you can add description, tags to your endpoint." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from azure.ai.ml.entities import (\n", " BatchEndpoint,\n", " BatchDeployment,\n", " BatchRetrySettings,\n", ")\n", "from azure.ai.ml.constants import BatchDeploymentOutputAction\n", "\n", "import datetime\n", "\n", "batch_endpoint_name = \"batch-endpoint-\" + datetime.datetime.now().strftime(\"%Y%m%d%H%M\")\n", "\n", "# create a batch endpoint\n", "endpoint = BatchEndpoint(\n", " name=batch_endpoint_name,\n", " description=\"Sample batch endpoint for detecting fridge objects\",\n", " tags={\"dataset\": \"od-fridge-objects\"},\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using the MLClient created earlier, we'll now create the Endpoint in the workspace. This command will start the endpoint creation and return a confirmation response while the endpoint creation continues." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ml_client.begin_create_or_update(endpoint).result()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 6.4.2 Create batch compute\n", "\n", "Batch endpoint runs only on cloud computing resources. The cloud computing resource is a reusable virtual computer cluster. \n", "The following code creates an Azure Machine Learning compute cluster." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from azure.ai.ml.entities import AmlCompute\n", "from azure.core.exceptions import ResourceNotFoundError\n", "\n", "compute_name = \"cpu-inference-cluster\"\n", "\n", "try:\n", " _ = ml_client.compute.get(compute_name)\n", " print(\"Found existing compute target.\")\n", "except ResourceNotFoundError:\n", " print(\"Creating a new compute target...\")\n", " compute_config = AmlCompute(\n", " name=compute_name,\n", " type=\"amlcompute\",\n", " size=\"Standard_DS12_v2\",\n", " idle_time_before_scale_down=120,\n", " min_instances=0,\n", " max_instances=5,\n", " )\n", " ml_client.begin_create_or_update(compute_config).result()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 6.4.3 Configure the deployment \n", "\n", "A deployment is a set of resources required for hosting the model that does the actual inferencing. We'll create a deployment for our endpoint using the BatchDeployment class. This class allows user to configure the following key aspects.\n", "\n", "- **name**: Name of the deployment.\n", "- **endpoint_name**: Name of the endpoint to create the deployment under.\n", "- **model**: The model to use for the deployment. This value can be either a reference to an existing versioned model in the workspace or an inline model specification.\n", "- **environment**: The environment to use for the deployment. This value can be either a reference to an existing versioned environment in the workspace or an inline environment specification. It should of type `azure.ai.ml.entities.environment` \n", "- **code_path**: Path to the source code directory for scoring the model\n", "- **scoring_script**: Relative path to the scoring file in the source code directory\n", "- **compute**: Name of the compute target to execute the batch scoring jobs on\n", "- **instance_count**: The number of nodes to use for each batch scoring job.\n", "- **max_concurrency_per_instance**: The maximum number of parallel scoring_script runs per instance.\n", "- **mini_batch_size**: The number of files the code_configuration.scoring_script can process in one run() call.\n", "- **retry_settings**: Retry settings for scoring each mini batch.\n", "- **max_retries**: The maximum number of retries for a failed or timed-out mini batch (default is 3)\n", "- **timeout**: The timeout in seconds for scoring a mini batch (default is 30)\n", "- **output_action**: Indicates how the output should be organized in the output file. Allowed values are append_row or summary_only. Default is append_row\n", "- **output_file_name**: Name of the batch scoring output file. Default is predictions.csv\n", "- **environment_variables**: Dictionary of environment variable name-value pairs to set for each batch scoring job.\n", "- **logging_level**: The log verbosity level. Allowed values are warning, info, debug. Default is info.\n", "\n", "\n", "For more details about the various arguments of the BatchDeployment method, please refer to the documentation [here](https://docs.microsoft.com/en-us/python/api/azure-ai-ml/azure.ai.ml.entities.batchdeployment?view=azure-python-preview)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create a batch deployment\n", "deployment_name = \"object-detection-model\"\n", "deployment = BatchDeployment(\n", " name=deployment_name,\n", " description=\"sample batch deployment for fridge objects detection model\",\n", " endpoint_name=batch_endpoint_name,\n", " model=registered_model.id,\n", " code_path=\"./scripts\",\n", " scoring_script=\"batch_scoring.py\",\n", " environment=env,\n", " compute=compute_name,\n", " instance_count=2,\n", " max_concurrency_per_instance=2,\n", " mini_batch_size=10,\n", " environment_variables={\"batch_size\": 16},\n", " # Specify inference batch_size, otherwise uses default value. (This is different from the mini_batch_size above)\n", " # NOTE: Large batch sizes may result in OOM errors.\n", " output_action=BatchDeploymentOutputAction.APPEND_ROW,\n", " output_file_name=\"predictions.csv\",\n", " retry_settings=BatchRetrySettings(max_retries=3, timeout=30),\n", " logging_level=\"info\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 6.4.4 Create the deployment\n", "\n", "Using the `MLClient` created earlier, we'll now create the deployment in the workspace. This command will start the deployment creation and return a confirmation response while the deployment creation continues." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ml_client.begin_create_or_update(deployment).result()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6.5 Test the endpoint with sample data\n", "\n", "Using the MLClient created earlier, we'll get a handle to the endpoint. The endpoint can be invoked using the invoke command with the following parameters:\n", "\n", "- **endpoint_name** - Name of the endpoint\n", "- **input** - Dataset object holding the test dataset\n", "- **deployment_name** - Name of the specific deployment to test in an endpoint\n", "- **params_override** - Optionally, this parameter could be used to override the deployment configurations. \n", "\n", "For more details, please refer the documentation [here](https://docs.microsoft.com/en-us/python/api/azure-ai-ml/azure.ai.ml.operations.batchendpointoperations?view=azure-python-preview#azure-ai-ml-operations-batchendpointoperations-invoke)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "registered_data_asset = ml_client.data.get(\n", " name=\"fridge-items-images-object-detection\", label=\"latest\"\n", ")\n", "\n", "test_data = Input(type=AssetTypes.URI_FILE, path=registered_data_asset.id)\n", "\n", "# invoke the endpoint for batch scoring job\n", "job = ml_client.batch_endpoints.invoke(\n", " endpoint_name=batch_endpoint_name,\n", " input=test_data,\n", " deployment_name=deployment_name, # name is required as default deployment is not set\n", " params_override=[{\"mini_batch_size\": \"10\"}, {\"compute.instance_count\": \"1\"}],\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# get the details of the job\n", "job_name = job.name\n", "batch_job = ml_client.jobs.get(name=job_name)\n", "print(batch_job)\n", "# stream the job logs\n", "ml_client.jobs.stream(name=job_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6.6 Download and review output" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "scoring_job = list(ml_client.jobs.list(parent_job_name=job.name))[0]\n", "\n", "ml_client.jobs.download(\n", " name=scoring_job.name,\n", " download_path=os.path.join(dataset_parent_dir),\n", " output_name=\"score\",\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Print the first five lines of the output\n", "result_file = os.path.join(\n", " dataset_parent_dir, \"named-outputs\", \"score\", deployment.output_file_name\n", ")\n", "with open(result_file) as f:\n", " for x in range(5):\n", " print(next(f))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6.7 Choose a random file for visualization" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import random\n", "import json\n", "\n", "with open(result_file, \"r\") as f:\n", " contents = f.readlines()\n", "rand_file = contents[random.randrange(len(contents))]\n", "prediction = json.loads(rand_file)\n", "print(prediction[\"filename\"])\n", "print(prediction[\"boxes\"])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Download the image file from the datastore\n", "file_path_list_on_vm = prediction[\"filename\"].split(\"/\")\n", "local_file_path = (\n", " f\"./data/odFridgeObjects/{file_path_list_on_vm[-2]}/{file_path_list_on_vm[-1]}\"\n", ")\n", "local_file_path" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import matplotlib.image as mpimg\n", "import matplotlib.patches as patches\n", "from PIL import Image\n", "import numpy as np\n", "import json\n", "\n", "IMAGE_SIZE = (18, 12)\n", "plt.figure(figsize=IMAGE_SIZE)\n", "img_np = mpimg.imread(local_file_path)\n", "img = Image.fromarray(img_np.astype(\"uint8\"), \"RGB\")\n", "x, y = img.size\n", "\n", "fig, ax = plt.subplots(1, figsize=(15, 15))\n", "# Display the image\n", "ax.imshow(img_np)\n", "\n", "# draw box and label for each detection\n", "detections = prediction\n", "for detect in detections[\"boxes\"]:\n", " label = detect[\"label\"]\n", " box = detect[\"box\"]\n", " conf_score = detect[\"score\"]\n", " if conf_score > 0.6:\n", " ymin, xmin, ymax, xmax = (\n", " box[\"topY\"],\n", " box[\"topX\"],\n", " box[\"bottomY\"],\n", " box[\"bottomX\"],\n", " )\n", " topleft_x, topleft_y = x * xmin, y * ymin\n", " width, height = x * (xmax - xmin), y * (ymax - ymin)\n", " print(\n", " f\"{detect['label']}: [{round(topleft_x, 3)}, {round(topleft_y, 3)}, \"\n", " f\"{round(width, 3)}, {round(height, 3)}], {round(conf_score, 3)}\"\n", " )\n", "\n", " color = np.random.rand(3) #'red'\n", " rect = patches.Rectangle(\n", " (topleft_x, topleft_y),\n", " width,\n", " height,\n", " linewidth=3,\n", " edgecolor=color,\n", " facecolor=\"none\",\n", " )\n", "\n", " ax.add_patch(rect)\n", " plt.text(topleft_x, topleft_y - 10, label, color=color, fontsize=20)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 7. Delete the endpoint\n", "Delete the endpoint to release the resources" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ml_client.batch_endpoints.begin_delete(name=batch_endpoint_name)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.10 - SDK V2", "language": "python", "name": "python310-sdkv2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" } }, "nbformat": 4, "nbformat_minor": 4 }