sdk/python/responsible-ai/vision/responsibleaidashboard-image-flower-classification.ipynb (1,325 lines of code) (raw):

{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "98605bcd", "metadata": {}, "source": [ "# Image Classification scenario with RAI Dashboard\n", "\n", "The Flower dataset classifies images into 5 distinct flower categories, each commonly found in the United Kingdom.. This example notebook demonstrates how to use a fine-tuned fastai computer vision model on the dataset to evaluate the model in AzureML.\n", "\n", "First, we need to specify the version of the RAI components which are available in the workspace. This was specified when the components were uploaded." ] }, { "cell_type": "code", "execution_count": null, "id": "53b4eeac", "metadata": { "gather": { "logged": 1700481891472 } }, "outputs": [], "source": [ "%pip install azure-ai-ml\n", "%pip install fastai" ] }, { "cell_type": "code", "execution_count": null, "id": "f035b8eb", "metadata": {}, "outputs": [], "source": [ "version_string = \"0.0.21\"" ] }, { "attachments": {}, "cell_type": "markdown", "id": "06008690", "metadata": {}, "source": [ "We also need to give the name of the compute cluster we want to use in AzureML. Later in this notebook, we will create it if it does not already exist:" ] }, { "cell_type": "code", "execution_count": null, "id": "f1ad79f9", "metadata": { "gather": { "logged": 1700481893606 } }, "outputs": [], "source": [ "compute_name = \"ResponsibleAI\"" ] }, { "attachments": {}, "cell_type": "markdown", "id": "9fc65dc7", "metadata": {}, "source": [ "Finally, we need to specify a version for the data and components we will create while running this notebook. This should be unique for the workspace, but the specific value doesn't matter:" ] }, { "cell_type": "code", "execution_count": null, "id": "78053935", "metadata": { "gather": { "logged": 1700481895693 } }, "outputs": [], "source": [ "rai_example_version_string = \"14\"" ] }, { "cell_type": "markdown", "id": "0ba978b3-3779-4561-a3cd-0704db497a91", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "# 1. Connect to Azure Machine Learning Workspace\n", "\n", "The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run." ] }, { "cell_type": "markdown", "id": "8123a6d4-bb8c-4f4d-ba7f-b04cc27b675a", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "First, we need to upload the datasets to our workspace. We start by creating an `MLClient` for interactions with AzureML:" ] }, { "cell_type": "code", "execution_count": null, "id": "3c18eaeb-82d9-412f-b86e-305e509fceb1", "metadata": { "gather": { "logged": 1700481898493 }, "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "# Enter details of your AML workspace\n", "subscription_id = \"<SUBSCRIPTION_ID>\"\n", "resource_group = \"<RESOURCE_GROUP>\"\n", "workspace = \"<AML_WORKSPACE_NAME>\"" ] }, { "cell_type": "code", "execution_count": null, "id": "2bd9adab-57a2-4ac5-90a0-e63abde900bb", "metadata": { "gather": { "logged": 1700481901485 }, "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "# Handle to the workspace\n", "from azure.ai.ml import MLClient\n", "from azure.identity import DefaultAzureCredential\n", "\n", "credential = DefaultAzureCredential()\n", "ml_client = MLClient(\n", " credential=credential,\n", " subscription_id=subscription_id,\n", " resource_group_name=resource_group,\n", " workspace_name=workspace,\n", ")\n", "\n", "print(ml_client)" ] }, { "cell_type": "markdown", "id": "68d97c28-e7e6-401f-90cb-976ad7b67c13", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "We can now upload the data to AzureML:" ] }, { "attachments": {}, "cell_type": "markdown", "id": "73be2b63", "metadata": {}, "source": [ "# 2. Accessing the Data\n", "\n", "We supply the data as a pair of parquet files and accompanying `MLTable` file. We can download them, preprocess them, and take a brief look:" ] }, { "cell_type": "code", "execution_count": null, "id": "5f875f18", "metadata": { "gather": { "logged": 1700481905894 } }, "outputs": [], "source": [ "import os\n", "import pandas as pd\n", "import json\n", "\n", "try:\n", " from urllib import urlretrieve\n", "except ImportError:\n", " from urllib.request import urlretrieve\n", "\n", "from zipfile import ZipFile" ] }, { "cell_type": "markdown", "id": "83f65ca9-d112-4e47-8420-f7cc2e7becf6", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "## 2.1 Download Data" ] }, { "cell_type": "code", "execution_count": null, "id": "1742abf4-d2f9-490a-9eaa-173fe91245e0", "metadata": { "gather": { "logged": 1700481925085 }, "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "import os\n", "import urllib\n", "from zipfile import ZipFile\n", "\n", "# Change to a different location if you prefer\n", "dataset_parent_dir = \"./data\"\n", "\n", "# create data folder if it doesnt exist.\n", "os.makedirs(dataset_parent_dir, exist_ok=True)\n", "\n", "# download data\n", "download_url = \"https://publictestdatasets.blob.core.windows.net/data/image_test_classification.zip\"\n", "\n", "# Extract current dataset name from dataset url\n", "dataset_name = os.path.split(download_url)[-1].split(\".\")[0]\n", "# Get dataset path for later use\n", "dataset_dir = os.path.join(dataset_parent_dir, dataset_name)\n", "os.makedirs(dataset_dir, exist_ok=True)\n", "\n", "# Get the data zip file path\n", "data_file = os.path.join(dataset_parent_dir, f\"{dataset_name}.zip\")\n", "\n", "# Download the dataset\n", "urllib.request.urlretrieve(download_url, filename=data_file)\n", "\n", "# extract files\n", "with ZipFile(data_file, \"r\") as zip:\n", " print(\"extracting files...\")\n", " zip.extractall(path=dataset_dir)\n", " print(\"done\")\n", "\n", "\n", "# delete zip file\n", "os.remove(data_file)" ] }, { "cell_type": "markdown", "id": "579ed90b-58c0-4040-99aa-8faeff1e4cbe", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "## 2.2. Upload the images to Datastore through an AML Data asset (URI Folder) for training an AutomatedML Model\n", "\n", "In order to use the data for training in Azure ML, we upload it to our default Azure Blob Storage of our Azure ML Workspace.\n", "\n", "Reference to URI FOLDER data asset example for further details: https://github.com/Azure/azureml-examples/blob/samuel100/data-samples/sdk/assets/data/data.ipynb" ] }, { "cell_type": "code", "execution_count": null, "id": "f130f65c-d11b-4db4-b9a9-ce4f89828892", "metadata": { "gather": { "logged": 1700481970331 }, "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "# Uploading image files by creating a 'data asset URI FOLDER':\n", "\n", "from azure.ai.ml.entities import Data\n", "from azure.ai.ml.constants import AssetTypes, InputOutputModes\n", "from azure.ai.ml import Input\n", "\n", "input_test_data_folder = \"image-classification-flowers-new\"\n", "\n", "try:\n", " uri_folder_data_asset = ml_client.data.get(\n", " name=input_test_data_folder,\n", " version=rai_example_version_string,\n", " )\n", "except Exception:\n", " my_data = Data(\n", " path=dataset_dir,\n", " type=AssetTypes.URI_FOLDER,\n", " description=\"Images of Flowers image classification\",\n", " name=input_test_data_folder,\n", " version=rai_example_version_string,\n", " )\n", "\n", " uri_folder_data_asset = ml_client.data.create_or_update(my_data)\n", "\n", "print(uri_folder_data_asset)\n", "print(\"\")\n", "print(\"Path to folder in Blob Storage:\")\n", "print(uri_folder_data_asset.path)" ] }, { "cell_type": "markdown", "id": "4a3c7dd0-0a2f-4154-9927-dc0dc6c7807b", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "## 2.3. Convert the downloaded data to JSONL\n", "\n", "In this example, the fridge object dataset is stored in a directory. There are four different folders inside:\n", "- /daisy\n", "- /dandelion\n", "- /rose\n", "- /sunflower\n", "- /tulip\n", "\n", "This is the most common data format for multiclass image classification. Each folder title corresponds to the image label for the images contained inside. In order to use this data to create an AzureML MLTable, we first need to convert it to the required JSONL format. Please refer to the [documentation on how to prepare datasets](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-prepare-datasets-for-automl-images).\n", "\n", "\n", "The following script is creating a .jsonl file in the corresponding MLTable folder. " ] }, { "cell_type": "code", "execution_count": null, "id": "fb3520cc-bc32-49d0-9200-f1f446b7313e", "metadata": { "gather": { "logged": 1700481975389 }, "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "import json\n", "import os\n", "\n", "# We'll copy each JSONL file within its related MLTable folder\n", "validation_mltable_path = os.path.join(dataset_parent_dir, \"validation-mltable-folder\")\n", "\n", "# First, let's create the folder if it doesn't exist\n", "os.makedirs(validation_mltable_path, exist_ok=True)\n", "\n", "# Path to the combined JSONL file\n", "validation_annotations_file = os.path.join(\n", " validation_mltable_path, \"validation_annotations.jsonl\"\n", ")\n", "\n", "# Baseline of json line dictionary\n", "json_line_sample = {\n", " \"image_url\": uri_folder_data_asset.path,\n", " \"label\": \"\",\n", "}\n", "\n", "# Scan each subdirectory and generate a jsonl line per image, combined in a single JSONL file\n", "with open(validation_annotations_file, \"w\") as combined_f:\n", " for class_name in os.listdir(dataset_dir):\n", " sub_dir = os.path.join(dataset_dir, class_name)\n", " if not os.path.isdir(sub_dir):\n", " continue\n", "\n", " # Scan each subdirectory\n", " print(f\"Parsing {sub_dir}\")\n", " for image in os.listdir(sub_dir):\n", " if image.endswith(\".jpg\"):\n", " json_line = dict(json_line_sample)\n", " json_line[\"image_url\"] += f\"{class_name}/{image}\"\n", " json_line[\"label\"] = class_name\n", "\n", " # Write each json line to the combined JSONL file\n", " combined_f.write(json.dumps(json_line) + \"\\n\")" ] }, { "cell_type": "markdown", "id": "efacc176-87a0-433c-9d45-e8384759926e", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "## 2.4. Create MLTable data input for training an AutomatedML Model\n", "\n", "Create MLTable data input using the jsonl files created above.\n", "\n", "For documentation on creating your own MLTable assets for jobs beyond this notebook, please refer to below resources\n", "- [MLTable YAML Schema](https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-mltable) - covers how to write MLTable YAML, which is required for each MLTable asset.\n", "- [Create MLTable data asset](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-data-assets?tabs=Python-SDK#create-a-mltable-data-asset) - covers how to create mltable data asset. " ] }, { "cell_type": "code", "execution_count": null, "id": "ec5e326f-003a-44db-becc-2f3a2d68947a", "metadata": { "gather": { "logged": 1700481980484 }, "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "def create_ml_table_file(filename):\n", " return (\n", " \"$schema: https://azureml/sdk-2-0/MLTable.json\\n\"\n", " \"type: mltable\\n\"\n", " \"paths:\\n\"\n", " \" - file: ./{0}\\n\"\n", " \"transformations:\\n\"\n", " \" - read_json_lines:\\n\"\n", " \" encoding: utf8\\n\"\n", " \" invalid_lines: error\\n\"\n", " \" include_path_column: false\\n\"\n", " \" - convert_column_types:\\n\"\n", " \" - columns: image_url\\n\"\n", " \" column_type: stream_info\"\n", " ).format(filename)\n", "\n", "\n", "def save_ml_table_file(output_path, mltable_file_contents):\n", " with open(os.path.join(output_path, \"MLTable\"), \"w\") as f:\n", " f.write(mltable_file_contents)\n", "\n", "\n", "# Create and save validation mltable\n", "validation_mltable_file_contents = create_ml_table_file(\n", " os.path.basename(validation_annotations_file)\n", ")\n", "save_ml_table_file(validation_mltable_path, validation_mltable_file_contents)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "a2c4ebb4", "metadata": {}, "source": [ "Load some data for a quick view:" ] }, { "cell_type": "code", "execution_count": null, "id": "c32fdb88-3a17-468b-aaea-df71dad5b251", "metadata": { "gather": { "logged": 1700482011206 }, "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "import mltable\n", "\n", "tbl = mltable.load(validation_mltable_path)\n", "test_df: pd.DataFrame = tbl.to_pandas_dataframe()\n", "\n", "display(test_df)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "1115ac59", "metadata": {}, "source": [ "The label columns contain the classes:" ] }, { "cell_type": "code", "execution_count": null, "id": "5b42df3d", "metadata": { "gather": { "logged": 1700482044985 } }, "outputs": [], "source": [ "target_column_name = \"label\"" ] }, { "cell_type": "code", "execution_count": null, "id": "62eb02a2", "metadata": { "gather": { "logged": 1700482188622 } }, "outputs": [], "source": [ "from azure.ai.ml.entities import Data\n", "from azure.ai.ml.constants import AssetTypes\n", "\n", "input_test_data = \"Flower_classification_new_MLTable_1\"\n", "\n", "try:\n", " test_data = ml_client.data.get(\n", " name=input_test_data,\n", " version=rai_example_version_string,\n", " )\n", "except Exception:\n", " test_data = Data(\n", " path=validation_mltable_path,\n", " type=AssetTypes.MLTABLE,\n", " description=\"RAI Flower classification test data\",\n", " name=input_test_data,\n", " version=rai_example_version_string,\n", " )\n", " ml_client.data.create_or_update(test_data)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6815ba75", "metadata": {}, "source": [ "# 3. Creating the Model\n", "\n", "A restnet50 model trained on approximately 3000 images from the 5 classes is used in this scenario.\n", "\n", "To simplify the model creation process, we're going to use a pipeline. We create a directory for the training script:" ] }, { "cell_type": "code", "execution_count": null, "id": "e78d869b", "metadata": { "gather": { "logged": 1700482191940 } }, "outputs": [], "source": [ "import os\n", "\n", "os.makedirs(\"component_src\", exist_ok=True)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ea86e55d", "metadata": {}, "source": [ "Next, we write out our script to retrieve the trained model:" ] }, { "cell_type": "code", "execution_count": null, "id": "a523f144", "metadata": {}, "outputs": [], "source": [ "%%writefile component_src/training_script.py\n", "\n", "import argparse\n", "import logging\n", "import json\n", "import os\n", "import time\n", "from zipfile import ZipFile\n", "\n", "import mlflow\n", "import mlflow.pyfunc\n", "\n", "from azureml.core import Run\n", "\n", "from fastai.learner import load_learner\n", "\n", "from raiutils.common.retries import retry_function\n", "\n", "try:\n", " from urllib import urlretrieve\n", "except ImportError:\n", " from urllib.request import urlretrieve\n", "\n", "_logger = logging.getLogger(__file__)\n", "logging.basicConfig(level=logging.INFO)\n", "\n", "MODEL_NAME = \"flower_classification_model.pkl\"\n", "\n", "def parse_args():\n", " # setup arg parser\n", " parser = argparse.ArgumentParser()\n", "\n", " # add arguments\n", " parser.add_argument(\n", " \"--model_output_path\", type=str, help=\"Path to write model info JSON\"\n", " )\n", " parser.add_argument(\n", " \"--model_base_name\", type=str, help=\"Name of the registered model\"\n", " )\n", " parser.add_argument(\n", " \"--model_name_suffix\", type=int, help=\"Set negative to use epoch_secs\"\n", " )\n", " parser.add_argument(\n", " \"--device\", type=int, help=(\n", " \"Device for CPU/GPU supports. Setting this to -1 will leverage \"\n", " \"CPU, >=0 will run the model on the associated CUDA device id.\")\n", " )\n", "\n", " # parse args\n", " args = parser.parse_args()\n", "\n", " # return args\n", " return args\n", "\n", "\n", "class FetchModel(object):\n", " def __init__(self):\n", " pass\n", "\n", " def fetch(self):\n", " # Blobstore link to the trained model\n", " url = (\"https://publictestdatasets.blob.core.windows.net/models/\"+\n", " \"flower_classification_model.zip\")\n", " urlretrieve(url, filename=\"flower_classification_model.zip\")\n", " data_file = os.path.join(\"./\", \"flower_classification_model.zip\")\n", "\n", " with ZipFile(data_file, \"r\") as zip:\n", " zip.extractall(path=\"./\")\n", " \n", " os.remove(data_file)\n", "\n", "\n", "def main(args):\n", " current_experiment = Run.get_context().experiment\n", " tracking_uri = current_experiment.workspace.get_mlflow_tracking_uri()\n", " _logger.info(\"tracking_uri: {0}\".format(tracking_uri))\n", " mlflow.set_tracking_uri(tracking_uri)\n", " mlflow.set_experiment(current_experiment.name)\n", "\n", " _logger.info(\"Getting device\")\n", " device = args.device\n", "\n", " _logger.info(\"Loading parquet input\")\n", "\n", " # Load the fastai model\n", " fetcher = FetchModel()\n", " action_name = \"Model download\"\n", " err_msg = \"Failed to download model\"\n", " max_retries = 4\n", " retry_delay = 60\n", " retry_function(fetcher.fetch, action_name, err_msg,\n", " max_retries=max_retries,\n", " retry_delay=retry_delay)\n", " model = load_learner(MODEL_NAME)\n", "\n", " if device >= 0:\n", " model = model.cuda()\n", "\n", " if args.model_name_suffix < 0:\n", " suffix = int(time.time())\n", " else:\n", " suffix = args.model_name_suffix\n", " registered_name = \"{0}_{1}\".format(args.model_base_name, suffix)\n", " _logger.info(f\"Registering model as {registered_name}\")\n", "\n", " # Saving model with mlflow\n", " _logger.info(\"Saving with mlflow\")\n", "\n", " mlflow.fastai.log_model(\n", " model,\n", " artifact_path=registered_name,\n", " registered_model_name=registered_name\n", " )\n", "\n", " _logger.info(\"Writing JSON\")\n", " dict = {\"id\": \"{0}:1\".format(registered_name)}\n", " output_path = os.path.join(args.model_output_path, \"model_info.json\")\n", " with open(output_path, \"w\") as of:\n", " json.dump(dict, fp=of)\n", "\n", "\n", "# run script\n", "if __name__ == \"__main__\":\n", " # add space in logs\n", " print(\"*\" * 60)\n", " print(\"\\n\\n\")\n", "\n", " # parse args\n", " args = parse_args()\n", "\n", " # run main function\n", " main(args)\n", "\n", " # add space in logs\n", " print(\"*\" * 60)\n", " print(\"\\n\\n\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e115dd6e", "metadata": {}, "source": [ "Now, we can build this into an AzureML component:" ] }, { "cell_type": "code", "execution_count": null, "id": "3d54e43f", "metadata": { "gather": { "logged": 1700482292209 } }, "outputs": [], "source": [ "from azure.ai.ml import load_component\n", "\n", "yaml_contents = f\"\"\"\n", "$schema: http://azureml/sdk-2-0/CommandComponent.json\n", "name: rai_flower_classification_training_component_new_1\n", "display_name: RAI Flower classification training component\n", "version: {rai_example_version_string}\n", "type: command\n", "inputs:\n", " model_base_name:\n", " type: string\n", " model_name_suffix: # Set negative to use epoch_secs\n", " type: integer\n", " default: -1\n", " device: # set to >= 0 to use GPU\n", " type: integer\n", " default: 0\n", "outputs:\n", " model_output_path:\n", " type: path\n", "code: ./component_src/\n", "environment: azureml://registries/azureml/environments/responsibleai-vision/versions/15\n", "command: >-\n", " python training_script.py\n", " --model_base_name ${{{{inputs.model_base_name}}}}\n", " --model_name_suffix ${{{{inputs.model_name_suffix}}}}\n", " --device ${{{{inputs.device}}}}\n", " --model_output_path ${{{{outputs.model_output_path}}}}\n", "\"\"\"\n", "\n", "yaml_filename = \"TrainingComponent.yaml\"\n", "\n", "with open(yaml_filename, \"w\") as f:\n", " f.write(yaml_contents)\n", "\n", "train_component_definition = load_component(source=yaml_filename)\n", "\n", "ml_client.components.create_or_update(train_component_definition)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6d165e2b", "metadata": {}, "source": [ "We need a compute target on which to run our jobs. The following checks whether the compute specified above is present; if not, then the compute target is created." ] }, { "cell_type": "code", "execution_count": null, "id": "1e40fc38", "metadata": { "gather": { "logged": 1700482299156 } }, "outputs": [], "source": [ "from azure.ai.ml.entities import AmlCompute\n", "\n", "all_compute_names = [x.name for x in ml_client.compute.list()]\n", "\n", "if compute_name in all_compute_names:\n", " print(f\"Found existing compute: {compute_name}\")\n", "else:\n", " my_compute = AmlCompute(\n", " name=compute_name,\n", " size=\"STANDARD_DS13_V2\",\n", " min_instances=0,\n", " max_instances=4,\n", " idle_time_before_scale_down=3600,\n", " )\n", " ml_client.compute.begin_create_or_update(my_compute)\n", " print(\"Initiated compute creation\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "9d8eb868", "metadata": {}, "source": [ "## 3.1. Running a training pipeline\n", "\n", "Now that we have our training component, we can run it. We begin by generating a unique name for the model." ] }, { "cell_type": "code", "execution_count": null, "id": "ad76242b", "metadata": { "gather": { "logged": 1700482305489 } }, "outputs": [], "source": [ "import time\n", "\n", "model_base_name = \"flower_classification_model\"\n", "model_name_suffix = \"12492\"\n", "device = -1" ] }, { "attachments": {}, "cell_type": "markdown", "id": "d49615a7", "metadata": {}, "source": [ "Next, we define our training pipeline. This has two components. The first is the training component which we defined above. The second is a component to register the model in AzureML:" ] }, { "cell_type": "code", "execution_count": null, "id": "cb6c6cec", "metadata": { "gather": { "logged": 1700482313283 } }, "outputs": [], "source": [ "from azure.ai.ml import dsl, Input\n", "\n", "train_model_component = ml_client.components.get(\n", " name=\"rai_flower_classification_training_component_new_1\",\n", " version=rai_example_version_string,\n", ")\n", "\n", "\n", "@dsl.pipeline(\n", " compute=compute_name,\n", " description=\"Register Model for RAI Flower classification\",\n", " experiment_name=f\"RAI_Flower_classification_Model_Training_{model_name_suffix}\",\n", ")\n", "def my_training_pipeline(model_base_name, model_name_suffix, device):\n", " trained_model = train_component_definition(\n", " model_base_name=model_base_name,\n", " model_name_suffix=model_name_suffix,\n", " device=device,\n", " )\n", " trained_model.set_limits(timeout=7200)\n", "\n", " return {}\n", "\n", "\n", "model_registration_pipeline_job = my_training_pipeline(\n", " model_base_name, model_name_suffix, device\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "2fa66ea6", "metadata": {}, "source": [ "With the training pipeline defined, we can submit it for execution in AzureML. We define a helper function to wait for the job to complete:" ] }, { "cell_type": "code", "execution_count": null, "id": "682cf6ab-fe46-45fd-8177-d1ceae06c6db", "metadata": { "gather": { "logged": 1700482316296 }, "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "from azure.ai.ml.entities import PipelineJob\n", "from IPython.core.display import HTML\n", "from IPython.display import display\n", "\n", "\n", "def submit_and_wait(ml_client, pipeline_job) -> PipelineJob:\n", " created_job = ml_client.jobs.create_or_update(pipeline_job)\n", " assert created_job is not None\n", "\n", " print(\"Pipeline job can be accessed in the following URL:\")\n", " display(HTML('<a href=\"{0}\">{0}</a>'.format(created_job.studio_url)))\n", "\n", " while created_job.status not in [\n", " \"Completed\",\n", " \"Failed\",\n", " \"Canceled\",\n", " \"NotResponding\",\n", " ]:\n", " time.sleep(30)\n", " created_job = ml_client.jobs.get(created_job.name)\n", " print(\"Latest status : {0}\".format(created_job.status))\n", " assert created_job.status == \"Completed\"\n", " return created_job" ] }, { "cell_type": "code", "execution_count": null, "id": "f854eef5", "metadata": { "gather": { "logged": 1700482353473 } }, "outputs": [], "source": [ "# This is the actual submission\n", "training_job = submit_and_wait(ml_client, model_registration_pipeline_job)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "0722395e", "metadata": {}, "source": [ "# 4. Creating the RAI Vision Insights\n", "\n", "Now that we have our model, we can generate RAI Vision insights for it. We will need the `id` of the registered model, which will be as follows:" ] }, { "cell_type": "code", "execution_count": null, "id": "7d3e6e6e", "metadata": { "gather": { "logged": 1700482353767 } }, "outputs": [], "source": [ "expected_model_id = f\"{model_base_name}_{model_name_suffix}:1\"\n", "azureml_model_id = f\"azureml:{expected_model_id}\"" ] }, { "attachments": {}, "cell_type": "markdown", "id": "310aa659", "metadata": {}, "source": [ "Next, we load the RAI components, so that we can construct a pipeline:" ] }, { "cell_type": "code", "execution_count": null, "id": "cf25558f-72df-4bba-8543-a5d3d5bf1da0", "metadata": { "gather": { "logged": 1700487342945 }, "jupyter": { "outputs_hidden": false, "source_hidden": false }, "nteract": { "transient": { "deleting": false } } }, "outputs": [], "source": [ "test_mltable = Input(\n", " type=\"mltable\",\n", " path=f\"{input_test_data}:{rai_example_version_string}\",\n", " mode=\"download\",\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "d67b942e", "metadata": { "gather": { "logged": 1700487346200 } }, "outputs": [], "source": [ "registry_name = \"azureml\"\n", "credential = DefaultAzureCredential()\n", "\n", "ml_client_registry = MLClient(\n", " credential=credential,\n", " subscription_id=ml_client.subscription_id,\n", " resource_group_name=ml_client.resource_group_name,\n", " registry_name=registry_name,\n", ")\n", "\n", "rai_vision_insights_component = ml_client_registry.components.get(\n", " name=\"rai_vision_insights\", version=version_string\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c98cd2d9", "metadata": {}, "source": [ "## 4.1 Constructing the pipeline in sdk\n", "\n", "We can now specify our pipeline. Complex objects (such as lists of column names) have to be converted to JSON strings before being passed to the components." ] }, { "cell_type": "code", "execution_count": null, "id": "a62105a7", "metadata": { "gather": { "logged": 1700487348739 } }, "outputs": [], "source": [ "import json\n", "from azure.ai.ml import Input\n", "from azure.ai.ml.constants import AssetTypes\n", "\n", "\n", "@dsl.pipeline(\n", " compute=compute_name,\n", " description=\"Example RAI computation on Flower classification\",\n", " experiment_name=f\"RAI_Flower_classification_RAIInsights_Computation_{model_name_suffix}\",\n", ")\n", "def image_classification_pipeline(\n", " target_column_name,\n", " test_data,\n", " classes,\n", " use_model_dependency,\n", "):\n", " # Initiate the RAIInsights\n", " rai_image_job = rai_vision_insights_component(\n", " task_type=\"image_classification\",\n", " model_info=expected_model_id,\n", " model_input=Input(type=AssetTypes.MLFLOW_MODEL, path=azureml_model_id),\n", " test_dataset=test_data,\n", " target_column_name=target_column_name,\n", " classes=classes,\n", " model_type=\"fastai\",\n", " dataset_type=\"private\",\n", " use_model_dependency=use_model_dependency,\n", " # Set this to true to be able to view explanations without computing\n", " # them on demand - however, running the dashboard may take several hours\n", " # for several hundred images.\n", " precompute_explanation=False,\n", " )\n", " rai_image_job.set_limits(timeout=720000)\n", "\n", " rai_image_job.outputs.dashboard.mode = \"upload\"\n", " rai_image_job.outputs.ux_json.mode = \"upload\"\n", "\n", " return {\n", " \"dashboard\": rai_image_job.outputs.dashboard,\n", " \"ux_json\": rai_image_job.outputs.ux_json,\n", " }" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6b5b14a9", "metadata": {}, "source": [ "Next, we define the pipeline object itself, and ensure that the outputs will be available for download:" ] }, { "cell_type": "code", "execution_count": null, "id": "e4d86ec2", "metadata": { "gather": { "logged": 1700487367354 } }, "outputs": [], "source": [ "import uuid\n", "from azure.ai.ml import Output\n", "\n", "insights_pipeline_job = image_classification_pipeline(\n", " target_column_name=target_column_name,\n", " test_data=test_mltable,\n", " classes=\"[]\",\n", " use_model_dependency=True,\n", ")\n", "\n", "rand_path = str(uuid.uuid4())\n", "insights_pipeline_job.outputs.dashboard = Output(\n", " path=f\"azureml://datastores/workspaceblobstore/paths/{rand_path}/dashboard/\",\n", " mode=\"upload\",\n", " type=\"uri_folder\",\n", ")\n", "insights_pipeline_job.outputs.ux_json = Output(\n", " path=f\"azureml://datastores/workspaceblobstore/paths/{rand_path}/ux_json/\",\n", " mode=\"upload\",\n", " type=\"uri_folder\",\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "25f34573", "metadata": {}, "source": [ "And submit the pipeline to AzureML for execution:" ] }, { "cell_type": "code", "execution_count": null, "id": "2ca757f7", "metadata": { "gather": { "logged": 1700483346337 } }, "outputs": [], "source": [ "insights_job = submit_and_wait(ml_client, insights_pipeline_job)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "1381768a", "metadata": {}, "source": [ "The dashboard should appear in the AzureML portal in the registered model view. The following cell computes the expected URI:" ] }, { "cell_type": "code", "execution_count": null, "id": "e86ab611", "metadata": { "gather": { "logged": 1700483150165 } }, "outputs": [], "source": [ "sub_id = ml_client._operation_scope.subscription_id\n", "rg_name = ml_client._operation_scope.resource_group_name\n", "ws_name = ml_client.workspace_name\n", "\n", "expected_uri = f\"https://ml.azure.com/model/{expected_model_id}/model_analysis?wsid=/subscriptions/{sub_id}/resourcegroups/{rg_name}/workspaces/{ws_name}\"\n", "\n", "print(f\"Please visit {expected_uri} to see your analysis\")" ] } ], "metadata": { "celltoolbar": "Raw Cell Format", "kernel_info": { "name": "python310-sdkv2" }, "kernelspec": { "display_name": "Python 3.10 - SDK v2", "language": "python", "name": "python310-sdkv2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.11" }, "microsoft": { "host": { "AzureML": { "notebookHasBeenCompleted": true } }, "ms_spell_check": { "ms_spell_check_language": "en" } }, "nteract": { "version": "nteract-front-end@1.0.0" } }, "nbformat": 4, "nbformat_minor": 5 }