# AutoML Image Classification scenario with RAI Dashboard

This example notebook demonstrates how to use an automl trained computer vision model on the dataset to evaluate the model in AzureML.

First, we need to specify the version of the RAI components which are available in the workspace. This was specified when the components were uploaded.

In [None]:
version_string = "0.0.20"

We can optionally provide the name of the compute cluster we want to use in AzureML. Later in this notebook, we will create it if it does not already exist as an example. AzureML can also run on serverless computes if a compute is not explicitly set. 

In [None]:
train_compute_name = "gpu-cluster-nc6-v3"

rai_compute_name = "cpucluster"

Finally, we need to specify a version for the data and components we will create while running this notebook. This should be unique for the workspace, but the specific value doesn't matter:

In [None]:
rai_example_version_string = "63"

# 1. Connect to Azure Machine Learning Workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

In [None]:
from azure.ai.ml.entities import Data, Model
from azure.ai.ml.constants import AssetTypes

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

from azure.ai.ml.automl import ClassificationPrimaryMetrics
from azure.ai.ml import automl, Input, dsl

In [None]:
# Enter details of your AML workspace
subscription_id = "<SUBSCRIPTION_ID>"
resource_group = "<RESOURCE_GROUP>"
workspace = "<AML_WORKSPACE_NAME>"

In [None]:
# Handle to the workspace
from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential

try:
    credential = DefaultAzureCredential()
    ml_client = MLClient(
        credential=credential,
        subscription_id=subscription_id,
        resource_group_name=resource_group,
        workspace_name=workspace,
    )
except Exception:
    # If in compute instance we can get the config automatically
    from azureml.core import Workspace

    workspace = Workspace.from_config()
    workspace.write_config()
    ml_client = MLClient.from_config(
        credential=DefaultAzureCredential(exclude_shared_token_cache_credential=True),
        logging_enable=True,
    )

print(ml_client)

#### Compute target setup

There are two ways to submit a job - through a compute or a serverless job.

##### Serverless Job:

In a serverless job, there is no need to create a compute explicitly.
Simply pass the desired instance type value to the `instance_type` parameter while creating a pipeline job.
This allows for quick and convenient job submission without the need for managing a compute cluster.

##### Compute Job:

The following code below demonstrates how to create a gpu compute cluster.
After creating the compute cluster, pass the name of the compute cluster to the `compute_name` parameter while submitting the pipeline job. This ensures that the job runs on the specified compute cluster, allowing for more control and customization.

You will need to provide a [Compute Target](https://docs.microsoft.com/en-us/azure/machine-learning/concept-azure-machine-learning-architecture#computes) that will be used for your AutoML model training. AutoML models for image tasks require [GPU SKUs](https://docs.microsoft.com/en-us/azure/virtual-machines/sizes-gpu) such as the ones from the NCv3, ND, NDv2 and NCasT4 series. We recommend using the NCsv3-series (with v100 GPUs) for faster training. Using a compute target with a multi-GPU VM SKU will leverage the multiple GPUs to speed up training. Additionally, setting up a compute target with multiple nodes will allow for faster model training by leveraging parallelism, when tuning hyperparameters for your model.



In [None]:
from azure.ai.ml.entities import AmlCompute

all_compute_names = [x.name for x in ml_client.compute.list()]

if train_compute_name in all_compute_names:
    print(f"Found existing compute: {train_compute_name}")
else:
    train_compute_config = AmlCompute(
        name=train_compute_name,
        type="amlcompute",
        size="Standard_NC6s_v3",
        min_instances=0,
        max_instances=4,
        idle_time_before_scale_down=120,
    )
    ml_client.compute.begin_create_or_update(train_compute_config).result()

In [None]:
from azure.ai.ml.entities import AmlCompute

all_compute_names = [x.name for x in ml_client.compute.list()]

if rai_compute_name in all_compute_names:
    print(f"Found existing compute: {rai_compute_name}")
else:
    rai_compute_config = AmlCompute(
        name=rai_compute_name,
        size="STANDARD_DS3_V2",
        min_instances=0,
        max_instances=4,
        idle_time_before_scale_down=3600,
    )
    ml_client.compute.begin_create_or_update(rai_compute_config)

# 2. Accessing the Data

We supply the data as a pair of parquet files and accompanying `MLTable` file. We can download them, preprocess them, and take a brief look. 

The [fridge dataset](https://github.com/microsoft/computervision-recipes/tree/master/scenarios/classification) classifies images into four types of items commonly found in the Microsoft New England R&D office refrigerator - carton, water bottle, can and milk bottle. 

In [None]:
import os
import pandas as pd

try:
    from urllib import urlretrieve
except ImportError:
    from urllib.request import urlretrieve

## 2.1 Download Data

Load the 'fridge items' dataset from a JSON file and MLTable definition.

In this notebook, we use a toy dataset called Fridge Objects, which consists of 134 images of 4 classes of beverage container {can, carton, milk bottle, water bottle} photos taken on different backgrounds.

All images in this notebook are hosted in [this repository](https://github.com/microsoft/computervision-recipes) and are made available under the [MIT license](https://github.com/microsoft/computervision-recipes/blob/master/LICENSE).

In [None]:
import os
import urllib
from zipfile import ZipFile

# Change to a different location if you prefer
dataset_parent_dir = "./data"

# create data folder if it doesnt exist.
os.makedirs(dataset_parent_dir, exist_ok=True)

# download data
download_url = (
    "https://publictestdatasets.blob.core.windows.net/computervision/fridgeObjects.zip"
)

# Extract current dataset name from dataset url
dataset_name = os.path.split(download_url)[-1].split(".")[0]
# Get dataset path for later use
dataset_dir = os.path.join(dataset_parent_dir, dataset_name)

# Get the data zip file path
data_file = os.path.join(dataset_parent_dir, f"{dataset_name}.zip")

# Download the dataset
urllib.request.urlretrieve(download_url, filename=data_file)

# extract files
with ZipFile(data_file, "r") as zip:
    print("extracting files...")
    zip.extractall(path=dataset_parent_dir)
    print("done")
# delete zip file
os.remove(data_file)

## 2.2. Upload the images to Datastore through an AML Data asset (URI Folder) for training an AutomatedML Model

In order to use the data for training in Azure ML, we upload it to our default Azure Blob Storage of our  Azure ML Workspace.

Reference to URI FOLDER data asset example for further details: https://github.com/Azure/azureml-examples/blob/samuel100/data-samples/sdk/assets/data/data.ipynb

In [None]:
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes

input_test_data = "fridge-items-images"

try:
    uri_folder_data_asset = ml_client.data.get(
        name=input_test_data, version=rai_example_version_string
    )
except Exception:
    my_data = Data(
        path=dataset_dir,
        type=AssetTypes.URI_FOLDER,
        description="Fridge-items images",
        name=input_test_data,
        version=rai_example_version_string,
    )
    uri_folder_data_asset = ml_client.data.create_or_update(my_data)
print(uri_folder_data_asset)
print("")
print("Path to folder in Blob Storage:")
print(uri_folder_data_asset.path)

## 2.3. Convert the downloaded data to JSONL

In this example, the fridge object dataset is stored in a directory. There are four different folders inside:

- /water_bottle
- /milk_bottle
- /carton
- /can

This is the most common data format for multiclass image classification. Each folder title corresponds to the image label for the images contained inside. In order to use this data to create an AzureML MLTable, we first need to convert it to the required JSONL format. Please refer to the [documentation on how to prepare datasets](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-prepare-datasets-for-automl-images).


The following script is creating two .jsonl files (one for training and one for validation) in the corresponding MLTable folder. The train / validation ratio corresponds to 20% of the data going into the validation file.

In [None]:
import json
import os


# We'll copy each JSONL file within its related MLTable folder
training_mltable_path = os.path.join(dataset_parent_dir, "training-mltable-folder")
validation_mltable_path = os.path.join(dataset_parent_dir, "validation-mltable-folder")

# First, let's create the folders if they don't exist
os.makedirs(training_mltable_path, exist_ok=True)
os.makedirs(validation_mltable_path, exist_ok=True)

train_validation_ratio = 5

# Path to the training and validation files
train_annotations_file = os.path.join(training_mltable_path, "train_annotations.jsonl")
validation_annotations_file = os.path.join(
    validation_mltable_path, "validation_annotations.jsonl"
)

# Baseline of json line dictionary
json_line_sample = {
    "image_url": uri_folder_data_asset.path,
    "label": "",
}

index = 0
# Scan each sub directary and generate a jsonl line per image, distributed on train and valid JSONL files
with open(train_annotations_file, "w") as train_f:
    with open(validation_annotations_file, "w") as validation_f:
        for class_name in os.listdir(dataset_dir):
            sub_dir = os.path.join(dataset_dir, class_name)
            if not os.path.isdir(sub_dir):
                continue

            # Scan each sub directary
            print(f"Parsing {sub_dir}")
            for image in os.listdir(sub_dir):
                json_line = dict(json_line_sample)
                json_line["image_url"] += f"{class_name}/{image}"
                json_line["label"] = class_name

                if index % train_validation_ratio == 0:
                    # validation annotation
                    validation_f.write(json.dumps(json_line) + "\n")
                else:
                    # train annotation
                    train_f.write(json.dumps(json_line) + "\n")
                index += 1

## 2.4 Create MLTable data input for training an AutomatedML Model

Create MLTable data input using the jsonl files created above.

For documentation on creating your own MLTable assets for jobs beyond this notebook, please refer to below resources
- [MLTable YAML Schema](https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-mltable) - covers how to write MLTable YAML, which is required for each MLTable asset.
- [Create MLTable data asset](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-data-assets?tabs=Python-SDK#create-a-mltable-data-asset) - covers how to create mltable data asset. 

In [None]:
def create_ml_table_file(filename):
    return (
        "$schema: https://azureml/sdk-2-0/MLTable.json\n"
        "type: mltable\n"
        "paths:\n"
        " - file: ./{0}\n"
        "transformations:\n"
        "  - read_json_lines:\n"
        "        encoding: utf8\n"
        "        invalid_lines: error\n"
        "        include_path_column: false\n"
        "  - convert_column_types:\n"
        "      - columns: image_url\n"
        "        column_type: stream_info"
    ).format(filename)


def save_ml_table_file(output_path, mltable_file_contents):
    with open(os.path.join(output_path, "MLTable"), "w") as f:
        f.write(mltable_file_contents)


# Create and save train mltable
train_mltable_file_contents = create_ml_table_file(
    os.path.basename(train_annotations_file)
)
save_ml_table_file(training_mltable_path, train_mltable_file_contents)

# Create and save validation mltable
validation_mltable_file_contents = create_ml_table_file(
    os.path.basename(validation_annotations_file)
)
save_ml_table_file(validation_mltable_path, validation_mltable_file_contents)

In [None]:
# Training MLTable defined locally, with local data to be uploaded
my_training_data_input = Input(type=AssetTypes.MLTABLE, path=training_mltable_path)

# Validation MLTable defined locally, with local data to be uploaded
my_validation_data_input = Input(type=AssetTypes.MLTABLE, path=validation_mltable_path)

# WITH REMOTE PATH: If available already in the cloud/workspace-blob-store
# my_training_data_input = Input(type=AssetTypes.MLTABLE, path="azureml://datastores/workspaceblobstore/paths/vision-classification/train")
# my_validation_data_input = Input(type=AssetTypes.MLTABLE, path="azureml://datastores/workspaceblobstore/paths/vision-classification/valid")

The label column contains the classes:

In [None]:
target_column_name = "label"

# 3. Configure and run the AutoML for Images Classification-MultiClass training job

Here, we are using Automatic hyperparameter sweeping for your models (AutoMode). For details on individual runs or manual hyper parameter sweep, refer to [automl-image-classification-multiclass-task-fridge-items.ipynb notebook](https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/automl-standalone-jobs/automl-image-classification-multiclass-task-fridge-items/automl-image-classification-multiclass-task-fridge-items.ipynb).

When using AutoML for Images, we can perform an automatic hyperparameter sweep to find the optimal model (we call this functionality AutoMode). The system will choose a model architecture and values for the learning_rate, number_of_epochs, training_batch_size, etc. based on the number of runs. There is no need to specify the hyperparameter search space, sampling method or early termination policy. A number of runs between 10 and 20 will likely work well on many datasets.

AutoMode is triggered by setting `max_trials` to a value greater than 1 in limits and by omitting the hyperparameter space, sampling method and termination policy.

The following functions configure AutoML image jobs for automatic sweeps:
### image_classification() function parameters:
The `image_classification()` factory function allows user to configure the training job.

- `compute` - The compute on which the AutoML job will run. In this example we are using a compute called 'gpu-cluster' present in the workspace. You can replace it any other compute in the workspace.
- `experiment_name` - The name of the experiment. An experiment is like a folder with multiple runs in Azure ML Workspace that should be related to the same logical machine learning experiment.
- `name` - The name of the Job/Run. This is an optional property. If not specified, a random name will be generated.
- `primary_metric` - The metric that AutoML will optimize for model selection.
- `target_column_name` - The name of the column to target for predictions. It must always be specified. This parameter is applicable to 'training_data' and 'validation_data'.
- `training_data` - The data to be used for training. It should contain both training feature columns and a target column. Optionally, this data can be split for segregating a validation or test dataset. 
You can use a registered MLTable in the workspace using the format '<mltable_name>:<version>' OR you can use a local file or folder as a MLTable. For e.g Input(mltable='my_mltable:1') OR Input(mltable=MLTable(local_path="./data"))
The parameter `training_data` must always be provided.

### set_limits() function parameters:
This is an optional configuration method to configure limits parameters such as timeouts.

- `max_trials` - Parameter for maximum number of configurations to sweep. Must be an integer between 1 and 1000. When exploring just the default hyperparameters for a given model algorithm, set this parameter to 1. Default value is 1.
- `max_concurrent_trials` - Maximum number of runs that can run concurrently. If not specified, all runs launch in parallel. If specified, must be an integer between 1 and 100.  Default value is 1.
    NOTE: The number of concurrent runs is gated on the resources available in the specified compute target. Ensure that the compute target has the available resources for the desired concurrency.


In [None]:
# set up experiment name
exp_name = "dpv2-image-classification-experiment"

This pipeline uses serverless compute. To use the compute you created above, uncomment the compute parameter line.

In [None]:
# Create the AutoML job with the related factory-function.

import random
import string

allowed_chars = string.ascii_lowercase + string.digits
suffix = "".join(random.choice(allowed_chars) for x in range(5))
job_name = "dpv2-image-classification-job-02" + suffix

image_classification_job = automl.image_classification(
    # compute=train_compute_name,
    name=job_name,
    experiment_name=exp_name,
    training_data=my_training_data_input,
    validation_data=my_validation_data_input,
    target_column_name="label",
    primary_metric=ClassificationPrimaryMetrics.ACCURACY,
    tags={"my_custom_tag": "My custom value"},
)

image_classification_job.set_limits(
    max_trials=10,
    max_concurrent_trials=2,
)

In [None]:
# Submit the AutoML job
returned_job = ml_client.jobs.create_or_update(
    image_classification_job
)  # submit the job to the backend

print(f"Created job: {returned_job}")

In [None]:
ml_client.jobs.stream(returned_job.name)

# 4. Retrieve the Best Trial (Best Model's trial/run) and Register the Best Model
Use the MLFLowClient to access the results (such as Models, Artifacts, Metrics) of a previously completed AutoML Trial.

### 4.1 Initialize MLFlow Client

The models and artifacts that are produced by AutoML can be accessed via the MLFlow interface.
Initialize the MLFlow client here, and set the backend as Azure ML, via. the MLFlow Client.

IMPORTANT, you need to have installed the latest MLFlow packages with:

    pip install azureml-mlflow

    pip install mlflow

In [None]:
import mlflow

# Obtain the tracking URL from MLClient
MLFLOW_TRACKING_URI = ml_client.workspaces.get(
    name=ml_client.workspace_name
).mlflow_tracking_uri

print(MLFLOW_TRACKING_URI)

In [None]:
# Set the MLFLOW TRACKING URI
mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)
print(f"\nCurrent tracking uri: {mlflow.get_tracking_uri()}")

In [None]:
from mlflow.tracking.client import MlflowClient

# Initialize MLFlow client
mlflow_client = MlflowClient()

In [None]:
job_name = returned_job.name

# # Example if providing an specific Job name/ID
# job_name = "happy_yam_40fq53m7c2" #"ashy_net_gdd31zf2fq"

# Get the parent run
mlflow_parent_run = mlflow_client.get_run(job_name)

print("Parent Run: ")
print(mlflow_parent_run)

In [None]:
# Print parent run tags. 'automl_best_child_run_id' tag should be there.
print(mlflow_parent_run.data.tags.keys())

### 4.2 Get the AutoML best child run

In [None]:
# Get the best model's child run

best_child_run_id = mlflow_parent_run.data.tags["automl_best_child_run_id"]
print(f"Found best child run id: {best_child_run_id}")

best_run = mlflow_client.get_run(best_child_run_id)

print("Best child run: ")
print(best_run)

In [None]:
import json

hyperparameter_tag_dict = json.loads(best_run.data.tags["hyperparameters"])
print(hyperparameter_tag_dict)

In [None]:
import pandas as pd

# Access the results (such as Models, Artifacts, Metrics) of a previously completed AutoML Run.
pd.DataFrame(best_run.data.metrics, index=[0]).T

### 4.3 Download the best model locally
Access the results (such as Models, Artifacts, Metrics) of a previously completed AutoML Run.

In [None]:
# Create local folder
import os

local_dir = "./artifact_downloads"
if not os.path.exists(local_dir):
    os.mkdir(local_dir)

In [None]:
# Download run's artifacts/outputs
local_path = mlflow_client.download_artifacts(
    best_run.info.run_id, "outputs", local_dir
)
print(f"Artifacts downloaded in: {local_path}")
print(f"Artifacts: {os.listdir(local_path)}")

In [None]:
import os

mlflow_model_dir = os.path.join(local_dir, "outputs", "mlflow-model")

# Show the contents of the MLFlow model folder
os.listdir(mlflow_model_dir)

# You should see a list of files such as the following:
# ['artifacts', 'conda.yaml', 'MLmodel', 'python_env.yaml', 'python_model.pkl', 'requirements.txt']

### 4.4 Register model

In [None]:
model_name = "ic-mc-rai-fridge-items-model" + suffix
model = Model(
    path=f"azureml://jobs/{best_run.info.run_id}/outputs/artifacts/outputs/mlflow-model/",
    name=model_name,
    description="my sample image classification multiclass model",
    type=AssetTypes.MLFLOW_MODEL,
)

# for downloaded file
# model = Model(
#     path=mlflow_model_dir,
#     name=model_name,
#     description="my sample image classification multiclass model",
#     type=AssetTypes.MLFLOW_MODEL,
# )

registered_model = ml_client.models.create_or_update(model)

In [None]:
registered_model.id

We need a compute target on which to run our jobs. The following checks whether the compute specified above is present; if not, then the compute target is created.

In [None]:
print(registered_model.name, registered_model.version)

# 5. Creating the RAI Vision Insights

Now that we have our model, we can generate RAI Vision insights for it. We will need the `id` of the registered model, which will be as follows:

In [None]:
expected_model_id = f"{registered_model.name}:{registered_model.version}"
azureml_model_id = f"azureml:{expected_model_id}"

Next, we load the RAI components, so that we can construct a pipeline:

In [None]:
registry_name = "azureml"
credential = DefaultAzureCredential()

ml_client_registry = MLClient(
    credential=credential,
    subscription_id=ml_client.subscription_id,
    resource_group_name=ml_client.resource_group_name,
    # workspace_name=ml_client.workspace_name,
    registry_name=registry_name,
)

rai_vision_insights_component = ml_client_registry.components.get(
    name="rai_vision_insights", label="latest"
)

## 5.1 Constructing the pipeline in sdk
We can now specify our pipeline. Complex objects (such as lists of column names) have to be converted to JSON strings before being passed to the components.


Note:
1. guided_gradcam doesn't work with transformer vision models
2. shap isn't supported for automl images models

For more details on XAI parameters, refer to this [page](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-auto-train-image-models?tabs=cli#generate-explanations-for-predictions).

In [None]:
# Prepare XAI parameters
xai_algorithm = (
    "guided_gradcam"  # xrai, integrated_gradients, guided_gradcam, guided_backprop
)
n_steps = 50  # applicable for xrai, integrated_gradients
xrai_fast = True  # applicable for xrai
approximation_method = "gausslegendre"  # applicable integrated_gradients
confidence_score_threshold_multilabel = 0.5  # applicable for multilabel classification

# Note: latest automl wraps model_name property in model
if "model" in hyperparameter_tag_dict:
    hyperparameter_tag_dict = hyperparameter_tag_dict["model"]

if "model_name" in hyperparameter_tag_dict:
    model_name = hyperparameter_tag_dict["model_name"]
    if "vit" in model_name:
        # guided_gradcam doesn't work with transformer vision models
        # override defaults
        xai_algorithm = "xrai"  # xrai, integrated_gradients, guided_backprop
        n_steps = 50  # applicable for xrai, integrated_gradients
        xrai_fast = True  # applicable for xrai
        approximation_method = "gausslegendre"  # applicable integrated_gradients
        confidence_score_threshold_multilabel = (
            0.5  # applicable for multilabel classification
        )

In [None]:
import json
from azure.ai.ml import Input
from azure.ai.ml.constants import AssetTypes


@dsl.pipeline(
    compute=rai_compute_name,
    description="Example RAI computation on Fridge data",
    experiment_name=f"RAI_Fridge_Example_RAIInsights_Computation",
)
def rai_fridge_image_classification_pipeline(target_column_name, test_data, classes):
    # Initiate the RAIInsights
    rai_image_job = rai_vision_insights_component(
        model_input=Input(type=AssetTypes.MLFLOW_MODEL, path=azureml_model_id),
        test_dataset=test_data,
        task_type="image_classification",
        model_info=expected_model_id,
        target_column_name=target_column_name,
        classes=classes,
        dataset_type="private",
        model_type="pyfunc",
        precompute_explanation=True,
        enable_error_analysis=True,
        xai_algorithm=xai_algorithm,
        n_steps=n_steps,
        xrai_fast=xrai_fast,
        approximation_method=approximation_method,
        confidence_score_threshold_multilabel=confidence_score_threshold_multilabel,
    )
    rai_image_job.set_limits(timeout=7200)

    rai_image_job.outputs.dashboard.mode = "upload"
    rai_image_job.outputs.ux_json.mode = "upload"

    return {
        "dashboard": rai_image_job.outputs.dashboard,
        "ux_json": rai_image_job.outputs.ux_json,
    }

Next, we define the pipeline object itself, and ensure that the outputs will be available for download:

In [None]:
import uuid
from azure.ai.ml import Output

insights_pipeline_job = rai_fridge_image_classification_pipeline(
    target_column_name=target_column_name,
    test_data=my_validation_data_input,  # rai_fridge_validation_mltable,
    classes="[]",
)

rand_path = str(uuid.uuid4())
insights_pipeline_job.outputs.dashboard = Output(
    path=f"azureml://datastores/workspaceblobstore/paths/{rand_path}/dashboard/",
    mode="upload",
    type="uri_folder",
)
insights_pipeline_job.outputs.ux_json = Output(
    path=f"azureml://datastores/workspaceblobstore/paths/{rand_path}/ux_json/",
    mode="upload",
    type="uri_folder",
)

And submit the pipeline to AzureML for execution:

In [None]:
import time
from azure.ai.ml.entities import PipelineJob


def submit_and_wait(ml_client, pipeline_job) -> PipelineJob:
    created_job = ml_client.jobs.create_or_update(pipeline_job)
    assert created_job is not None

    while created_job.status not in [
        "Completed",
        "Failed",
        "Canceled",
        "NotResponding",
    ]:
        time.sleep(30)
        created_job = ml_client.jobs.get(created_job.name)
        print("Latest status : {0}".format(created_job.status))
    assert created_job.status == "Completed"
    return created_job

In [None]:
insights_job = submit_and_wait(ml_client, insights_pipeline_job)

In [None]:
insights_job

The dashboard should appear in the AzureML portal in the registered model view. The following cell computes the expected URI:

In [None]:
sub_id = ml_client._operation_scope.subscription_id
rg_name = ml_client._operation_scope.resource_group_name
ws_name = ml_client.workspace_name

expected_uri = f"https://ml.azure.com/model/{expected_model_id}/model_analysis?wsid=/subscriptions/{sub_id}/resourcegroups/{rg_name}/workspaces/{ws_name}"

print(f"Please visit {expected_uri} to see your analysis")

## 5.2 Constructing the pipeline in YAML

It is also possible to specify the pipeline as a YAML file, and submit that using the command line. We will now create a YAML specification of the above pipeline and submit that:

In [None]:
my_validation_data_input

In [None]:
yaml_contents = f"""
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
experiment_name: AML_RAI_Vision_Sample_{rai_example_version_string}_From_YAML
type: pipeline

compute: azureml:cpucluster

inputs:
  registered_model_name: {registered_model.name}
  vision_model_info: {expected_model_id}
  dataset_type: private
  my_test_data:
    type: mltable
    path: {my_validation_data_input["path"]}
    mode: download

settings:
  default_datastore: azureml:workspaceblobstore
  default_compute: azureml:cpucluster
  continue_on_step_failure: false

jobs:
  analyse_model:
    type: command
    component: azureml://registries/azureml-preview/components/rai_vision_insights/versions/{version_string}
    inputs:
      task_type: image_classification
      model_input:
        type: mlflow_model
        path: {azureml_model_id}
      model_info: ${{{{parent.inputs.vision_model_info}}}}
      test_dataset:
        type: mltable
        path: ${{{{parent.inputs.my_test_data}}}}
      dataset_type: ${{{{parent.inputs.dataset_type}}}}
      target_column_name: {target_column_name}
      maximum_rows_for_test_dataset: 5000
      classes: '[]'
      precompute_explanation: True
      model_type: pyfunc
      xai_algorithm: {xai_algorithm}
      n_steps: {n_steps}
      xrai_fast: {xrai_fast}
      approximation_method: {approximation_method}
      confidence_score_threshold_multilabel: {confidence_score_threshold_multilabel}
"""

yaml_pipeline_filename = "rai_automl_vision_example.yaml"

with open(yaml_pipeline_filename, "w") as f:
    f.write(yaml_contents)

The created file can then be submitted using the Azure CLI:

In [None]:
cmd_line = [
    "az",
    "ml",
    "job",
    "create",
    "--resource-group",
    rg_name,
    "--workspace",
    ws_name,
    "--file",
    yaml_pipeline_filename,
]

import subprocess

try:
    cmd = subprocess.run(cmd_line, check=True, shell=True, capture_output=True)
except subprocess.CalledProcessError as cpe:
    print(f"Error invoking: {cpe.args}")
    print(cpe.stdout)
    print(cpe.stderr)
    raise
else:
    print("Azure CLI submission completed")