sdk/python/responsible-ai/text/responsibleaidashboard-text-classification-blbooksgenre.ipynb

{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "98605bcd", "metadata": {}, "source": [ "# Binary Text Classification scenario with RAI Dashboard as Deployed Endpoint\n", "\n", "The [blbooksgenre dataset](https://huggingface.co/datasets/blbooksgenre) classifies various book titles into fiction and nonfiction titles.\n", "\n", "In this notebook we will look into this binary text classification scenario end to end where the huggingface model will be deployed as a blackbox model webservice endpoint in AzureML from MLFlow." ] }, { "attachments": {}, "cell_type": "markdown", "id": "bca59e45", "metadata": {}, "source": [ "Install datasets to retrieve this dataset from huggingface:" ] }, { "cell_type": "code", "execution_count": null, "id": "888ae09f", "metadata": {}, "outputs": [], "source": [ "%pip install datasets\n", "%pip install ml-wrappers\n", "%pip install \"pandas<2.0.0\"" ] }, { "attachments": {}, "cell_type": "markdown", "id": "fd9b5e44", "metadata": {}, "source": [ "First, we need to specify the version of the RAI components which are available in the workspace. This was specified when the components were uploaded." ] }, { "cell_type": "code", "execution_count": null, "id": "53b4eeac", "metadata": {}, "outputs": [], "source": [ "version_string = \"0.0.20\"" ] }, { "attachments": {}, "cell_type": "markdown", "id": "06008690", "metadata": {}, "source": [ "We also need to give the name of the compute cluster we want to use in AzureML. Later in this notebook, we will create it if it does not already exist:" ] }, { "cell_type": "code", "execution_count": null, "id": "f1ad79f9", "metadata": {}, "outputs": [], "source": [ "compute_name = \"cpucluster\"" ] }, { "attachments": {}, "cell_type": "markdown", "id": "9fc65dc7", "metadata": {}, "source": [ "Finally, we need to specify a version for the data and components we will create while running this notebook. This should be unique for the workspace, but the specific value doesn't matter:" ] }, { "cell_type": "code", "execution_count": null, "id": "78053935", "metadata": {}, "outputs": [], "source": [ "rai_example_version_string = \"41\"" ] }, { "attachments": {}, "cell_type": "markdown", "id": "73be2b63", "metadata": {}, "source": [ "## Accessing the Data\n", "\n", "We supply the data as a pair of parquet files and accompanying `MLTable` file. We can download them, preprocess them, and take a brief look:" ] }, { "cell_type": "code", "execution_count": null, "id": "5f875f18", "metadata": {}, "outputs": [], "source": [ "import os\n", "import datasets\n", "import pandas as pd\n", "\n", "from sklearn import preprocessing\n", "from sklearn.model_selection import train_test_split\n", "\n", "NUM_TEST_SAMPLES = 100" ] }, { "cell_type": "code", "execution_count": null, "id": "ccbfd923", "metadata": {}, "outputs": [], "source": [ "def load_dataset(split):\n", " config_kwargs = {\"name\": \"title_genre_classifiction\"}\n", " dataset = datasets.load_dataset(\n", " \"blbooksgenre\", split=split, trust_remote_code=True, **config_kwargs\n", " )\n", " return pd.DataFrame({\"text\": dataset[\"title\"], \"label\": dataset[\"label\"]})\n", "\n", "\n", "pd_data = load_dataset(\"train\")\n", "\n", "_, pd_test_data = train_test_split(pd_data, test_size=0.2, random_state=0)\n", "\n", "test_data = pd_test_data[:NUM_TEST_SAMPLES]" ] }, { "attachments": {}, "cell_type": "markdown", "id": "17d53df4", "metadata": {}, "source": [ "Now create the mltable:" ] }, { "cell_type": "code", "execution_count": null, "id": "4c7bbe58", "metadata": {}, "outputs": [], "source": [ "pq_filename = \"hf_data.parquet\"\n", "\n", "\n", "def create_ml_table_file_contents(pq_filename):\n", " return (\n", " \"$schema: http://azureml/sdk-2-0/MLTable.json\\n\"\n", " \"type: mltable\\n\"\n", " \"paths:\\n\"\n", " \" - file: ./{0}\\n\"\n", " \"transformations:\\n\"\n", " \" - read_parquet\\n\"\n", " ).format(pq_filename)\n", "\n", "\n", "def write_to_parquet(data, path, pq_filename):\n", " os.makedirs(path, exist_ok=True)\n", " data.to_parquet(os.path.join(path, pq_filename), index=False)\n", "\n", "\n", "def create_ml_table_file(path, contents):\n", " with open(os.path.join(path, \"MLTable\"), \"w\") as f:\n", " f.write(contents)\n", "\n", "\n", "test_data_path = \"test_data\"\n", "\n", "write_to_parquet(test_data, test_data_path, pq_filename)\n", "\n", "mltable_file_contents = create_ml_table_file_contents(pq_filename)\n", "create_ml_table_file(test_data_path, mltable_file_contents)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "a2c4ebb4", "metadata": {}, "source": [ "Load some data for a quick view:" ] }, { "cell_type": "code", "execution_count": null, "id": "1027fa92", "metadata": {}, "outputs": [], "source": [ "import mltable\n", "\n", "tbl = mltable.load(test_data_path)\n", "test_df: pd.DataFrame = tbl.to_pandas_dataframe()\n", "\n", "display(test_df)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "1115ac59", "metadata": {}, "source": [ "The label column contains the classes:" ] }, { "cell_type": "code", "execution_count": null, "id": "5b42df3d", "metadata": {}, "outputs": [], "source": [ "target_column_name = \"label\"" ] }, { "attachments": {}, "cell_type": "markdown", "id": "52e79b04", "metadata": {}, "source": [ "First, we need to upload the datasets to our workspace. We start by creating an `MLClient` for interactions with AzureML:" ] }, { "cell_type": "code", "execution_count": null, "id": "2ac60b07", "metadata": {}, "outputs": [], "source": [ "# Enter details of your AML workspace\n", "subscription_id = \"<SUBSCRIPTION_ID>\"\n", "resource_group = \"<RESOURCE_GROUP>\"\n", "workspace = \"<AML_WORKSPACE_NAME>\"" ] }, { "cell_type": "code", "execution_count": null, "id": "395435fc", "metadata": {}, "outputs": [], "source": [ "# Handle to the workspace\n", "from azure.ai.ml import MLClient\n", "from azure.identity import DefaultAzureCredential\n", "\n", "try:\n", " credential = DefaultAzureCredential()\n", " ml_client = MLClient(\n", " credential=credential,\n", " subscription_id=subscription_id,\n", " resource_group_name=resource_group,\n", " workspace_name=workspace,\n", " )\n", "except Exception:\n", " # If in compute instance we can get the config automatically\n", " from azureml.core import Workspace\n", "\n", " workspace = Workspace.from_config()\n", " workspace.write_config()\n", " ml_client = MLClient.from_config(\n", " credential=DefaultAzureCredential(exclude_shared_token_cache_credential=True),\n", " logging_enable=True,\n", " )\n", "\n", "print(ml_client)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "7b501735", "metadata": {}, "source": [ "We can now upload the data to AzureML:" ] }, { "cell_type": "code", "execution_count": null, "id": "62eb02a2", "metadata": {}, "outputs": [], "source": [ "from azure.ai.ml.entities import Data\n", "from azure.ai.ml.constants import AssetTypes\n", "\n", "input_test_data = \"blbooksgenre_Test_MLTable\"\n", "\n", "try:\n", " test_data = ml_client.data.get(\n", " name=input_test_data,\n", " version=rai_example_version_string,\n", " )\n", "except Exception:\n", " test_data = Data(\n", " path=test_data_path,\n", " type=AssetTypes.MLTABLE,\n", " description=\"RAI blbooksgenre test data\",\n", " name=input_test_data,\n", " version=rai_example_version_string,\n", " )\n", " ml_client.data.create_or_update(test_data)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6815ba75", "metadata": {}, "source": [ "# Creating the Model\n", "\n", "To simplify the model creation process, we're going to use a pipeline.\n", "\n", "We create a directory for the training script:" ] }, { "cell_type": "code", "execution_count": null, "id": "e78d869b", "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "os.makedirs(\"blbooksgenre_component_src\", exist_ok=True)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ea86e55d", "metadata": {}, "source": [ "Next, we write out our script to retrieve the trained model:" ] }, { "cell_type": "code", "execution_count": null, "id": "a523f144", "metadata": {}, "outputs": [], "source": [ "%%writefile blbooksgenre_component_src/training_script.py\n", "\n", "import argparse\n", "import logging\n", "import json\n", "import os\n", "import time\n", "import numpy as np\n", "\n", "import mlflow\n", "import mlflow.pyfunc\n", "\n", "import zipfile\n", "from azureml.core import Run\n", "\n", "from transformers import AutoModelForSequenceClassification, \\\n", " AutoTokenizer, pipeline\n", "\n", "from raiutils.common.retries import retry_function\n", "\n", "try:\n", " from urllib import urlretrieve\n", "except ImportError:\n", " from urllib.request import urlretrieve\n", "\n", "\n", "_logger = logging.getLogger(__file__)\n", "logging.basicConfig(level=logging.INFO)\n", "\n", "\n", "BLBOOKSGENRE_MODEL_NAME = \"blbooksgenre_model\"\n", "NUM_LABELS = 2\n", "\n", "\n", "def parse_args():\n", " # setup arg parser\n", " parser = argparse.ArgumentParser()\n", "\n", " # add arguments\n", " parser.add_argument(\n", " \"--model_output_path\", type=str, help=\"Path to write model info JSON\"\n", " )\n", " parser.add_argument(\n", " \"--model_base_name\", type=str, help=\"Name of the registered model\"\n", " )\n", " parser.add_argument(\n", " \"--model_name_suffix\", type=str, help=\"Model name suffix\"\n", " )\n", " parser.add_argument(\n", " \"--device\", type=int, help=(\n", " \"Device for CPU/GPU supports. Setting this to -1 will leverage \"\n", " \"CPU, >=0 will run the model on the associated CUDA device id.\")\n", " )\n", "\n", " # parse args\n", " args = parser.parse_args()\n", "\n", " # return args\n", " return args\n", "\n", "\n", "class FetchModel(object):\n", " def __init__(self):\n", " pass\n", "\n", " def fetch(self):\n", " zipfilename = BLBOOKSGENRE_MODEL_NAME + '.zip'\n", " url = ('https://publictestdatasets.blob.core.windows.net/models/' +\n", " BLBOOKSGENRE_MODEL_NAME + '.zip')\n", " urlretrieve(url, zipfilename)\n", " with zipfile.ZipFile(zipfilename, 'r') as unzip:\n", " unzip.extractall(BLBOOKSGENRE_MODEL_NAME)\n", "\n", "\n", "def retrieve_blbooksgenre_model():\n", " fetcher = FetchModel()\n", " action_name = \"Model download\"\n", " err_msg = \"Failed to download model\"\n", " max_retries = 4\n", " retry_delay = 60\n", " retry_function(fetcher.fetch, action_name, err_msg,\n", " max_retries=max_retries,\n", " retry_delay=retry_delay)\n", " model = AutoModelForSequenceClassification.from_pretrained(\n", " BLBOOKSGENRE_MODEL_NAME, num_labels=NUM_LABELS)\n", " return model\n", "\n", "\n", "class HuggingfaceWrapper(mlflow.pyfunc.PythonModel):\n", " def __init__(self, pipeline):\n", " self._pipeline = pipeline\n", "\n", " def predict(self, context, model_input):\n", " if isinstance(model_input, np.ndarray):\n", " model_input = model_input.tolist()\n", " result = self._pipeline(model_input)\n", " return result\n", "\n", "\n", "def main(args):\n", " current_experiment = Run.get_context().experiment\n", " tracking_uri = current_experiment.workspace.get_mlflow_tracking_uri()\n", " _logger.info(\"tracking_uri: {0}\".format(tracking_uri))\n", " mlflow.set_tracking_uri(tracking_uri)\n", " mlflow.set_experiment(current_experiment.name)\n", "\n", " _logger.info(\"Getting device\")\n", " device = args.device\n", "\n", " _logger.info(\"Loading parquet input\")\n", "\n", " # load the model and tokenizer\n", " tokenizer = AutoTokenizer.from_pretrained(\"bert-base-uncased\")\n", " model = retrieve_blbooksgenre_model()\n", "\n", " if device >= 0:\n", " model = model.cuda()\n", "\n", " # build a pipeline object to do predictions\n", " _logger.info(\"Building model\")\n", " # set to false to only return predicted score in webservice\n", " return_all_scores = False\n", " pred = pipeline(\n", " \"text-classification\",\n", " model=model,\n", " tokenizer=tokenizer,\n", " device=device,\n", " return_all_scores=return_all_scores\n", " )\n", "\n", " suffix = args.model_name_suffix\n", " registered_name = \"{0}_{1}\".format(args.model_base_name, suffix)\n", " _logger.info(f\"Registering model as {registered_name}\")\n", "\n", " # my_mlflow = PyfuncModel(pred)\n", " my_mlflow = HuggingfaceWrapper(pred)\n", "\n", " # Saving model with mlflow\n", " _logger.info(\"Saving with mlflow\")\n", " mlflow.pyfunc.log_model(\n", " python_model=my_mlflow,\n", " registered_model_name=registered_name,\n", " artifact_path=registered_name,\n", " pip_requirements=['mlflow', 'torch>=2.2.2', 'transformers>=4.17.0,<=4.44.0', 'numpy<2.0.0'])\n", "\n", " _logger.info(\"Writing JSON\")\n", " dict = {\"id\": \"{0}:1\".format(registered_name)}\n", " output_path = os.path.join(args.model_output_path, \"model_info.json\")\n", " with open(output_path, \"w\") as of:\n", " json.dump(dict, fp=of)\n", "\n", "\n", "# run script\n", "if __name__ == \"__main__\":\n", " # add space in logs\n", " print(\"*\" * 60)\n", " print(\"\\n\\n\")\n", "\n", " # parse args\n", " args = parse_args()\n", "\n", " # run main function\n", " main(args)\n", "\n", " # add space in logs\n", " print(\"*\" * 60)\n", " print(\"\\n\\n\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e115dd6e", "metadata": {}, "source": [ "Now, we can build this into an AzureML component:" ] }, { "cell_type": "code", "execution_count": null, "id": "3d54e43f", "metadata": {}, "outputs": [], "source": [ "from azure.ai.ml import load_component\n", "\n", "yaml_contents = f\"\"\"\n", "$schema: http://azureml/sdk-2-0/CommandComponent.json\n", "name: rai_training_component\n", "display_name: blbooksgenre training component for RAI example\n", "version: {rai_example_version_string}\n", "type: command\n", "inputs:\n", " model_base_name:\n", " type: string\n", " model_name_suffix:\n", " type: string\n", " device: # set to >= 0 to use GPU\n", " type: integer\n", " default: 0\n", "outputs:\n", " model_output_path:\n", " type: path\n", "code: ./blbooksgenre_component_src/\n", "environment: azureml://registries/azureml/environments/responsibleai-text/versions/13\n", "command: >-\n", " python training_script.py\n", " --model_base_name ${{{{inputs.model_base_name}}}}\n", " --model_name_suffix ${{{{inputs.model_name_suffix}}}}\n", " --device ${{{{inputs.device}}}}\n", " --model_output_path ${{{{outputs.model_output_path}}}}\n", "\"\"\"\n", "\n", "yaml_filename = \"blbooksgenreTextTrainingComp.yaml\"\n", "\n", "with open(yaml_filename, \"w\") as f:\n", " f.write(yaml_contents)\n", "\n", "train_component_definition = load_component(source=yaml_filename)\n", "\n", "ml_client.components.create_or_update(train_component_definition)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6d165e2b", "metadata": {}, "source": [ "We need a compute target on which to run our jobs. The following checks whether the compute specified above is present; if not, then the compute target is created." ] }, { "cell_type": "code", "execution_count": null, "id": "1e40fc38", "metadata": {}, "outputs": [], "source": [ "from azure.ai.ml.entities import AmlCompute\n", "\n", "all_compute_names = [x.name for x in ml_client.compute.list()]\n", "\n", "if compute_name in all_compute_names:\n", " print(f\"Found existing compute: {compute_name}\")\n", "else:\n", " my_compute = AmlCompute(\n", " name=compute_name,\n", " size=\"STANDARD_DS3_V2\",\n", " min_instances=0,\n", " max_instances=4,\n", " idle_time_before_scale_down=3600,\n", " )\n", " ml_client.compute.begin_create_or_update(my_compute)\n", " print(\"Initiated compute creation\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "9d8eb868", "metadata": {}, "source": [ "## Running a training pipeline\n", "\n", "Now that we have our training component, we can run it. We begin by generating a unique name for the mode;" ] }, { "cell_type": "code", "execution_count": null, "id": "a80b5b12", "metadata": {}, "outputs": [], "source": [ "import random\n", "import string\n", "\n", "# Creating a unique model and endpoint name by including a random suffix\n", "allowed_chars = string.ascii_lowercase + string.digits\n", "endpoint_suffix = \"\".join(random.choice(allowed_chars) for x in range(5))" ] }, { "cell_type": "code", "execution_count": null, "id": "ad76242b", "metadata": {}, "outputs": [], "source": [ "import time\n", "\n", "model_base_name = \"blbooksgenre_model\"\n", "model_name_suffix = endpoint_suffix\n", "device = -1" ] }, { "attachments": {}, "cell_type": "markdown", "id": "d49615a7", "metadata": {}, "source": [ "Next, we define our training pipeline. This has two components. The first is the training component which we defined above. The second is a component to register the model in AzureML:" ] }, { "cell_type": "code", "execution_count": null, "id": "cb6c6cec", "metadata": {}, "outputs": [], "source": [ "from azure.ai.ml import dsl, Input\n", "\n", "train_model_component = ml_client.components.get(\n", " name=\"rai_training_component\", version=rai_example_version_string\n", ")\n", "\n", "\n", "@dsl.pipeline(\n", " compute=compute_name,\n", " description=\"Register Model for RAI blbooksgenre example\",\n", " experiment_name=f\"RAI_blbooksgenre_Example_Model_Training_{model_name_suffix}\",\n", ")\n", "def my_training_pipeline(model_base_name, model_name_suffix, device):\n", " trained_model = train_component_definition(\n", " model_base_name=model_base_name,\n", " model_name_suffix=model_name_suffix,\n", " device=device,\n", " )\n", " trained_model.set_limits(timeout=3600)\n", "\n", " return {}\n", "\n", "\n", "model_registration_pipeline_job = my_training_pipeline(\n", " model_base_name, model_name_suffix, device\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "2fa66ea6", "metadata": {}, "source": [ "With the training pipeline defined, we can submit it for execution in AzureML. We define a helper function to wait for the job to complete:" ] }, { "cell_type": "code", "execution_count": null, "id": "f854eef5", "metadata": {}, "outputs": [], "source": [ "from azure.ai.ml.entities import PipelineJob\n", "\n", "\n", "def submit_and_wait(ml_client, pipeline_job) -> PipelineJob:\n", " created_job = ml_client.jobs.create_or_update(pipeline_job)\n", " assert created_job is not None\n", "\n", " while created_job.status not in [\n", " \"Completed\",\n", " \"Failed\",\n", " \"Canceled\",\n", " \"NotResponding\",\n", " ]:\n", " time.sleep(30)\n", " created_job = ml_client.jobs.get(created_job.name)\n", " print(\"Latest status : {0}\".format(created_job.status))\n", " print(\"pipeline job outputs: \", created_job.outputs)\n", " print(created_job.outputs)\n", " # get job details\n", " print(created_job)\n", " # stream the job logs\n", " ml_client.jobs.stream(name=created_job.name)\n", " assert created_job.status == \"Completed\"\n", " return created_job\n", "\n", "\n", "# This is the actual submission\n", "training_job = submit_and_wait(ml_client, model_registration_pipeline_job)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "36ee01e0", "metadata": {}, "source": [ "Now we can define the registered model id:" ] }, { "cell_type": "code", "execution_count": null, "id": "6f9364c7", "metadata": {}, "outputs": [], "source": [ "model_name = f\"{model_base_name}_{model_name_suffix}\"\n", "model_version = 1" ] }, { "attachments": {}, "cell_type": "markdown", "id": "39e98d0e", "metadata": {}, "source": [ "### Create Scoring Endpoint " ] }, { "cell_type": "code", "execution_count": null, "id": "3ee0683b", "metadata": {}, "outputs": [], "source": [ "endpoint_name = \"blbooksgenre-classifier-\" + endpoint_suffix\n", "\n", "print(f\"Endpoint name: {endpoint_name}\")" ] }, { "cell_type": "code", "execution_count": null, "id": "49583af0", "metadata": {}, "outputs": [], "source": [ "import os\n", "import mlflow\n", "\n", "mlflow_tracking_key = \"MLFLOW_TRACKING_URI\"\n", "\n", "if mlflow_tracking_key in os.environ:\n", " target_uri = os.environ[mlflow_tracking_key]\n", "else:\n", " ws = ml_client.workspaces.get(name=workspace)\n", " target_uri = ws.mlflow_tracking_uri\n", " mlflow.set_tracking_uri(target_uri)" ] }, { "cell_type": "code", "execution_count": null, "id": "d0b9713f", "metadata": {}, "outputs": [], "source": [ "from mlflow.deployments import get_deploy_client\n", "\n", "deployment_client = get_deploy_client(target_uri)" ] }, { "cell_type": "code", "execution_count": null, "id": "066bff17", "metadata": {}, "outputs": [], "source": [ "endpoint = deployment_client.create_endpoint(endpoint_name)" ] }, { "cell_type": "code", "execution_count": null, "id": "d6adcde5", "metadata": {}, "outputs": [], "source": [ "scoring_uri = deployment_client.get_endpoint(endpoint=endpoint_name)[\"properties\"][\n", " \"scoringUri\"\n", "]\n", "print(scoring_uri)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "8aad33f2", "metadata": {}, "source": [ "### Create a deployment" ] }, { "cell_type": "code", "execution_count": null, "id": "fe1dcb36", "metadata": {}, "outputs": [], "source": [ "deployment_name = \"default\"" ] }, { "cell_type": "code", "execution_count": null, "id": "59600239", "metadata": {}, "outputs": [], "source": [ "deploy_config = {\n", " \"instance_type\": \"Standard_DS4_v2\",\n", " \"instance_count\": 1,\n", " \"readiness_probe\": {\"initial_delay\": 60, \"timeout\": 5},\n", "}" ] }, { "cell_type": "code", "execution_count": null, "id": "8a57c773", "metadata": {}, "outputs": [], "source": [ "import json\n", "\n", "deployment_config_path = \"deployment_config.json\"\n", "with open(deployment_config_path, \"w\") as outfile:\n", " outfile.write(json.dumps(deploy_config))" ] }, { "cell_type": "code", "execution_count": null, "id": "8d154900", "metadata": {}, "outputs": [], "source": [ "f\"models:/{model_name}/{model_version}\"" ] }, { "cell_type": "code", "execution_count": null, "id": "314aa10d", "metadata": {}, "outputs": [], "source": [ "deployment = deployment_client.create_deployment(\n", " name=deployment_name,\n", " endpoint=endpoint_name,\n", " model_uri=f\"models:/{model_name}/{model_version}\",\n", " config={\"deploy-config-file\": deployment_config_path},\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "ce6b6c82", "metadata": {}, "outputs": [], "source": [ "traffic_config = {\"traffic\": {deployment_name: 100}}" ] }, { "cell_type": "code", "execution_count": null, "id": "a13eb07d", "metadata": {}, "outputs": [], "source": [ "traffic_config_path = \"traffic_config.json\"\n", "with open(traffic_config_path, \"w\") as outfile:\n", " outfile.write(json.dumps(traffic_config))" ] }, { "cell_type": "code", "execution_count": null, "id": "9379c15d", "metadata": {}, "outputs": [], "source": [ "deployment_client.update_endpoint(\n", " endpoint=endpoint_name,\n", " config={\"endpoint-config-file\": traffic_config_path},\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "f3224875", "metadata": {}, "outputs": [], "source": [ "endpoint_secret_key = ml_client.online_endpoints.get_keys(\n", " name=endpoint_name\n", ").access_token" ] }, { "attachments": {}, "cell_type": "markdown", "id": "1dfff912", "metadata": {}, "source": [ "Now let's wrap the registered endpoint and register it as a model in azureml:" ] }, { "cell_type": "code", "execution_count": null, "id": "52b43f7b", "metadata": {}, "outputs": [], "source": [ "from ml_wrappers.model import EndpointWrapperModel\n", "\n", "extra_headers = {\"azureml-model-deployment\": \"default\"}\n", "endpoint_wrapper = EndpointWrapperModel(\n", " endpoint_secret_key,\n", " scoring_uri,\n", " extra_headers,\n", " transform_output_dict=True,\n", " class_names=[\"LABEL_0\", \"LABEL_1\"],\n", " wrap_input_data_dict=True,\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "e4ed64a7", "metadata": {}, "outputs": [], "source": [ "endpoint_wrapper.predict(test_df.iloc[0:2][\"text\"].tolist())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "26e8e45a", "metadata": {}, "source": [ "Now we can register the endpoint wrapper using mlflow as another model:" ] }, { "cell_type": "code", "execution_count": null, "id": "d28aaa31", "metadata": {}, "outputs": [], "source": [ "endpoint_model_name = \"wrapped_endpoint_\" + model_name\n", "expected_model_id = f\"{endpoint_model_name}:{model_version}\"\n", "azureml_model_id = f\"azureml:{expected_model_id}\"\n", "mlflow.pyfunc.log_model(\n", " python_model=endpoint_wrapper,\n", " registered_model_name=endpoint_model_name,\n", " artifact_path=endpoint_model_name,\n", " pip_requirements=[\"ml-wrappers\"],\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "0722395e", "metadata": {}, "source": [ "## Creating the RAI Text Insights\n", "\n", "Now that we have our model, we can generate RAI Text insights for it.\n", "Next, we load the RAI components, so that we can construct a pipeline:" ] }, { "cell_type": "code", "execution_count": null, "id": "d67b942e", "metadata": {}, "outputs": [], "source": [ "blbooksgenre_test_mltable = Input(\n", " type=\"mltable\",\n", " path=f\"{input_test_data}:{rai_example_version_string}\",\n", " mode=\"download\",\n", ")\n", "\n", "registry_name = \"azureml\"\n", "credential = DefaultAzureCredential()\n", "\n", "ml_client_registry = MLClient(\n", " credential=credential,\n", " subscription_id=ml_client.subscription_id,\n", " resource_group_name=ml_client.resource_group_name,\n", " registry_name=registry_name,\n", ")\n", "\n", "rai_text_insights_component = ml_client_registry.components.get(\n", " name=\"rai_text_insights\", version=version_string\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c98cd2d9", "metadata": {}, "source": [ "We can now specify our pipeline. Complex objects (such as lists of column names) have to be converted to JSON strings before being passed to the components." ] }, { "cell_type": "code", "execution_count": null, "id": "a62105a7", "metadata": {}, "outputs": [], "source": [ "import json\n", "from azure.ai.ml import Input\n", "from azure.ai.ml.constants import AssetTypes\n", "\n", "\n", "@dsl.pipeline(\n", " compute=compute_name,\n", " description=\"Example RAI computation on blbooksgenre data\",\n", " experiment_name=f\"RAI_blbooksgenre_Example_RAIInsights_Computation_{model_name_suffix}\",\n", ")\n", "def rai_blbooksgenre_text_classification_pipeline(\n", " target_column_name,\n", " test_data,\n", " classes,\n", " use_model_dependency,\n", "):\n", " # Initiate the RAIInsights\n", " rai_text_job = rai_text_insights_component(\n", " task_type=\"text_classification\",\n", " model_info=expected_model_id,\n", " model_input=Input(type=AssetTypes.MLFLOW_MODEL, path=azureml_model_id),\n", " test_dataset=test_data,\n", " target_column_name=target_column_name,\n", " classes=classes,\n", " use_model_dependency=use_model_dependency,\n", " )\n", " rai_text_job.set_limits(timeout=7200)\n", "\n", " rai_text_job.outputs.dashboard.mode = \"upload\"\n", " rai_text_job.outputs.ux_json.mode = \"upload\"\n", "\n", " return {\n", " \"dashboard\": rai_text_job.outputs.dashboard,\n", " \"ux_json\": rai_text_job.outputs.ux_json,\n", " }" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6b5b14a9", "metadata": {}, "source": [ "Next, we define the pipeline object itself, and ensure that the outputs will be available for download:" ] }, { "cell_type": "code", "execution_count": null, "id": "e4d86ec2", "metadata": {}, "outputs": [], "source": [ "import uuid\n", "from azure.ai.ml import Output\n", "\n", "insights_pipeline_job = rai_blbooksgenre_text_classification_pipeline(\n", " target_column_name=target_column_name,\n", " test_data=blbooksgenre_test_mltable,\n", " classes=\"[]\",\n", " use_model_dependency=True,\n", ")\n", "\n", "rand_path = str(uuid.uuid4())\n", "insights_pipeline_job.outputs.dashboard = Output(\n", " path=f\"azureml://datastores/workspaceblobstore/paths/{rand_path}/dashboard/\",\n", " mode=\"upload\",\n", " type=\"uri_folder\",\n", ")\n", "insights_pipeline_job.outputs.ux_json = Output(\n", " path=f\"azureml://datastores/workspaceblobstore/paths/{rand_path}/ux_json/\",\n", " mode=\"upload\",\n", " type=\"uri_folder\",\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "25f34573", "metadata": {}, "source": [ "And submit the pipeline to AzureML for execution:" ] }, { "cell_type": "code", "execution_count": null, "id": "2ca757f7", "metadata": {}, "outputs": [], "source": [ "insights_job = submit_and_wait(ml_client, insights_pipeline_job)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "1381768a", "metadata": {}, "source": [ "The dashboard should appear in the AzureML portal in the registered model view. The following cell computes the expected URI:" ] }, { "cell_type": "code", "execution_count": null, "id": "e86ab611", "metadata": {}, "outputs": [], "source": [ "sub_id = ml_client._operation_scope.subscription_id\n", "rg_name = ml_client._operation_scope.resource_group_name\n", "ws_name = ml_client.workspace_name\n", "\n", "expected_uri = f\"https://ml.azure.com/model/{expected_model_id}/model_analysis?wsid=/subscriptions/{sub_id}/resourcegroups/{rg_name}/workspaces/{ws_name}\"\n", "\n", "print(f\"Please visit {expected_uri} to see your analysis\")" ] } ], "metadata": { "celltoolbar": "Raw Cell Format", "kernelspec": { "display_name": "Python 3.10 - SDK V2", "language": "python", "name": "python310-sdkv2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.15" }, "vscode": { "interpreter": { "hash": "8fd340b5477ca1a0b454d48a3973beff39fee032ada47a04f6f3725b469a8988" } } }, "nbformat": 4, "nbformat_minor": 5 }

sdk/python/responsible-ai/text/responsibleaidashboard-text-classification-blbooksgenre.ipynb (1,241 lines of code) (raw):