sdk/python/responsible-ai/tabular/responsibleaidashboard-finance-loan-classification/responsibleaidashboard-finance-loan-classification.ipynb (1,911 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"id": "a671f6c6-c6fb-442d-aef6-fb7c282e0221",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"# Loan Classification RAI dashboard\n",
"This notebook demonstrates the use of the `responsibleai` API to assess a classification model trained on Fabricated Loan classification dataset. The model predicts **if a loan applicant will be accepted or rejected for loan** based on the independent features:\n",
"\n",
"- Gender\n",
"- Loan_Requirement(1000$)\n",
"- Loan_Tenure\n",
"- Home_Ownership\n",
"- Income(1000$)\n",
"- Employment_Tenure\n",
"- Credit_score\n",
"- Age\n",
"- Credit_Utlization\n",
"- Active_Balance(1000$)\n",
"- Open_CreditAcc\n",
"- TotalIncome_to_Debt\n",
"- Pre-Approved\n",
"\n",
"The Data Dictionary can be accessed through the following link: [Data_dictionary_Finance](link-URL)\n",
"\n",
"The Notebook walks through the API calls necessary to create a widget with model analysis insights, then guides a visual analysis of the model."
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "1dd5ed12",
"metadata": {},
"source": [
"## **Installation** \n",
"\n",
"If you are **running the notebook for the first time**, you need to follow a few of steps for smooth execution of notebook:\n",
"\n",
"1. Un-comment the below cell.\n",
"2. Run the cell.\n",
"3. After execution of this cell, comment the cell.\n",
"4. Re-start the kernel\n",
"5. Continue with running of all cells.\n",
"\n",
"\n",
"**Reminder** -- Be sure to set your kernel to \"Python 3.8 - AzureML,\" via the drop-down menu at the right end of the taskbar. "
]
},
{
"cell_type": "markdown",
"id": "680805aa",
"metadata": {},
"source": [
"### Install required dependencies"
]
},
{
"cell_type": "markdown",
"id": "78b43899",
"metadata": {},
"source": [
"**Make sure it comment the below cell while executing the notebook more than once**"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "913d7fa9",
"metadata": {},
"outputs": [],
"source": [
"%pip install azure-ai-ml\n",
"%pip install sklearn"
]
},
{
"cell_type": "markdown",
"id": "37287826-d891-49f4-b473-7eb2625d270d",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"## **User Configuration** \n",
"Confirm the compute name listed here is the same that was created using the included ARM template. If not, change this name so they match. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "763128bc-b820-4324-b37a-2f2005abd0ae",
"metadata": {
"gather": {
"logged": 1681455138983
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"# Pass the name of your compute instance (See step 6 below for it's use)\n",
"compute_name = \"raitextclusterlarge\""
]
},
{
"cell_type": "markdown",
"id": "7bf896d2-32e0-4df0-a69a-ccea3bd2e4c0",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"## **After changing the above cell click on Run All.**\n",
"**The notebook will follow the below steps and complete execution in 15-30 minutes depending upon compute configurations**"
]
},
{
"cell_type": "markdown",
"id": "63bc551a-5c3a-4a5d-8a81-6f94d13b1428",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"## Automated Notebook steps:\n",
"\n",
"**Step 1:** Loading the Data.\n",
"\n",
"**Step 2:** Pre-processing.\n",
"\n",
"**Step 3:** Splitting into Train Test datasets.\n",
"\n",
"**Step 4:** Registering the datasets as data assets in AML.\n",
"\n",
"**Step 5:** Define training and registering scripts for use in Training Pipeline.\n",
"\n",
"**Step 6:** Create compute instance (if compute instance name not passed).\n",
"\n",
"**Step 7:** Executing Model Training pipeline.\n",
"\n",
"**Step 8:** Define components for Responsible AI Dashboard Generation Pipeline (The components are explained in later parts).\n",
"\n",
"**Step 9:** Execute Dashboard Generation Pipeline (generate scorecard and save in directory).\n",
"\n",
"**Step 10:** Click on the link at the end of the notebook to access the dashboard generated."
]
},
{
"cell_type": "markdown",
"id": "2f135ebb-9123-4721-9b55-1fc753a03b44",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"## Loading required modules"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "eb59e9ef-ea11-4107-a2cd-82c20e5fe85a",
"metadata": {
"gather": {
"logged": 1681455144593
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import os\n",
"\n",
"import sklearn\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.pipeline import Pipeline\n",
"from sklearn.impute import SimpleImputer\n",
"from sklearn.preprocessing import StandardScaler, OneHotEncoder\n",
"from sklearn.compose import ColumnTransformer\n",
"\n",
"import zipfile\n",
"from io import BytesIO\n",
"import requests"
]
},
{
"cell_type": "markdown",
"id": "db7edfd5-f5a5-42ff-a718-d21f34ce3a17",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"## Accessing the Data\n",
"\n",
"The following section examines the code necessary to create datasets and a model using components in AzureML."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "29bb4891-c33b-4413-a84b-353cce46f1ce",
"metadata": {
"gather": {
"logged": 1681455144885
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"def get_data(data_location, independent_features, target_feature, drop_col=None):\n",
" \"\"\"\n",
" Function to read data in Pandas dataframe\n",
" [TODO: Add any preprocessing steps within this function]\n",
"\n",
" Parameters\n",
" ----------\n",
" data_location: string\n",
" Path of the Dataset\n",
" independent_features: list\n",
" List of names of the independent features\n",
" target_feature: string\n",
" Name of the target/dependent features\n",
" drop_col: list\n",
" List of column names to drop\n",
"\n",
" Returns\n",
" -------\n",
" df: Pandas DataFrame\n",
" Pandas dataframe containing the dataset with the names passed\n",
" \"\"\"\n",
" column_names = independent_features + [target_feature]\n",
"\n",
" # Download the blob data from the provided URL\n",
" response = requests.get(data_location)\n",
" blob_content = response.content\n",
"\n",
" with zipfile.ZipFile(BytesIO(blob_content), \"r\") as zip_ref:\n",
" file_list = zip_ref.namelist()\n",
" if len(file_list) > 0:\n",
" # Assume the first file in the zip contains the data\n",
" inner_blob_name = file_list[0]\n",
" inner_blob_content = zip_ref.read(inner_blob_name)\n",
" df = pd.read_csv(BytesIO(inner_blob_content))\n",
"\n",
" # df = pd.read_csv(data_location)\n",
" l = list(df.columns)\n",
" l.remove(target_feature)\n",
" df = df[l + [target_feature]]\n",
" df.columns = column_names\n",
" if drop_col is not None:\n",
" df.drop(drop_col, axis=1, inplace=True)\n",
" return df"
]
},
{
"cell_type": "markdown",
"id": "49ebcbcc-b1ef-4425-b8d3-a09d6e7a3dca",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"### Reading & Encoding the dataset"
]
},
{
"cell_type": "markdown",
"id": "9c191e6f-1b7d-4eb2-a81d-604d95c1b1ce",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"We load the data from github Repo directly and do basic pre-processing steps."
]
},
{
"cell_type": "markdown",
"id": "f4ad76ec",
"metadata": {},
"source": [
"**Categorical Codes for \"LoanStatus\":**\n",
"\n",
" **Approved: The customer was approved for the Loan**\n",
" \n",
" **Rejected: The customer was not approved for the Loan**"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f92f0dd0-c79b-4438-88dd-00d15d9a5b55",
"metadata": {
"gather": {
"logged": 1681455151112
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"data_df = get_data(\n",
" # data_location=\"./Fabricated_Loan_data.csv\",\n",
" data_location=\"https://publictestdatasets.blob.core.windows.net/data/RAI_fabricated_loan_data.zip\",\n",
" target_feature=\"LoanStatus\",\n",
" independent_features=[\n",
" \"Gender\",\n",
" \"Loan_Requirement (in 1000$)\",\n",
" \"Loan_Tenure\",\n",
" \"Home_Ownership\",\n",
" \"Income (in 1000$)\",\n",
" \"Employment_Tenure\",\n",
" \"Credit_score\",\n",
" \"Age\",\n",
" \"Credit_Utlization\",\n",
" \"Active_Credit_balance (in 1000$)\",\n",
" \"Open_CreditAcc\",\n",
" \"Income_to_debt_ratio\",\n",
" \"Pre-Approved\",\n",
" ],\n",
")\n",
"\n",
"data_encoded = data_df.copy()\n",
"\n",
"loan_encoding = {\n",
" True: \"Approved\",\n",
" False: \"Rejected\",\n",
"}\n",
"\n",
"data_encoded.replace({\"LoanStatus\": loan_encoding}, inplace=True)\n",
"data_encoded"
]
},
{
"cell_type": "markdown",
"id": "a0d0324a-7062-467b-bc9d-5f5ea6459654",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"### Splitting the Data into training and test datasets"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e3d9b848-7a43-4ef8-b27f-a069e1000b55",
"metadata": {
"gather": {
"logged": 1681455156987
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"data_train, data_test = train_test_split(\n",
" data_encoded, test_size=0.25, random_state=7, stratify=data_encoded[\"LoanStatus\"]\n",
")\n",
"\n",
"if len(data_test) <= 5000:\n",
" print(\"Proceed with the analysis\")\n",
"else:\n",
" print(\"Reduce your test data size\")"
]
},
{
"cell_type": "markdown",
"id": "67603e63-940d-4e86-8d5d-25817da8b9a3",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"### Get the Data to AzureML\n",
"\n",
"With the data now split into 'train' and 'test' DataFrames, we save them out to files in preparation for upload into AzureML:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9fdf2a40-6525-4d89-b8da-30d3fee832cf",
"metadata": {
"gather": {
"logged": 1681455157794
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"train_data_path = \"./data_loan_classification/train/\"\n",
"test_data_path = \"./data_loan_classification/test/\"\n",
"\n",
"os.makedirs(train_data_path, exist_ok=True)\n",
"os.makedirs(test_data_path, exist_ok=True)\n",
"\n",
"train_filename = train_data_path + \"loan_classification_train.parquet\"\n",
"test_filename = test_data_path + \"loan_classification_test.parquet\"\n",
"\n",
"data_train.to_parquet(train_filename, index=False)\n",
"data_test.to_parquet(test_filename, index=False)"
]
},
{
"cell_type": "markdown",
"id": "3d3ad3c1-7c34-45d7-8bd8-c4a85a135185",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"We are going to create two Datasets in AzureML, one for the train and one for the test datasets. The first step is to create an `MLClient` to perform the upload. The method we use assumes that there is a `config.json` file (downloadable from the Azure or AzureML portals) present in the same directory as this notebook file:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "485b1fb8",
"metadata": {},
"outputs": [],
"source": [
"# Enter details of your AML workspace\n",
"subscription_id = \"<SUBSCRIPTION_ID>\"\n",
"resource_group = \"<RESOURCE_GROUP>\"\n",
"workspace = \"<AML_WORKSPACE_NAME>\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0b6e1ca6-9ef9-43c2-95f8-a98a3b93dbb4",
"metadata": {
"gather": {
"logged": 1681455158948
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"# Handle to the workspace\n",
"from azure.ai.ml import MLClient\n",
"from azure.identity import DefaultAzureCredential\n",
"\n",
"credential = DefaultAzureCredential()\n",
"ml_client = MLClient(\n",
" credential=credential,\n",
" subscription_id=subscription_id,\n",
" resource_group_name=resource_group,\n",
" workspace_name=workspace,\n",
")\n",
"print(ml_client)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3635442a-1b78-4814-b082-b8ddb7c231fa",
"metadata": {
"gather": {
"logged": 1681455164018
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"# Define Version string (optional)\n",
"rai_loan_classification_example_version_string = \"1\""
]
},
{
"cell_type": "markdown",
"id": "c2c758e6-a5cc-47fa-9714-16f22c8bf031",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"### Create an asset MLtable (or URI file) to register the Data into workspace\n",
"This is essential, as the dashboard recognizes only registered assets. \n",
"\n",
"Reference:\n",
"https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-data-assets?tabs=Python-SDK"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6629e7f2-fe9f-44f5-8244-d9d04c670d04",
"metadata": {
"gather": {
"logged": 1681455164688
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"from azure.ai.ml.entities import Data\n",
"from azure.ai.ml.constants import AssetTypes"
]
},
{
"cell_type": "markdown",
"id": "9d106eef-33ae-44db-8d29-3f161ff28e5f",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"#### Change the asset name of the below file if the train/test data has changed"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b2b4360e-d977-43c5-a543-0a634532c5f4",
"metadata": {
"gather": {
"logged": 1681455168719
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"input_train_data = \"train_loan_classification\"\n",
"\n",
"try:\n",
" # Try getting data already registered in workspace\n",
" train_data = ml_client.data.get(\n",
" name=input_train_data,\n",
" version=rai_loan_classification_example_version_string,\n",
" )\n",
"\n",
"except Exception as e:\n",
" train_data = Data(\n",
" path=train_filename,\n",
" type=AssetTypes.URI_FILE,\n",
" description=\"RAI loan classification example training data\",\n",
" name=input_train_data,\n",
" version=rai_loan_classification_example_version_string,\n",
" )\n",
" ml_client.data.create_or_update(train_data)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e6e9317e-d755-4128-9700-733e16a9bec3",
"metadata": {
"gather": {
"logged": 1681455169017
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"input_test_data = \"test_loan_classification\"\n",
"\n",
"try:\n",
" # Try getting data already registered in workspace\n",
" test_data = ml_client.data.get(\n",
" name=input_test_data,\n",
" version=rai_loan_classification_example_version_string,\n",
" )\n",
"\n",
"except Exception as e:\n",
" test_data = Data(\n",
" path=test_filename,\n",
" type=AssetTypes.URI_FILE,\n",
" description=\"RAI loan classification example test data\",\n",
" name=input_test_data,\n",
" version=rai_loan_classification_example_version_string,\n",
" )\n",
" ml_client.data.create_or_update(test_data)"
]
},
{
"cell_type": "markdown",
"id": "3f66aeab-a4bd-4ca8-9e96-fc141f87d2ca",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"## A model training pipeline\n",
"\n",
"To simplify the model creation process, we're going to use a pipeline. This will have two stages:\n",
"\n",
"1. The actual training component\n",
"2. A model registration component\n",
"\n",
"We have to register the model in AzureML in order for our RAI insights components to use it.\n",
"\n",
"### The Training Component\n",
"\n",
"The training component is for this particular model. In this case, we are going to train an `Logistic Classifier` on the input data and save it using MLFlow. We need command line arguments to specify the location of the input data, the location where MLFlow should write the output model, and the name of the target column in the dataset.\n",
"\n",
"We start by creating a directory to hold the component source:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9c0f9bde",
"metadata": {
"gather": {
"logged": 1681455169326
}
},
"outputs": [],
"source": [
"os.makedirs(\"./component_src\", exist_ok=True)\n",
"os.makedirs(\"./register_model_src\", exist_ok=True)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "d9b95f7d",
"metadata": {},
"source": [
"**Create the training script** \n",
"This cell creates a machine learning pipeline that trains a Logistic classifier using labeled data and then saves the trained model to a specified output path using MLFlow. \n",
"- The code reads in the training data as a pandas dataframe from a specified path, extracts the target column name, and separates the target column from the feature columns. \n",
"- Feature columns are then preprocessed using both a standard scaler for numeric data and a one-hot encoder for categorical data. \n",
"- Preprocessed feature columns and target column are then fed into the Gaussian Naive Bayes classifier. \n",
"- The trained model is saved to a temporary directory and then copied to the specified output path. \n",
"- Code takes command-line arguments for the paths of the training data, the output model, and the name of the target column. \n",
"- The code also uses the Azure Machine Learning (AML) Python SDK to log the model and tracking information with MLFlow. \n",
"- Additional comments in the code provide details on each section of the pipeline."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bdb4e1af",
"metadata": {},
"outputs": [],
"source": [
"%%writefile component_src/classification_training_script.py\n",
"\n",
"import argparse\n",
"import os\n",
"import shutil\n",
"import tempfile\n",
"\n",
"\n",
"from azureml.core import Run\n",
"\n",
"import mlflow\n",
"import mlflow.sklearn\n",
"from sklearn.pipeline import Pipeline\n",
"from sklearn.preprocessing import StandardScaler, OneHotEncoder, OrdinalEncoder\n",
"from sklearn.compose import ColumnTransformer\n",
"\n",
"import pandas as pd\n",
"from sklearn.linear_model import LogisticRegression\n",
"\n",
"def parse_args():\n",
" # setup arg parser\n",
" parser = argparse.ArgumentParser()\n",
"\n",
" # add arguments\n",
" parser.add_argument(\"--training_data\", type=str, help=\"Path to training data\")\n",
" parser.add_argument(\"--target_column_name\", type=str, help=\"Name of target column\")\n",
" parser.add_argument(\"--model_output\", type=str, help=\"Path of output model\")\n",
"\n",
" # parse args\n",
" args = parser.parse_args()\n",
"\n",
" # return args\n",
" return args\n",
"\n",
"\n",
"def main(args):\n",
" current_experiment = Run.get_context().experiment\n",
" tracking_uri = current_experiment.workspace.get_mlflow_tracking_uri()\n",
" print(\"tracking_uri: {0}\".format(tracking_uri))\n",
" mlflow.set_tracking_uri(tracking_uri)\n",
" mlflow.set_experiment(current_experiment.name)\n",
"\n",
" # Read in data\n",
" print(\"Reading data\")\n",
" all_data = pd.read_parquet(args.training_data)\n",
"\n",
" print(\"Extracting X_train, y_train\")\n",
" print(\"all_data cols: {0}\".format(all_data.columns))\n",
" y_train = all_data[args.target_column_name]\n",
" X_train = all_data.drop(labels=args.target_column_name, axis=\"columns\")\n",
" print(\"X_train cols: {0}\".format(X_train.columns))\n",
"\n",
" print(\"Executing Model Training pipeline\")\n",
" # We create the preprocessing pipelines for both numeric and categorical data.\n",
" numeric_transformer = Pipeline(steps=[\n",
" ('scaler', StandardScaler())])\n",
"\n",
" categorical_transformer = Pipeline(steps=[\n",
" ('onehot', OneHotEncoder(handle_unknown='ignore'))])\n",
"\n",
" continuous_features_names = ['Active_Credit_balance (in 1000$)', 'Age', 'Credit_score', 'Credit_Utlization',\n",
" 'Income_to_debt_ratio', 'Employment_Tenure', 'Income (in 1000$)',\n",
" 'Loan_Requirement (in 1000$)', 'Loan_Tenure']\n",
" categorical_features_names = ['Gender', 'Home_Ownership', 'Pre-Approved', 'Open_CreditAcc']\n",
" transformations = ColumnTransformer(\n",
" transformers=[\n",
" ('num', numeric_transformer, continuous_features_names),\n",
" ('cat', categorical_transformer, categorical_features_names)])\n",
"\n",
" # Append classifier to preprocessing pipeline.\n",
" # Now we have a full prediction pipeline.\n",
" # The estimator can be changed to suit\n",
" model = Pipeline(steps=[('preprocessor', transformations),\n",
" ('classifier', LogisticRegression(solver='lbfgs', max_iter=1000))])\n",
"\n",
" model.fit(X_train, y_train)\n",
"\n",
" # Saving model with mlflow - leave this section unchanged\n",
" with tempfile.TemporaryDirectory() as td:\n",
" print(\"Saving model with MLFlow to temporary directory\")\n",
" tmp_output_dir = os.path.join(td, \"my_model_dir\")\n",
" mlflow.sklearn.save_model(sk_model=model, path=tmp_output_dir)\n",
"\n",
" print(\"Copying MLFlow model to output path\")\n",
" for file_name in os.listdir(tmp_output_dir):\n",
" print(\" Copying: \", file_name)\n",
" # As of Python 3.8, copytree will acquire dirs_exist_ok as\n",
" # an option, removing the need for listdir\n",
" shutil.copy2(src=os.path.join(tmp_output_dir, file_name), dst=os.path.join(args.model_output, file_name))\n",
"\n",
"\n",
"# run script\n",
"if __name__ == \"__main__\":\n",
" # add space in logs\n",
" print(\"*\" * 60)\n",
" print(\"\\n\\n\")\n",
"\n",
" # parse args\n",
" args = parse_args()\n",
"\n",
" # run main function\n",
" main(args)\n",
"\n",
" # add space in logs\n",
" print(\"*\" * 60)\n",
" print(\"\\n\\n\")"
]
},
{
"cell_type": "markdown",
"id": "d412b0ad",
"metadata": {},
"source": [
"**Define the YAML file**\n",
"\n",
"This code snippet defines an Azure Machine Learning Command Component for training a classification model on a dataset. It starts by defining a YAML configuration file that specifies the inputs and outputs of the component, the command to run, and the environment to use. The YAML file is then saved to disk.\n",
"\n",
"Next, the code uses the Azure ML Python SDK to load the Command Component from the YAML file. The resulting object can be used to run the component on a dataset, passing in the input paths and output paths as arguments.\n",
"\n",
"Overall, this code provides a simple and reusable way to define and run machine learning training components in Azure ML."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6a0ce700",
"metadata": {
"gather": {
"logged": 1681455169879
}
},
"outputs": [],
"source": [
"from azure.ai.ml import load_component\n",
"\n",
"yaml_contents = (\n",
" f\"\"\"\n",
"$schema: http://azureml/sdk-2-0/CommandComponent.json\n",
"name: rai_classification_training_component\n",
"display_name: Classification training component for RAI example\n",
"version: {rai_loan_classification_example_version_string}\n",
"type: command\n",
"inputs:\n",
" training_data:\n",
" type: path\n",
" target_column_name:\n",
" type: string\n",
"outputs:\n",
" model_output:\n",
" type: path\n",
"code: ./component_src/\n",
"environment: azureml://registries/azureml/environments/responsibleai-tabular/versions/14\n",
"\"\"\"\n",
" + r\"\"\"\n",
"command: >-\n",
" python classification_training_script.py\n",
" --training_data ${{{{inputs.training_data}}}}\n",
" --target_column_name ${{{{inputs.target_column_name}}}}\n",
" --model_output ${{{{outputs.model_output}}}}\n",
"\"\"\"\n",
")\n",
"\n",
"yaml_filename = \"RAILoanTrainingComponent.yaml\"\n",
"\n",
"with open(yaml_filename, \"w\") as f:\n",
" f.write(yaml_contents.format(yaml_contents))\n",
"\n",
"train_model_component = load_component(source=yaml_filename)"
]
},
{
"cell_type": "markdown",
"id": "523752b3",
"metadata": {},
"source": [
"This script loads a trained model, registers it via MLFlow, and saves the registered model information to a JSON file. Users need to provide the necessary arguments to register the model, including the path to the input model, path to the output model info JSON file, base name of the registered model, and an optional suffix for the registered model name.\n",
"\n",
"To use this script, the following arguments must be defined: \n",
"- model_input_path: Path to the input model \n",
"- model_info_output_path: Path to write the model info JSON \n",
"- model_base_name: Name of the registered model \n",
"- model_name_suffix: An integer value to add as a suffix to the registered model name. If this is negative, the epoch time is used as the suffix."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "280531f8",
"metadata": {},
"outputs": [],
"source": [
"%%writefile register_model_src/register.py\n",
"\n",
"# ---------------------------------------------------------\n",
"# Copyright (c) Microsoft Corporation. All rights reserved.\n",
"# ---------------------------------------------------------\n",
"\n",
"import argparse\n",
"import json\n",
"import os\n",
"import time\n",
"\n",
"\n",
"from azureml.core import Run\n",
"\n",
"import mlflow\n",
"import mlflow.sklearn\n",
"\n",
"# Based on example:\n",
"# https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-cli\n",
"# which references\n",
"# https://github.com/Azure/azureml-examples/tree/main/cli/jobs/train/lightgbm/iris\n",
"\n",
"\n",
"def parse_args():\n",
" # setup arg parser\n",
" parser = argparse.ArgumentParser()\n",
"\n",
" # add arguments\n",
" parser.add_argument(\"--model_input_path\", type=str, help=\"Path to input model\")\n",
" parser.add_argument(\n",
" \"--model_info_output_path\", type=str, help=\"Path to write model info JSON\"\n",
" )\n",
" parser.add_argument(\n",
" \"--model_base_name\", type=str, help=\"Name of the registered model\"\n",
" )\n",
" parser.add_argument(\n",
" \"--model_name_suffix\", type=int, help=\"Set negative to use epoch_secs\"\n",
" )\n",
"\n",
" # parse args\n",
" args = parser.parse_args()\n",
"\n",
" # return args\n",
" return args\n",
"\n",
"\n",
"def main(args):\n",
" current_experiment = Run.get_context().experiment\n",
" tracking_uri = current_experiment.workspace.get_mlflow_tracking_uri()\n",
" print(\"tracking_uri: {0}\".format(tracking_uri))\n",
" mlflow.set_tracking_uri(tracking_uri)\n",
" mlflow.set_experiment(current_experiment.name)\n",
"\n",
" print(\"Loading model\")\n",
" mlflow_model = mlflow.sklearn.load_model(args.model_input_path)\n",
"\n",
" if args.model_name_suffix < 0:\n",
" suffix = int(time.time())\n",
" else:\n",
" suffix = args.model_name_suffix\n",
" registered_name = \"{0}_{1}\".format(args.model_base_name, suffix)\n",
" print(f\"Registering model as {registered_name}\")\n",
"\n",
" print(\"Registering via MLFlow\")\n",
" mlflow.sklearn.log_model(\n",
" sk_model=mlflow_model,\n",
" registered_model_name=registered_name,\n",
" artifact_path=registered_name,\n",
" )\n",
"\n",
" print(\"Writing JSON\")\n",
" dict = {\"id\": \"{0}:1\".format(registered_name)}\n",
" output_path = os.path.join(args.model_info_output_path, \"model_info.json\")\n",
" with open(output_path, \"w\") as of:\n",
" json.dump(dict, fp=of)\n",
"\n",
"\n",
"# run script\n",
"if __name__ == \"__main__\":\n",
" # add space in logs\n",
" print(\"*\" * 60)\n",
" print(\"\\n\\n\")\n",
"\n",
" # parse args\n",
" args = parse_args()\n",
"\n",
" # run main function\n",
" main(args)\n",
"\n",
" # add space in logs\n",
" print(\"*\" * 60)\n",
" print(\"\\n\\n\")"
]
},
{
"cell_type": "markdown",
"id": "5386fd7f",
"metadata": {},
"source": [
"Now that the model registration script is saved on our local drive, we create a YAML file to describe it as a component to AzureML. This involves defining the inputs and outputs, specifing the AzureML environment which can run the script, and telling AzureML how to invoke the model registration script:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7ad3244d-096f-4e74-9836-54d7e84c5c7a",
"metadata": {
"gather": {
"logged": 1681455174584
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"yaml_contents = f\"\"\"\n",
"$schema: http://azureml/sdk-2-0/CommandComponent.json\n",
"name: register_model\n",
"display_name: Register Model\n",
"version: {rai_loan_classification_example_version_string}\n",
"type: command\n",
"is_deterministic: False\n",
"inputs:\n",
" model_input_path:\n",
" type: path\n",
" model_base_name:\n",
" type: string\n",
" model_name_suffix: # Set negative to use epoch_secs\n",
" type: integer\n",
" default: -1\n",
"outputs:\n",
" model_info_output_path:\n",
" type: path\n",
"code: ./register_model_src/\n",
"environment: azureml://registries/azureml/environments/responsibleai-tabular/versions/14\n",
"command: >-\n",
" python register.py\n",
" --model_input_path ${{{{inputs.model_input_path}}}}\n",
" --model_base_name ${{{{inputs.model_base_name}}}}\n",
" --model_name_suffix ${{{{inputs.model_name_suffix}}}}\n",
" --model_info_output_path ${{{{outputs.model_info_output_path}}}}\n",
"\n",
"\"\"\"\n",
"\n",
"yaml_filename = \"register.yaml\"\n",
"\n",
"with open(yaml_filename, \"w\") as f:\n",
" f.write(yaml_contents)\n",
"\n",
"register_component = load_component(source=yaml_filename)"
]
},
{
"cell_type": "markdown",
"id": "7f573806-90ff-4ed2-95e1-996520f884dc",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"We will create a new compute instance to run the jobs if it does not already exist by the name passed in the beginning of the notebook."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d04d6675-9de0-4e8e-9d9d-f92a2a2d2259",
"metadata": {
"gather": {
"logged": 1681455174815
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"from azure.ai.ml.entities import AmlCompute\n",
"\n",
"all_compute_names = [x.name for x in ml_client.compute.list()]\n",
"\n",
"if compute_name in all_compute_names:\n",
" print(f\"Found existing compute: {compute_name}\")\n",
"else:\n",
" my_compute = AmlCompute(\n",
" name=compute_name,\n",
" size=\"Standard_DS5_v2\",\n",
" min_instances=0,\n",
" max_instances=1,\n",
" idle_time_before_scale_down=3600,\n",
" )\n",
" ml_client.compute.begin_create_or_update(my_compute).result()\n",
" print(\"Initiated compute creation\")"
]
},
{
"cell_type": "markdown",
"id": "693f7706",
"metadata": {},
"source": [
"### Running a training pipeline\n",
"\n",
"The 2 YAML files (RAILoanTrainingComponent.yaml & register.yaml) are used to define the 2 components in the model training pipeline\n",
"\n",
"We start by ensuring that the compute cluster named in the begining exists:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0917f58e-5171-426d-80f8-a27d793b4aa9",
"metadata": {
"gather": {
"logged": 1681455175109
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"import time\n",
"\n",
"model_name_suffix = int(time.time())\n",
"model_name = \"rai_loan_classsification_model\""
]
},
{
"cell_type": "markdown",
"id": "34bf9603-d4be-4c8a-99ba-69c1b484dd03",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"Next, we define the pipeline using objects from the AzureML SDKv2. As mentioned above, there are two component jobs: one to train the model, and one to register it:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9dcfd57a-03b5-446c-bf87-af2db9aadac8",
"metadata": {
"gather": {
"logged": 1681455175344
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"from azure.ai.ml import dsl, Input\n",
"\n",
"target_feature = \"LoanStatus\"\n",
"categorical_features = [\"Gender\", \"Home_Ownership\", \"Pre-Approved\", \"Open_CreditAcc\"]\n",
"\n",
"loan_train_pq = Input(\n",
" type=\"uri_file\",\n",
" path=f\"azureml:{input_train_data}:{rai_loan_classification_example_version_string}\",\n",
" mode=\"download\",\n",
")\n",
"loan_test_pq = Input(\n",
" type=\"uri_file\",\n",
" path=f\"azureml:{input_test_data}:{rai_loan_classification_example_version_string}\",\n",
" mode=\"download\",\n",
")\n",
"\n",
"\n",
"@dsl.pipeline(\n",
" compute=compute_name,\n",
" description=\"Register Model for RAI Loan classification example\",\n",
" experiment_name=f\"RAI_classification_Example_Model_Training_{model_name_suffix}\",\n",
")\n",
"def my_training_pipeline(target_column_name, training_data):\n",
" trained_model = train_model_component(\n",
" target_column_name=target_column_name, training_data=training_data\n",
" )\n",
" trained_model.set_limits(timeout=1200)\n",
"\n",
" _ = register_component(\n",
" model_input_path=trained_model.outputs.model_output,\n",
" model_base_name=model_name,\n",
" model_name_suffix=model_name_suffix,\n",
" )\n",
"\n",
" return {}\n",
"\n",
"\n",
"model_registration_pipeline_job = my_training_pipeline(target_feature, loan_train_pq)"
]
},
{
"cell_type": "markdown",
"id": "304560ea-b575-4118-af50-51b99ba509cb",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"With the pipeline definition created, we can submit it to AzureML. We define a helper function to do the submission, which waits for the submitted job to complete:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "19ad43eb-513d-4755-93f3-78c5662b3847",
"metadata": {
"gather": {
"logged": 1681455222624
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"from azure.ai.ml.entities import PipelineJob\n",
"from IPython.core.display import HTML\n",
"from IPython.display import display\n",
"\n",
"\n",
"def submit_and_wait(ml_client, pipeline_job) -> PipelineJob:\n",
" created_job = ml_client.jobs.create_or_update(pipeline_job)\n",
" assert created_job is not None\n",
"\n",
" print(\"Pipeline job can be accessed in the following URL:\")\n",
" display(HTML('<a href=\"{0}\">{0}</a>'.format(created_job.studio_url)))\n",
"\n",
" while created_job.status not in [\n",
" \"Completed\",\n",
" \"Failed\",\n",
" \"Canceled\",\n",
" \"NotResponding\",\n",
" ]:\n",
" time.sleep(30)\n",
" created_job = ml_client.jobs.get(created_job.name)\n",
" print(\"Latest status : {0}\".format(created_job.status))\n",
" assert created_job.status == \"Completed\"\n",
" return created_job\n",
"\n",
"\n",
"# This is the actual submission\n",
"training_job = submit_and_wait(ml_client, model_registration_pipeline_job)"
]
},
{
"cell_type": "markdown",
"id": "c9889100",
"metadata": {},
"source": [
"## Creating the RAI Insights\n",
"\n",
"We have a registered model, and can now run a pipeline to create the RAI insights. First off, compute the name of the model we registered:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c65da4a0-81da-4d30-b8e0-65be2a2caaee",
"metadata": {
"gather": {
"logged": 1681455223037
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"expected_model_id = f\"{model_name}_{model_name_suffix}:1\"\n",
"azureml_model_id = f\"azureml:{expected_model_id}\""
]
},
{
"cell_type": "markdown",
"id": "d4bad8a5",
"metadata": {},
"source": [
"\n",
"Now, we create the RAI pipeline itself. There are four 'component stages' in this pipeline:\n",
"\n",
"1. Construct an empty `RAIInsights` object\n",
"1. Run the RAI tool components\n",
"1. Gather the tool outputs into a single `RAIInsights` object\n",
"1. (Optional) Generate a score card in pdf format summarizing model performance, and key aspects from the rai tool components\n",
"\n",
"We start by loading the RAI component definitions for use in our pipeline:"
]
},
{
"cell_type": "markdown",
"id": "32e1594d-0bba-43cc-bc20-44f761881350",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"## Add different components of ResponsibleAI dashboard to the Pipeline\n",
"\n",
"Reference:\n",
"https://learn.microsoft.com/en-us/azure/machine-learning/how-to-responsible-ai-insights-sdk-cli?tabs=python"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "73fcada8",
"metadata": {},
"outputs": [],
"source": [
"# Get handle to azureml registry for the RAI built in components\n",
"registry_name = \"azureml\"\n",
"ml_client_registry = MLClient(\n",
" credential=credential,\n",
" subscription_id=subscription_id,\n",
" resource_group_name=resource_group,\n",
" registry_name=registry_name,\n",
")\n",
"print(ml_client_registry)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d1fe7afc-2d6c-4a7d-9108-edd78eb019ae",
"metadata": {
"gather": {
"logged": 1681455234599
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"label = \"latest\"\n",
"\n",
"rai_constructor_component = ml_client_registry.components.get(\n",
" name=\"rai_tabular_insight_constructor\", label=label\n",
")\n",
"\n",
"# We get latest version and use the same version for all components\n",
"version = rai_constructor_component.version\n",
"print(\"The current version of RAI built-in components is: \" + version)\n",
"\n",
"rai_counterfactual_component = ml_client_registry.components.get(\n",
" name=\"rai_tabular_counterfactual\", version=version\n",
")\n",
"rai_erroranalysis_component = ml_client_registry.components.get(\n",
" name=\"rai_tabular_erroranalysis\", version=version\n",
")\n",
"\n",
"rai_explanation_component = ml_client_registry.components.get(\n",
" name=\"rai_tabular_explanation\", version=version\n",
")\n",
"\n",
"rai_gather_component = ml_client_registry.components.get(\n",
" name=\"rai_tabular_insight_gather\", version=version\n",
")\n",
"\n",
"rai_scorecard_component = ml_client_registry.components.get(\n",
" name=\"rai_tabular_score_card\", version=version\n",
")"
]
},
{
"cell_type": "markdown",
"id": "64a22471",
"metadata": {},
"source": [
"## Score card generation config\n",
"For score card generation, we need some additional configuration in a separate json file. Here we configure the following model performance metrics for reporting:\n",
"- accuracy\n",
"- precision"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f740ae68-8379-4d04-8445-ce5111dbc9c1",
"metadata": {
"gather": {
"logged": 1681455234846
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"import json\n",
"\n",
"score_card_config_dict = {\n",
" \"Model\": {\n",
" \"ModelName\": \"Loan classification\",\n",
" \"ModelType\": \"Classification\",\n",
" \"ModelSummary\": \"<model summary>\",\n",
" },\n",
" \"Metrics\": {\"accuracy_score\": {\"threshold\": \">=0.5\"}, \"precision_score\": {}},\n",
"}\n",
"\n",
"score_card_config_filename = \"rai_loan_classification_score_card_config.json\"\n",
"\n",
"with open(score_card_config_filename, \"w\") as f:\n",
" json.dump(score_card_config_dict, f)\n",
"\n",
"score_card_config_path = Input(\n",
" type=\"uri_file\", path=score_card_config_filename, mode=\"download\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "cecaea36",
"metadata": {},
"source": [
"Now the pipeline itself. This creates an empty `RAIInsights` object, adds the analyses, and then gathers everything into the final `RAIInsights` output. Where complex objects need to be passed (such as a list of treatment feature names), they must be encoded as JSON strings.\n",
"\n",
"Note that the timeout for the counterfactual generation is longer, since this is a comparatively slow process."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "86692d9a-351e-404f-9ba0-1051d60be7f2",
"metadata": {
"gather": {
"logged": 1681455235207
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"import json\n",
"from azure.ai.ml import Input\n",
"from azure.ai.ml.constants import AssetTypes\n",
"\n",
"classes_in_target = json.dumps([\"Approved\", \"Rejected\"])\n",
"\n",
"\n",
"@dsl.pipeline(\n",
" compute=compute_name,\n",
" description=\"Example RAI computation on Loan Classification\",\n",
" experiment_name=f\"RAI_Loan_Classification_Example_RAIInsights_Computation_{model_name_suffix}\",\n",
")\n",
"def rai_classification_pipeline(\n",
" target_column_name,\n",
" train_data,\n",
" test_data,\n",
" score_card_config_path,\n",
"):\n",
" # Initiate the RAIInsights\n",
" create_rai_job = rai_constructor_component(\n",
" title=\"RAI Dashboard Example\",\n",
" task_type=\"classification\",\n",
" model_info=expected_model_id,\n",
" model_input=Input(type=AssetTypes.MLFLOW_MODEL, path=azureml_model_id),\n",
" train_dataset=train_data,\n",
" test_dataset=test_data,\n",
" target_column_name=target_column_name,\n",
" categorical_column_names=json.dumps(categorical_features),\n",
" classes=classes_in_target,\n",
" )\n",
" create_rai_job.set_limits(timeout=1200)\n",
"\n",
" # Add an explanation\n",
" explain_job = rai_explanation_component(\n",
" comment=\"Explanation for the classification dataset\",\n",
" rai_insights_dashboard=create_rai_job.outputs.rai_insights_dashboard,\n",
" )\n",
" explain_job.set_limits(timeout=1200)\n",
"\n",
" # Add counterfactual analysis\n",
" counterfactual_job = rai_counterfactual_component(\n",
" rai_insights_dashboard=create_rai_job.outputs.rai_insights_dashboard,\n",
" total_cfs=10,\n",
" desired_class=\"opposite\",\n",
" )\n",
" counterfactual_job.set_limits(timeout=1200)\n",
"\n",
" # Add error analysis\n",
" erroranalysis_job = rai_erroranalysis_component(\n",
" rai_insights_dashboard=create_rai_job.outputs.rai_insights_dashboard,\n",
" )\n",
" erroranalysis_job.set_limits(timeout=1200)\n",
"\n",
" # Combine everything\n",
" rai_gather_job = rai_gather_component(\n",
" constructor=create_rai_job.outputs.rai_insights_dashboard,\n",
" insight_1=explain_job.outputs.explanation,\n",
" # insight_2=causal_job.outputs.causal,\n",
" insight_3=counterfactual_job.outputs.counterfactual,\n",
" insight_4=erroranalysis_job.outputs.error_analysis,\n",
" )\n",
" rai_gather_job.set_limits(timeout=1200)\n",
"\n",
" rai_gather_job.outputs.dashboard.mode = \"upload\"\n",
" rai_gather_job.outputs.ux_json.mode = \"upload\"\n",
"\n",
" # Generate score card in pdf format for a summary report on model performance,\n",
" # and observe distrbution of error between prediction vs ground truth.\n",
" rai_scorecard_job = rai_scorecard_component(\n",
" dashboard=rai_gather_job.outputs.dashboard,\n",
" pdf_generation_config=score_card_config_path,\n",
" )\n",
"\n",
" return {\n",
" \"dashboard\": rai_gather_job.outputs.dashboard,\n",
" \"ux_json\": rai_gather_job.outputs.ux_json,\n",
" \"scorecard\": rai_scorecard_job.outputs.scorecard,\n",
" }"
]
},
{
"cell_type": "markdown",
"id": "1c26b9a6",
"metadata": {},
"source": [
"Next, we define the pipeline object itself, and ensure that the outputs will be available for download:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7b877890-c3d8-45d8-b91f-906968b7b053",
"metadata": {
"gather": {
"logged": 1681455235502
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"import uuid\n",
"from azure.ai.ml import Output\n",
"\n",
"# Pipeline to construct the RAI Insights\n",
"insights_pipeline_job = rai_classification_pipeline(\n",
" target_column_name=target_feature,\n",
" train_data=loan_train_pq,\n",
" test_data=loan_test_pq,\n",
" score_card_config_path=score_card_config_path,\n",
")\n",
"\n",
"# Workaround to enable the download\n",
"rand_path = str(uuid.uuid4())\n",
"insights_pipeline_job.outputs.dashboard = Output(\n",
" path=f\"azureml://datastores/workspaceblobstore/paths/{rand_path}/dashboard/\",\n",
" mode=\"upload\",\n",
" type=\"uri_folder\",\n",
")\n",
"insights_pipeline_job.outputs.ux_json = Output(\n",
" path=f\"azureml://datastores/workspaceblobstore/paths/{rand_path}/ux_json/\",\n",
" mode=\"upload\",\n",
" type=\"uri_folder\",\n",
")\n",
"insights_pipeline_job.outputs.scorecard = Output(\n",
" path=f\"azureml://datastores/workspaceblobstore/paths/{rand_path}/scorecard/\",\n",
" mode=\"upload\",\n",
" type=\"uri_folder\",\n",
")"
]
},
{
"cell_type": "markdown",
"id": "204c6fe5",
"metadata": {},
"source": [
"And submit the pipeline to AzureML for execution:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a3f713bb-6a5f-4ab7-ac59-4fe2e0c4ec7a",
"metadata": {
"gather": {
"logged": 1681455891694
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"insights_job = submit_and_wait(ml_client, insights_pipeline_job)"
]
},
{
"cell_type": "markdown",
"id": "65437c8f",
"metadata": {},
"source": [
"The dashboard should appear in the AzureML portal in the registered model view. The following cell computes the expected URI:"
]
},
{
"cell_type": "markdown",
"id": "69721140",
"metadata": {},
"source": [
"## Downloading the Scorecard PDF\n",
"\n",
"We can download the scorecard PDF from our pipeline as follows:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "65b2a0db-3efc-444c-aec1-8a2a00d70bdb",
"metadata": {
"gather": {
"logged": 1681455891992
}
},
"outputs": [],
"source": [
"target_directory = \".\"\n",
"\n",
"ml_client.jobs.download(\n",
" insights_job.name, download_path=target_directory, output_name=\"scorecard\"\n",
")"
]
},
{
"cell_type": "markdown",
"id": "9a5cbfc6-698a-4d94-a305-0d1ccd2a9511",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"## To Access the Dashboard follow the link below"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e923ae8d-13bb-4056-8962-32b6959447bb",
"metadata": {
"gather": {
"logged": 1681455892310
},
"jupyter": {
"outputs_hidden": false,
"source_hidden": false
},
"nteract": {
"transient": {
"deleting": false
}
}
},
"outputs": [],
"source": [
"sub_id = ml_client._operation_scope.subscription_id\n",
"rg_name = ml_client._operation_scope.resource_group_name\n",
"ws_name = ml_client.workspace_name\n",
"\n",
"expected_uri = f\"https://ml.azure.com/model/{expected_model_id}/model_analysis?wsid=/subscriptions/{sub_id}/resourcegroups/{rg_name}/workspaces/{ws_name}\"\n",
"\n",
"print(f\"Please visit {expected_uri} to see your analysis\")"
]
},
{
"cell_type": "markdown",
"id": "83d4b5d0-1704-49fb-a65c-8b30c58e7b0a",
"metadata": {
"nteract": {
"transient": {
"deleting": false
}
}
},
"source": [
"Once this is complete, we can go to the Registered Models view in the AzureML portal, and find the model we have just registered. On the 'Model Details' page, there is a \"Responsible AI dashboard\" tab where we can view the insights which we have just uploaded."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "34ffab18",
"metadata": {},
"outputs": [],
"source": [
"# Remove the temporary directories\n",
"from pathlib import Path\n",
"import shutil\n",
"\n",
"out_dir = Path(\"./named-outputs\")\n",
"comp_dir = Path(\"./component_src\")\n",
"reg_dir = Path(\"./register_model_src\")\n",
"list_dir = [out_dir, comp_dir, reg_dir]\n",
"\n",
"for dir in list_dir:\n",
" if dir.exists() and dir.is_dir():\n",
" shutil.rmtree(dir)\n",
"\n",
"\n",
"list_file = [\n",
" \"./RAILoanTrainingComponent.yaml\",\n",
" \"./register.yaml\",\n",
" \"./rai_loan_classification_score_card_config.json\",\n",
"]\n",
"\n",
"for file in list_file:\n",
" if os.path.exists(file):\n",
" os.remove(file)"
]
}
],
"metadata": {
"categories": [
"SDK v2",
"sdk",
"python",
"responsible-ai"
],
"kernel_info": {
"name": "python310-sdkv2"
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
},
"microsoft": {
"host": {
"AzureML": {
"notebookHasBeenCompleted": true
}
},
"ms_spell_check": {
"ms_spell_check_language": "en"
}
},
"nteract": {
"version": "nteract-front-end@1.0.0"
},
"vscode": {
"interpreter": {
"hash": "8fd340b5477ca1a0b454d48a3973beff39fee032ada47a04f6f3725b469a8988"
}
}
},
"nbformat": 4,
"nbformat_minor": 5
}