Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

# Automated Machine Learning

## Demand Forecasting Using Many Models (preview)

> [!IMPORTANT]
> Items marked (preview) in this article are currently in public preview.
> The preview version is provided without a service level agreement, and it's not recommended for production workloads. Certain features might not be supported or might have constrained capabilities.
> For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).

## Contents
1. [Introduction](#Introduction)
1. [Setup](#Setup)
1. [Compute](#Compute)
1. [Data](#Data)
1. [Train AutoML Model](#Train)
1. [Import Components From Registry](#ImportComponents)
1. [Create a Pipeline](#Pipeline)
1. [Kick Off Pipeline Runs](#PipelineRun)
1. [Download Output](#DownloadOutput)
1. [Compare Evaluation Results](#CompareResults)
1. [Deployment](#Deployment)

## 1. Introduction  <a id="Introduction">

The objective of this notebook is to illustrate how to use the component-based AutoML single model solution. It walks you through all stages of model evaluation and production process starting with data ingestion and concluding with batch endpoint deployment for production. In this tutorial we will illustrate how to leverage AutoML and train a destributed TCN model ([link](placeholder)). However, the same notebook can be used to train a non-distributed TCN as well as the conventional ML models.

We use a subset of UCI electricity data ([link](https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014#)) with the objective of predicting electricity demand per consumer 24 hours ahead. The data was preprocessed using the [data prep notebook](https://github.com/Azure/azureml-examples/blob/main/v1/python-sdk/tutorials/automl-with-azureml/forecasting-data-preparation/auto-ml-forecasting-data-preparation.ipynb). Please refer to it for illustration on how to download the data from the source, aggregate to an hourly frequency, convert from wide to long format and upload to the Datastore. Here, we will work with the data that has been pre-processed and saved locally in the parquet format.

There are a number of steps you need to take before you can put a model into production. A user needs to prepare the data, partition it into appropriate sets, select the best model, evaluate it against a baseline, and monitor the model in real life to collect enough observations on how it would perform had it been put in production. Some of these steps are time consuming, some require certain expertise in writing code. The steps shown in this notebook follow a typical thought process one follows before the model is put in production.

Make sure you have executed the [configuration](https://github.com/Azure/MachineLearningNotebooks/blob/master/configuration.ipynb) before running this notebook.

## 2. Setup <a id="Setup">

In [None]:
# Import required libraries
import os
import datetime
import json
import yaml
import azure.ai.ml

import pandas as pd

from time import sleep

from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

from azure.ai.ml import MLClient, Input, Output
from azure.ai.ml import load_component
from azure.ai.ml import automl
from azure.ai.ml.constants import AssetTypes
from azure.ai.ml.dsl import pipeline
from azure.ai.ml.entities import (
    BatchEndpoint,
    BatchDeployment,
    AmlCompute,
    PipelineComponentBatchDeployment,
)
from azure.ai.ml.entities._job.automl.tabular.forecasting_settings import (
    ForecastingSettings,
)

print(f"SDK version: {azure.ai.ml.__version__}")

## 2.1. Configure workspace details and get a handle to the workspace

The [workspace](https://docs.microsoft.com/en-us/azure/machine-learning/concept-workspace) is the top-level resource for Azure Machine Learning, providing a centralized place to work with all the artifacts you create when you use Azure Machine Learning. In this section we will connect to the workspace in which the job will be run.

To connect to a workspace, we need identifier parameters - a subscription, resource group and workspace name. We will use these details in the `MLClient` from `azure.ai.ml` to get a handle to the required Azure Machine Learning workspace. We use the default [default azure authentication](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python) for this tutorial. Check the [configuration notebook](https://github.com/Azure/MachineLearningNotebooks/blob/master/configuration.ipynb) for more details on how to configure credentials and connect to a workspace.

In [None]:
try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential does not work
    credential = InteractiveBrowserCredential()

In [None]:
try:
    ml_client = MLClient.from_config(credential)
except Exception as ex:
    print(ex)
    # Enter details of your AML workspace
    subscription_id = "<SUBSCRIPTION_ID>"
    resource_group = "<RESOURCE_GROUP>"
    workspace = "<AML_WORKSPACE_NAME>"
    ml_client = MLClient(credential, subscription_id, resource_group, workspace)
    print(ml_client)

### 2.2. Show Azure ML Workspace information

In [None]:
ws = ml_client.workspaces.get(name=ml_client.workspace_name)

output = {}
output["Workspace"] = ml_client.workspace_name
output["Subscription ID"] = ml_client.subscription_id
output["Resource Group"] = ws.resource_group
output["Location"] = ws.location
pd.DataFrame(data=output, index=[""]).T

## 3. Compute  <a id="Compute">

#### Create or Attach existing AmlCompute

You will need to create a compute target for your AutoML run. In this tutorial, you will create AmlCompute as your training compute resource.

> Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.


Here, we use a 10 node cluster of the `STANDARD_NC6s_v3` series for illustration purposes. You will need to adjust the compute type and the number of nodes based on your needs which can be driven by the speed needed for model selection, data size, etc. 

#### Creation of AmlCompute takes approximately 5 minutes. 
If the AmlCompute with that name is already in your workspace, this code will skip the creation process.
As with other Azure services, there are limits on certain resources (e.g. AmlCompute) associated with the Azure Machine Learning service. Please read [this article](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-manage-quotas) on the default limits and how to request more quota.

In [None]:
from azure.core.exceptions import ResourceNotFoundError

amlcompute_cluster_name = "distributed-tcn-cluster"

try:
    # Retrieve an already attached Azure Machine Learning Compute.
    compute_target = ml_client.compute.get(amlcompute_cluster_name)
except ResourceNotFoundError as e:
    compute_target = AmlCompute(
        name=amlcompute_cluster_name,
        size="Standard_NC6s_v3",
        type="amlcompute",
        min_instances=0,
        max_instances=10,
        idle_time_before_scale_down=600,
        # vm_priority="LowPriority"
    )
    poller = ml_client.begin_create_or_update(compute_target)
    poller.wait()

## 4. Data <a id="Data">

For illustration purposes we use the UCI electricity data ([link](https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014#)). The original dataset contains electricity consumption data for 370 consumers measured at 15 minute intervals. In the data set for this demonstrations, we have aggregated to an hourly frequency and convereted to the kilowatt hours (kWh) for 10 customers. Each customer is assigned to one of the two groups as denoted by the entries in the `group_id` column. The following cells read and print the first few rows of the training data as well as print the number of unique time series in the dataset.

In [None]:
time_column_name = "datetime"
target_column_name = "usage"
time_series_id_column_names = ["group_id", "customer_id"]

In [None]:
dataset_type = "train"
df = pd.read_parquet(f"./data/{dataset_type}/uci_electro_small_{dataset_type}.parquet")
df.head(3)

In [None]:
nseries = df.groupby(time_series_id_column_names).ngroups
print(f"Data contains {nseries} individual time-series\n---")

In [None]:
df[time_series_id_column_names].drop_duplicates()

Please note that the data used in this experiment is of the Azure Machine Learning Tables (`MLTable`) format. This allows for the creation of a *blueprint* that defines how to load data files into memory as a Pandas or Spark data frame. See the following [link](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-mltable?view=azureml-api-2&tabs=cli) on more inforamtion on this topic. For documentation on creating your own MLTable assets for jobs beyond this notebook:

- https://learn.microsoft.com/en-us/azure/machine-learning/reference-yaml-mltable details how to write MLTable YAMLs (required for each MLTable asset).
- https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-data-assets?tabs=Python-SDK covers how to work with them in the v2 CLI/SDK.

Next, we upload the directory with the train, validation and test set data which will be used in this notebook.

In [None]:
# Training MLTable defined locally, with local data to be uploaded
train_dataset = Input(type=AssetTypes.MLTABLE, path="./data/train")
valid_dataset = Input(type=AssetTypes.MLTABLE, path="./data/valid")
test_dataset = Input(type=AssetTypes.URI_FOLDER, path="./data/test")

## 5. Train Distributed TCN <a id="Train">
    
In this section we will configure and run the AutoML job to train the model.

<!-- |Property|Description|
|-|-|
| **task**               | forecasting |
| **target_column_name** | The name of the column to target for predictions. It must always be specified. This parameter is applicable to `training_data`, `validation_data` and `test_data`. |
| **primary_metric**     | This is the metric that you want to optimize. Forecasting supports the following primary metrics<ul><li>`normalized_root_mean_squared_error`</li><li>`normalized_mean_absolute_error`</li><li>`spearman_correlation`</li><li>`r2_score`</li></ul> We recommend using either the normalized root mean squared error (default metric) or normalized mean absolute error as a primary metric because they measure forecast accuracy. See the [link](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-automl-forecasting-faq#how-do-i-choose-the-primary-metric) for a more detailed discussion on this topic. |
| **n_cross_validations** | Number of cross-validation folds to use for model/pipeline selection. This can be set to "auto", in which case AutoMl determines the number of cross-validations automatically, if a validation set is not provided. Or, users could specify an integer value. The default value is "auto". |
|**training_data**|The training data to be used for this experiment. You can use a registered MLTable in the workspace using the format `<mltable_name>:<version>` OR you can use a local file or folder as a MLTable. For e.g `Input(mltable='my_mltable:1')` OR `Input(mltable=MLTable(local_path="./data"))` The parameter 'training_data' must always be provided.|
|**compute_name**|The compute on which the AutoML job will run. In this example we are using a compute called 'energy-cluster-v2' present in the workspace. You can replace it with any other compute in the workspace.|
|**n_cross_validations**|Number of cross-validation folds to use for model/pipeline selection. The default value is "auto", in which case AutoMl determines the number of cross-validations automatically, if a validation set is not provided. Or, users could specify an integer value.|
|**name**|The name of the Job/Run. This is an optional property. If not specified, a random name will be generated.
|**experiment_name**|The name of the Experiment. An Experiment is like a folder with multiple runs in Azure ML Workspace that should be related to the same logical machine learning experiment. For example, if a user runs this notebook multiple times, there will be multiple runs associated with the same Experiment name.|
|**enable_model_explainability**|If set to true, the explanations such as feature importance for the best model will be generated.|

### set_limits() parameters:
This is an optional configuration method to configure limits parameters such as timeouts.

|Property|Description|
|-|-|
| **timeout_minutes**          | Maximum amount of time in minutes that the whole AutoML job can take before the job terminates. This timeout includes setup, featurization and training runs but does not include the ensembling and model explainability runs at the end of the process since those actions need to happen once all the trials (children jobs) are done. If not specified, the default job's total timeout is 6 days (8,640 minutes). To specify a timeout less than or equal to 1 hour (60 minutes), make sure your dataset's size is not greater than 10,000,000 (rows times column) or an error results. |
| **trial_timeout_minutes**    | Maximum time in minutes that each trial (child job) can run for before it terminates. If not specified, a value of 1 month or 43200 minutes is used. |
| **max_trials**               | Represents the maximum number of trials an Automated ML job can try to run a training algorithm with different combination of hyperparameters. Its default value is set to 1000. If `enable_early_termination` is defined, then the number of trials used to run training algorithms can be smaller. |
| **max_concurrent_trials**    | The maximum number of trials (children jobs) that would be executed in parallel. It's highly recommended to set the number of concurrent runs to the number of nodes in the cluster (aml compute defined in `compute`). The default value is 1. |
| **max_nodes**                | Maximum number of nodes to use in training. This value should be set only for the distirbuted TCN training. We encourage this value to be a multiple of max_concurrent_iterations. The multiple indicates the number of nodes that will be used by each concurrent iteration. Minimum acceptable value to kick off distributed training is 2. |
| **enable_early_termination** | Represents whether to enable of experiment termination if the loss score doesn't improve after 'x' number of iterations. In an Automated ML job, no early stopping is applied on first 20 iterations. The early stopping window starts only after first 20 iterations. The default value is `True`. |

### Specialized Forecasting Parameters
To define forecasting parameters for your experiment training, you can leverage the .set_forecast_settings() method. 
The table below details the forecasting parameters we will be passing into our experiment.

|Property|Description|
|-|-|
|**time_column_name**|The name of your time column.|
|**forecast_horizon**|The forecast horizon is how many periods forward you would like to forecast. This integer horizon is in units of the timeseries frequency (e.g. daily, weekly).|

### Trainining Settings Parameters
Next, we set parameters to configure training parameters such as enabling DNN and blocking/alowing specific models, and store them in the `training_settings` dictionary.

|Property|Description|
|-|-|
| **enable_dnn_training**          | A flag to turn on or off the inclusion of DNN based models to try out during model selection. The default value is `False`. |
| **training_mode**    |The training mode to use. The possible values are `distributed` and `non_distributed` (default value). When this parameter is set to `distributed` and `enablle_dnn_training=True`, a disitributed TCN run will be kicked off. |
| **allowed_training_algorithms**               | A list of Time Series Forecasting algorithms to try out as base model for model training in an experiment. If it is omitted or set to `None`, then all supported algorithms are used during experiment, except algorithms specified in `blocked_training_algorithms`. The default value is `None`. |
| **blocked_training_algorithms**               | A list of Time Series Forecasting algorithms to not run as base model while model training in an experiment. If it is omitted or set to `None`, then all supported algorithms are used during model training.  The default value is `None`.|
| **enable_model_explainability**    | Represents a flag to turn on model explainability like feature importance, of best model evaluated by Automated ML system. The default value is `True`. | -->

### 5.1 Configure the AutoML Job

First, we create a set of parameters which will be used to define the `forecasting()` factory function to kick off the model training. Think of this as the bare minimum settings that are necessary to define an AutoML job, and it contains the following properties:

|Property|Description|
|-|-|
| **task**               | forecasting |
| **target_column_name** | The name of the column to target for predictions. It must always be specified. This parameter is applicable to `training_data`, `validation_data` and `test_data`. |
| **primary_metric**     | This is the metric that you want to optimize. Forecasting supports the following primary metrics<ul><li>`normalized_root_mean_squared_error`</li><li>`normalized_mean_absolute_error`</li><li>`spearman_correlation`</li><li>`r2_score`</li></ul> We recommend using either the normalized root mean squared error (default metric) or normalized mean absolute error as a primary metric because they measure forecast accuracy. See the [link](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-automl-forecasting-faq#how-do-i-choose-the-primary-metric) for a more detailed discussion on this topic. |

Please note that the `forecasting()` function also requires a training and/or validation data. We will provide this is section 5.2.

In [None]:
# automl settings
task = "forecasting"
target_column_name = target_column_name
primary_metric = "normalized_root_mean_squared_error"

Next, we define the forecasting specific parameters for the experiment. Technically, there are only 2 parameters that are necessary for forecasting tasks (`forecast_horizon` and `time_column_name`). For a greater control over the experiment we also list optional parameters that users can set, they are marked with an asterisk $(*)$ in the table below. See the [forecast settings API doc](https://learn.microsoft.com/en-us/python/api/azure-ai-ml/azure.ai.ml.automl.forecastingjob#azure-ai-ml-automl-forecastingjob-set-forecast-settings) for a complete list of available parameters.

|Property|Description|
|-|-|
| **time_column_name**               | The name of the time column in the data. |
| **forecast_horizon** | The forecast horizon is how many periods forward you would like to forecast. This integer horizon is in units of the timeseries frequency (e.g. daily, weekly). |
| **time_series_id_column_names***    | The column names used to uniquely identify the time series in data that has multiple rows with the same timestamp. If the time series identifiers are not defined, AutoML will detect them for you. |

In [None]:
# forecast settings
forecast_horizon = 24
time_column_name = time_column_name
time_series_id_column_names = time_series_id_column_names

Next, we set parameters to configure limits such as timeouts, number of trails, etc.

|Property|Description|
|-|-|
| **timeout_minutes**          | Maximum amount of time in minutes that the whole AutoML job can take before the job terminates. This timeout includes setup, featurization and training runs but does not include the ensembling and model explainability runs at the end of the process since those actions need to happen once all the trials (children jobs) are done. If not specified, the default job's total timeout is 6 days (8,640 minutes). To specify a timeout less than or equal to 1 hour (60 minutes), make sure your dataset's size is not greater than 10,000,000 (rows times column) or an error results. |
| **max_trials**               | Represents the maximum number of trials an Automated ML job can try to run a training algorithm with different combination of hyperparameters. Its default value is set to 1000. If `enable_early_termination` is defined, then the number of trials used to run training algorithms can be smaller. |
| **max_concurrent_trials**    | The maximum number of trials (children jobs) that would be executed in parallel. It's highly recommended to set the number of concurrent runs to the number of nodes in the cluster (aml compute defined in `compute`). The default value is 1. |
| **max_nodes**                | Maximum number of nodes to use in training. This value should be set only for the distirbuted TCN training. We encourage this value to be a multiple of max_concurrent_iterations. The multiple indicates the number of nodes that will be used by each concurrent iteration. Minimum acceptable value to kick off distributed training is 2. |
| **enable_early_termination** | Represents whether to enable of experiment termination if the loss score doesn't improve after 'x' number of iterations. In an Automated ML job, no early stopping is applied on first 20 iterations. The early stopping window starts only after first 20 iterations. The default value is `True`. |

In [None]:
# training limits
timeout_minutes = 60
max_concurrent_trials = 5
max_trials = 15
max_nodes = 8
enable_early_termination = True

Finally, we set parameters to configure training parameters such as enabling DNN and blocking/alowing specific models.

|Property|Description|
|-|-|
| **enable_dnn_training**          | A flag to turn on or off the inclusion of DNN based models to try out during model selection. The default value is `False`. |
| **training_mode**    |The training mode to use. The possible values are `distributed` and `non_distributed` (default value). When this parameter is set to `distributed` and `enablle_dnn_training=True`, a disitributed TCN run will be kicked off. |
| **allowed_training_algorithms**               | A list of Time Series Forecasting algorithms to try out as base model for model training in an experiment. If it is omitted or set to `None`, then all supported algorithms are used during experiment, except algorithms specified in `blocked_training_algorithms`. The default value is `None`. |
| **blocked_training_algorithms**               | A list of Time Series Forecasting algorithms to not run as base model while model training in an experiment. If it is omitted or set to `None`, then all supported algorithms are used during model training.  The default value is `None`.|
| **enable_model_explainability**    | Represents a flag to turn on model explainability like feature importance, of best model evaluated by Automated ML system. The default value is `True`. |

In [None]:
# training settings
enable_dnn_training = True
training_mode = "distributed"
allowed_training_algorithms = ["TCNForecaster"]
blocked_training_algorithms = None
enable_model_explainability = True

### 5.2. Create the AutoML Forecasting Job

In [None]:
# --  5.2.1. Create forecasting job
training_job = automl.forecasting(
    compute=amlcompute_cluster_name,
    training_data=Input(type="uri_folder", path="./data/train"),
    validation_data=Input(type="uri_folder", path="./data/valid"),
    target_column_name=target_column_name,
    primary_metric=primary_metric,
    enable_model_explainability=enable_model_explainability,
    outputs={"best_model": Output(type=AssetTypes.CUSTOM_MODEL)},
)

In [None]:
# --  5.2.2 Define forecasting settings
training_job.set_forecast_settings(
    forecast_horizon=forecast_horizon,
    time_column_name=time_column_name,
    time_series_id_column_names=time_series_id_column_names,
)

In [None]:
# -- 5.2.3 Set training parameters
training_job.set_training(
    enable_dnn_training=enable_dnn_training,
    training_mode=training_mode,
    enable_model_explainability=enable_model_explainability,
    allowed_training_algorithms=allowed_training_algorithms,
)

In [None]:
# -- 5.2.4 Set training limits
training_job.set_limits(
    timeout_minutes=timeout_minutes,
    max_trials=max_trials,
    max_concurrent_trials=max_concurrent_trials,
    max_nodes=max_nodes,
)

### 5.3. Train the AutoML model

In [None]:
# Submit the AutoML job
returned_job = ml_client.jobs.create_or_update(training_job)

In [None]:
# Wait until AutoML training runs are finished
ml_client.jobs.stream(returned_job.name)

### 5.4. Download Best Model's Artifacts

Next, we download the best TCN model's artifacts which will be used in the evaluation pipeline and batch deployment.

##### Obtain the tracking URI for MLFlow

In [None]:
import mlflow

# Obtain the tracking URL from MLClient
MLFLOW_TRACKING_URI = ml_client.workspaces.get(
    name=ml_client.workspace_name
).mlflow_tracking_uri

print(MLFLOW_TRACKING_URI)

In [None]:
# Set the MLFLOW TRACKING URI

mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)

print("\nCurrent tracking uri: {}".format(mlflow.get_tracking_uri()))

In [None]:
local_dir = "./artifact_downloads"
os.makedirs(local_dir, exist_ok=True)

for child in ml_client.jobs.list(parent_job_name=returned_job.name):
    print(f"{child.name}\n---")
    if "_HD" in child.name:
        print(f'Best child run ID: {child.properties["best_child_run_id"]}\n---')
        # Download best run's artifacts/outputs
        local_path = mlflow.artifacts.download_artifacts(
            run_id=child.properties["best_child_run_id"],
            artifact_path="outputs",
            dst_path=local_dir,
        )
        break

## 6. Import Components From Registry <a id="ImportComponents">

An Azure Machine Learning component is a self-contained piece of code that does one step in a machine learning pipeline. A component is analogous to a function - it has a name, inputs, outputs, and a body. Components are the building blocks of the Azure Machine Learning pipelines. It's a good engineering practice to build a machine learning pipeline where each step has well-defined inputs and outputs. In Azure Machine Learning, a component represents one reusable step in a pipeline. Components are designed to help improve the productivity of pipeline building. Specifically, components offer:

- Well-defined interface: Components require a well-defined interface (input and output). The interface allows the user to build steps and connect steps easily. The interface also hides the complex logic of a step and removes the burden of understanding how the step is implemented.

- Share and reuse: As the building blocks of a pipeline, components can be easily shared and reused across pipelines, workspaces, and subscriptions. Components built by one team can be discovered and used by another team.

- Version control: Components are versioned. The component producers can keep improving components and publish new versions. Consumers can use specific component versions in their pipelines. This gives them compatibility and reproducibility.

For a more detailed information on this subject, refer to the this [link](https://learn.microsoft.com/en-us/azure/machine-learning/concept-component?view=azureml-api-2).

To import components, we need to get the registry. The following command obtains the public regsitry from which we will import components for our experiment.

In [None]:
# get registry for the inference component
ml_client_registry = MLClient(credential=credential, registry_name="azureml")
print(ml_client_registry)
print("---")

Next, we pull specific components and use them to build a pipeline of steps. For the illustration of the product evaluation workflow we will use the following components:

- Inference componnet: generates forecast for each partition. This can be done on the test and inference sets.
- Compute metrics component: calculates metrics per time series if the inference component was used on a test set.

In [None]:
inference_component = ml_client_registry.components.get(
    name="automl_forecasting_inference"
)
print(f"Inference component version: {inference_component.version}\n---")

In [None]:
compute_metrics_component = ml_client_registry.components.get(name="compute_metrics")
print(f"Compute metrics component version: {compute_metrics_component.version}\n---")

## 7. Build an evaluation pipeline <a id="Pipeline">

Now that we imported the components we will build an evaluation pipeline. This pipeline will allow us to genererate rolling forecast on the test set, and calculate metrics on the test set output.

### 7.1. Set Pipeline Parameters

AzureML components can only receive specific object types such as strings, JSON/YML files, URI Folders and URI Files. Other object types are not accepted. Because of this, we will create the pipeline by utilizing the `pipeline_parameters` dictionary. Most of the parameters in this dictionary will define the settings for the model training step of the pipeline and the remaining ones will be used in inference and compute metrics components. To have a better understanding of what these settings represents, we will build this dictionary in sequential steps. 

Next, we set declare the parameters that will be used be the inference and compute metrics components.

|Property|Description|
|-|-|
| **forecast_mode**               | Type of forecat to perform on the test set. Can be `recursive` or `rolling`. Rolling forecast can be used for the evaluation purpose. The default value is `recursive`. |
| **forecast_step**               | The forecast step used for rolling forecast. See the following [link](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-auto-train-forecast?view=azureml-api-2#evaluating-model-accuracy-with-a-rolling-forecast) for more details. |

In [None]:
pipeline_parameters = dict(forecast_mode="rolling", forecast_step=24)

In [None]:
@pipeline(
    description="AutoML Forecasting TCN Evaluation Pipeline",
)
def evaluation_pipeline(
    inference_data: Input(type=AssetTypes.MLTABLE),
    model_path: Input(type=AssetTypes.MLFLOW_MODEL),
):
    # 0. Extract pipeline parameters from the dictionary
    forecast_mode = pipeline_parameters.get("forecast_mode", "recursive")
    forecast_step = pipeline_parameters.get("forecast_step", 1)

    # 1. Inferencing step
    inference_node = inference_component(
        test_data=inference_data,
        model_path=model_path,
        target_column_name=target_column_name,
        forecast_mode=forecast_mode,
        forecast_step=forecast_step,
    )

    # 3. Metrics calculation step
    compute_metrics_node = compute_metrics_component(
        task="tabular-forecasting",
        prediction=inference_node.outputs.inference_output_file,
        ground_truth=inference_node.outputs.inference_output_file,
        evaluation_config=inference_node.outputs.evaluation_config_output_file,
    )
    compute_metrics_node.compute = (
        amlcompute_cluster_name  # compute_name # amlcompute_cluster_name
    )

    # 4. Specify pipeline outputs
    return {
        "output_files": compute_metrics_node.outputs.evaluation_result,
        "forecast_output": inference_node.outputs.inference_output_file,
    }

## 8. Kick Off the Evaluation Pipeline Run <a id="PipelineRun">
    
Now that the pipeline is defined, we will use it to kick off an experiment which will inference and evaluate the performance for the best AutoML model.

In [None]:
pipeline_job = evaluation_pipeline(
    inference_data=Input(type=AssetTypes.URI_FOLDER, path="./data/test"),
    model_path=Input(
        type=AssetTypes.MLFLOW_MODEL, path="./artifact_downloads/outputs/mlflow-model"
    ),
)
print(pipeline_job)

In [None]:
# set pipeline level compute
pipeline_job.settings.default_compute = amlcompute_cluster_name

In [None]:
experiment_name = "tcn-evaluation-" + datetime.datetime.now().strftime("%Y%m%d")

pipeline_submitted_job = ml_client.jobs.create_or_update(
    pipeline_job,
    experiment_name=experiment_name,
)
ml_client.jobs.stream(pipeline_submitted_job.name)

In [None]:
# To rehydrate run
# RUN_ID = "<Paste the PipelineRunId from the output of the previous cell.>"
# pipeline_submitted_job = ml_client.jobs.get(RUN_ID)
# pipeline_submitted_job

## 9. Download Pipeline Output <a id="DownloadOutput">
    
Next, we will download the output files generated by the compute metrics components for each executed pipeline and save them in the corresponfing subfolder of the `output` folder. First, we create corresponding output directories. Then, we execute the `ml_client.jobs.download` command which downloads experiments' outputs.

In [None]:
# create output directories
automl_output_dir = os.path.join(os.getcwd(), "output/automl")
os.makedirs(automl_output_dir, exist_ok=True)

In [None]:
ml_client.jobs.download(
    name=pipeline_submitted_job.name,
    download_path=automl_output_dir,
    output_name="output_files",
)

In [None]:
ml_client.jobs.download(
    name=pipeline_submitted_job.name,
    download_path=automl_output_dir,
    output_name="forecast_output",
)

## 10. Evaluation Results <a id="PipelineRun">

### 10.1. Examine Metrics

In this section, we compare metrics for the 2 pipeline runs to quantify accuracy improvement of AutoML over the baseline model. First, we compare metrics that are calculated for the entire dataset. Since there are 10 unique time series in the test dataset, these individual metrics are aggregated into a single number. The non-normalized metrics can be misleading due to the difference in scales of each unique time series. The following [article (placeholder)](https://review.learn.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml?view=azureml-api-2&branch=pr-en-us-238443#forecasting-metrics-normalization-and-aggregation) explains this topic in a greater detail.

The code in the next cell loads dataset metrics for each of the experiments.

In [None]:
metrics_artifacts_path = os.path.join(
    "named-outputs", "output_files", "evaluationResult"
)

with open(os.path.join(automl_output_dir, metrics_artifacts_path, "metrics.json")) as f:
    metrics_automl_series = json.load(f)
    metrics_automl = (
        pd.Series(metrics_automl_series).to_frame(name="score").reset_index(drop=False)
    )
metrics_automl

#### 10.1.1. Detailed Metrics

Next, we will load and examine the detailed accuracy metrics since the aggregate metrics may not convey enough information to make a decision about product accuracy. It may be helpful to examine metrics at a more granular level. We will extract metrics per time series. To do this, we create a helper function `extract_specific_metric` which reads the JSON file and returns a specified metric for each time series. Even though the file contains the following metrics, we will  we will focus on the normalized root mean squared error (NRMSE) accuracy metric for illustration purposes. <ul>
    <li> `explained_variance` </li>
    <li> `mean_absolute_error` </li>
    <li> `mean_absolute_percentage_error`</li>
    <li> `median_absolute_error`</li>
    <li> `normalized_median_absolute_error`</li>
    <li> `normalized_root_mean_squared_error`</li>
    <li> `normalized_root_mean_squared_error`</li>
    <li> `normalized_root_mean_squared_log_error`</li>
    <li> `r2_score`</li>
    <li> `root_mean_squared_log_error`</li>
    <li> `root_mean_squared_error`</li>
    <li> `root_mean_squared_log_error`</li>
</ul>

In [None]:
def extract_specific_metric(path, metric_name):
    with open(path) as f:
        artifact = json.load(f)
    all_metrics = pd.DataFrame(artifact["data"])
    index_scores = ["customer_id"] + [metric_name]
    return all_metrics[index_scores]

In [None]:
metrics_table_relative_path = os.path.join(
    metrics_artifacts_path, "artifacts", "forecast_time_series_id_distribution_table"
)
automl_detailed_metrics = extract_specific_metric(
    os.path.join(automl_output_dir, metrics_table_relative_path),
    "normalized_root_mean_squared_error",
)
automl_detailed_metrics

### 10.2. Generate Time Series Plots

Here, we generate forecast versus actuals plot for the test set for both the best model and the baseline. Since we use rolling evaluation with the step size of 24 hours, this mimics the behavior of putting both models in production and monitoring their behavior for the duration of the test set. This step helps you make informed decisions about model performance and saves numerous costs associated with productionalizing the model and monitoring its performance in real life. 

In the next block of code, we, load the test set output for each of the runs and merge the data. Then, we generate and save time series plots.

In [None]:
forecast_table_relative_path = os.path.join(
    "named-outputs", "forecast_output", "inference_output_file"
)

forecast_column_name = "automl_prediction"
actual_column_name = "automl_actual"
forecast_origin_column_name = "automl_forecast_origin"

backtest = pd.read_json(
    os.path.join(automl_output_dir, forecast_table_relative_path), lines=True
)
print(f"AutoML backtest table size: {backtest.shape}\n---")
backtest.head()

In [None]:
from scripts.helper_scripts import draw_one_plot
from matplotlib import pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages

plot_filename = "forecast_vs_actual.pdf"

pdf = PdfPages(os.path.join(os.getcwd(), "./output", plot_filename))
for _, one_forecast in backtest.groupby("customer_id"):
    one_forecast[time_column_name] = pd.to_datetime(one_forecast[time_column_name])
    one_forecast.sort_values(time_column_name, inplace=True)
    draw_one_plot(
        one_forecast,
        time_column_name,
        target_column_name,
        ["customer_id"],
        [actual_column_name, forecast_column_name],
        pdf,
        plot_predictions=True,
    )
pdf.close()

In [None]:
from IPython.display import IFrame

IFrame(os.path.join("./output/forecast_vs_actual.pdf"), width=800, height=300)

## 11. Deployment <a id="Deployment">

In this section, we will illustrate how to perfrom batch inference using the inference component.  Batch endpoints are endpoints that are used to do batch inferencing on large volumes of data in asynchronous way. Batch endpoints receive pointers to data and run jobs asynchronously to process the data in parallel on compute clusters and store outputs to a datastore for further analysis. For more information on batch endpoints see this [link](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-batch-scoring-pipeline?view=azureml-api-2&tabs=python).

### 11.1. Build the pipeline

First, we create a pipeline consisting of one step that invokes the inference componnet. This pipeline takes the inference data set as a parameter, generates a forecast and returns the predictions. The only one named output of this pipeline will be `forecast`. It is a table with predictions, stored in JSONL format. In the current setup, users can generate distribution forecast. To do this, uncomment the `forecast_quantiles` line in the pipeline definition and specify desired quantiles as a string. In the code example below the valuess of 0.1 and 0.9 are entered as `"0.1,0.9"`.

In [None]:
# Define pipeline
@pipeline(
    description="AutoML Inferencing Pipeline",
)
def demand_inference_single_model(
    test_data: Input(type=AssetTypes.MLTABLE),
    model_path: Input(type=AssetTypes.MLFLOW_MODEL),
    target_column_name: Input(type="string"),
    forecast_mode: Input(type="string"),
):
    inference_node = inference_component(
        test_data=test_data,
        model_path=model_path,
        target_column_name=target_column_name,
        forecast_mode=forecast_mode,
        # forecast_quantiles="0.1,0.9"
    )
    return {
        "forecast": inference_node.outputs.inference_output_file,
    }

### 11.2. Create Batch Endpoint

A batch endpoint's name needs to be unique in each region since the name is used to construct the invocation URI. To ensure uniqueness, append any trailing characters to the name specified in the following code.

In [None]:
import random
import string

# Creating a unique endpoint name by including a random suffix
allowed_chars = string.ascii_lowercase + string.digits
endpoint_suffix = "".join(random.choice(allowed_chars) for x in range(5))
endpoint_name = "sdk-tcn-" + endpoint_suffix

print(f"Endpoint name: {endpoint_name}\n---")

In [None]:
endpoint = BatchEndpoint(
    name=endpoint_name,
    description="An endpoint for component deployments",
    properties={"ComponentDeployment.Enabled": True},
)

The following command creates the endpoint in the workspace usign the MLClient created earlier. This command will start the endpoint creation and return a confirmation response while the endpoint creation continues.

In [None]:
ml_client.batch_endpoints.begin_create_or_update(endpoint).result()

### 11.2. Create the Deployment

A deployment is a set of resources required for hosting the model that does the actual inferencing. Our pipeline is defined in a function. To transform it to a component, you'll use the `build()` method. Pipeline components are reusable compute graphs that can be included in batch deployments or used to compose more complex pipelines.

In [None]:
pipeline_component = demand_inference_single_model._pipeline_builder.build()

Now we can define the deployment

In [None]:
deployment = PipelineComponentBatchDeployment(
    name="sdk-tcn-deployment",
    description="A TCN deployment.",
    endpoint_name=endpoint.name,
    component=pipeline_component,
    settings={"default_compute": amlcompute_cluster_name},
)

The following command creates the deployment in the workspace usign the MLClient created earlier. This command will start the deployment creation and return a confirmation response while the deployment creation continues.

In [None]:
ml_client.batch_deployments.begin_create_or_update(deployment).result()

### 11.3. Invoke the Endpoint

The next cell contains the command that invokes the endpoint for batch inference job. The `invoke` method contains the `inputs` parameter. This parameter contains the inputs necessary to execute the inference component on the endpoint. To convince yourself this is the case, compare the input parameters for the `inference_component_from_registry` in section 7.1 with the `inputs` we are providing in the next cell. They are identical.

Notice, the the `forecast_mode` is set to `"recursive"`. In the evaluation pipeline this component was used to generate rolling forecast to evaluate model performance on the test set. For more details on rolling evaluation, see our [forecasting model evaluation article](placeholder). Here, we are using it to generate a forecast.

In [None]:
batch_job = ml_client.batch_endpoints.invoke(
    endpoint_name=endpoint.name,
    deployment_name=deployment.name,
    inputs={
        "test_data": Input(path=os.path.join(os.getcwd(), "data", "inference")),
        "model_path": Input(
            path=os.path.join("./artifact_downloads", "outputs", "mlflow-model")
        ),
        "target_column_name": Input(type="string", default=target_column_name),
        "forecast_mode": Input(type="string", default="recursive"),
    },
)

Next, we will stream the job output to monitor the execution.

In [None]:
job_name = batch_job.name
batch_job = ml_client.jobs.get(name=job_name)
print(f"Batch job status: {batch_job.status}\n---")
ml_client.jobs.stream(name=job_name)

### 11.4. Download Forecast Output

Finally, we download the forecast output and print the first few rows.

In [None]:
fcst_output_dir = os.path.join(os.getcwd(), "forecast")

for child in ml_client.jobs.list(parent_job_name=job_name):
    print(f"{child.name}\n---\nDownloading data ...\n---")
    for attempt in range(3):
        print(f"Attempt: {attempt}")
        try:
            ml_client.jobs.download(
                child.name,
                download_path=fcst_output_dir,
                output_name="inference_output_file",
            )
            break
        except BaseException:
            sleep(10)

In [None]:
fcst_df = pd.read_json(
    os.path.join(
        fcst_output_dir,
        "named-outputs",
        "inference_output_file",
        "inference_output_file",
    ),
    orient="records",
    lines=True,
)
fcst_df[time_column_name] = pd.to_datetime(fcst_df[time_column_name], unit="ms")
fcst_df.head()

### 11.5. [Optional] Delete the Endpoint

In [None]:
ml_client.online_endpoints.begin_delete(name=endpoint.name).wait()