sdk/python/using-mlflow/runs-management/run_history.ipynb (551 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Manage runs and experiments with MLFlow\n",
"\n",
"Experiments and runs can be queried using MLflow client in Azure ML. You are not longer in need to use Azure ML specific SDKs to manage anything that happens inside of a training job, allowing you to remove dependencies and create a more seamless transition between local runs and cloud.\n",
"\n",
"MLflow client allows you to:\n",
"\n",
"* Create, delete and search for experiments in a workspace\n",
"* Start, stop, cancel and query runs for experiments.\n",
"* Track and retrieve metrics, parameters, artifacts and models from runs."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Prerequisites to run this notebook"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Ensure you have the dependencies for this notebook\n",
"%pip install -r run_history.txt"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Import required namespaces:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import mlflow"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the following notebook, we will explore an example that uses the following naming convention:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"experiment_name = \"heart-classifier-sample\"\n",
"model_name = \"heart-classifier\"\n",
"artifact_path = \"pipeline\""
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"To demonstrate how to manage runs, let's create a couple of experiments and runs (3 of them):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from trainer import train_and_log"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mlflow.set_experiment(experiment_name=experiment_name)\n",
"input_data = \"https://azuremlexampledata.blob.core.windows.net/data/heart-disease-uci/data/heart.csv\"\n",
"\n",
"for run_idx in range(3):\n",
" with mlflow.start_run(run_name=f\"{experiment_name}-run-{run_idx}\"):\n",
" train_and_log(input_data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Getting all the experiments\n",
"\n",
"You can get all the active experiments in the workspace using MLFlow:"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"> __MLflow 2.0 advisory__: In legacy versions of MLflow (<2.0) use method `list_experiments` instead:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"experiments = mlflow.search_experiments()\n",
"for exp in experiments:\n",
" print(exp.name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you want to retrieve archived experiments too, then include the option `ViewType.ALL`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from mlflow.entities import ViewType\n",
"\n",
"experiments = mlflow.search_experiments(view_type=ViewType.ALL)\n",
"for exp in experiments:\n",
" print(exp.name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Getting an specific experiment"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"exp = mlflow.get_experiment_by_name(experiment_name)\n",
"print(exp)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Getting runs inside an experiments\n",
"\n",
"MLflow allows to search runs inside of any experiment. You will need either the experiment ID or the experiment name:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Getting all the runs"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mlflow.search_runs(exp.experiment_id)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> Notice that `experiment_ids` supports providing an array of experiments, so you can search runs across multiple experiments if required. This may be useful in case you want to compare runs of the same model when it is being logged in different experiments (by different people, different project iterations, etc). You can also use `search_all_experiments=True` if you want to search across all the experiments in the workspace."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Filtering and ordering runs"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"By default, experiments are ordered descending by `start_time`, which is the time the experiment was queue in Azure ML. However, you can change this default by using the parameter `order_by`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mlflow.search_runs(exp.experiment_id, order_by=[\"start_time DESC\"], max_results=2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also search by metrics to know which run generated the best results:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> **Notes:** Expressions containing `metrics.*` in the parameter `order_by` is not supported by the moment. Please use `order_values` method from Pandas as shown in the next example."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mlflow.search_runs(exp.experiment_id).sort_values(\"metrics.accuracy\", ascending=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also look for a run with an specific combination in the hyperparameters:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mlflow.search_runs(\n",
" exp.experiment_id, filter_string=\"params.num_boost_round='100'\", max_results=2\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also filter experiment by status. This is useful to find runs that are running, completed, canceled or failed. In MLflow, `status` is an `attribute`, so we can access this value using the expression `attributes.status`. The following table shows the possible values:\n",
"\n",
"| Azure ML Job status | MLFlow's `attributes.status` | Meaning |\n",
"| :-: | :-: | :- |\n",
"| Not started | `SCHEDULED` | The job/run was just registered in Azure ML but it has processed it yet. |\n",
"| Queue | `SCHEDULED` | The job/run is scheduled for running, but it hasn't started yet. |\n",
"| Preparing | `SCHEDULED` | The job/run has not started yet, but a compute has been allocated for the execution and it is on building state. |\n",
"| Running | `RUNNING` | The job/run is currently under active execution. |\n",
"| Completed | `FINISHED` | The job/run has completed without errors. |\n",
"| Failed | `FAILED` | The job/run has completed with errors. |\n",
"| Canceled | `KILLED` | The job/run has been canceled or killed by the user/system. |\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> **Notes:** Expressions containing `attributes.status` in the parameter `filter_string` are not support at the moment. Please use Pandas filtering expressions as shown in the next example."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"runs = mlflow.search_runs(exp.experiment_id)\n",
"runs[runs.status == \"FINISHED\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"By default, mlflow returns runs as a pandas Dataframe. You can get Python objects if required, which may be useful to get details about them:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"runs = mlflow.search_runs(\n",
" exp.experiment_id,\n",
" filter_string=\"params.num_boost_round='100'\",\n",
" output_format=\"list\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For instance, you can get the last run matching the search criteria by:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"last_run = runs[-1]\n",
"print(last_run)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Getting metrics, params, artifacts and models from a run\n",
"\n",
"Once you have identified the run you are interested in, you can get details about it to further explore."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Metrics"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"last_run.data.metrics"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Parameters"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the same way, you can get all the parameters of the run:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"last_run.data.params"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Artifacts"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For those runs that log artifacts, you can also list them. However, that can't be done from the run object itself, but you need an MLflow client object:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"client = mlflow.tracking.MlflowClient()\n",
"client.list_artifacts(last_run.info.run_id)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see in this example, three artifacts are availble in the run:\n",
"\n",
"* `feature_importance_weight.json` -> the feature importance of the model we created.\n",
"* `feature_importance_weight.png` -> a plot of the feature importance mentioned above, stored as an image.\n",
"* `classifier`, the path where the model is stored. Note that this artifact is a directory.\n",
"\n",
"You can download any artifact using the method `download_artifact`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"file_path = mlflow.artifacts.download_artifacts(\n",
" run_id=last_run.info.run_id, artifact_path=\"feature_importance_weight.png\"\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"> __MLflow 2.0 advisory__: In legacy versions of MLflow (<2.0), use the method `MlflowClient.download_artifacts()` instead."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since the artifact is an image, we can display it in the following way:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import matplotlib.image as img\n",
"\n",
"image = img.imread(file_path)\n",
"plt.imshow(image)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Models\n",
"\n",
"Models can also be logged in the run and then retrieved directly from it. To retrieve it, you need to know the artifact's path where it is stored. The method `list_artifacats` can be used to find artifacts that are representing a model. Those that are folder, for instance."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can download a model them by indicating the path where the model is stored using the `download_artifact` method:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model_local_path = mlflow.artifacts.download_artifacts(\n",
" run_id=last_run.info.run_id, artifact_path=artifact_path\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"You can then load the model from the local path using MLflow:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = mlflow.sklearn.load_model(model_local_path)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"MLflow also allows you to skip the download path and directly reference the model from the run. You can achieve this using the following URI format:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model = mlflow.sklearn.load_model(f\"runs:/{last_run.info.run_id}/{artifact_path}\")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"> This will download the model to a temporary folder and load the model from there. Note that loading a model requires you to have all its dependencies already installed in you environment. MLflow will still check if versions installed are present before loading."
]
}
],
"metadata": {
"interpreter": {
"hash": "5b8a1be1e2403238145698c0f44c56932b9f9ce2a9391696719cccf625cf1ff9"
},
"kernelspec": {
"display_name": "Python 3.10 - SDK V2",
"language": "python",
"name": "python310-sdkv2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
}
},
"nbformat": 4,
"nbformat_minor": 4
}