tutorials/get-started-notebooks/train-model.ipynb

{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Day 1: Train a model\n", "\n", "Learn how a data scientist uses Azure Machine Learning to train a model. In this example, we use the associated credit card dataset to show how you can use Azure Machine Learning for a classification problem. The goal is to predict if a customer has a high likelihood of defaulting on a credit card payment.\n", "\n", "The training script handles the data preparation, then trains and registers a model. This tutorial takes you through steps to submit a cloud-based training job (command job). If you would like to learn more about how to load your data into Azure, see [Create data assets](how-to-create-data-assets.md). \n", "\n", "The steps are:\n", "\n", " * Get a handle to your Azure Machine Learning workspace\n", " * Create a job environment\n", " * Create your training script\n", " * Create and run your command job to run the training script on the compute resource, configured with the appropriate job environment and the data source\n", " * View the output of your training script\n", " * Deploy the newly-trained model as an endpoint\n", " * Call the Azure Machine Learning endpoint for inferencing" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Prerequisites\n", "\n", "* If you opened this notebook from Azure Machine Learning studio, you need a compute instance to run the code. If you don't have a compute instance, select **Create compute** on the toolbar to first create one. You can use all the default settings. \n", "\n", " ![Create compute](./media/create-compute.png)\n", "\n", "* If your Azure Machine Learning workspace is configured with a managed virtual network, you may need to add outbound rules to allow access to the public Python package repositories. For more information, see [Scenario: Access public machine learning packages](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-managed-network#scenario-access-public-machine-learning-packages).\n", "\n", "* If you're seeing this notebook elsewhere, complete [Create resources you need to get started](https://docs.microsoft.com/azure/machine-learning/quickstart-create-resources) to create an Azure Machine Learning workspace and a compute instance.\n", "\n", "## Set your kernel\n", "\n", "* If your compute instance is stopped, start it now. \n", " \n", " ![Start compute](./media/start-compute.png)\n", "\n", "* Once your compute instance is running, make sure the that the kernel, found on the top right, is `Python 3.10 - SDK v2`. If not, use the dropdown to select this kernel.\n", "\n", " ![Set the kernel](./media/set-kernel.png)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "nteract": { "transient": { "deleting": false } } }, "source": [ "## Use a command job to train a model in Azure Machine Learning\n", "\n", "To train a model, you need to submit a *job*. The type of job you'll submit in this tutorial is a *command job*. Azure Machine Learning offers several different types of jobs to train models. Users can select their method of training based on complexity of the model, data size, and training speed requirements. In this tutorial, you'll learn how to submit a *command job* to run a *training script*. \n", "\n", "A command job is a function that allows you to submit a custom training script to train your model. This can also be defined as a custom training job. A command job in Azure Machine Learning is a type of job that runs a script or command in a specified environment. You can use command jobs to train models, process data, or any other custom code you want to execute in the cloud. \n", "\n", "In this tutorial, we'll focus on using a command job to create a custom training job that we'll use to train a model. For any custom training job, the below items are required:\n", "\n", "* compute resource (usually a compute cluster or [serverless compute (preview)](https://learn.microsoft.com/azure/machine-learning/how-to-use-serverless-compute?view=azureml-api-2&tabs=python))\n", "* environment\n", "* data\n", "* command job \n", "* training script\n", "\n", "\n", "In this tutorial we'll provide all these items for our example: creating a classifier to predict customers who have a high likelihood of defaulting on credit card payments.\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Create handle to workspace\n", "\n", "Before we dive in the code, you need a way to reference your workspace. You'll create `ml_client` for a handle to the workspace. You'll then use `ml_client` to manage resources and jobs.\n", "\n", "In the next cell, enter your Subscription ID, Resource Group name and Workspace name. To find these values:\n", "\n", "1. In the upper right Azure Machine Learning studio toolbar, select your workspace name.\n", "1. Copy the value for workspace, resource group and subscription ID into the code.\n", "1. You'll need to copy one value, close the area and paste, then come back for the next one." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "gather": { "logged": 1677262283435 }, "name": "credential" }, "outputs": [], "source": [ "from azure.ai.ml import MLClient\n", "from azure.identity import DefaultAzureCredential\n", "\n", "# authenticate\n", "credential = DefaultAzureCredential()\n", "\n", "SUBSCRIPTION = \"<SUBSCRIPTION_ID>\"\n", "RESOURCE_GROUP = \"<RESOURCE_GROUP>\"\n", "WS_NAME = \"<AML_WORKSPACE_NAME>\"\n", "# Get a handle to the workspace\n", "ml_client = MLClient(\n", " credential=credential,\n", " subscription_id=SUBSCRIPTION,\n", " resource_group_name=RESOURCE_GROUP,\n", " workspace_name=WS_NAME,\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "> [!NOTE]\n", "> Creating MLClient will not connect to the workspace. The client initialization is lazy, it will wait for the first time it needs to make a call (this will happen in the next code cell)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Verify that the handle works correctly.\n", "# If you ge an error here, modify your SUBSCRIPTION, RESOURCE_GROUP, and WS_NAME in the previous cell.\n", "ws = ml_client.workspaces.get(WS_NAME)\n", "print(ws.location, \":\", ws.resource_group)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Create a job environment\n", "\n", "To run your Azure Machine Learning job on your compute resource, you need an [environment](https://learn.microsoft.com/articles/machine-learning/concept-environments). An environment lists the software runtime and libraries that you want installed on the compute where you’ll be training. It's similar to your python environment on your local machine.\n", "\n", "Azure Machine Learning provides many curated or ready-made environments, which are useful for common training and inference scenarios. \n", "\n", "In this example, you'll create a custom conda environment for your jobs, using a conda yaml file.\n", "\n", "First, create a directory to store the file in." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "gather": { "logged": 1677262301389 }, "name": "dependencies_dir" }, "outputs": [], "source": [ "import os\n", "\n", "dependencies_dir = \"./dependencies\"\n", "os.makedirs(dependencies_dir, exist_ok=True)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "The cell below uses IPython magic to write the conda file into the directory you just created." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "write_model" }, "outputs": [], "source": [ "%%writefile {dependencies_dir}/conda.yaml\n", "name: model-env\n", "channels:\n", " - conda-forge\n", "dependencies:\n", " - python=3.8\n", " - numpy=1.21.2\n", " - pip=21.2.4\n", " - scikit-learn=1.0.2\n", " - scipy=1.7.1\n", " - pandas>=1.1,<1.2\n", " - pip:\n", " - inference-schema[numpy-support]==1.3.0\n", " - mlflow==2.8.0\n", " - mlflow-skinny==2.8.0\n", " - azureml-mlflow==1.51.0\n", " - psutil>=5.8,<5.9\n", " - tqdm>=4.59,<4.60\n", " - ipykernel~=6.0\n", " - matplotlib" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "The specification contains some usual packages, that you'll use in your job (numpy, pip).\n", "\n", "Reference this *yaml* file to create and register this custom environment in your workspace:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "gather": { "logged": 1677262314695 }, "name": "custom_env_name" }, "outputs": [], "source": [ "from azure.ai.ml.entities import Environment\n", "\n", "custom_env_name = \"aml-scikit-learn\"\n", "\n", "custom_job_env = Environment(\n", " name=custom_env_name,\n", " description=\"Custom environment for Credit Card Defaults job\",\n", " tags={\"scikit-learn\": \"1.0.2\"},\n", " conda_file=os.path.join(dependencies_dir, \"conda.yaml\"),\n", " image=\"mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest\",\n", ")\n", "custom_job_env = ml_client.environments.create_or_update(custom_job_env)\n", "\n", "print(\n", " f\"Environment with name {custom_job_env.name} is registered to workspace, the environment version is {custom_job_env.version}\"\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Configure a training job using the command function\n", "\n", "You create an Azure Machine Learning *command job* to train a model for credit default prediction. The command job runs a *training script* in a specified environment on a specified compute resource. You've already created the environment and the compute cluster. Next you'll create the training script. In our specific case, we're training our dataset to produce a classifier using the `GradientBoostingClassifier` model. \n", "\n", "The *training script* handles the data preparation, training and registering of the trained model. The method `train_test_split` handles splitting the dataset into test and training data. In this tutorial, you'll create a Python training script. \n", "\n", "Command jobs can be run from CLI, Python SDK, or studio interface. In this tutorial, you'll use the Azure Machine Learning Python SDK v2 to create and run the command job.\n", "\n", "## Create training script\n", "\n", "Let's start by creating the training script - the *main.py* python file.\n", "\n", "First create a source folder for the script:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "gather": { "logged": 1677262322022 }, "name": "train_src_dir" }, "outputs": [], "source": [ "import os\n", "\n", "train_src_dir = \"./src\"\n", "os.makedirs(train_src_dir, exist_ok=True)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This script handles the preprocessing of the data, splitting it into test and train data. It then consumes this data to train a tree based model and return the output model. \n", "\n", "[MLFlow](https://learn.microsoft.com/articles/machine-learning/concept-mlflow) is used to log the parameters and metrics during our job. The MLFlow package allows you to keep track of metrics and results for each model Azure trains. We'll be using MLFlow to first get the best model for our data, then we'll view the model's metrics on the Azure studio. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "name": "write_main" }, "outputs": [], "source": [ "%%writefile {train_src_dir}/main.py\n", "import os\n", "import argparse\n", "import pandas as pd\n", "import mlflow\n", "import mlflow.sklearn\n", "from sklearn.ensemble import GradientBoostingClassifier\n", "from sklearn.metrics import classification_report\n", "from sklearn.model_selection import train_test_split\n", "\n", "def main():\n", " \"\"\"Main function of the script.\"\"\"\n", "\n", " # input and output arguments\n", " parser = argparse.ArgumentParser()\n", " parser.add_argument(\"--data\", type=str, help=\"path to input data\")\n", " parser.add_argument(\"--test_train_ratio\", type=float, required=False, default=0.25)\n", " parser.add_argument(\"--n_estimators\", required=False, default=100, type=int)\n", " parser.add_argument(\"--learning_rate\", required=False, default=0.1, type=float)\n", " parser.add_argument(\"--registered_model_name\", type=str, help=\"model name\")\n", " args = parser.parse_args()\n", " \n", " # Start Logging\n", " mlflow.start_run()\n", "\n", " # enable autologging\n", " mlflow.sklearn.autolog()\n", "\n", " ###################\n", " #<prepare the data>\n", " ###################\n", " print(\" \".join(f\"{k}={v}\" for k, v in vars(args).items()))\n", "\n", " print(\"input data:\", args.data)\n", " \n", " credit_df = pd.read_csv(args.data, header=1, index_col=0)\n", "\n", " mlflow.log_metric(\"num_samples\", credit_df.shape[0])\n", " mlflow.log_metric(\"num_features\", credit_df.shape[1] - 1)\n", "\n", " #Split train and test datasets\n", " train_df, test_df = train_test_split(\n", " credit_df,\n", " test_size=args.test_train_ratio,\n", " )\n", " ####################\n", " #</prepare the data>\n", " ####################\n", "\n", " ##################\n", " #<train the model>\n", " ##################\n", " # Extracting the label column\n", " y_train = train_df.pop(\"default payment next month\")\n", "\n", " # convert the dataframe values to array\n", " X_train = train_df.values\n", "\n", " # Extracting the label column\n", " y_test = test_df.pop(\"default payment next month\")\n", "\n", " # convert the dataframe values to array\n", " X_test = test_df.values\n", "\n", " print(f\"Training with data of shape {X_train.shape}\")\n", "\n", " clf = GradientBoostingClassifier(\n", " n_estimators=args.n_estimators, learning_rate=args.learning_rate\n", " )\n", " clf.fit(X_train, y_train)\n", "\n", " y_pred = clf.predict(X_test)\n", "\n", " print(classification_report(y_test, y_pred))\n", " ###################\n", " #</train the model>\n", " ###################\n", "\n", " ##########################\n", " #<save and register model>\n", " ##########################\n", " # Registering the model to the workspace\n", " print(\"Registering the model via MLFlow\")\n", " mlflow.sklearn.log_model(\n", " sk_model=clf,\n", " registered_model_name=args.registered_model_name,\n", " artifact_path=args.registered_model_name,\n", " )\n", "\n", " # Saving the model to a file\n", " mlflow.sklearn.save_model(\n", " sk_model=clf,\n", " path=os.path.join(args.registered_model_name, \"trained_model\"),\n", " )\n", " ###########################\n", " #</save and register model>\n", " ###########################\n", " \n", " # Stop Logging\n", " mlflow.end_run()\n", "\n", "if __name__ == \"__main__\":\n", " main()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "In this script, once the model is trained, the model file is saved and registered to the workspace. Registering your model allows you to store and version your models in the Azure cloud, in your workspace. Once you register a model, you can find all other registered model in one place in the Azure Studio called the model registry. The model registry helps you organize and keep track of your trained models. \n", "\n", "## Configure the command\n", "\n", "Now that you have a script that can perform the classification task, use the general purpose **command** that can run command line actions. This command line action can be directly calling system commands or by running a script. \n", "\n", "Here, create input variables to specify the input data, split ratio, learning rate and registered model name. The command script will:\n", "* Use the environment created earlier - you can use the `@latest` notation to indicate the latest version of the environment when the command is run.\n", "* Configure the command line action itself - `python main.py` in this case. The inputs/outputs are accessible in the command via the `${{ ... }}` notation.\n", "* Since a compute resource was not specified, the script will be run on a [serverless compute cluster](https://learn.microsoft.com/azure/machine-learning/how-to-use-serverless-compute?view=azureml-api-2&tabs=python) that is automatically created." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "gather": { "logged": 1677262332367 }, "name": "registered_model_name" }, "outputs": [], "source": [ "from azure.ai.ml import command\n", "from azure.ai.ml import Input\n", "\n", "registered_model_name = \"credit_defaults_model\"\n", "\n", "job = command(\n", " inputs=dict(\n", " data=Input(\n", " type=\"uri_file\",\n", " path=\"https://azuremlexamples.blob.core.windows.net/datasets/credit_card/default_of_credit_card_clients.csv\",\n", " ),\n", " test_train_ratio=0.2,\n", " learning_rate=0.25,\n", " registered_model_name=registered_model_name,\n", " ),\n", " code=\"./src/\", # location of source code\n", " command=\"python main.py --data ${{inputs.data}} --test_train_ratio ${{inputs.test_train_ratio}} --learning_rate ${{inputs.learning_rate}} --registered_model_name ${{inputs.registered_model_name}}\",\n", " environment=\"aml-scikit-learn@latest\",\n", " display_name=\"credit_default_prediction\",\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Submit the job \n", "\n", "It's now time to submit the job to run in Azure Machine Learning studio. This time you'll use `create_or_update` on `ml_client`. `ml_client` is a client class that allows you to connect to your Azure subscription using Python and interact with Azure Machine Learning services. `ml_client` allows you to submit your jobs using Python." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "gather": { "logged": 1677262345449 }, "name": "create_job" }, "outputs": [], "source": [ "ml_client.create_or_update(job)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## View job output and wait for job completion\n", "\n", "View the job in Azure Machine Learning studio by selecting the link in the output of the previous cell. The output of this job will look like this in the Azure Machine Learning studio. Explore the tabs for various details like metrics, outputs etc. Once completed, the job will register a model in your workspace as a result of training. \n", "\n", "![Screenshot shows the overview page for the job.](./media/view-job.gif)\n", "\n", "> [!IMPORTANT]\n", "> Wait until the status of the job is complete before returning to this notebook to continue. The job will take 2 to 3 minutes to run. It could take longer (up to 10 minutes) if the compute cluster has been scaled down to zero nodes and custom environment is still building.\n", "\n", "When you run the cell, the notebook output shows a link to the job's details page on Azure Studio. Alternatively, you can also select Jobs on the left navigation menu. A job is a grouping of many runs from a specified script or piece of code. Information for the run is stored under that job. The details page gives an overview of the job, the time it took to run, when it was created, etc. The page also has tabs to other information about the job such as metrics, Outputs + logs, and code. Listed below are the tabs available in the job's details page:\n", "\n", "* Overview: The overview section provides basic information about the job, including its status, start and end times, and the type of job that was run\n", "* Inputs: The input section lists the data and code that were used as inputs for the job. This section can include datasets, scripts, environment configurations, and other resources that were used during training. \n", "* Outputs + logs: The Outputs + logs tab contains logs generated while the job was running. This tab assists in troubleshooting if anything goes wrong with your training script or model creation.\n", "* Metrics: The metrics tab showcases key performance metrics from your model such as training score, f1 score, and precision score. " ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Clean up resources\n", "\n", "If you plan to continue now to other tutorials, skip to [Next steps](#next-steps).\n", "\n", "### Stop compute instance\n", "\n", "If you're not going to use it now, stop the compute instance:\n", "\n", "1. In the studio, in the left navigation area, select **Compute**.\n", "1. In the top tabs, select **Compute instances**\n", "1. Select the compute instance in the list.\n", "1. On the top toolbar, select **Stop**.\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Next Steps\n", "Learn about deploying a model \n", "\n", "[Deploy a model](https://learn.microsoft.com/articles/machine-learning/tutorial-deploy-model).\n", "\n", "This tutorial used an online data file. To learn more about other ways to access data, see [Tutorial: Upload, access and explore your data in Azure Machine Learning](https://learn.microsoft.com/articles/machine-learning/tutorial-explore-data).\n", "\n", "If you would like to learn more about different ways to train models in Azure Machine Learning, see [What is automated machine learning (AutoML)?](https://learn.microsoft.com/articles/machine-learning/concept-automated-ml). Automated ML is a supplemental tool to reduce the amount of time a data scientist spends finding a model that works best with their data.\n", "\n", "If you would like more examples similar to this tutorial, see [**Samples**](https://learn.microsoft.com/articles/machine-learning/quickstart-create-resources#learn-from-sample-notebooks) section of studio. These same samples are available at our [GitHub examples page.](https://github.com/Azure/azureml-examples) The examples include complete Python Notebooks that you can run code and learn to train a model. You can modify and run existing scripts from the samples, containing scenarios including classification, natural language processing, and anomaly detection. \n", "\n", "To train models by creating your own custom environments using a [docker image,](how-to-manage-environments-v2.md#create-an-environment-from-a-docker-build-context)." ] } ], "metadata": { "categories": [ "SDK v2", "tutorials" ], "kernel_info": { "name": "python310-sdkv2" }, "kernelspec": { "display_name": "Python 3.10 - SDK v2", "language": "python", "name": "python310-sdkv2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.6" }, "microsoft": { "host": { "AzureML": { "notebookHasBeenCompleted": true } }, "ms_spell_check": { "ms_spell_check_language": "en" } }, "nteract": { "version": "nteract-front-end@1.0.0" } }, "nbformat": 4, "nbformat_minor": 2 }

tutorials/get-started-notebooks/train-model.ipynb (592 lines of code) (raw):