sdk/python/foundation-models/system/finetune/question-answering/extractive-qa.ipynb

{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Question Answering - Extractive Q&A with the SQUAD (Wikipedia Q&A) dataset\n", "\n", "This sample shows how to use `question-answering` components from the `azureml` system registry to fine tune a model to extract answers from a given context using the SQUAD dataset. We then deploy it to an online endpoint for real time inference. The model is trained on tiny sample of the dataset with a small number of epochs to illustrate the fine tuning approach.\n", "\n", "### Training data\n", "We will use the [SQUAD](https://huggingface.co/datasets/squad) dataset. The [original source](https://rajpurkar.github.io/SQuAD-explorer/) of dataset describes it as follows: _\"Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.\"_\n", "\n", "### Model\n", "Models that can perform the `fill-mask` task are generally good foundation models to fine tune for `question-answering`, specifically the extractive Q&A type. We will use the `bert-base-uncased` model in this notebook. If you opened this notebook from a specific model card, remember to replace the specific model name. Optionally, if you need to fine tune a model that is available on HuggingFace, but not available in `azureml` system registry, you can either [import](https://github.com/Azure/azureml-examples) the model or use the `huggingface_id` parameter instruct the components to pull the model directly from HuggingFace. \n", "\n", "### Outline\n", "* Setup pre-requisites such as compute.\n", "* Pick a model to fine tune.\n", "* Pick and explore training data.\n", "* Configure the fine tuning job.\n", "* Run the fine tuning job.\n", "* Register the fine tuned model. \n", "* Deploy the fine tuned model for real time inference.\n", "* Clean up resources. \n", "\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Setup pre-requisites\n", "* Install dependencies\n", "* Connect to AzureML Workspace. Learn more at [set up SDK authentication](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-setup-authentication?tabs=sdk). Replace `<WORKSPACE_NAME>`, `<RESOURCE_GROUP>` and `<SUBSCRIPTION_ID>` below.\n", "* Connect to `azureml` system registry\n", "* Set an optional experiment name\n", "* Check or create compute. A single GPU node can have multiple GPU cards. For example, in one node of `Standard_NC24rs_v3` there are 4 NVIDIA V100 GPUs while in `Standard_NC12s_v3`, there are 2 NVIDIA V100 GPUs. Refer to the [docs](https://learn.microsoft.com/en-us/azure/virtual-machines/sizes-gpu) for this information. The number of GPU cards per node is set in the param `gpus_per_node` below. Setting this value correctly will ensure utilization of all GPUs in the node. The recommended GPU compute SKUs can be found [here](https://learn.microsoft.com/en-us/azure/virtual-machines/ncv3-series) and [here](https://learn.microsoft.com/en-us/azure/virtual-machines/ndv2-series)." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Install dependencies by running below cell. This is not an optional step if running in a new environment." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install azure-ai-ml\n", "%pip install azure-identity\n", "%pip install datasets==2.9.0\n", "%pip install mlflow\n", "%pip install azureml-mlflow" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from azure.ai.ml import MLClient\n", "from azure.identity import (\n", " DefaultAzureCredential,\n", " InteractiveBrowserCredential,\n", " ClientSecretCredential,\n", ")\n", "from azure.ai.ml.entities import AmlCompute\n", "import time\n", "\n", "try:\n", " credential = DefaultAzureCredential()\n", " credential.get_token(\"https://management.azure.com/.default\")\n", "except Exception as ex:\n", " credential = InteractiveBrowserCredential()\n", "\n", "try:\n", " workspace_ml_client = MLClient.from_config(credential=credential)\n", "except:\n", " workspace_ml_client = MLClient(\n", " credential,\n", " subscription_id=\"<SUBSCRIPTION_ID>\",\n", " resource_group_name=\"<RESOURCE_GROUP>\",\n", " workspace_name=\"<WORKSPACE_NAME>\",\n", " )\n", "\n", "# the models, fine tuning pipelines and environments are available in the AzureML system registry, \"azureml\"\n", "registry_ml_client = MLClient(credential, registry_name=\"azureml\")\n", "\n", "experiment_name = \"question-answering-extractive-qna\"\n", "\n", "# genrating a unique timestamp that can be used for names and versions that need to be unique\n", "timestamp = str(int(time.time()))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Pick a foundation model to fine tune\n", "\n", "Models that support `fill-mask` tasks are good candidates to fine tune for extractive Q&A style `question answering`. You can browse these models in the Model Catalog in the AzureML Studio, filtering by the `fill-mask` task. In this example, we use the `bert-base-uncased` model. If you have opened this notebook for a different model, replace the model name and version accordingly. \n", "\n", "Note the model id property of the model. This will be passed as input to the fine tuning job. This is also available as the `Asset ID` field in model details page in AzureML Studio Model Catalog. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model_name = \"bert-base-uncased\"\n", "foundation_model = registry_ml_client.models.get(model_name, label=\"latest\")\n", "print(\n", " \"\\n\\nUsing model name: {0}, version: {1}, id: {2} for fine tuning\".format(\n", " foundation_model.name, foundation_model.version, foundation_model.id\n", " )\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Create a compute to be used with the job\n", "\n", "The finetune job works `ONLY` with `GPU` compute. The size of the compute depends on how big the model is and in most cases it becomes tricky to identify the right compute for the job. In this cell, we guide the user to select the right compute for the job.\n", "\n", "`NOTE1` The computes listed below work with the most optimized configuration. Any changes to the configuration might lead to Cuda Out Of Memory error. In such cases, try to upgrade the compute to a bigger compute size.\n", "\n", "`NOTE2` While selecting the compute_cluster_size below, make sure the compute is available in your resource group. If a particular compute is not available you can make a request to get access to the compute resources." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import ast\n", "\n", "if \"computes_allow_list\" in foundation_model.tags:\n", " computes_allow_list = ast.literal_eval(\n", " foundation_model.tags[\"computes_allow_list\"]\n", " ) # convert string to python list\n", " print(f\"Please create a compute from the above list - {computes_allow_list}\")\n", "else:\n", " computes_allow_list = None\n", " print(\"Computes allow list is not part of model tags\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# If you have a specific compute size to work with change it here. By default we use the 1 x V100 compute from the above list\n", "compute_cluster_size = \"Standard_NC6s_v3\"\n", "\n", "# If you already have a gpu cluster, mention it here. Else will create a new one with the name 'gpu-cluster-big'\n", "compute_cluster = \"gpu-cluster-big\"\n", "\n", "try:\n", " compute = workspace_ml_client.compute.get(compute_cluster)\n", " print(\"The compute cluster already exists! Reusing it for the current run\")\n", "except Exception as ex:\n", " print(\n", " f\"Looks like the compute cluster doesn't exist. Creating a new one with compute size {compute_cluster_size}!\"\n", " )\n", " try:\n", " print(\"Attempt #1 - Trying to create a dedicated compute\")\n", " compute = AmlCompute(\n", " name=compute_cluster,\n", " size=compute_cluster_size,\n", " tier=\"Dedicated\",\n", " max_instances=2, # For multi node training set this to an integer value more than 1\n", " )\n", " workspace_ml_client.compute.begin_create_or_update(compute).wait()\n", " except Exception as e:\n", " try:\n", " print(\n", " \"Attempt #2 - Trying to create a low priority compute. Since this is a low priority compute, the job could get pre-empted before completion.\"\n", " )\n", " compute = AmlCompute(\n", " name=compute_cluster,\n", " size=compute_cluster_size,\n", " tier=\"LowPriority\",\n", " max_instances=2, # For multi node training set this to an integer value more than 1\n", " )\n", " workspace_ml_client.compute.begin_create_or_update(compute).wait()\n", " except Exception as e:\n", " print(e)\n", " raise ValueError(\n", " f\"WARNING! Compute size {compute_cluster_size} not available in workspace\"\n", " )\n", "\n", "\n", "# Sanity check on the created compute\n", "compute = workspace_ml_client.compute.get(compute_cluster)\n", "if compute.provisioning_state.lower() == \"failed\":\n", " raise ValueError(\n", " f\"Provisioning failed, Compute '{compute_cluster}' is in failed state. \"\n", " f\"please try creating a different compute\"\n", " )\n", "\n", "if computes_allow_list is not None:\n", " computes_allow_list_lower_case = [x.lower() for x in computes_allow_list]\n", " if compute.size.lower() not in computes_allow_list_lower_case:\n", " raise ValueError(\n", " f\"VM size {compute.size} is not in the allow-listed computes for finetuning\"\n", " )\n", "else:\n", " # Computes with K80 GPUs are not supported\n", " unsupported_gpu_vm_list = [\n", " \"standard_nc6\",\n", " \"standard_nc12\",\n", " \"standard_nc24\",\n", " \"standard_nc24r\",\n", " ]\n", " if compute.size.lower() in unsupported_gpu_vm_list:\n", " raise ValueError(\n", " f\"VM size {compute.size} is currently not supported for finetuning\"\n", " )\n", "\n", "\n", "# This is the number of GPUs in a single node of the selected 'vm_size' compute.\n", "# Setting this to less than the number of GPUs will result in underutilized GPUs, taking longer to train.\n", "# Setting this to more than the number of GPUs will result in an error.\n", "gpu_count_found = False\n", "workspace_compute_sku_list = workspace_ml_client.compute.list_sizes()\n", "available_sku_sizes = []\n", "for compute_sku in workspace_compute_sku_list:\n", " available_sku_sizes.append(compute_sku.name)\n", " if compute_sku.name.lower() == compute.size.lower():\n", " gpus_per_node = compute_sku.gpus\n", " gpu_count_found = True\n", "# if gpu_count_found not found, then print an error\n", "if gpu_count_found:\n", " print(f\"Number of GPU's in compute {compute.size}: {gpus_per_node}\")\n", "else:\n", " raise ValueError(\n", " f\"Number of GPU's in compute {compute.size} not found. Available skus are: {available_sku_sizes}.\"\n", " f\"This should not happen. Please check the selected compute cluster: {compute_cluster} and try again.\"\n", " )" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 4. Pick the dataset for fine-tuning the model\n", "\n", "We use the [SQUAD](https://huggingface.co/datasets/squad) dataset. The next few cells show basic data preparation for fine tuning:\n", "* Visualize some data rows. Take note of the dataset fields: `question`, `context`, `answers`, `id` and `title`. The `answers` field has `start_key` and `text` fields in json format inside the `answers` field . The keys `question` and `context`, `answers`, `answer_start` and `text` are the relevant fields that need to be mapped to the parameters of the fine tuning pipeline.\n", "* The dataset does not have a test split, split test into two halves, one for test and other for validation.\n", "* We want this sample to run quickly, so save smaller `train` and `validation` files containing 5% of the original. This means the fine tuned model will have lower accuracy, hence it should not be put to real-world use. \n", "\n", "##### Here is an example of how the data should look like\n", "\n", "| Question | Context | Answers |\n", "| :- | :- | :- |\n", "| What does Phosphorylation do? | After a chloroplast polypeptide is synthesized on a ribosome in the cytosol, an enzyme specific to chloroplast proteins phosphorylates, or adds a phosphate group to many (but not all) of them in their transit sequences. Phosphorylation helps many proteins bind the polypeptide, keeping it from folding prematurely. This is important because it prevents chloroplast proteins from assuming their active form and carrying out their chloroplast functions in the wrong place\\u2014the cytosol. At the same time, they have to keep just enough shape so that they can be recognized by the chloroplast. These proteins also help the polypeptide get imported into the chloroplast. | {\"text”: [\"helps many proteins bind the polypeptide\",\"helps many proteins bind the polypeptide\", \"helps many proteins bind the polypeptide\"], \"answer_start\": [236,236,236]} |\n", "| What is the basic unit of organization within the UMC? | The Annual Conference, roughly the equivalent of a diocese in the Anglican Communion and the Roman Catholic Church or a synod in some Lutheran denominations such as the Evangelical Lutheran Church in America, is the basic unit of organization within the UMC. The term Annual Conference is often used to refer to the geographical area it covers as well as the frequency of meeting. Clergy are members of their Annual Conference rather than of any local congregation, and are appointed to a local church or other charge annually by the conference's resident Bishop at the meeting of the Annual Conference. In many ways, the United Methodist Church operates in a connectional organization of the Annual Conferences, and actions taken by one conference are not binding upon another. | {\"text\": [\"The Annual Conference\",\"synod\",\"The Annual Conference\"],\"answer_start\": [0,120,0]} |\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# download the dataset using the helper script. This needs datasets library: https://pypi.org/project/datasets/\n", "import os\n", "\n", "exit_status = os.system(\"python ./download-dataset.py --download_dir squad-dataset\")\n", "if exit_status != 0:\n", " raise Exception(\"Error downloading dataset\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# load the train.jsonl and validation.jsonl files from the ./squad-dataset/ folder and show first 5 rows\n", "import pandas as pd\n", "\n", "pd.set_option(\n", " \"display.max_colwidth\", 0\n", ") # set the max column width to 0 to display the full text\n", "train_df = pd.read_json(\"squad-dataset/train.jsonl\", lines=True)\n", "validation_df = pd.read_json(\"squad-dataset/validation.jsonl\", lines=True)\n", "train_df.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# change the frac parameter to control the number of examples to be saved\n", "# save a fraction of the rows from the validation and test dataframes into files with small_ prefix in the ./squad-dataset folder\n", "frac = 1\n", "train_df.sample(frac=frac).to_json(\n", " \"./squad-dataset/small_train.jsonl\", orient=\"records\", lines=True\n", ")\n", "# the original dataset does not have a test split, so split the validation dataframe into validation and test dataframes equally\n", "validation_df, test_df = (\n", " validation_df[: len(validation_df) // 2],\n", " validation_df[len(validation_df) // 2 :],\n", ")\n", "# save 5% of the rows from the validation and test dataframes into files with small_ prefix in the ./squad-dataset folder\n", "validation_df.sample(frac=frac).to_json(\n", " \"./squad-dataset/small_validation.jsonl\", orient=\"records\", lines=True\n", ")\n", "test_df.sample(frac=frac).to_json(\n", " \"./squad-dataset/small_test.jsonl\", orient=\"records\", lines=True\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 5. Submit the fine tuning job using the the model and data as inputs\n", " \n", "Create the job that uses the `question-answering` pipeline component. [Learn more](https://github.com/Azure/azureml-assets/blob/main/training/finetune_acft_hf_nlp/components/pipeline_components/question_answering/README.md) about all the parameters supported for fine tuning." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Define finetune parameters\n", "\n", "Finetune parameters can be grouped into 2 categories - training parameters, optimization parameters\n", "\n", "Training parameters define the training aspects such as - \n", "1. the optimizer, scheduler to use\n", "2. the metric to optimize the finetune\n", "3. number of training steps and the batch size\n", "and so on\n", "\n", "Optimization parameters help in optimizing the GPU memory and effectively using the compute resources. Below are few of the parameters that belong to this category. _The optimization parameters differs for each model and are packaged with the model to handle these variations._\n", "1. enable the deepspeed, ORT and LoRA\n", "2. enable mixed precision training\n", "2. enable multi-node training " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Training parameters\n", "training_parameters = dict(\n", " num_train_epochs=3,\n", " per_device_train_batch_size=1,\n", " per_device_eval_batch_size=1,\n", " learning_rate=2e-5,\n", " metric_for_best_model=\"exact\",\n", ")\n", "print(f\"The following training parameters are enabled - {training_parameters}\")\n", "\n", "# Optimization parameters - As these parameters are packaged with the model itself, lets retrieve those parameters\n", "if \"model_specific_defaults\" in foundation_model.tags:\n", " optimization_parameters = ast.literal_eval(\n", " foundation_model.tags[\"model_specific_defaults\"]\n", " ) # convert string to python dict\n", "else:\n", " optimization_parameters = dict(\n", " apply_lora=\"true\", apply_deepspeed=\"true\", apply_ort=\"true\"\n", " )\n", "print(f\"The following optimizations are enabled - {optimization_parameters}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from azure.ai.ml.dsl import pipeline\n", "from azure.ai.ml.entities import CommandComponent, PipelineComponent, Job, Component\n", "from azure.ai.ml import PyTorchDistribution, Input\n", "\n", "# fetch the pipeline component\n", "pipeline_component_func = registry_ml_client.components.get(\n", " name=\"question_answering_pipeline\", label=\"latest\"\n", ")\n", "\n", "\n", "# define the pipeline job\n", "@pipeline()\n", "def create_pipeline():\n", " question_answering_pipeline = pipeline_component_func(\n", " # specify the foundation model available in the azureml system registry id identified in step #3\n", " mlflow_model_path=foundation_model.id,\n", " # huggingface_id = 'bert-base-uncased', # if you want to use a huggingface model, uncomment this line and comment the above line\n", " compute_model_import=compute_cluster,\n", " compute_preprocess=compute_cluster,\n", " compute_finetune=compute_cluster,\n", " compute_model_evaluation=compute_cluster,\n", " # map the dataset splits to parameters\n", " train_file_path=Input(\n", " type=\"uri_file\", path=\"./squad-dataset/small_train.jsonl\"\n", " ),\n", " validation_file_path=Input(\n", " type=\"uri_file\", path=\"./squad-dataset/small_validation.jsonl\"\n", " ),\n", " test_file_path=Input(type=\"uri_file\", path=\"./squad-dataset/small_test.jsonl\"),\n", " evaluation_config=Input(\n", " type=\"uri_file\", path=\"./question-answering-config.json\"\n", " ),\n", " # The following parameters map to the dataset fields\n", " # the question whose answer needs to be extracted from the provided context\n", " # question_key parameter maps to the \"question\" field in the SQuAD dataset\n", " question_key=\"question\",\n", " # the context that contains the answer to the question\n", " # context_key parameter maps to the \"context\" field in the SQuAD dataset\n", " context_key=\"context\",\n", " # The value of this field is text in json format with two nested keys, answer_start_key and answer_text_key with their corresponding values\n", " # answers_key parameter maps to the \"answers\" field in the SQuAD dataset\n", " answers_key=\"answers\",\n", " # Refers to the position where the answer beings in context. Needs a value that maps to a nested key in the values of the answers_key parameter.\n", " # in the SQuAD dataset, the answer_start_key maps \"answer_start\" under \"answer\"\n", " answer_start_key=\"answer_start\",\n", " # Contains the answer to the question. Needs a value that maps to a nested key in the values of the answers_key parameter\n", " # in the SQuAD dataset, the answer_text_key maps to \"text\" under \"answer\"\n", " answer_text_key=\"text\",\n", " # training settings\n", " number_of_gpu_to_use_finetuning=gpus_per_node, # set to the number of GPUs available in the compute\n", " **training_parameters,\n", " **optimization_parameters\n", " )\n", " return {\n", " # map the output of the fine tuning job to the output of the pipeline job so that we can easily register the fine tuned model\n", " # registering the model is required to deploy the model to an online or batch endpoint\n", " \"trained_model\": question_answering_pipeline.outputs.mlflow_model_folder\n", " }\n", "\n", "\n", "pipeline_object = create_pipeline()\n", "\n", "# don't use cached results from previous jobs\n", "pipeline_object.settings.force_rerun = True\n", "\n", "# set continue on step failure to False\n", "pipeline_object.settings.continue_on_step_failure = False" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Validate the pipeline against data and compute" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# comment this section to disable validation\n# Makesure to turn off the validation if your data is too big. Alternatively, validate the run with small data before launching runs with large datasets\n", "\n", "%run ../pipeline_validations/common.ipynb\n", "\n", "validate_pipeline(pipeline_object, workspace_ml_client)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Submit the job" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# submit the pipeline job\n", "pipeline_job = workspace_ml_client.jobs.create_or_update(\n", " pipeline_object, experiment_name=experiment_name\n", ")\n", "# wait for the pipeline job to complete\n", "workspace_ml_client.jobs.stream(pipeline_job.name)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 6. Review training and evaluation metrics\n", "Viewing the job in AzureML studio is the best way to analyze logs, metrics and outputs of jobs. You can create custom charts and compare metics across different jobs. See https://learn.microsoft.com/en-us/azure/machine-learning/how-to-log-view-metrics?tabs=interactive#view-jobsruns-information-in-the-studio to learn more. \n", "\n", "However, we may need to access and review metrics programmatically for which we will use MLflow, which is the recommended client for logging and querying metrics." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import mlflow, json\n", "\n", "mlflow_tracking_uri = workspace_ml_client.workspaces.get(\n", " workspace_ml_client.workspace_name\n", ").mlflow_tracking_uri\n", "mlflow.set_tracking_uri(mlflow_tracking_uri)\n", "# concat 'tags.mlflow.rootRunId=' and pipeline_job.name in single quotes as filter variable\n", "filter = \"tags.mlflow.rootRunId='\" + pipeline_job.name + \"'\"\n", "runs = mlflow.search_runs(\n", " experiment_names=[experiment_name], filter_string=filter, output_format=\"list\"\n", ")\n", "training_run = None\n", "evaluation_run = None\n", "# get the training and evaluation runs.\n", "# using a hacky way till 'Bug 2320997: not able to show eval metrics in FT notebooks - mlflow client now showing display names' is fixed\n", "for run in runs:\n", " # check if run.data.metrics.epoch exists\n", " if \"epoch\" in run.data.metrics:\n", " training_run = run\n", " # else, check if run.data.metrics.accuracy exists\n", " elif \"exact_match\" in run.data.metrics:\n", " evaluation_run = run" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if training_run:\n", " print(\"Training metrics:\\n\\n\")\n", " print(json.dumps(training_run.data.metrics, indent=2))\n", "else:\n", " print(\"No Training job found\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "if evaluation_run:\n", " print(\"Evaluation metrics:\\n\\n\")\n", " print(json.dumps(evaluation_run.data.metrics, indent=2))\n", "else:\n", " print(\"No Evaluation job found\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 7. Register the fine tuned model with the workspace\n", "\n", "We will register the model from the output of the fine tuning job. This will track lineage between the fine tuned model and the fine tuning job. The fine tuning job, further, tracks lineage to the foundation model, data and training code." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from azure.ai.ml.entities import Model\n", "from azure.ai.ml.constants import AssetTypes\n", "\n", "# check if the `trained_model` output is available\n", "print(\"pipeline job outputs: \", workspace_ml_client.jobs.get(pipeline_job.name).outputs)\n", "\n", "# fetch the model from pipeline job output - not working, hence fetching from fine tune child job\n", "model_path_from_job = \"azureml://jobs/{0}/outputs/{1}\".format(\n", " pipeline_job.name, \"trained_model\"\n", ")\n", "\n", "finetuned_model_name = model_name + \"-extractive-qna\"\n", "finetuned_model_name = finetuned_model_name.replace(\"/\", \"-\")\n", "print(\"path to register model: \", model_path_from_job)\n", "prepare_to_register_model = Model(\n", " path=model_path_from_job,\n", " type=AssetTypes.MLFLOW_MODEL,\n", " name=finetuned_model_name,\n", " version=timestamp, # use timestamp as version to avoid version conflict\n", " description=model_name + \" fine tuned model for extractive Q&A\",\n", ")\n", "print(\"prepare to register model: \\n\", prepare_to_register_model)\n", "# register the model from pipeline job output\n", "registered_model = workspace_ml_client.models.create_or_update(\n", " prepare_to_register_model\n", ")\n", "print(\"registered model: \\n\", registered_model)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 8. Deploy the fine tuned model to an online endpoint\n", "Online endpoints give a durable REST API that can be used to integrate with applications that need to use the model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import time, sys\n", "from azure.ai.ml.entities import (\n", " ManagedOnlineEndpoint,\n", " ManagedOnlineDeployment,\n", " ProbeSettings,\n", ")\n", "\n", "# Create online endpoint - endpoint names need to be unique in a region, hence using timestamp to create unique endpoint name\n", "\n", "online_endpoint_name = \"ext-qna-\" + timestamp\n", "# create an online endpoint\n", "endpoint = ManagedOnlineEndpoint(\n", " name=online_endpoint_name,\n", " description=\"Online endpoint for \"\n", " + registered_model.name\n", " + \", fine tuned model for emotion detection\",\n", " auth_mode=\"key\",\n", ")\n", "workspace_ml_client.begin_create_or_update(endpoint).wait()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "You can find here the list of SKU's supported for deployment - [Managed online endpoints SKU list](https://learn.microsoft.com/en-us/azure/machine-learning/reference-managed-online-endpoints-vm-sku-list)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create a deployment\n", "demo_deployment = ManagedOnlineDeployment(\n", " name=\"demo\",\n", " endpoint_name=online_endpoint_name,\n", " model=registered_model.id,\n", " instance_type=\"Standard_DS3_v2\",\n", " instance_count=1,\n", " liveness_probe=ProbeSettings(initial_delay=600),\n", ")\n", "workspace_ml_client.online_deployments.begin_create_or_update(demo_deployment).wait()\n", "endpoint.traffic = {\"demo\": 100}\n", "workspace_ml_client.begin_create_or_update(endpoint).result()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 9. Test the endpoint with sample data\n", "\n", "We will fetch some sample data from the test dataset and submit to online endpoint for inference. We will then show the display the scored labels alongside the ground truth labels" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# read ./squad-dataset/small_test.jsonl into a pandas dataframe\n", "import pandas as pd\n", "import json\n", "\n", "test_df = pd.read_json(\"./squad-dataset/small_test.jsonl\", orient=\"records\", lines=True)\n", "# take 10 random samples\n", "test_df = test_df.sample(n=10)\n", "# rebuild index\n", "test_df.reset_index(drop=True, inplace=True)\n", "# flatten the json object in the \"answer\" column with the keys \"answer_start\" and \"text\"\n", "json_struct = json.loads(test_df.to_json(orient=\"records\"))\n", "test_df = pd.json_normalize(json_struct)\n", "# drop id and title columns\n", "test_df = test_df.drop(columns=[\"id\", \"title\"])\n", "test_df.head(10)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# create a json object with \"inputs\" as key and a list of json objects with \"question\" and \"context\" as keys\n", "test_df_copy = test_df[[\"question\", \"context\"]]\n", "test_json = {\"input_data\": test_df_copy.to_dict(\"split\")}\n", "print(test_json)\n", "# write the json object to a file named sample_score.json in the ./squad-dataset folder\n", "with open(\"./squad-dataset/sample_score.json\", \"w\") as f:\n", " json.dump(test_json, f)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# score the sample_score.json file using the online endpoint with the azureml endpoint invoke method\n", "response = workspace_ml_client.online_endpoints.invoke(\n", " endpoint_name=online_endpoint_name,\n", " deployment_name=\"demo\",\n", " request_file=\"./squad-dataset/sample_score.json\",\n", ")\n", "print(\"raw response: \\n\", response, \"\\n\")\n", "# convert the response to a pandas dataframe and rename the label column as scored_label\n", "response_df = pd.read_json(response)\n", "response_df = response_df.rename(columns={0: \"scored_answer\"})\n", "response_df.head(10)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# merge the test dataframe and the response dataframe on the index\n", "merged_df = pd.merge(test_df, response_df, left_index=True, right_index=True)\n", "# drop the answers.answer_start, start and end columns and rename the answer column to scored_answer\n", "merged_df = merged_df.drop(columns=[\"answers.answer_start\"])\n", "# rename the answers.text column to ground_truth_answers\n", "merged_df = merged_df.rename(columns={\"answers.text\": \"ground_truth_answers\"})\n", "merged_df.head(10)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 10. Delete the online endpoint\n", "Don't forget to delete the online endpoint, else you will leave the billing meter running for the compute used by the endpoint" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "workspace_ml_client.online_endpoints.begin_delete(name=online_endpoint_name).wait()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.16" } }, "nbformat": 4, "nbformat_minor": 2 }

sdk/python/foundation-models/system/finetune/question-answering/extractive-qa.ipynb (810 lines of code) (raw):