3_llmops-aifoundry/3_3_optimizing/promptflow_with_evaluation

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Execute batch groundness evaluation flow using Promptflow Python SDK\n", "\n", "### Overview\n", "\n", "Prompt flow is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality.\n", "\n", "In this handson, you will be able to:\n", "Evaluate your flows, calculate quality and performance metrics with run result datasets.\n", "Debug and iterate your flows, especially tracing interaction with LLMs with ease.\n", "In order to calculate the other metrics like accuracy, relevance score. Please refer to [Develop evaluation flow](https://microsoft.github.io/promptflow/how-to-guides/develop-a-dag-flow/develop-evaluation-flow.html) to learn how to develop an evaluation flow.\n", "\n", "#### 1. Create Promptflow client with Credential and configuration\n", "\n", "#### 2. AI Foundry batch run to get the base run data\n", "\n", "#### 3. Run Groundedness Evaluation of the Promptflow\n", "\n", "[Note] Please use `Python 3.10 - SDK v2 (azureml_py310_sdkv2)` conda environment.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2\n", "\n", "import os, sys\n", "lab_prep_dir = os.getcwd().split(\"slm-innovator-lab\")[0] + \"slm-innovator-lab/0_lab_preparation\"\n", "sys.path.append(os.path.abspath(lab_prep_dir))\n", "\n", "from common import check_kernel\n", "check_kernel()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import json\n", "import os\n", "import time\n", "\n", "# Import required libraries\n", "from promptflow.azure import PFClient\n", "from promptflow.entities import Run\n", "# Import required libraries\n", "from azure.identity import DefaultAzureCredential, EnvironmentCredential, InteractiveBrowserCredential\n", "from dotenv import load_dotenv\n", "from azure.core.exceptions import HttpResponseError\n", "\n", "load_dotenv(\"../../.env\")\n", "\n", "with open('../3_2_prototyping/config.json', 'r') as f:\n", " config = json.load(f)\n", " \n", "print(config[\"subscription_id\"])\n", "print(config[\"resource_group\"])\n", "print(config[\"workspace_name\"]) # Azure AI Foundry project name which is not the same as the Azure ML workspace name\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from tqdm import tqdm\n", "\n", "# Monitor the status of the run_result\n", "def monitor_status(pf_azure_client:PFClient, run_result:Run):\n", " with tqdm(total=3, desc=\"Running Status\", unit=\"step\") as pbar:\n", " status = pf_azure_client.runs.get(run_result).status\n", " if status == \"Preparing\":\n", " pbar.update(1)\n", " while status != \"Completed\" and status != \"Failed\":\n", " if status == \"Running\" and pbar.n < 2:\n", " pbar.update(1)\n", " print(f\"Current Status: {status}\")\n", " time.sleep(10)\n", " status = pf_azure_client.runs.get(run_result).status\n", " pbar.update(1)\n", " print(\"Promptflow Running Completed\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Create Promptflow client with Credential and configuration\n", "\n", "- Create a promptflow client with the credential and configuration. You need to set the `config.json` file with subscription_id, resource_group and workspace_name\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "try:\n", " credential = DefaultAzureCredential()\n", " # Check if given credential can get token successfully.\n", " credential.get_token(\"https://management.azure.com/.default\")\n", "except Exception as ex:\n", " # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work\n", " credential = InteractiveBrowserCredential()\n", "# if you cannot use DefaultAzureCredential and InteractiveBrowserCredential you need to set up the Managed identity in your .env file\n", "\n", "pf_azure_client = PFClient.from_config(credential=credential, path=\"../3_2_prototyping/config.json\")\n", "\n", "try:\n", " workspace = pf_azure_client.ml_client.workspaces.get(name=config[\"workspace_name\"])\n", " print(f\"Connected to Azure AI Foundry Workspace: {workspace.name}\")\n", " print(f\"Workspace Location: {workspace.location}\")\n", " print(f\"Workspace ID: {workspace.id}\")\n", "except HttpResponseError as e:\n", " print(f\"Failed to connect to Azure ML Workspace: {e}\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. AI Foundry batch run to get the base run data\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Check the exist connections\n", "\n", "- currently we only support create connection in Azure AI, ML Studio UI. Check the exiting connections in the workspace.\n", " > ✨ **_important_** <br>\n", " > Check your connection information in Azure AI Foundry Management Center\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from jinja2 import Environment, FileSystemLoader\n", "from pathlib import Path\n", "\n", "env = Environment(loader=FileSystemLoader('.'))\n", "# Read the template file\n", "template = env.get_template('./flow-template/chat-serverless.flow.dag.yaml')\n", "\n", "# Define the variables for the template with your connection names for chat serverless \n", "variables = {\n", "\t\"your_phi35_serverless_connection_name\": \"replace with your connection name\",\n", "\t\"your_gpt4o_connection_name\": \"replace with your connection name\"\n", "}\n", "\n", "rendered_content = template.render(variables)\n", "Path('../3_2_prototyping/chat-serverless/flow.dag.yaml').write_text(rendered_content)\n", "\n", "print(Path('../3_2_prototyping/chat-serverless/flow.dag.yaml').read_text()) " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from jinja2 import Environment, FileSystemLoader\n", "from pathlib import Path\n", "\n", "env = Environment(loader=FileSystemLoader('.'))\n", "\n", "# Read the template file\n", "template = env.get_template('./flow-template/evaluation.flow.dag.yaml')\n", "\n", "# Define the variables for the template with your connection names for chat serverless \n", "variables = {\n", "\t\"your_gpt4o_connection_name\": \"replace with your connection name\"\n", "}\n", "\n", "rendered_content = template.render(variables)\n", "Path('./evaluation/flow.dag.yaml').write_text(rendered_content)\n", "\n", "print(Path('./evaluation/flow.dag.yaml').read_text()) " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "flow_path = \"../3_2_prototyping/chat-serverless\"\n", "data_path = \"../3_2_prototyping/data/questions_outdoor.jsonl\"\n", "\n", "# get the context from context.json file as str and map it to the column_mapping\n", "with open('../3_2_prototyping/data/context_simple.json', 'r') as file:\n", " context = json.load(file)\n", "\n", "column_mapping = {\n", " \"question\": \"${data.question}\",\n", " \"context\": context.get(\"context\") \n", "}\n", "\n", "base_run = pf_azure_client.run(\n", " flow=flow_path,\n", " type=\"chat\",\n", " data=data_path, \n", " column_mapping=column_mapping,\n", " display_name=\"chat_serverless_context_data\",\n", " tags={\"chat_serverless_context_jsonl\": \"\", \"1st_round\": \"\"},\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "monitor_status(pf_azure_client, base_run)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "detail = pf_azure_client.get_details(base_run)\n", "\n", "detail" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Run Groundedness Evaluation of the Promptflow\n", "\n", "The eval-groundness flow is illustrating measures how grounded the model's predicted answers are against the context. Even if LLM’s responses are true, if not verifiable against context, then such responses are considered ungrounded.\n", "\n", "> 🧪 +For Your Information<br> > **Groundedness** is a measure of how well the model's responses are grounded in the context. A grounded response is one that is directly supported by the context. For example, if the context is about a dog, a grounded response would be \"Dogs are mammals.\" An ungrounded response would be \"Dogs can fly.\"\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import datetime\n", "\n", "eval_groundedness_flow_path = \"./evaluation/\"\n", "data_path = \"./data/qna_outdoor.jsonl\"\n", "\n", "with open('../3_2_prototyping/data/context_simple.json', 'r') as file:\n", " context = json.load(file)\n", "\n", "column_mapping={\n", " \"question\": \"${data.question}\",\n", " \"context\": context.get(\"context\") ,\n", " \"answer\": \"${run.outputs.gpt4o_answer}\",\n", " }\n", "eval_name = \"eval_groundedness\"\n", "now = datetime.datetime.now()\n", "timestamp = now.strftime(\"%m_%d_%H%M\")\n", "eval_name = str(eval_name + \"_\" + timestamp)\n", "\n", "eval_groundedness_result = pf_azure_client.run(\n", " flow=eval_groundedness_flow_path,\n", " data=data_path,\n", " run=base_run, # use run as the variant\n", " column_mapping=column_mapping,\n", " display_name=eval_name,\n", " name=eval_name,\n", ")\n", "\n", "\n", "\n", "# pf_azure_client.stream(eval_groundedness_result)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "monitor_status(pf_azure_client, eval_groundedness_result)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "detail = pf_azure_client.get_details(eval_groundedness_result)\n", "\n", "detail" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "azureml_py310_sdkv2", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.14" } }, "nbformat": 4, "nbformat_minor": 2 }

3_llmops-aifoundry/3_3_optimizing/promptflow_with_evaluation_code.ipynb (345 lines of code) (raw):