2-notebooks/2-agent_service/3-file-search.ipynb

{ "cells": [ { "cell_type": "markdown", "id": "intro-health-file-search", "metadata": {}, "source": [ "# 🍏 Health Resource Search Agent Tutorial 🍎\n", "\n", "Welcome to the **Health Resource Search Agent** tutorial! We'll use **Azure AI Foundry** SDKs to build an assistant that can:\n", "\n", "1. **Upload** health and recipe files into a vector store.\n", "2. **Create an Agent** with a **File Search** tool.\n", "3. **Search** these documents for relevant dietary info.\n", "4. **Answer** health and wellness questions (with disclaimers!).\n", "\n", "### ⚠️ Important Medical Disclaimer ⚠️\n", "> **All health information in this notebook is for general educational purposes only and is not a substitute for professional medical advice, diagnosis, or treatment.** Always seek the advice of a qualified healthcare professional with any questions you may have.\n", "\n", "## Prerequisites\n", "- Complete Agent basics notebook - [1-basics.ipynb](1-basics.ipynb)\n", "- **Roles** \n", " 1. **Azure AI Developer** on your Azure AI Foundry project.\n", " 2. **Storage Blob Data Contributor** on the project’s Storage account.\n", " 3. If standard agent setup is used with your own Search resource, also ensure you have **Cognitive Search Data Contributor** on that resource.\n", "\n", "## Let's Get Searching!\n", "We'll show you how to upload some sample files, create a vector store for them, then spin up an agent that can search these resources for dietary guidelines, recipes, and more. Enjoy!\n", "\n", "<img src=\"./seq-diagrams/3-file-search.png\" width=\"30%\"/>\n" ] }, { "cell_type": "markdown", "id": "initial-setup", "metadata": {}, "source": [ "## 1. Initial Setup\n", "Here we import needed libraries, load environment variables from `.env`, and initialize our **AIProjectClient**. Let's do this! 🎉" ] }, { "cell_type": "code", "execution_count": null, "id": "init-client-code", "metadata": {}, "outputs": [], "source": [ "import os\n", "import time\n", "from pathlib import Path\n", "\n", "from dotenv import load_dotenv\n", "from azure.identity import DefaultAzureCredential\n", "from azure.ai.projects import AIProjectClient\n", "from azure.ai.projects.models import (\n", " FileSearchTool,\n", " FilePurpose,\n", " MessageTextContent,\n", " MessageRole\n", ")\n", "\n", "# Load environment variables from parent .env\n", "notebook_path = Path().absolute()\n", "parent_dir = notebook_path.parent\n", "load_dotenv(parent_dir.parent / '.env')\n", "\n", "# Initialize AIProjectClient\n", "try:\n", " project_client = AIProjectClient.from_connection_string(\n", " credential=DefaultAzureCredential(),\n", " conn_str=os.environ.get(\"PROJECT_CONNECTION_STRING\")\n", " )\n", " print(\"✅ Successfully initialized AIProjectClient\")\n", "except Exception as e:\n", " print(f\"❌ Error initializing project client: {e}\")" ] }, { "cell_type": "markdown", "id": "create-sample-resources", "metadata": {}, "source": [ "## 2. Prepare Sample Files 🍲🗒\n", "We'll create some dummy .md files (for recipes and guidelines). Then we'll store them in a vector store for searching.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "create-files-code", "metadata": {}, "outputs": [], "source": [ "def create_sample_files():\n", " recipes_md = (\n", " \"\"\"# Healthy Recipes Database\\n\\n\"\n", " \"## Gluten-Free Recipes\\n\"\n", " \"1. Quinoa Bowl\\n\"\n", " \" - Ingredients: quinoa, vegetables, olive oil\\n\"\n", " \" - Instructions: Cook quinoa, add vegetables\\n\\n\"\n", " \"2. Rice Pasta with Vegetables\\n\"\n", " \" - Ingredients: rice pasta, mixed vegetables\\n\"\n", " \" - Instructions: Boil pasta, sauté vegetables\\n\\n\"\n", " \"## Diabetic-Friendly Recipes\\n\"\n", " \"1. Low-Carb Stir Fry\\n\"\n", " \" - Ingredients: chicken, vegetables, tamari sauce\\n\"\n", " \" - Instructions: Cook chicken, add vegetables\\n\\n\"\n", " \"2. Greek Salad\\n\"\n", " \" - Ingredients: cucumber, tomatoes, feta, olives\\n\"\n", " \" - Instructions: Chop vegetables, combine\\n\\n\"\n", " \"## Heart-Healthy Recipes\\n\"\n", " \"1. Baked Salmon\\n\"\n", " \" - Ingredients: salmon, lemon, herbs\\n\"\n", " \" - Instructions: Season salmon, bake\\n\\n\"\n", " \"2. Mediterranean Bowl\\n\"\n", " \" - Ingredients: chickpeas, vegetables, tahini\\n\"\n", " \" - Instructions: Combine ingredients\\n\"\"\"\n", " )\n", "\n", " guidelines_md = (\n", " \"\"\"# Dietary Guidelines\\n\\n\"\n", " \"## General Guidelines\\n\"\n", " \"- Eat a variety of foods\\n\"\n", " \"- Control portion sizes\\n\"\n", " \"- Stay hydrated\\n\\n\"\n", " \"## Special Diets\\n\"\n", " \"1. Gluten-Free Diet\\n\"\n", " \" - Avoid wheat, barley, rye\\n\"\n", " \" - Focus on naturally gluten-free foods\\n\\n\"\n", " \"2. Diabetic Diet\\n\"\n", " \" - Monitor carbohydrate intake\\n\"\n", " \" - Choose low glycemic foods\\n\\n\"\n", " \"3. Heart-Healthy Diet\\n\"\n", " \" - Limit saturated fats\\n\"\n", " \" - Choose lean proteins\\n\"\"\"\n", " )\n", "\n", " # Save to local .md files\n", " with open(\"recipes.md\", \"w\", encoding=\"utf-8\") as f:\n", " f.write(recipes_md)\n", " with open(\"guidelines.md\", \"w\", encoding=\"utf-8\") as f:\n", " f.write(guidelines_md)\n", "\n", " print(\"📄 Created sample resource files: recipes.md, guidelines.md\")\n", " return [\"recipes.md\", \"guidelines.md\"]\n", "\n", "sample_files = create_sample_files()" ] }, { "cell_type": "markdown", "id": "note-above-step-3", "metadata": {}, "source": [ "#### ✨ Note on Search Permissions\n", "When creating the vector store, you must also have **Cognitive Search Data Contributor** role on your Azure AI Search resource (if you're using the standard agent setup with your own Search resource). Missing this role will often cause a **Forbidden** error. See [Authentication Setup](../../1-introduction/1-authentication.ipynb#4-add-agent-service-permissions) for details on configuring permissions.\n" ] }, { "cell_type": "markdown", "id": "create-vector-store", "metadata": {}, "source": [ "## 3. Create a Vector Store 📚\n", "We'll upload our newly created files and group them into a single vector store for searching. This is how the agent can later find relevant text." ] }, { "cell_type": "code", "execution_count": null, "id": "vector-store-code", "metadata": {}, "outputs": [], "source": [ "def create_vector_store(files, store_name=\"my_health_resources\"):\n", " try:\n", " # Step 1: Upload files to Azure AI Agent service\n", " # Each file needs to be uploaded individually and we'll collect their IDs\n", " uploaded_ids = []\n", " for fp in files:\n", " # upload_file_and_poll ensures the upload is complete before continuing\n", " # FilePurpose.AGENTS tells the service these files are for agent usage\n", " upl = project_client.agents.upload_file_and_poll(\n", " file_path=fp,\n", " purpose=FilePurpose.AGENTS\n", " )\n", " uploaded_ids.append(upl.id)\n", " print(f\"✅ Uploaded: {fp} -> File ID: {upl.id}\")\n", "\n", " # Step 2: Create a vector store from the uploaded files\n", " # A vector store converts text into numerical vectors for semantic search\n", " # create_vector_store_and_poll waits until indexing is complete\n", " vs = project_client.agents.create_vector_store_and_poll(\n", " file_ids=uploaded_ids, # Pass all our uploaded file IDs\n", " name=store_name # Give our vector store a friendly name\n", " )\n", " print(f\"🎉 Created vector store '{store_name}', ID: {vs.id}\")\n", " return vs, uploaded_ids\n", " except Exception as e:\n", " print(f\"❌ Error creating vector store: {e}\")\n", " return None, []\n", "\n", "# Initialize empty variables to store our vector store and file IDs\n", "vector_store, file_ids = None, []\n", "\n", "# If we successfully created sample files earlier, create a vector store from them\n", "if sample_files:\n", " vector_store, file_ids = create_vector_store(sample_files, \"health_resources_example\")" ] }, { "cell_type": "markdown", "id": "create-agent-file-search", "metadata": {}, "source": [ "## 4. Create the Health Resource Agent 🔎\n", "We use a **FileSearchTool** pointing to our newly created vector store, then create the Agent with instructions about disclaimers, dietary help, etc." ] }, { "cell_type": "code", "execution_count": null, "id": "create-agent-code", "metadata": {}, "outputs": [], "source": [ "def create_health_resource_agent(vstore_id):\n", " try:\n", " # Create a FileSearchTool that will allow our agent to search through documents\n", " # FileSearchTool uses the vector store we created earlier to perform semantic search\n", " # This means the agent can find relevant content even if the exact words don't match\n", " file_search_tool = FileSearchTool(vector_store_ids=[vstore_id])\n", "\n", " # Create an AI agent that will use our search tool and follow specific instructions\n", " # The agent combines:\n", " # 1. A base LLM model (specified in environment variables or defaults to gpt-4o-mini)\n", " # 2. The ability to search our health documents using the FileSearchTool\n", " # 3. Custom instructions that shape how it responds to questions\n", " agent = project_client.agents.create_agent(\n", " # Specify which LLM model to use - fallback to gpt-4o-mini if not set\n", " model=os.environ.get(\"MODEL_DEPLOYMENT_NAME\", \"gpt-4o-mini\"),\n", " \n", " # Give our agent a descriptive name\n", " name=\"health-search-agent\",\n", " \n", " # These instructions act like a personality and rule set for our agent\n", " # They ensure consistent, responsible health advice\n", " instructions=\"\"\"\n", " You are a health resource advisor with access to dietary and recipe files.\n", " You:\n", " 1. Always present disclaimers (you're not a doctor!)\n", " 2. Provide references to the files when possible\n", " 3. Focus on general nutrition or recipe tips.\n", " 4. Encourage professional consultation for more detailed advice.\n", " \"\"\",\n", " \n", " # Connect the search tool's interface definition and required resources\n", " # This gives the agent the ability to actually perform searches\n", " tools=file_search_tool.definitions,\n", " tool_resources=file_search_tool.resources\n", " )\n", " print(f\"🎉 Created health resource agent, ID: {agent.id}\")\n", " return agent\n", " except Exception as e:\n", " print(f\"❌ Error creating health resource agent: {e}\")\n", " return None\n", "\n", "# Initialize our agent variable\n", "health_agent = None\n", "\n", "# Only create the agent if we successfully created a vector store earlier\n", "if vector_store:\n", " health_agent = create_health_resource_agent(vector_store.id)" ] }, { "cell_type": "markdown", "id": "health-search-conversation", "metadata": {}, "source": [ "## 5. Searching Health Resources 🏋️👩‍🍳\n", "We'll create a new conversation thread and ask queries like “Gluten-free recipe ideas?” or “Heart-healthy meal plan?” The agent will do file search on the vector store to find relevant info." ] }, { "cell_type": "code", "execution_count": null, "id": "search-code", "metadata": {}, "outputs": [], "source": [ "def create_search_thread(agent):\n", " try:\n", " # In Azure AI Agent service, conversations happen in \"threads\"\n", " # A thread maintains the context and history of a conversation\n", " # Here we create a new empty thread to start a fresh conversation\n", " thread = project_client.agents.create_thread()\n", " print(f\"📝 Created new search thread, ID: {thread.id}\")\n", " return thread\n", " except Exception as e:\n", " print(f\"❌ Error creating search thread: {e}\")\n", " return None\n", "\n", "def ask_search_question(thread_id, agent_id, user_question):\n", " try:\n", " # First, we add the user's question as a message to the thread\n", " # This is like typing a message in a chat interface\n", " message = project_client.agents.create_message(\n", " thread_id=thread_id,\n", " role=\"user\", # Specifies this message is from the user\n", " content=user_question\n", " )\n", " print(f\"🔎 Searching: '{user_question}'\")\n", "\n", " # Next, we create and process a \"run\" - this is where the magic happens!\n", " # The agent will:\n", " # 1. Read the user's question\n", " # 2. Use the FileSearchTool to search our health documents\n", " # 3. Generate a helpful response based on the search results\n", " run = project_client.agents.create_and_process_run(\n", " thread_id=thread_id,\n", " agent_id=agent_id\n", " )\n", " print(f\"🤖 Run finished with status: {run.status}\")\n", " if run.last_error:\n", " print(f\"Error details: {run.last_error}\")\n", " return run\n", " except Exception as e:\n", " print(f\"❌ Error searching question: {e}\")\n", " return None\n", "\n", "# Now let's test our search functionality!\n", "# First check if we have our health agent available\n", "if health_agent:\n", " # Create a new conversation thread\n", " search_thread = create_search_thread(health_agent)\n", "\n", " if search_thread:\n", " # Define some test questions that demonstrate different types of health queries\n", " # The agent will search our uploaded health documents to answer these\n", " queries = [\n", " \"Could you suggest a gluten-free lunch recipe?\",\n", " \"Show me some heart-healthy meal ideas.\",\n", " \"What guidelines do you have for someone with diabetes?\"\n", " ]\n", "\n", " # Process each query one at a time\n", " # The agent will maintain conversation context between questions\n", " for q in queries:\n", " ask_search_question(search_thread.id, health_agent.id, q)" ] }, { "cell_type": "markdown", "id": "view-search-results", "metadata": {}, "source": [ "## 6. View Results & Citations 📄\n", "We'll read the conversation thread to see how the agent responded and see if it cited the correct files." ] }, { "cell_type": "code", "execution_count": 6, "id": "view-search-results-code", "metadata": {}, "outputs": [], "source": [ "def display_thread_messages(thread_id):\n", " try:\n", " # Retrieve all messages in this conversation thread using the Azure AI Agent SDK\n", " # Messages contain the back-and-forth between user and AI agent\n", " messages = project_client.agents.list_messages(thread_id=thread_id)\n", "\n", " # Display the conversation history in reverse chronological order (newest first)\n", " print(\"\\n🗣️ Conversation so far:\")\n", " for m in reversed(messages.data):\n", " # Each message may have multiple content pieces\n", " # We're interested in the text content (vs other types like images)\n", " if m.content:\n", " last_content = m.content[-1]\n", " if hasattr(last_content, \"text\"):\n", " # Print who said what (ASSISTANT or USER) along with their message\n", " print(f\"{m.role.upper()}: {last_content.text.value}\\n\")\n", "\n", " # The agent can cite specific passages from the uploaded documents\n", " # Let's check if it referenced any files in its responses\n", " print(\"\\n📎 Checking for citations...\")\n", " for c in messages.file_citation_annotations:\n", " # Each citation includes the quoted text and which file it came from\n", " # This helps users verify the agent's sources\n", " print(f\"- Citation snippet: '{c.text}' from file ID: {c.file_citation['file_id']}\")\n", "\n", " except Exception as e:\n", " # Gracefully handle any errors that might occur when displaying messages\n", " print(f\"❌ Error displaying messages: {e}\")\n", "\n", "# Display the conversation history for our search thread\n", "if search_thread:\n", " display_thread_messages(search_thread.id)" ] }, { "cell_type": "markdown", "id": "cleanup", "metadata": {}, "source": [ "## 7. Cleanup & Best Practices 🧹\n", "We'll optionally remove the vector store, the uploaded files, and the agent. In a production environment, you might keep them around longer. Meanwhile, here are some tips:\n", "\n", "1. **Resource Management**\n", " - Keep files grouped by category, regularly prune old or irrelevant files.\n", " - Clear out test agents or vector stores once you're done.\n", "\n", "2. **Search Queries**\n", " - Provide precise or multi-part queries.\n", " - Consider synonyms or alternative keywords (\"gluten-free\" vs \"celiac\").\n", " \n", "3. **Health Information**\n", " - Always disclaim that you are not a medical professional.\n", " - Encourage users to see doctors for specific diagnoses.\n", "\n", "4. **Performance**\n", " - Keep an eye on vector store size.\n", " - Evaluate search accuracy with `azure-ai-evaluation`!\n" ] }, { "cell_type": "code", "execution_count": null, "id": "cleanup-code", "metadata": {}, "outputs": [], "source": [ "def cleanup_all():\n", " try:\n", " # Check if we have a vector store and delete it\n", " # Vector stores are where we store the embeddings (numerical representations) \n", " # of our documents for semantic search\n", " if 'vector_store' in globals() and vector_store:\n", " project_client.agents.delete_vector_store(vector_store.id)\n", " print(\"🗑️ Deleted vector store.\")\n", "\n", " # Remove any files we uploaded to Azure AI Search\n", " # These were the documents our agent used as its knowledge base\n", " if 'file_ids' in globals() and file_ids:\n", " for fid in file_ids:\n", " project_client.agents.delete_file(fid)\n", " print(\"🗑️ Deleted uploaded files from the service.\")\n", "\n", " # Delete the AI agent we created\n", " # This frees up resources since we're done with our demo\n", " if 'health_agent' in globals() and health_agent:\n", " project_client.agents.delete_agent(health_agent.id)\n", " print(\"🗑️ Deleted health resource agent.\")\n", "\n", " # Clean up any local files we created during the demo\n", " # This keeps our workspace tidy\n", " if 'sample_files' in globals() and sample_files:\n", " for sf in sample_files:\n", " if os.path.exists(sf):\n", " os.remove(sf)\n", " print(\"🗑️ Deleted local sample files.\")\n", "\n", " except Exception as e:\n", " # If anything goes wrong during cleanup, we'll see what happened\n", " print(f\"❌ Error during cleanup: {e}\")\n", "\n", "# Run our cleanup function to remove all resources we created\n", "# This is good practice in a tutorial/demo environment\n", "cleanup_all()" ] }, { "cell_type": "markdown", "id": "conclusion", "metadata": {}, "source": [ "# Congratulations! 🎉\n", "You've created a **Health Resource Search Agent** that:\n", "1. Uses a **Vector Store** to store sample recipes & guidelines.\n", "2. **Searches** them to answer queries.\n", "3. **Provides disclaimers** reminding users to consult professionals.\n", "\n", "Feel free to adapt this approach for your own corporate documents, product manuals, or custom health resources.\n", "\n", "Happy Searching! 🎉\n", "\n", "#### Let's proceed to [4-bing_grounding.ipynb](4-bing_grounding.ipynb)" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.11" } }, "nbformat": 4, "nbformat_minor": 5 }

2-notebooks/2-agent_service/3-file-search.ipynb (515 lines of code) (raw):