2-notebooks/2-agent_service/3-file-search.ipynb (515 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"id": "intro-health-file-search",
"metadata": {},
"source": [
"# ๐ Health Resource Search Agent Tutorial ๐\n",
"\n",
"Welcome to the **Health Resource Search Agent** tutorial! We'll use **Azure AI Foundry** SDKs to build an assistant that can:\n",
"\n",
"1. **Upload** health and recipe files into a vector store.\n",
"2. **Create an Agent** with a **File Search** tool.\n",
"3. **Search** these documents for relevant dietary info.\n",
"4. **Answer** health and wellness questions (with disclaimers!).\n",
"\n",
"### โ ๏ธ Important Medical Disclaimer โ ๏ธ\n",
"> **All health information in this notebook is for general educational purposes only and is not a substitute for professional medical advice, diagnosis, or treatment.** Always seek the advice of a qualified healthcare professional with any questions you may have.\n",
"\n",
"## Prerequisites\n",
"- Complete Agent basics notebook - [1-basics.ipynb](1-basics.ipynb)\n",
"- **Roles** \n",
" 1. **Azure AI Developer** on your Azure AI Foundry project.\n",
" 2. **Storage Blob Data Contributor** on the project’s Storage account.\n",
" 3. If standard agent setup is used with your own Search resource, also ensure you have **Cognitive Search Data Contributor** on that resource.\n",
"\n",
"## Let's Get Searching!\n",
"We'll show you how to upload some sample files, create a vector store for them, then spin up an agent that can search these resources for dietary guidelines, recipes, and more. Enjoy!\n",
"\n",
"<img src=\"./seq-diagrams/3-file-search.png\" width=\"30%\"/>\n"
]
},
{
"cell_type": "markdown",
"id": "initial-setup",
"metadata": {},
"source": [
"## 1. Initial Setup\n",
"Here we import needed libraries, load environment variables from `.env`, and initialize our **AIProjectClient**. Let's do this! ๐"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "init-client-code",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import time\n",
"from pathlib import Path\n",
"\n",
"from dotenv import load_dotenv\n",
"from azure.identity import DefaultAzureCredential\n",
"from azure.ai.projects import AIProjectClient\n",
"from azure.ai.projects.models import (\n",
" FileSearchTool,\n",
" FilePurpose,\n",
" MessageTextContent,\n",
" MessageRole\n",
")\n",
"\n",
"# Load environment variables from parent .env\n",
"notebook_path = Path().absolute()\n",
"parent_dir = notebook_path.parent\n",
"load_dotenv(parent_dir.parent / '.env')\n",
"\n",
"# Initialize AIProjectClient\n",
"try:\n",
" project_client = AIProjectClient.from_connection_string(\n",
" credential=DefaultAzureCredential(),\n",
" conn_str=os.environ.get(\"PROJECT_CONNECTION_STRING\")\n",
" )\n",
" print(\"โ
Successfully initialized AIProjectClient\")\n",
"except Exception as e:\n",
" print(f\"โ Error initializing project client: {e}\")"
]
},
{
"cell_type": "markdown",
"id": "create-sample-resources",
"metadata": {},
"source": [
"## 2. Prepare Sample Files ๐ฒ๐\n",
"We'll create some dummy .md files (for recipes and guidelines). Then we'll store them in a vector store for searching.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "create-files-code",
"metadata": {},
"outputs": [],
"source": [
"def create_sample_files():\n",
" recipes_md = (\n",
" \"\"\"# Healthy Recipes Database\\n\\n\"\n",
" \"## Gluten-Free Recipes\\n\"\n",
" \"1. Quinoa Bowl\\n\"\n",
" \" - Ingredients: quinoa, vegetables, olive oil\\n\"\n",
" \" - Instructions: Cook quinoa, add vegetables\\n\\n\"\n",
" \"2. Rice Pasta with Vegetables\\n\"\n",
" \" - Ingredients: rice pasta, mixed vegetables\\n\"\n",
" \" - Instructions: Boil pasta, sauté vegetables\\n\\n\"\n",
" \"## Diabetic-Friendly Recipes\\n\"\n",
" \"1. Low-Carb Stir Fry\\n\"\n",
" \" - Ingredients: chicken, vegetables, tamari sauce\\n\"\n",
" \" - Instructions: Cook chicken, add vegetables\\n\\n\"\n",
" \"2. Greek Salad\\n\"\n",
" \" - Ingredients: cucumber, tomatoes, feta, olives\\n\"\n",
" \" - Instructions: Chop vegetables, combine\\n\\n\"\n",
" \"## Heart-Healthy Recipes\\n\"\n",
" \"1. Baked Salmon\\n\"\n",
" \" - Ingredients: salmon, lemon, herbs\\n\"\n",
" \" - Instructions: Season salmon, bake\\n\\n\"\n",
" \"2. Mediterranean Bowl\\n\"\n",
" \" - Ingredients: chickpeas, vegetables, tahini\\n\"\n",
" \" - Instructions: Combine ingredients\\n\"\"\"\n",
" )\n",
"\n",
" guidelines_md = (\n",
" \"\"\"# Dietary Guidelines\\n\\n\"\n",
" \"## General Guidelines\\n\"\n",
" \"- Eat a variety of foods\\n\"\n",
" \"- Control portion sizes\\n\"\n",
" \"- Stay hydrated\\n\\n\"\n",
" \"## Special Diets\\n\"\n",
" \"1. Gluten-Free Diet\\n\"\n",
" \" - Avoid wheat, barley, rye\\n\"\n",
" \" - Focus on naturally gluten-free foods\\n\\n\"\n",
" \"2. Diabetic Diet\\n\"\n",
" \" - Monitor carbohydrate intake\\n\"\n",
" \" - Choose low glycemic foods\\n\\n\"\n",
" \"3. Heart-Healthy Diet\\n\"\n",
" \" - Limit saturated fats\\n\"\n",
" \" - Choose lean proteins\\n\"\"\"\n",
" )\n",
"\n",
" # Save to local .md files\n",
" with open(\"recipes.md\", \"w\", encoding=\"utf-8\") as f:\n",
" f.write(recipes_md)\n",
" with open(\"guidelines.md\", \"w\", encoding=\"utf-8\") as f:\n",
" f.write(guidelines_md)\n",
"\n",
" print(\"๐ Created sample resource files: recipes.md, guidelines.md\")\n",
" return [\"recipes.md\", \"guidelines.md\"]\n",
"\n",
"sample_files = create_sample_files()"
]
},
{
"cell_type": "markdown",
"id": "note-above-step-3",
"metadata": {},
"source": [
"#### โจ Note on Search Permissions\n",
"When creating the vector store, you must also have **Cognitive Search Data Contributor** role on your Azure AI Search resource (if you're using the standard agent setup with your own Search resource). Missing this role will often cause a **Forbidden** error. See [Authentication Setup](../../1-introduction/1-authentication.ipynb#4-add-agent-service-permissions) for details on configuring permissions.\n"
]
},
{
"cell_type": "markdown",
"id": "create-vector-store",
"metadata": {},
"source": [
"## 3. Create a Vector Store ๐\n",
"We'll upload our newly created files and group them into a single vector store for searching. This is how the agent can later find relevant text."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "vector-store-code",
"metadata": {},
"outputs": [],
"source": [
"def create_vector_store(files, store_name=\"my_health_resources\"):\n",
" try:\n",
" # Step 1: Upload files to Azure AI Agent service\n",
" # Each file needs to be uploaded individually and we'll collect their IDs\n",
" uploaded_ids = []\n",
" for fp in files:\n",
" # upload_file_and_poll ensures the upload is complete before continuing\n",
" # FilePurpose.AGENTS tells the service these files are for agent usage\n",
" upl = project_client.agents.upload_file_and_poll(\n",
" file_path=fp,\n",
" purpose=FilePurpose.AGENTS\n",
" )\n",
" uploaded_ids.append(upl.id)\n",
" print(f\"โ
Uploaded: {fp} -> File ID: {upl.id}\")\n",
"\n",
" # Step 2: Create a vector store from the uploaded files\n",
" # A vector store converts text into numerical vectors for semantic search\n",
" # create_vector_store_and_poll waits until indexing is complete\n",
" vs = project_client.agents.create_vector_store_and_poll(\n",
" file_ids=uploaded_ids, # Pass all our uploaded file IDs\n",
" name=store_name # Give our vector store a friendly name\n",
" )\n",
" print(f\"๐ Created vector store '{store_name}', ID: {vs.id}\")\n",
" return vs, uploaded_ids\n",
" except Exception as e:\n",
" print(f\"โ Error creating vector store: {e}\")\n",
" return None, []\n",
"\n",
"# Initialize empty variables to store our vector store and file IDs\n",
"vector_store, file_ids = None, []\n",
"\n",
"# If we successfully created sample files earlier, create a vector store from them\n",
"if sample_files:\n",
" vector_store, file_ids = create_vector_store(sample_files, \"health_resources_example\")"
]
},
{
"cell_type": "markdown",
"id": "create-agent-file-search",
"metadata": {},
"source": [
"## 4. Create the Health Resource Agent ๐\n",
"We use a **FileSearchTool** pointing to our newly created vector store, then create the Agent with instructions about disclaimers, dietary help, etc."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "create-agent-code",
"metadata": {},
"outputs": [],
"source": [
"def create_health_resource_agent(vstore_id):\n",
" try:\n",
" # Create a FileSearchTool that will allow our agent to search through documents\n",
" # FileSearchTool uses the vector store we created earlier to perform semantic search\n",
" # This means the agent can find relevant content even if the exact words don't match\n",
" file_search_tool = FileSearchTool(vector_store_ids=[vstore_id])\n",
"\n",
" # Create an AI agent that will use our search tool and follow specific instructions\n",
" # The agent combines:\n",
" # 1. A base LLM model (specified in environment variables or defaults to gpt-4o-mini)\n",
" # 2. The ability to search our health documents using the FileSearchTool\n",
" # 3. Custom instructions that shape how it responds to questions\n",
" agent = project_client.agents.create_agent(\n",
" # Specify which LLM model to use - fallback to gpt-4o-mini if not set\n",
" model=os.environ.get(\"MODEL_DEPLOYMENT_NAME\", \"gpt-4o-mini\"),\n",
" \n",
" # Give our agent a descriptive name\n",
" name=\"health-search-agent\",\n",
" \n",
" # These instructions act like a personality and rule set for our agent\n",
" # They ensure consistent, responsible health advice\n",
" instructions=\"\"\"\n",
" You are a health resource advisor with access to dietary and recipe files.\n",
" You:\n",
" 1. Always present disclaimers (you're not a doctor!)\n",
" 2. Provide references to the files when possible\n",
" 3. Focus on general nutrition or recipe tips.\n",
" 4. Encourage professional consultation for more detailed advice.\n",
" \"\"\",\n",
" \n",
" # Connect the search tool's interface definition and required resources\n",
" # This gives the agent the ability to actually perform searches\n",
" tools=file_search_tool.definitions,\n",
" tool_resources=file_search_tool.resources\n",
" )\n",
" print(f\"๐ Created health resource agent, ID: {agent.id}\")\n",
" return agent\n",
" except Exception as e:\n",
" print(f\"โ Error creating health resource agent: {e}\")\n",
" return None\n",
"\n",
"# Initialize our agent variable\n",
"health_agent = None\n",
"\n",
"# Only create the agent if we successfully created a vector store earlier\n",
"if vector_store:\n",
" health_agent = create_health_resource_agent(vector_store.id)"
]
},
{
"cell_type": "markdown",
"id": "health-search-conversation",
"metadata": {},
"source": [
"## 5. Searching Health Resources ๐๏ธ๐ฉ๐ณ\n",
"We'll create a new conversation thread and ask queries like “Gluten-free recipe ideas?” or “Heart-healthy meal plan?” The agent will do file search on the vector store to find relevant info."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "search-code",
"metadata": {},
"outputs": [],
"source": [
"def create_search_thread(agent):\n",
" try:\n",
" # In Azure AI Agent service, conversations happen in \"threads\"\n",
" # A thread maintains the context and history of a conversation\n",
" # Here we create a new empty thread to start a fresh conversation\n",
" thread = project_client.agents.create_thread()\n",
" print(f\"๐ Created new search thread, ID: {thread.id}\")\n",
" return thread\n",
" except Exception as e:\n",
" print(f\"โ Error creating search thread: {e}\")\n",
" return None\n",
"\n",
"def ask_search_question(thread_id, agent_id, user_question):\n",
" try:\n",
" # First, we add the user's question as a message to the thread\n",
" # This is like typing a message in a chat interface\n",
" message = project_client.agents.create_message(\n",
" thread_id=thread_id,\n",
" role=\"user\", # Specifies this message is from the user\n",
" content=user_question\n",
" )\n",
" print(f\"๐ Searching: '{user_question}'\")\n",
"\n",
" # Next, we create and process a \"run\" - this is where the magic happens!\n",
" # The agent will:\n",
" # 1. Read the user's question\n",
" # 2. Use the FileSearchTool to search our health documents\n",
" # 3. Generate a helpful response based on the search results\n",
" run = project_client.agents.create_and_process_run(\n",
" thread_id=thread_id,\n",
" agent_id=agent_id\n",
" )\n",
" print(f\"๐ค Run finished with status: {run.status}\")\n",
" if run.last_error:\n",
" print(f\"Error details: {run.last_error}\")\n",
" return run\n",
" except Exception as e:\n",
" print(f\"โ Error searching question: {e}\")\n",
" return None\n",
"\n",
"# Now let's test our search functionality!\n",
"# First check if we have our health agent available\n",
"if health_agent:\n",
" # Create a new conversation thread\n",
" search_thread = create_search_thread(health_agent)\n",
"\n",
" if search_thread:\n",
" # Define some test questions that demonstrate different types of health queries\n",
" # The agent will search our uploaded health documents to answer these\n",
" queries = [\n",
" \"Could you suggest a gluten-free lunch recipe?\",\n",
" \"Show me some heart-healthy meal ideas.\",\n",
" \"What guidelines do you have for someone with diabetes?\"\n",
" ]\n",
"\n",
" # Process each query one at a time\n",
" # The agent will maintain conversation context between questions\n",
" for q in queries:\n",
" ask_search_question(search_thread.id, health_agent.id, q)"
]
},
{
"cell_type": "markdown",
"id": "view-search-results",
"metadata": {},
"source": [
"## 6. View Results & Citations ๐\n",
"We'll read the conversation thread to see how the agent responded and see if it cited the correct files."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "view-search-results-code",
"metadata": {},
"outputs": [],
"source": [
"def display_thread_messages(thread_id):\n",
" try:\n",
" # Retrieve all messages in this conversation thread using the Azure AI Agent SDK\n",
" # Messages contain the back-and-forth between user and AI agent\n",
" messages = project_client.agents.list_messages(thread_id=thread_id)\n",
"\n",
" # Display the conversation history in reverse chronological order (newest first)\n",
" print(\"\\n๐ฃ๏ธ Conversation so far:\")\n",
" for m in reversed(messages.data):\n",
" # Each message may have multiple content pieces\n",
" # We're interested in the text content (vs other types like images)\n",
" if m.content:\n",
" last_content = m.content[-1]\n",
" if hasattr(last_content, \"text\"):\n",
" # Print who said what (ASSISTANT or USER) along with their message\n",
" print(f\"{m.role.upper()}: {last_content.text.value}\\n\")\n",
"\n",
" # The agent can cite specific passages from the uploaded documents\n",
" # Let's check if it referenced any files in its responses\n",
" print(\"\\n๐ Checking for citations...\")\n",
" for c in messages.file_citation_annotations:\n",
" # Each citation includes the quoted text and which file it came from\n",
" # This helps users verify the agent's sources\n",
" print(f\"- Citation snippet: '{c.text}' from file ID: {c.file_citation['file_id']}\")\n",
"\n",
" except Exception as e:\n",
" # Gracefully handle any errors that might occur when displaying messages\n",
" print(f\"โ Error displaying messages: {e}\")\n",
"\n",
"# Display the conversation history for our search thread\n",
"if search_thread:\n",
" display_thread_messages(search_thread.id)"
]
},
{
"cell_type": "markdown",
"id": "cleanup",
"metadata": {},
"source": [
"## 7. Cleanup & Best Practices ๐งน\n",
"We'll optionally remove the vector store, the uploaded files, and the agent. In a production environment, you might keep them around longer. Meanwhile, here are some tips:\n",
"\n",
"1. **Resource Management**\n",
" - Keep files grouped by category, regularly prune old or irrelevant files.\n",
" - Clear out test agents or vector stores once you're done.\n",
"\n",
"2. **Search Queries**\n",
" - Provide precise or multi-part queries.\n",
" - Consider synonyms or alternative keywords (\"gluten-free\" vs \"celiac\").\n",
" \n",
"3. **Health Information**\n",
" - Always disclaim that you are not a medical professional.\n",
" - Encourage users to see doctors for specific diagnoses.\n",
"\n",
"4. **Performance**\n",
" - Keep an eye on vector store size.\n",
" - Evaluate search accuracy with `azure-ai-evaluation`!\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "cleanup-code",
"metadata": {},
"outputs": [],
"source": [
"def cleanup_all():\n",
" try:\n",
" # Check if we have a vector store and delete it\n",
" # Vector stores are where we store the embeddings (numerical representations) \n",
" # of our documents for semantic search\n",
" if 'vector_store' in globals() and vector_store:\n",
" project_client.agents.delete_vector_store(vector_store.id)\n",
" print(\"๐๏ธ Deleted vector store.\")\n",
"\n",
" # Remove any files we uploaded to Azure AI Search\n",
" # These were the documents our agent used as its knowledge base\n",
" if 'file_ids' in globals() and file_ids:\n",
" for fid in file_ids:\n",
" project_client.agents.delete_file(fid)\n",
" print(\"๐๏ธ Deleted uploaded files from the service.\")\n",
"\n",
" # Delete the AI agent we created\n",
" # This frees up resources since we're done with our demo\n",
" if 'health_agent' in globals() and health_agent:\n",
" project_client.agents.delete_agent(health_agent.id)\n",
" print(\"๐๏ธ Deleted health resource agent.\")\n",
"\n",
" # Clean up any local files we created during the demo\n",
" # This keeps our workspace tidy\n",
" if 'sample_files' in globals() and sample_files:\n",
" for sf in sample_files:\n",
" if os.path.exists(sf):\n",
" os.remove(sf)\n",
" print(\"๐๏ธ Deleted local sample files.\")\n",
"\n",
" except Exception as e:\n",
" # If anything goes wrong during cleanup, we'll see what happened\n",
" print(f\"โ Error during cleanup: {e}\")\n",
"\n",
"# Run our cleanup function to remove all resources we created\n",
"# This is good practice in a tutorial/demo environment\n",
"cleanup_all()"
]
},
{
"cell_type": "markdown",
"id": "conclusion",
"metadata": {},
"source": [
"# Congratulations! ๐\n",
"You've created a **Health Resource Search Agent** that:\n",
"1. Uses a **Vector Store** to store sample recipes & guidelines.\n",
"2. **Searches** them to answer queries.\n",
"3. **Provides disclaimers** reminding users to consult professionals.\n",
"\n",
"Feel free to adapt this approach for your own corporate documents, product manuals, or custom health resources.\n",
"\n",
"Happy Searching! ๐\n",
"\n",
"#### Let's proceed to [4-bing_grounding.ipynb](4-bing_grounding.ipynb)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}