# üçè Basic Retrieval-Augmented Generation (RAG) with AIProjectClient üçé

In this notebook, we'll demonstrate a **basic RAG** flow using:
- **`azure-ai-projects`** (AIProjectClient)
- **`azure-ai-inference`** (Embeddings, ChatCompletions)
- **`azure-ai-search`** (for vector or hybrid search)

Our theme is **Health & Fitness** üçè so we‚Äôll create a simple set of health tips, embed them, store them in a search index, then do a query that retrieves relevant tips, and pass them to an LLM to produce a final answer.

> **Disclaimer**: This is not medical advice. For real health questions, consult a professional.

## What is RAG?
Retrieval-Augmented Generation (RAG) is a technique where the LLM (Large Language Model) uses relevant retrieved text chunks from your data to craft a final answer. This helps ground the model's response in real data, reducing hallucinations.


<img src="./seq-diagrams/3-basic-rag.png" width="30%"/>

## 1. Setup
We'll import libraries, load environment variables, and create an `AIProjectClient`.

> #### Complete [2-embeddings.ipynb](2-embeddings.ipynb) notebook before starting this one


In [None]:
import os
import time
import json
from dotenv import load_dotenv

# azure-ai-projects
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential

# We'll embed with azure-ai-inference
from azure.ai.inference import EmbeddingsClient, ChatCompletionsClient
from azure.ai.inference.models import UserMessage, SystemMessage

# For vector search or hybrid search
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from azure.core.credentials import AzureKeyCredential
from pathlib import Path

# Load environment variables
notebook_path = Path().absolute()
parent_dir = notebook_path.parent
load_dotenv(parent_dir / '.env')

conn_string = os.environ.get("PROJECT_CONNECTION_STRING")
chat_model = os.environ.get("MODEL_DEPLOYMENT_NAME", "gpt-4o-mini")
embedding_model = os.environ.get("EMBEDDING_MODEL_DEPLOYMENT_NAME", "text-embedding-3-small")
search_index_name = os.environ.get("SEARCH_INDEX_NAME", "healthtips-index")

try:
    project_client = AIProjectClient.from_connection_string(
        credential=DefaultAzureCredential(),
        conn_str=conn_string,
    )
    print("‚úÖ AIProjectClient created successfully!")
except Exception as e:
    print("‚ùå Error creating AIProjectClient:", e)

## 2. Create Sample Health Data
We'll create a few short doc chunks. In a real scenario, you might read from CSV or PDFs, chunk them up, embed them, and store them in your search index.


In [None]:
health_tips = [
    {
        "id": "doc1",
        "content": "Daily 30-minute walks help maintain a healthy weight and reduce stress.",
        "source": "General Fitness"
    },
    {
        "id": "doc2",
        "content": "Stay hydrated by drinking 8-10 cups of water per day.",
        "source": "General Fitness"
    },
    {
        "id": "doc3",
        "content": "Consistent sleep patterns (7-9 hours) improve muscle recovery.",
        "source": "General Fitness"
    },
    {
        "id": "doc4",
        "content": "For cardio endurance, try interval training like HIIT.",
        "source": "Workout Advice"
    },
    {
        "id": "doc5",
        "content": "Warm up with dynamic stretches before running to reduce injury risk.",
        "source": "Workout Advice"
    },
    {
        "id": "doc6",
        "content": "Balanced diets typically include protein, whole grains, fruits, vegetables, and healthy fats.",
        "source": "Nutrition"
    },
]
print("Created a small list of health tips.")

## 3.0. Create or Reset the Index
When creating a vector field in Azure AI Search, the **field definition** must include a `vector_search_profile` property that points to a matching profile name in your vector search settings.

We'll define a helper function to create (or reset) a vector index with an HNSW algorithm config.


In [3]:
from azure.search.documents.indexes.models import (
    SearchIndex,
    SearchField,
    SearchFieldDataType,
    SimpleField,
    SearchableField,
    VectorSearch,
    HnswAlgorithmConfiguration,
    HnswParameters,
    VectorSearchAlgorithmKind,
    VectorSearchAlgorithmMetric,
    VectorSearchProfile,
)

def create_healthtips_index(
        endpoint: str, api_key: str, index_name: str, 
        dimension: int = 1536 # if using text-embedding-3-small
        ):
    """Create or update a search index for health tips with vector search capability."""
    
    index_client = SearchIndexClient(endpoint=endpoint, credential=AzureKeyCredential(api_key))
    
    # Try to delete existing index
    try:
        index_client.delete_index(index_name)
        print(f"Deleted existing index: {index_name}")
    except Exception:
        pass  # Index doesn't exist yet
        
    # Define vector search configuration
    vector_search = VectorSearch(
        algorithms=[
            HnswAlgorithmConfiguration(
                name="myHnsw",
                kind=VectorSearchAlgorithmKind.HNSW,
                parameters=HnswParameters(
                    m=4,
                    ef_construction=400,
                    ef_search=500,
                    metric=VectorSearchAlgorithmMetric.COSINE
                )
            )
        ],
        profiles=[
            VectorSearchProfile(
                name="myHnswProfile",
                algorithm_configuration_name="myHnsw"
            )
        ]
    )
    
    # Define fields
    fields = [
        SimpleField(name="id", type=SearchFieldDataType.String, key=True),
        SearchableField(name="content", type=SearchFieldDataType.String),
        SimpleField(name="source", type=SearchFieldDataType.String),
        SearchField(
            name="embedding", 
            type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
            vector_search_dimensions=dimension,
            vector_search_profile_name="myHnswProfile" 
        ),
    ]
    
    # Create index definition
    index_def = SearchIndex(
        name=index_name,
        fields=fields,
        vector_search=vector_search
    )
    
    # Create the index
    index_client.create_index(index_def)
    print(f"‚úÖ Created or reset index: {index_name}")

## 3.1. Create Index & Upload Health Tips üèãÔ∏è

Now we'll put our health tips into action by:
1. **Creating a search connection** to Azure AI Search
2. **Building our index** with vector search capability
3. **Generating embeddings** for each health tip
4. **Uploading** the tips with their embeddings

This creates our knowledge base that we'll search through later. Think of it as building our 'fitness library' that our AI assistant can reference! üìöüí™

In [None]:
from azure.ai.projects.models import ConnectionType

# Step 1: Get search connection
search_conn = project_client.connections.get_default(
    connection_type=ConnectionType.AZURE_AI_SEARCH, 
    include_credentials=True
)
if not search_conn:
    raise RuntimeError("‚ùå No default Azure AI Search connection found!")
print("‚úÖ Got search connection")

# Step 2: Get embeddings client and check embedding length
embeddings_client = project_client.inference.get_embeddings_client()
print("‚úÖ Created embeddings client")

sample_doc = health_tips[0]
emb_response = embeddings_client.embed(
        model=embedding_model,
        input=[sample_doc["content"]]
    )
embedding_length = len(emb_response.data[0].embedding)
print(f"‚úÖ Got embedding length: {embedding_length}")

# Step 3: Create the index
create_healthtips_index(
    endpoint=search_conn.endpoint_url,
    api_key=search_conn.key,
    index_name=search_index_name,
    dimension=embedding_length   # for text-embedding-3-large
)

# Step 4: Create search client for uploading documents
search_client = SearchClient(
    endpoint=search_conn.endpoint_url,
    index_name=search_index_name,
    credential=AzureKeyCredential(search_conn.key)
)
print("‚úÖ Created search client")


# Step 5: Embed and upload documents
search_docs = []
for doc in health_tips:
    # Get embedding for document content
    emb_response = embeddings_client.embed(
        model=embedding_model,
        input=[doc["content"]]
    )
    emb_vec = emb_response.data[0].embedding
    
    # Create document with embedding
    search_docs.append({
        "id": doc["id"],
        "content": doc["content"],
        "source": doc["source"],
        "embedding": emb_vec,
    })

# Upload documents to index
result = search_client.upload_documents(documents=search_docs)
print(f"‚úÖ Uploaded {len(search_docs)} documents to search index '{search_index_name}'")

## 4. Basic RAG Flow
### 4.1. Retrieve
When a user queries, we:
1. Embed user question.
2. Search vector index with that embedding to get top docs.

### 4.2. Generate answer
We then pass the retrieved docs to the chat model.

> In a real scenario, you'd have a more advanced approach to chunking & summarizing. We'll keep it simple.


In [6]:
from azure.search.documents.models import VectorizedQuery

def rag_chat(query: str, top_k: int = 3) -> str:
    # 1) Embed user query
    user_vec = embeddings_client.embed(
        model=embedding_model,
        input=[query]).data[0].embedding

    # 2) Vector search using VectorizedQuery
    vector_query = VectorizedQuery(
        vector=user_vec,
        k_nearest_neighbors=top_k,
        fields="embedding"
    )

    results = search_client.search(
        search_text="",  # Optional text query
        vector_queries=[vector_query],
        select=["content", "source"]  # Only retrieve fields we need
    )

    # gather the top docs
    top_docs_content = []
    for r in results:
        c = r["content"]
        s = r["source"]
        top_docs_content.append(f"Source: {s} => {c}")

    # 3) Chat with retrieved docs
    system_text = (
        "You are a health & fitness assistant.\n"
        "Answer user questions using ONLY the text from these docs.\n"
        "Docs:\n"
        + "\n".join(top_docs_content)
        + "\nIf unsure, say 'I'm not sure'.\n"
    )

    with project_client.inference.get_chat_completions_client() as chat_client:
        response = chat_client.complete(
            model=chat_model,
            messages=[
                SystemMessage(content=system_text),
                UserMessage(content=query)
            ]
        )
    return response.choices[0].message.content

## 5. Try a Query üéâ
Let's do a question about cardio for busy people.


In [None]:
user_query = "What's a good short cardio routine for me if I'm busy?"
answer = rag_chat(user_query)
print("üó£Ô∏è User Query:", user_query)
print("ü§ñ RAG Answer:", answer)

## 6. Conclusion
We've demonstrated a **basic RAG** pipeline with:
- **Embedding** docs & storing them in **Azure AI Search**.
- **Retrieving** top docs for user question.
- **Chat** with the retrieved docs.

üîé You can expand this by adding advanced chunking, more robust retrieval, and quality checks. Enjoy your healthy coding! üçé


üöÄ Want to optimize this further with a small language model? Check out the next notebook [4-phi-4.ipynb](4-phi-4.ipynb) to see how to use Phi-4 using the same Azure AI Foundry SDKs!