2-notebooks/1-chat_completion/4-phi-4.ipynb (228 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"id": "1e99a5b8",
"metadata": {},
"source": [
"# 🍎 Phi-4 Model with AIProjectClient 🍏\n",
"\n",
"**Phi-4** is a next-generation open model that aims to provide near GPT-4o capabilities at a fraction of the cost, making it ideal for many enterprise or personal use cases. It's especially great for chain-of-thought reasoning and RAG (Retrieval Augmented Generation) scenarios.\n",
"\n",
"In this notebook, you'll see how to:\n",
"1. **Initialize** an `AIProjectClient` for your Azure AI Foundry environment.\n",
"2. **Chat** with the **Phi-4** model using `azure-ai-inference`.\n",
"3. **Show** a Health & Fitness example, featuring disclaimers and wellness Q&A.\n",
"4. **Enjoy** the value proposition of a cheaper alternative to GPT-4 with strong reasoning capabilities. 🏋️\n",
"\n",
"> **Disclaimer**: This is not medical advice. Please consult professionals.\n",
"\n",
"## Why Phi-4?\n",
"Phi-4 is a 14B-parameter model trained on curated data for high reasoning performance.\n",
"- **Cost-Effective**: Get GPT-4-level performance for many tasks without the GPT-4 price.\n",
"- **Reasoning & RAG**: Perfect for chain-of-thought reasoning steps and retrieval augmented generation workflows.\n",
"- **Generous Context Window**: 16K tokens, enabling more context or longer user conversations.\n",
"\n",
"<img src=\"./seq-diagrams/4-phi-4.png\" width=\"30%\"/>\n"
]
},
{
"cell_type": "markdown",
"id": "e93357dd",
"metadata": {},
"source": [
"## 1. Setup\n",
"\n",
"Below, we'll install and import the necessary libraries:\n",
"- **azure-ai-projects**: For the `AIProjectClient`.\n",
"- **azure-ai-inference**: For calling your model, specifically the chat completions.\n",
"- **azure-identity**: For `DefaultAzureCredential`.\n",
"\n",
"Ensure you have a `.env` file with:\n",
"```bash\n",
"PROJECT_CONNECTION_STRING=<your-conn-string>\n",
"SERVERLESS_MODEL_NAME=phi-4\n",
"```\n",
"\n",
"> **Note**: It's recommended to complete the [`3-basic-rag.ipynb`](./3-basic-rag.ipynb) notebook before this one, as it covers important concepts that will be helpful here."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8b5634a0",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"from dotenv import load_dotenv\n",
"from pathlib import Path\n",
"from azure.identity import DefaultAzureCredential\n",
"from azure.ai.projects import AIProjectClient\n",
"from azure.ai.inference.models import SystemMessage, UserMessage, AssistantMessage\n",
"\n",
"from pathlib import Path\n",
"\n",
"# Load environment variables\n",
"notebook_path = Path().absolute()\n",
"parent_dir = notebook_path.parent\n",
"load_dotenv(parent_dir / '.env')\n",
"\n",
"conn_string = os.getenv(\"PROJECT_CONNECTION_STRING\")\n",
"phi4_deployment = os.getenv(\"SERVERLESS_MODEL_NAME\", \"phi-4\")\n",
"\n",
"try:\n",
" project_client = AIProjectClient.from_connection_string(\n",
" credential=DefaultAzureCredential(),\n",
" conn_str=conn_string,\n",
" )\n",
" print(\"✅ AIProjectClient created successfully!\")\n",
"except Exception as e:\n",
" print(\"❌ Error creating AIProjectClient:\", e)"
]
},
{
"cell_type": "markdown",
"id": "500d63ef",
"metadata": {},
"source": [
"## 2. Chat with Phi-4 🍏\n",
"We'll demonstrate a simple conversation using **Phi-4** in a health & fitness context. We'll define a system prompt that clarifies the role of the assistant. Then we'll ask some user queries.\n",
"\n",
"> Notice that Phi-4 is well-suited for chain-of-thought reasoning. We'll let it illustrate its reasoning steps for fun.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0bcc4772",
"metadata": {},
"outputs": [],
"source": [
"def chat_with_phi4(user_question, chain_of_thought=False):\n",
" \"\"\"Send a chat request to the Phi-4 model with optional chain-of-thought.\"\"\"\n",
" # We'll define a system message with disclaimers:\n",
" system_prompt = (\n",
" \"You are a Phi-4 AI assistant, focusing on health and fitness.\\n\"\n",
" \"Remind users that you are not a medical professional, but can provide general info.\\n\"\n",
" )\n",
"\n",
" # We can optionally instruct the model to show chain-of-thought. (Use carefully in production.)\n",
" if chain_of_thought:\n",
" system_prompt += \"Please show your step-by-step reasoning in your answer.\\n\"\n",
"\n",
" # We create messages for system + user.\n",
" system_msg = SystemMessage(content=system_prompt)\n",
" user_msg = UserMessage(content=user_question)\n",
"\n",
" with project_client.inference.get_chat_completions_client() as chat_client:\n",
" response = chat_client.complete(\n",
" model=phi4_deployment,\n",
" messages=[system_msg, user_msg],\n",
" temperature=0.8, # a bit creative\n",
" top_p=0.9,\n",
" max_tokens=400,\n",
" )\n",
"\n",
" return response.choices[0].message.content\n",
"\n",
"# Example usage:\n",
"question = \"I'm training for a 5K. Any tips on a weekly workout schedule?\"\n",
"answer = chat_with_phi4(question, chain_of_thought=True)\n",
"print(\"🗣️ User:\", question)\n",
"print(\"🤖 Phi-4:\", answer)"
]
},
{
"cell_type": "markdown",
"id": "bc68c40d",
"metadata": {},
"source": [
"## 3. RAG-like Example (Stub)\n",
"Phi-4 also excels in retrieval augmented generation scenarios, where you provide external context and let the model reason over it. Below is a **stub** example showing how you'd pass retrieved text as context.\n",
"\n",
"> In a real scenario, you'd embed & search for relevant passages, then feed them into the user/system message.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "419ea578",
"metadata": {},
"outputs": [],
"source": [
"def chat_with_phi4_rag(user_question, retrieved_doc):\n",
" \"\"\"Simulate an RAG flow by appending retrieved context to the system prompt.\"\"\"\n",
" system_prompt = (\n",
" \"You are Phi-4, helpful fitness AI.\\n\"\n",
" \"We have some context from the user's knowledge base: \\n\"\n",
" f\"{retrieved_doc}\\n\"\n",
" \"Please use this context to help your answer. If the context doesn't help, say so.\\n\"\n",
" )\n",
"\n",
" system_msg = SystemMessage(content=system_prompt)\n",
" user_msg = UserMessage(content=user_question)\n",
"\n",
" with project_client.inference.get_chat_completions_client() as chat_client:\n",
" response = chat_client.complete(\n",
" model=phi4_deployment,\n",
" messages=[system_msg, user_msg],\n",
" temperature=0.3,\n",
" max_tokens=300,\n",
" )\n",
" return response.choices[0].message.content\n",
"\n",
"# Let's define a dummy doc snippet:\n",
"doc_snippet = \"Recommended to run 3 times per week and mix with cross-training.\\n\" \\\n",
" \"Include rest days or active recovery days for muscle repair.\"\n",
"\n",
"user_q = \"How often should I run weekly to prepare for a 5K?\"\n",
"rag_answer = chat_with_phi4_rag(user_q, doc_snippet)\n",
"print(\"🗣️ User:\", user_q)\n",
"print(\"🤖 Phi-4 (RAG):\", rag_answer)"
]
},
{
"cell_type": "markdown",
"id": "3a33a375",
"metadata": {},
"source": [
"## 4. Wrap-Up & Best Practices\n",
"1. **Chain-of-Thought**: Great for debugging or certain QA tasks, but be mindful about revealing chain-of-thought to end users.\n",
"2. **RAG**: Use `azure-ai-inference` with retrieval results to ground your answers.\n",
"3. **OpenTelemetry**: Optionally integrate `opentelemetry-sdk` and `azure-core-tracing-opentelemetry` for full observability.\n",
"4. **Evaluate**: Use `azure-ai-evaluation` to measure your model’s performance.\n",
"5. **Cost & Performance**: Phi-4 aims to provide near GPT-4 performance at lower cost. Evaluate for your domain needs.\n",
"\n",
"## 🎉 Congratulations!\n",
"You've seen how to:\n",
"- Use **Phi-4** with `AIProjectClient` and `azure-ai-inference`.\n",
"- Create a **chat** flow with chain-of-thought.\n",
"- Stub a **RAG** scenario.\n",
"\n",
"> Happy hacking! 🏋️\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}