demo-python/code/conference-demos/ignite-2024/azure-ai-search-e2e-build-demo.ipynb (1,350 lines of code) (raw):

{ "cells": [ { "cell_type": "markdown", "metadata": { "redirect_from": ["https://github.com/azure/azure-search-vector-samples/blob/main/demo-python/code/e2e-demos/azure-ai-search-e2e-build-demo"] }, "source": [ "# Azure AI Search E2E RAG Demo - Bring Data from GCP and AWS \n", "\n", "## Introduction\n", "In this demo, we will show you how to seamlessly integrate data from Google Cloud Platform storage and Amazon Web Services (AWS) S3 with Azure AI Search using the latest AI vectorization techniques.\n", "\n", "## Prerequisites\n", "- 🐍 Python 3.9 or higher\n", "- ☁️ GCP Storage Bucket \n", "- ☁️ AWS S3 Storage Bucket \n", "- 🔗 Azure OpenAI Service\n", "- 🔗 Azure AI Search Service\n", "- 🔗 Azure AI Studio MaaS Cohere Embed V3 real-time endpoint\n", "\n", "## Features Covered\n", "This demo covers the following features:\n", "- ✅ Stored=False\n", "- ✅ Scalar Quantization to int8\n", "- ✅ Reranking w/full precision vectors\n", "- ✅ Oversampling\n", "- ✅ Integrated Vectorization\n", "- ✅ Reranking with Semantic Ranker\n", "- ✅ **[NEW]** OneLake Data Source Connector\n", "- ✅ **[NEW]** Azure AI Studio MaaS Embedding Skill (We'll use Cohere Embed V3)\n", "- ✅ **[NEW]** Azure OpenAI new `text-embedding-3-large` embedding model\n", "- ✅ **[NEW]** Score Threshold Filtering\n", "- ✅ **[NEW]** MaxTextSizeRecall property for Hybrid Search\n", "- ✅ **[NEW]** Hybrid Weighting\n", "- ✅ **[NEW]** RAG using Azure OpenAI Studio On Your Data SDK\n", "\n", "Let's get started!" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [], "source": [ "! pip install azure-search-documents --pre --quiet\n", "! pip install openai python-dotenv azure-identity cohere azure-ai-vision-imageanalysis --quiet" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "import cohere\n", "import openai\n", "import os\n", "import time\n", "from azure.core.credentials import AzureKeyCredential\n", "from azure.identity import DefaultAzureCredential\n", "from azure.search.documents import SearchClient\n", "from azure.search.documents.indexes import SearchIndexClient, SearchIndexerClient\n", "from azure.search.documents.indexes.models import (\n", " AzureMachineLearningSkill,\n", " AzureOpenAIEmbeddingSkill,\n", " AzureOpenAIModelName,\n", " AzureOpenAIVectorizer,\n", " AzureOpenAIParameters,\n", " ExhaustiveKnnAlgorithmConfiguration,\n", " ExhaustiveKnnParameters,\n", " FieldMapping,\n", " HnswAlgorithmConfiguration,\n", " HnswParameters,\n", " IndexProjectionMode,\n", " InputFieldMappingEntry,\n", " OutputFieldMappingEntry,\n", " ScalarQuantizationCompressionConfiguration,\n", " ScalarQuantizationParameters,\n", " SearchField,\n", " SearchFieldDataType,\n", " SearchIndex,\n", " SearchIndexer,\n", " SearchIndexerDataContainer,\n", " SearchIndexerDataSourceConnection,\n", " SearchIndexerIndexProjectionSelector,\n", " SearchIndexerIndexProjections,\n", " SearchIndexerIndexProjectionsParameters,\n", " SearchIndexerSkillset,\n", " SemanticConfiguration,\n", " SemanticField,\n", " SemanticPrioritizedFields,\n", " SemanticSearch,\n", " SplitSkill,\n", " VectorSearch,\n", " VectorSearchAlgorithmMetric,\n", " VectorSearchProfile,\n", ")\n", "from azure.search.documents.models import (\n", " HybridCountAndFacetMode,\n", " HybridSearch,\n", " SearchScoreThreshold,\n", " VectorSimilarityThreshold,\n", " VectorizableTextQuery,\n", " VectorizedQuery\n", ")\n", "from dotenv import load_dotenv\n" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "# Load environment variables\n", "load_dotenv()\n", "\n", "# Configuration\n", "AZURE_AI_STUDIO_COHERE_API_KEY = os.getenv(\"AZURE_AI_STUDIO_COHERE_API_KEY\")\n", "AZURE_AI_STUDIO_COHERE_ENDPOINT = os.getenv(\"AZURE_AI_STUDIO_COHERE_ENDPOINT\")\n", "AZURE_OPENAI_API_KEY = os.getenv(\"AZURE_OPENAI_API_KEY\")\n", "AZURE_OPENI_CHAT_COMPLETION_DEPLOYED_MODEL_NAME = os.getenv(\"AZURE_OPENAI_CHAT_COMPLETION_DEPLOYED_MODEL_NAME\")\n", "AZURE_OPENAI_EMBEDDING_DEPLOYED_MODEL_NAME = os.getenv(\"AZURE_OPENAI_EMBEDDING_DEPLOYED_MODEL_NAME\")\n", "AZURE_OPENAI_ENDPOINT = os.getenv(\"AZURE_OPENAI_ENDPOINT\")\n", "INDEX_NAME = \"fsunavala-aws-openai\"\n", "ONELAKE_CONNECTION_STRING = os.getenv(\"ONELAKE_CONNECTION_STRING\")\n", "ONELAKE_CONTAINER_NAME = os.getenv(\"ONELAKE_CONTAINER_NAME\")\n", "SEARCH_SERVICE_API_KEY = os.getenv(\"AZURE_SEARCH_ADMIN_KEY\")\n", "SEARCH_SERVICE_ENDPOINT = os.getenv(\"AZURE_SEARCH_SERVICE_ENDPOINT\")" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using AAD for authentication.\n" ] } ], "source": [ "# User-specified parameter\n", "USE_AAD_FOR_SEARCH = True \n", "\n", "def authenticate_azure_search(api_key=None, use_aad_for_search=False):\n", " if use_aad_for_search:\n", " print(\"Using AAD for authentication.\")\n", " credential = DefaultAzureCredential()\n", " else:\n", " print(\"Using API keys for authentication.\")\n", " if api_key is None:\n", " raise ValueError(\"API key must be provided if not using AAD for authentication.\")\n", " credential = AzureKeyCredential(api_key)\n", " return credential\n", "\n", "azure_search_credential = authenticate_azure_search(api_key=SEARCH_SERVICE_API_KEY, use_aad_for_search=USE_AAD_FOR_SEARCH)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Connect to OneLake\n", "\n", "There is a few steps you need to do to create this. You can follow the tutorial here on how to create a onelake in Microsoft Fabric. https://learn.microsoft.com/fabric/onelake/create-lakehouse-onelake\n", "\n", "In this example, I already congifured shortcuts to Google Cloud (GCS) Storage and Amazon Web Services (AWS) S3. You can follow the tutorial here on how to create a Onelake shortcut in Microsoft Fabric: https://learn.microsoft.com/fabric/onelake/create-gcs-shortcut" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create a OneLake data source in Azure AI Search" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Data source 'fsunavala-aws-openai-onelake' created or updated successfully.\n", "Data source 'fsunavala-aws-cohere-onelake' created or updated successfully.\n", "Data source 'fsunavala-gcp-openai-onelake' created or updated successfully.\n", "Data source 'fsunavala-gcp-cohere-onelake' created or updated successfully.\n" ] } ], "source": [ "def create_or_update_data_source(indexer_client, container_name, connection_string, index_name, query):\n", " \"\"\"\n", " Create or update a data source connection for Azure AI Search.\n", " \"\"\"\n", " try:\n", " container = SearchIndexerDataContainer(name=container_name, query=query)\n", " data_source_connection = SearchIndexerDataSourceConnection(\n", " name=f\"{index_name}-onelake\",\n", " type=\"onelake\",\n", " connection_string=connection_string, # Fabric Workspace ID \n", " container=container, # Lakehouse ID\n", " )\n", " data_source = indexer_client.create_or_update_data_source_connection(data_source_connection)\n", " print(f\"Data source '{data_source.name}' created or updated successfully.\")\n", " except Exception as e:\n", " print(f\"Failed to create or update data source due to error: {e}\")\n", "\n", "indexer_client = SearchIndexerClient(SEARCH_SERVICE_ENDPOINT, azure_search_credential)\n", "\n", "index_names = [\"fsunavala-aws-openai\", \"fsunavala-aws-cohere\", \"fsunavala-gcp-openai\", \"fsunavala-gcp-cohere\"]\n", "queries = [\"fsunavala-aws-shortcut/pdfs\", \"fsunavala-aws-shortcut/pdfs\", \"fsunavala-gcp-shortcut/pdfs\", \"fsunavala-gcp-shortcut/pdfs\"]\n", "\n", "for index_name, query in zip(index_names, queries):\n", " create_or_update_data_source(indexer_client, ONELAKE_CONTAINER_NAME, ONELAKE_CONNECTION_STRING, index_name, query)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create a search index" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Created index: fsunavala-aws-openai\n", "Created index: fsunavala-aws-cohere\n", "Created index: fsunavala-gcp-openai\n", "Created index: fsunavala-gcp-cohere\n" ] } ], "source": [ "\n", "def create_fields(vector_search_dimensions):\n", " return [\n", " SearchField(\n", " name=\"parent_id\",\n", " type=SearchFieldDataType.String,\n", " sortable=True,\n", " filterable=True,\n", " facetable=True,\n", " ),\n", " SearchField(name=\"title\", type=SearchFieldDataType.String),\n", " SearchField(\n", " name=\"chunk_id\",\n", " type=SearchFieldDataType.String,\n", " key=True,\n", " sortable=True,\n", " filterable=True,\n", " facetable=True,\n", " analyzer_name=\"keyword\",\n", " ),\n", " SearchField(name=\"chunk\", type=SearchFieldDataType.String),\n", " SearchField(\n", " name=\"vector\",\n", " type=SearchFieldDataType.Collection(SearchFieldDataType.Single),\n", " vector_search_dimensions=vector_search_dimensions,\n", " vector_search_profile_name=\"myHnswProfileSQ\",\n", " stored=False\n", " ),\n", " ]\n", "\n", "\n", "def create_vector_search_configuration(vectorizer_name):\n", " return VectorSearch(\n", " algorithms=[\n", " HnswAlgorithmConfiguration(\n", " name=\"myHnsw\",\n", " parameters=HnswParameters(\n", " m=4,\n", " ef_construction=400,\n", " ef_search=500,\n", " metric=VectorSearchAlgorithmMetric.COSINE,\n", " ),\n", " ),\n", " ExhaustiveKnnAlgorithmConfiguration(\n", " name=\"myExhaustiveKnn\",\n", " parameters=ExhaustiveKnnParameters(\n", " metric=VectorSearchAlgorithmMetric.COSINE\n", " ),\n", " ),\n", " ],\n", " profiles=[\n", " VectorSearchProfile(\n", " name=\"myHnswProfileSQ\",\n", " algorithm_configuration_name=\"myHnsw\",\n", " compression_configuration_name=\"myScalarQuantization\",\n", " vectorizer=vectorizer_name,\n", " ),\n", " VectorSearchProfile(\n", " name=\"myExhaustiveKnnProfile\",\n", " algorithm_configuration_name=\"myExhaustiveKnn\",\n", " vectorizer=vectorizer_name,\n", " ),\n", " ],\n", " vectorizers=[\n", " AzureOpenAIVectorizer(\n", " name=\"myOpenAI\",\n", " kind=\"azureOpenAI\",\n", " azure_open_ai_parameters=AzureOpenAIParameters(\n", " resource_uri=AZURE_OPENAI_ENDPOINT,\n", " deployment_id=AZURE_OPENAI_EMBEDDING_DEPLOYED_MODEL_NAME,\n", " api_key=AZURE_OPENAI_API_KEY,\n", " model_name=AzureOpenAIModelName.TEXT_EMBEDDING3_LARGE,\n", " ),\n", " ),\n", " ],\n", " compressions=[\n", " ScalarQuantizationCompressionConfiguration(\n", " name=\"myScalarQuantization\",\n", " rerank_with_original_vectors=True,\n", " default_oversampling=10,\n", " parameters=ScalarQuantizationParameters(quantized_data_type=\"int8\"),\n", " )\n", " ],\n", " )\n", "\n", "\n", "def create_semantic_search_configuration():\n", " return SemanticSearch(\n", " configurations=[\n", " SemanticConfiguration(\n", " name=\"mySemanticConfig\",\n", " prioritized_fields=SemanticPrioritizedFields(\n", " content_fields=[SemanticField(field_name=\"chunk\")]\n", " ),\n", " )\n", " ]\n", " )\n", "\n", "\n", "def create_search_index(index_name, fields, vector_search, semantic_search):\n", " index = SearchIndex(\n", " name=index_name,\n", " fields=fields,\n", " vector_search=vector_search,\n", " semantic_search=semantic_search,\n", " )\n", " index_client.create_or_update_index(index)\n", "\n", "\n", "index_client = SearchIndexClient(\n", " endpoint=SEARCH_SERVICE_ENDPOINT, credential=azure_search_credential\n", ")\n", "\n", "index_names = [\n", " \"fsunavala-aws-openai\",\n", " \"fsunavala-aws-cohere\",\n", " \"fsunavala-gcp-openai\",\n", " \"fsunavala-gcp-cohere\",\n", "]\n", "\n", "for index_name in index_names:\n", " vector_search_dimensions = 1024 if \"cohere\" in index_name else 3072 \n", " fields = create_fields(vector_search_dimensions)\n", " vectorizer_name = \"myOpenAI\" if \"openai\" in index_name else None\n", " vector_search = create_vector_search_configuration(vectorizer_name)\n", " semantic_search = create_semantic_search_configuration()\n", " create_search_index(index_name, fields, vector_search, semantic_search)\n", " print(f\"Created index: {index_name}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create a Skillset " ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Creating skillset: fsunavala-aws-openai-skillset\n", "Skillset 'fsunavala-aws-openai-skillset' created or updated.\n", "Creating skillset: fsunavala-aws-cohere-skillset\n", "Skillset 'fsunavala-aws-cohere-skillset' created or updated.\n", "Creating skillset: fsunavala-gcp-openai-skillset\n", "Skillset 'fsunavala-gcp-openai-skillset' created or updated.\n", "Creating skillset: fsunavala-gcp-cohere-skillset\n", "Skillset 'fsunavala-gcp-cohere-skillset' created or updated.\n" ] } ], "source": [ "# Define the skills\n", "def create_split_skill():\n", " \"\"\"Creates a split skill to chunk documents into pages.\"\"\"\n", " return SplitSkill(\n", " description=\"Split skill to chunk documents\",\n", " text_split_mode=\"pages\",\n", " context=\"/document\",\n", " maximum_page_length=2000,\n", " page_overlap_length=500,\n", " inputs=[InputFieldMappingEntry(name=\"text\", source=\"/document/content\")],\n", " outputs=[OutputFieldMappingEntry(name=\"textItems\", target_name=\"pages\")],\n", " )\n", "\n", "\n", "def create_embedding_skill_cohere(\n", " azure_ai_studio_cohere_endpoint, azure_ai_studio_cohere_key\n", "):\n", " \"\"\"Defines the embedding skill for generating embeddings via AI Studio Cohere.\"\"\"\n", " return AzureMachineLearningSkill(\n", " description=\"Skill to generate embeddings via AI Studio Cohere\",\n", " context=\"/document/pages/*\",\n", " scoring_uri=f\"{azure_ai_studio_cohere_endpoint}/v1/embed\",\n", " authentication_key=azure_ai_studio_cohere_key,\n", " inputs=[\n", " InputFieldMappingEntry(name=\"texts\", source=\"=[$(/document/pages/*)]\"),\n", " InputFieldMappingEntry(name=\"input_type\", source=\"='search_document'\"),\n", " InputFieldMappingEntry(\n", " name=\"truncate\", source=\"='NONE'\"\n", " ), # Trim end of input if necessary\n", " InputFieldMappingEntry(name=\"embedding_types\", source=\"=['float']\"),\n", " ],\n", " outputs=[\n", " OutputFieldMappingEntry(name=\"embeddings\", target_name=\"aml_vector_object\")\n", " ],\n", " )\n", "\n", "\n", "def create_embedding_skill_openai(\n", " azure_openai_endpoint, azure_openai_embedding_deployment, azure_openai_key\n", "):\n", " \"\"\"Defines the embedding skill for generating embeddings via Azure OpenAI.\"\"\"\n", " return AzureOpenAIEmbeddingSkill(\n", " description=\"Skill to generate embeddings via Azure OpenAI\",\n", " context=\"/document/pages/*\",\n", " resource_uri=azure_openai_endpoint,\n", " deployment_id=azure_openai_embedding_deployment,\n", " api_key=azure_openai_key,\n", " model_name=AzureOpenAIModelName.TEXT_EMBEDDING3_LARGE,\n", " dimensions=3072, # Take advantage of the larger model with variable dimension sizes\n", " inputs=[InputFieldMappingEntry(name=\"text\", source=\"/document/pages/*\")],\n", " outputs=[OutputFieldMappingEntry(name=\"embedding\", target_name=\"vector\")],\n", " )\n", "\n", "\n", "# Define the index projections\n", "def create_index_projections(index_name, is_cohere):\n", " \"\"\"Creates index projections for use in a skillset.\"\"\"\n", " vector_source = (\n", " \"/document/pages/*/aml_vector_object/float/0\"\n", " if is_cohere\n", " else \"/document/pages/*/vector\"\n", " )\n", " return SearchIndexerIndexProjections(\n", " selectors=[\n", " SearchIndexerIndexProjectionSelector(\n", " target_index_name=index_name,\n", " parent_key_field_name=\"parent_id\",\n", " source_context=\"/document/pages/*\",\n", " mappings=[\n", " InputFieldMappingEntry(name=\"chunk\", source=\"/document/pages/*\"),\n", " InputFieldMappingEntry(name=\"vector\", source=vector_source),\n", " InputFieldMappingEntry(\n", " name=\"title\", source=\"/document/metadata_storage_name\"\n", " ),\n", " ],\n", " ),\n", " ],\n", " parameters=SearchIndexerIndexProjectionsParameters(\n", " projection_mode=IndexProjectionMode.SKIP_INDEXING_PARENT_DOCUMENTS\n", " ),\n", " )\n", "\n", "\n", "# Create or update the skillset\n", "def create_skillset(client, skillset_name, skills, index_projections):\n", " \"\"\"Creates or updates the skillset with embedding and indexing projection skills.\"\"\"\n", " skillset = SearchIndexerSkillset(\n", " name=skillset_name,\n", " description=\"Skillset to chunk documents and generate embeddings\",\n", " skills=skills,\n", " index_projections=index_projections,\n", " )\n", " try:\n", " client.create_or_update_skillset(skillset)\n", " print(f\"Skillset '{skillset_name}' created or updated.\")\n", " except Exception as e:\n", " print(f\"Failed to create or update skillset '{skillset_name}': {e}\")\n", "\n", "\n", "split_skill = create_split_skill()\n", "\n", "client = SearchIndexerClient(\n", " SEARCH_SERVICE_ENDPOINT, credential=azure_search_credential\n", ")\n", "\n", "index_names = [\n", " \"fsunavala-aws-openai\",\n", " \"fsunavala-aws-cohere\",\n", " \"fsunavala-gcp-openai\",\n", " \"fsunavala-gcp-cohere\",\n", "]\n", "\n", "for index_name in index_names:\n", " if \"cohere\" in index_name:\n", " embedding_skill = create_embedding_skill_cohere(\n", " AZURE_AI_STUDIO_COHERE_ENDPOINT, AZURE_AI_STUDIO_COHERE_API_KEY\n", " )\n", " else:\n", " embedding_skill = create_embedding_skill_openai(\n", " AZURE_OPENAI_ENDPOINT,\n", " AZURE_OPENAI_EMBEDDING_DEPLOYED_MODEL_NAME,\n", " AZURE_OPENAI_API_KEY,\n", " )\n", " index_projections = create_index_projections(index_name, \"cohere\" in index_name)\n", " skillset_name = f\"{index_name}-skillset\"\n", " print(f\"Creating skillset: {skillset_name}\")\n", " create_skillset(\n", " client, skillset_name, [split_skill, embedding_skill], index_projections\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run Indexer" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Creating and running indexer: fsunavala-aws-openai-indexer\n", "fsunavala-aws-openai-indexer is created and running.\n", "Creating and running indexer: fsunavala-aws-cohere-indexer\n", "fsunavala-aws-cohere-indexer is created and running.\n", "Creating and running indexer: fsunavala-gcp-openai-indexer\n", "fsunavala-gcp-openai-indexer is created and running.\n", "Creating and running indexer: fsunavala-gcp-cohere-indexer\n", "fsunavala-gcp-cohere-indexer is created and running.\n" ] } ], "source": [ "def create_and_run_indexer(index_name, skillset_name, data_source, endpoint, credential):\n", " indexer_name = f\"{index_name}-indexer\"\n", " indexer = SearchIndexer(\n", " name=indexer_name,\n", " description=\"Indexer to index documents and generate embeddings\",\n", " skillset_name=skillset_name,\n", " target_index_name=index_name,\n", " data_source_name=data_source,\n", " field_mappings=[FieldMapping(source_field_name=\"metadata_storage_name\", target_field_name=\"title\")],\n", " )\n", " indexer_client = SearchIndexerClient(endpoint, credential)\n", " indexer_client.create_or_update_indexer(indexer)\n", " indexer_client.run_indexer(indexer_name)\n", " print(f\"{indexer_name} is created and running.\")\n", "\n", "index_names = [\n", " \"fsunavala-aws-openai\",\n", " \"fsunavala-aws-cohere\",\n", " \"fsunavala-gcp-openai\",\n", " \"fsunavala-gcp-cohere\",\n", "]\n", "\n", "for index_name in index_names:\n", " skillset_name = f\"{index_name}-skillset\"\n", " data_source = f\"{index_name}-onelake\"\n", " print(f\"Creating and running indexer: {index_name}-indexer\")\n", " create_and_run_indexer(index_name, skillset_name, data_source, SEARCH_SERVICE_ENDPOINT, azure_search_credential)\n", " time.sleep(30) # Wait for 30 seconds before starting the next indexer" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Search our vector indexes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create function for querying Cohere Embed V3 " ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "def generate_cohere_query_embedding(api_key, endpoint, query):\n", " co = cohere.Client(api_key=api_key, base_url=f\"{endpoint}/v1\")\n", " response = co.embed(texts=[query], input_type=\"search_query\")\n", " return response.embeddings[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Simple vector search (Cohere)\n", "\n", "This code performs a simple vector search over the \"fsunavala-aws-cohere\" index using the Azure AI Studio MaaS Cohere Embed V3 for generating the query embedding. " ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Title: Northwind_Standard_Benefits_Details.pdf\n", "Score: 0.83207124\n", "Content: to \n", "\n", "see if there are any generic or over-the-counter options that may be more affordable. \n", "\n", "\n", "\n", "• Keep track of your allergies, including the type of allergy, the severity of the allergy, the \n", "\n", "medications you are taking, and the results of any allergy tests you have had. This \n", "\n", "information can be helpful for your doctor when making decisions about your care. \n", "\n", "Ambulance \n", "\n", "COVERED SERVICES: Ambulance \n", "\n", "Ambulance services are covered under the Northwind Standard plan, providing you with \n", "\n", "the medical assistance you need in the event of an emergency. When you are in need of an \n", "\n", "ambulance, you can be sure that Northwind Health will cover your transport to the closest \n", "\n", "hospital or medical facility. \n", "\n", "Covered Services: \n", "\n", "The Northwind Standard plan covers ambulance transport to the nearest hospital or \n", "\n", "medical facility in the event of an emergency. This service is available 24 hours a day, seven \n", "\n", "days a week, and is covered up to the plan's limit. Ambulance transport is covered up to the \n", "\n", "plan's limit, regardless of whether the ambulance is provided by an in-network provider or \n", "\n", "an out-of-network provider. \n", "\n", "Exceptions: \n", "\n", "Northwind Health does not cover ambulance services that are provided for non-emergency \n", "\n", "transport. Non-emergency transport includes transport for routine medical care, such as \n", "\n", "transport to a doctor's office or a laboratory for tests. Non-emergency transport is not \n", "\n", "covered by the Northwind Standard plan. \n", "\n", "Tips for Employees: \n", "\n", "• Keep the phone number of your local ambulance service handy in case of an emergency. \n", "\n", "• Make sure to provide your Northwind Health insurance information to the ambulance \n", "\n", "service at the time of transport, as this will help ensure that your costs for the service are \n", "\n", "covered. \n", "\n", "• If you are transported by an out-of-network provider, you may be responsible for paying a \n", "\n", "portion of the cost. Be sure to check with Northwind Health to determine what your cost-\n", "--------------------------------------------------\n", "Title: Northwind_Standard_Benefits_Details.pdf\n", "Score: 0.7986301\n", "Content: plan. \n", "\n", "Tips for Employees: \n", "\n", "• Keep the phone number of your local ambulance service handy in case of an emergency. \n", "\n", "• Make sure to provide your Northwind Health insurance information to the ambulance \n", "\n", "service at the time of transport, as this will help ensure that your costs for the service are \n", "\n", "covered. \n", "\n", "• If you are transported by an out-of-network provider, you may be responsible for paying a \n", "\n", "portion of the cost. Be sure to check with Northwind Health to determine what your cost-\n", "\n", "sharing responsibilities are in such an event. \n", "\n", "• Be sure to keep all receipts and paperwork related to your ambulance service, as you may \n", "\n", "need this information when filing your claim with Northwind Health. \n", "\n", "• Remember, ambulance services are only covered for emergency transport. If you need to \n", "\n", "be transported for non-emergency medical care, you will need to make other arrangements \n", "\n", "for your transportation. \n", "\n", "With the Northwind Standard plan, you can rest assured that you will be covered in the \n", "\n", "event of an emergency. In the event of an emergency, you can be transported by ambulance \n", "\n", "\n", "\n", "to the nearest hospital or medical facility and your costs will be covered up to the plan's \n", "\n", "limit. By following the tips outlined above, you can ensure that you get the most out of your \n", "\n", "Northwind Health coverage. \n", "\n", "Blood Products And Services \n", "\n", "COVERED SERVICES: Blood Products And Services \n", "\n", "Northwind Standard offers coverage for a variety of blood products and services. These \n", "\n", "include both red and white blood cells, platelets, and plasma. The plan also covers \n", "\n", "laboratory tests related to the collection, examination, and transfusion of blood products. \n", "\n", "For red and white blood cells, Northwind Standard covers screening and compatibility tests, \n", "\n", "as well as collection, storage, and transfusion of the cells. The plan also covers the cost of \n", "\n", "blood or blood products administered during a hospital stay or procedure.\n", "--------------------------------------------------\n", "Title: Northwind_Health_Plus_Benefits_Details.pdf\n", "Score: 0.7966438\n", "Content: your local ambulance provider. You should provide them with your \n", "\n", "Northwind Health Plus plan information and Northwind Health will cover the cost of the \n", "\n", "service, minus any applicable coinsurance and copays. \n", "\n", "In conclusion, Northwind Health Plus covers ambulance services when they are medically \n", "\n", "necessary. This includes emergency services and non-emergency services. If you need to \n", "\n", "use an ambulance, it must be one that is in your network and has been approved by \n", "\n", "Northwind Health. You should always contact your primary care physician to determine if a \n", "\n", "service is medically necessary before using an ambulance, as this will help you to avoid any \n", "\n", "out-of-pocket costs. \n", "\n", "Blood Products And Services \n", "\n", "COVERED SERVICES: Blood Products and Services \n", "\n", "Northwind Health Plus covers a variety of blood products and services that are necessary \n", "\n", "for a healthy life. This plan provides coverage for blood tests, transfusions, and other \n", "\n", "related services required for diagnosis, treatment, and management of a medical condition. \n", "\n", "In-Network Coverage: \n", "\n", "Northwind Health Plus provides in-network coverage for a variety of blood products and \n", "\n", "services. Services covered by this plan include: \n", "\n", "• Blood tests and transfusions \n", "\n", "• Blood typing \n", "\n", "\n", "\n", "• Platelet donation and collection \n", "\n", "• Hemoglobin testing \n", "\n", "• Hemophilia treatment \n", "\n", "• Anemia treatment \n", "\n", "Out-of-Network Coverage: \n", "\n", "Northwind Health Plus also covers blood products and services received from out-of-\n", "\n", "network providers. However, coverage for out-of-network services may be limited and you \n", "\n", "may be required to pay more for out-of-network services than for in-network services. \n", "\n", "Exceptions: \n", "\n", "Northwind Health Plus does not cover any blood products or services not specifically listed \n", "\n", "in the plan document. This includes any experimental treatments or other services that are \n", "\n", "not medically necessary. \n", "\n", "Tips for Employees:\n", "--------------------------------------------------\n" ] } ], "source": [ "query = \"Coverage and exclusions for emergency and non-emergency medical transport under Northwind health plans\"\n", "\n", "search_client = SearchClient(\n", " SEARCH_SERVICE_ENDPOINT,\n", " index_name=\"fsunavala-aws-cohere\",\n", " credential=azure_search_credential,\n", ")\n", "\n", "vector_query = VectorizedQuery(\n", " vector=generate_cohere_query_embedding(\n", " AZURE_AI_STUDIO_COHERE_API_KEY, AZURE_AI_STUDIO_COHERE_ENDPOINT, query\n", " ),\n", " k_nearest_neighbors=3,\n", " fields=\"vector\",\n", ")\n", "\n", "results = search_client.search(search_text=None, vector_queries=[vector_query])\n", "\n", "for result in results:\n", " print(f\"Title: {result['title']}\")\n", " print(f\"Score: {result['@search.score']}\")\n", " print(f\"Content: {result['chunk']}\")\n", " print(\"-\" * 50)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Simple vector search (OpenAI)\n", "This code performs a simple vector search, retrieving the top 3 nearest neighbors. For the rest of the code, we'll only query the \"fsunavala-aws-openai\"." ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Title: Northwind_Health_Plus_Benefits_Details.pdf\n", "Score: 0.75144523\n", "Content: your local ambulance provider. You should provide them with your \n", "\n", "Northwind Health Plus plan information and Northwind Health will cover the cost of the \n", "\n", "service, minus any applicable coinsurance and copays. \n", "\n", "In conclusion, Northwind Health Plus covers ambulance services when they are medically \n", "\n", "necessary. This includes emergency services and non-emergency services. If you need to \n", "\n", "use an ambulance, it must be one that is in your network and has been approved by \n", "\n", "Northwind Health. You should always contact your primary care physician to determine if a \n", "\n", "service is medically necessary before using an ambulance, as this will help you to avoid any \n", "\n", "out-of-pocket costs. \n", "\n", "Blood Products And Services \n", "\n", "COVERED SERVICES: Blood Products and Services \n", "\n", "Northwind Health Plus covers a variety of blood products and services that are necessary \n", "\n", "for a healthy life. This plan provides coverage for blood tests, transfusions, and other \n", "\n", "related services required for diagnosis, treatment, and management of a medical condition. \n", "\n", "In-Network Coverage: \n", "\n", "Northwind Health Plus provides in-network coverage for a variety of blood products and \n", "\n", "services. Services covered by this plan include: \n", "\n", "• Blood tests and transfusions \n", "\n", "• Blood typing \n", "\n", "\n", "\n", "• Platelet donation and collection \n", "\n", "• Hemoglobin testing \n", "\n", "• Hemophilia treatment \n", "\n", "• Anemia treatment \n", "\n", "Out-of-Network Coverage: \n", "\n", "Northwind Health Plus also covers blood products and services received from out-of-\n", "\n", "network providers. However, coverage for out-of-network services may be limited and you \n", "\n", "may be required to pay more for out-of-network services than for in-network services. \n", "\n", "Exceptions: \n", "\n", "Northwind Health Plus does not cover any blood products or services not specifically listed \n", "\n", "in the plan document. This includes any experimental treatments or other services that are \n", "\n", "not medically necessary. \n", "\n", "Tips for Employees:\n", "--------------------------------------------------\n", "Title: Northwind_Standard_Benefits_Details.pdf\n", "Score: 0.7493273\n", "Content: to \n", "\n", "see if there are any generic or over-the-counter options that may be more affordable. \n", "\n", "\n", "\n", "• Keep track of your allergies, including the type of allergy, the severity of the allergy, the \n", "\n", "medications you are taking, and the results of any allergy tests you have had. This \n", "\n", "information can be helpful for your doctor when making decisions about your care. \n", "\n", "Ambulance \n", "\n", "COVERED SERVICES: Ambulance \n", "\n", "Ambulance services are covered under the Northwind Standard plan, providing you with \n", "\n", "the medical assistance you need in the event of an emergency. When you are in need of an \n", "\n", "ambulance, you can be sure that Northwind Health will cover your transport to the closest \n", "\n", "hospital or medical facility. \n", "\n", "Covered Services: \n", "\n", "The Northwind Standard plan covers ambulance transport to the nearest hospital or \n", "\n", "medical facility in the event of an emergency. This service is available 24 hours a day, seven \n", "\n", "days a week, and is covered up to the plan's limit. Ambulance transport is covered up to the \n", "\n", "plan's limit, regardless of whether the ambulance is provided by an in-network provider or \n", "\n", "an out-of-network provider. \n", "\n", "Exceptions: \n", "\n", "Northwind Health does not cover ambulance services that are provided for non-emergency \n", "\n", "transport. Non-emergency transport includes transport for routine medical care, such as \n", "\n", "transport to a doctor's office or a laboratory for tests. Non-emergency transport is not \n", "\n", "covered by the Northwind Standard plan. \n", "\n", "Tips for Employees: \n", "\n", "• Keep the phone number of your local ambulance service handy in case of an emergency. \n", "\n", "• Make sure to provide your Northwind Health insurance information to the ambulance \n", "\n", "service at the time of transport, as this will help ensure that your costs for the service are \n", "\n", "covered. \n", "\n", "• If you are transported by an out-of-network provider, you may be responsible for paying a \n", "\n", "portion of the cost. Be sure to check with Northwind Health to determine what your cost-\n", "--------------------------------------------------\n", "Title: Northwind_Standard_Benefits_Details.pdf\n", "Score: 0.74687946\n", "Content: it’s important to read the plan documents carefully and contact Northwind Health with any \n", "\n", "questions. \n", "\n", "Coverage for Emergency Room Services \n", "\n", "Under the Northwind Standard plan, coverage is provided for medically necessary \n", "\n", "emergency room services. Coverage is only available when the condition is an acute medical \n", "\n", "emergency or injury, and when the emergency room is the only way to receive medical \n", "\n", "attention. For example, if you experience a broken bone, chest pain, or a head injury, you \n", "\n", "would be covered for emergency room services. \n", "\n", "Exclusions \n", "\n", "However, there are certain services that are not covered under the Northwind Standard \n", "\n", "plan. Services that are not considered medically necessary, such as elective procedures, are \n", "\n", "not covered by the plan. In addition, services that are provided in the emergency room that \n", "\n", "are not related to the medical emergency, such as lab tests, x-rays, and other diagnostic \n", "\n", "tests, are not covered by the plan. \n", "\n", "Tips for Utilizing Emergency Room Services \n", "\n", "If you find yourself in a situation where you need to visit the emergency room, there are a \n", "\n", "few tips that can help you get the most out of your coverage. First, be sure to provide \n", "\n", "Northwind Health with all the information they need to process your claim, such as the date \n", "\n", "of service, the medical provider, and any other relevant information. Additionally, you \n", "\n", "should contact Northwind Health before you receive any services to ensure that they are \n", "\n", "covered by the plan. \n", "\n", "It’s also important to remember that emergency room services can be expensive, so you \n", "\n", "should always take steps to avoid unnecessary visits. If you are feeling ill and it’s not an \n", "\n", "acute medical emergency, you should contact your primary care physician or an urgent care \n", "\n", "center before going to the emergency room. In most cases, these services are less expensive \n", "\n", "and can provide the same level of care. \n", "\n", "Finally, if you do need to visit the emergency room, you should be sure to keep all of your\n", "--------------------------------------------------\n" ] } ], "source": [ "search_client = SearchClient(\n", " SEARCH_SERVICE_ENDPOINT,\n", " index_name=\"fsunavala-aws-openai\",\n", " credential=azure_search_credential,\n", ")\n", "\n", "query = \"Coverage and exclusions for emergency and non-emergency medical transport under Northwind health plans\"\n", "\n", "vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=3, fields=\"vector\")\n", "\n", "results = search_client.search(search_text=None, vector_queries=[vector_query])\n", "\n", "for result in results:\n", " print(f\"Title: {result['title']}\")\n", " print(f\"Score: {result['@search.score']}\")\n", " print(f\"Content: {result['chunk']}\")\n", " print(\"-\" * 50) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Simple vector search with Score Thresholding\n", "\n", "This code performs a simple vector search, applying a score threshold of 0.75 to filter out lower quality matches." ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Title: Northwind_Health_Plus_Benefits_Details.pdf\n", "Score: 0.75144523\n", "--------------------------------------------------\n" ] } ], "source": [ "# Define the query\n", "query = \"Coverage and exclusions for emergency and non-emergency medical transport under Northwind health plans\"\n", "vector_query = VectorizableTextQuery(\n", " text=query,\n", " k_nearest_neighbors=3,\n", " fields=\"vector\",\n", " threshold=SearchScoreThreshold(value=0.75),\n", " # threshold=VectorSimilarityThreshold(value=0.75),\n", ")\n", "\n", "# Perform the search\n", "results = search_client.search(search_text=None, vector_queries=[vector_query])\n", "\n", "# Print the results\n", "for result in results:\n", " print(f\"Title: {result['title']}\")\n", " print(f\"Score: {result['@search.score']}\")\n", " print(\"-\" * 50) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Hybrid Search with maxTextRecallSize\n", "\n", "This code performs a hybrid search, limiting the text recall size to 3 for more limited text/keyword search results, and prints the total count and top results." ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total count: 4\n", "Title: Northwind_Standard_Benefits_Details.pdf\n", "Score: 0.03279569745063782\n", "--------------------------------------------------\n", "Title: Northwind_Standard_Benefits_Details.pdf\n", "Score: 0.032786883413791656\n", "--------------------------------------------------\n", "Title: Northwind_Health_Plus_Benefits_Details.pdf\n", "Score: 0.01666666753590107\n", "--------------------------------------------------\n", "Title: Northwind_Health_Plus_Benefits_Details.pdf\n", "Score: 0.016129031777381897\n", "--------------------------------------------------\n" ] } ], "source": [ "# Define the query\n", "query = \"Coverage and exclusions for emergency and non-emergency medical transport under Northwind health plans\"\n", "vector_query = VectorizableTextQuery(\n", " text=query,\n", " k_nearest_neighbors=3,\n", " fields=\"vector\",\n", ")\n", "\n", "# Perform the search\n", "results = search_client.search(\n", " search_text=query,\n", " vector_queries=[vector_query],\n", " hybrid_search=HybridSearch(\n", " max_text_recall_size=3,\n", " count_and_facet_mode=HybridCountAndFacetMode.COUNT_RETRIEVABLE_RESULTS,\n", " # count_and_facet_mode=HybridCountAndFacetMode.COUNT_ALL_RESULTS,\n", " \n", " ),\n", " include_total_count=True,\n", ")\n", "\n", "# Print the total count\n", "print(f\"Total count: {results.get_count()}\")\n", "\n", "# Print the results\n", "for result in results:\n", " print(f\"Title: {result['title']}\")\n", " print(f\"Score: {result['@search.score']}\")\n", " print(\"-\" * 50) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Hybrid Search Weighting\n", "\n", "This code performs a weighted hybrid search, making vector matches 2x more important than keyword matches, and prints the top 3 results." ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Title: Northwind_Standard_Benefits_Details.pdf\n", "Score: 0.049180325120687485\n", "--------------------------------------------------\n", "Title: Northwind_Standard_Benefits_Details.pdf\n", "Score: 0.048924729228019714\n", "--------------------------------------------------\n", "Title: Northwind_Health_Plus_Benefits_Details.pdf\n", "Score: 0.04871794953942299\n", "--------------------------------------------------\n" ] } ], "source": [ "# Define the query\n", "query = \"Coverage and exclusions for emergency and non-emergency medical transport under Northwind health plans\"\n", "vector_query = VectorizableTextQuery(\n", " text=query,\n", " k_nearest_neighbors=3,\n", " fields=\"vector\",\n", " weight=2\n", ")\n", "\n", "# Perform the search\n", "results = search_client.search(\n", " search_text=query,\n", " vector_queries=[vector_query],\n", " top=3\n", ")\n", "\n", "# Print the results\n", "for result in results:\n", " print(f\"Title: {result['title']}\")\n", " print(f\"Score: {result['@search.score']}\")\n", " print(\"-\" * 50) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Perform RAG using your data using Azure AI Search and Azure OpenAI Studio\n", "This code leverages Azure OpenAI Studio's Chat On Your Data feature using the OpenAI client and grounding your previously created Azure AI Search vector index as your retriever to get a a generative response from GPT-4." ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "AI Assistant (GPT-4): The Northwind Standard plan covers ambulance services for emergency transport to the nearest\n", "hospital or medical facility, available 24/7, up to the plan's limit. This coverage applies\n", "regardless of whether the ambulance provider is in-network or out-of-network. However, it does not\n", "cover non-emergency transport, such as transport to a doctor's office or for routine medical care\n", "[doc1]. For Northwind Health Plus members, emergency ambulance services are covered without any\n", "out-of-pocket expenses. Additionally, Northwind Health Plus also covers non-emergency ambulance\n", "services when they are deemed medically necessary by a primary care physician. However, if using an\n", "out-of-network ambulance for non-emergency services, members may be responsible for the entire cost\n", "[doc3]. It's important for members to provide their insurance information at the time of transport\n", "and to check with Northwind Health regarding their cost-sharing responsibilities, especially when\n", "transported by an out-of-network provider [doc1][doc2]. Members should also keep all receipts and\n", "paperwork related to ambulance services for claim filing [doc2]. In cases of traveling outside the\n", "United States or in life-threatening situations where the nearest hospital must be accessed,\n", "Northwind Health Plus may cover a portion of the cost even for non-participating providers [doc5].\n", "Members are advised to check with Northwind Health Plus before visiting a non-in-network provider\n", "and to review their Explanation of Benefits (EOB) post-visit to ensure proper coverage [doc5].\n" ] } ], "source": [ "client = openai.AzureOpenAI(\n", " azure_endpoint=AZURE_OPENAI_ENDPOINT,\n", " api_key=AZURE_OPENAI_API_KEY,\n", " api_version=\"2024-02-01\",\n", ")\n", "\n", "completion = client.chat.completions.create(\n", " model=AZURE_OPENI_CHAT_COMPLETION_DEPLOYED_MODEL_NAME,\n", " messages=[\n", " {\n", " \"role\": \"user\",\n", " \"content\": \"Coverage and exclusions for emergency and non-emergency medical transport under Northwind health plans\",\n", " },\n", " ],\n", " extra_body={\n", " \"data_sources\": [\n", " {\n", " \"type\": \"azure_search\",\n", " \"parameters\": {\n", " \"endpoint\": SEARCH_SERVICE_ENDPOINT,\n", " \"index_name\": \"fsunavala-aws-openai\",\n", " \"authentication\": {\n", " \"type\": \"api_key\",\n", " \"key\": SEARCH_SERVICE_API_KEY,\n", " },\n", " \"query_type\": \"vector_semantic_hybrid\",\n", " \"embedding_dependency\": {\n", " \"type\": \"deployment_name\",\n", " \"deployment_name\": AZURE_OPENAI_EMBEDDING_DEPLOYED_MODEL_NAME,\n", " },\n", " \"semantic_configuration\": \"mySemanticConfig\",\n", " },\n", " }\n", " ],\n", " },\n", ")\n", "\n", "import textwrap\n", "if completion.choices:\n", " message_content = completion.choices[0].message.content\n", " wrapped_message_content = textwrap.fill(message_content, width=100)\n", " print(f\"AI Assistant (GPT-4): {wrapped_message_content}\")" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9" } }, "nbformat": 4, "nbformat_minor": 2 }