# Azure AI Search integrated vectorization sample

This Python notebook demonstrates the [integrated vectorization](https://learn.microsoft.com/azure/search/vector-search-integrated-vectorization) features of Azure AI Search that are currently in public preview. 

Integrated vectorization takes a dependency on indexers and skillsets, using the Text Split skill for data chunking, and the AzureOpenAIEmbedding skill and your Azure OpenAI resorce for embedding.

This example uses PDFs from the `data/documents` folder for chunking, embedding, indexing, and queries.

### Prerequisites

+ An Azure subscription, with [access to Azure OpenAI](https://aka.ms/oai/access).
 
+ Azure AI Search, any tier, but we recommend Basic or higher for this workload. [Enable semantic ranker](https://learn.microsoft.com/azure/search/semantic-how-to-enable-disable) if you want to run a hybrid query with semantic ranking.

+ A deployment of the `text-embedding-ada-002` model on Azure OpenAI.

+ Azure Blob Storage. This notebook connects to your storage account and loads a container with the sample PDFs.


### Set up a Python virtual environment in Visual Studio Code

1. Open the Command Palette (Ctrl+Shift+P).
1. Search for **Python: Create Environment**.
1. Select **Venv**.
1. Select a Python interpreter. Choose 3.10 or later.

It can take a minute to set up. If you run into problems, see [Python environments in VS Code](https://code.visualstudio.com/docs/python/environments).

### Install packages

In [1]:
! pip install -r azure-search-integrated-vectorization-sample-requirements.txt --quiet

### Load .env file (Copy .env-sample to .env and update accordingly)

Optionally, you can test the following features of integrated vectorization using this notebook by setting the appropriate environment variables below:

1. [OCR](https://learn.microsoft.com/en-us/azure/search/cognitive-search-skill-ocr) every page using the built-in OCR functionality. This allows you to add page numbers for every chunk that is extracted. It requires an [AI Services account](https://learn.microsoft.com/en-us/azure/search/cognitive-search-attach-cognitive-services)
   1. Set `USE_OCR` to true and specify `AZURE_AI_SERVICES_KEY` if using key-based authentication, and specify `AZURE_AI_SERVICES_ENDPOINT`.
1. Use the [Document Layout Skill](https://learn.microsoft.com/en-us/azure/search/cognitive-search-skill-document-intelligence-layout) to convert PDFs and other compatible documents to markdown. It requires an [AI Services account](https://learn.microsoft.com/en-us/azure/search/cognitive-search-attach-cognitive-services) and a search service in a [supported region](https://learn.microsoft.com/en-us/azure/search/cognitive-search-attach-cognitive-services)
   1. Set `USE_LAYOUT` to true and specify `AZURE_AI_SERVICES_KEY` if using key-based authentication, and specify `AZURE_AI_SERVICES_ENDPOINT`.


In [14]:
from dotenv import load_dotenv
from azure.identity import DefaultAzureCredential
from azure.core.credentials import AzureKeyCredential
import os

load_dotenv(override=True) # take environment variables from .env.

# Variables not used here do not need to be updated in your .env file
endpoint = os.environ["AZURE_SEARCH_SERVICE_ENDPOINT"]
credential = AzureKeyCredential(os.getenv("AZURE_SEARCH_ADMIN_KEY")) if os.getenv("AZURE_SEARCH_ADMIN_KEY") else DefaultAzureCredential()
index_name = os.getenv("AZURE_SEARCH_INDEX", "int-vec")
blob_connection_string = os.environ["BLOB_CONNECTION_STRING"]
# search blob datasource connection string is optional - defaults to blob connection string
# This field is only necessary if you are using MI to connect to the data source
# https://learn.microsoft.com/azure/search/search-howto-indexing-azure-blob-storage#supported-credentials-and-connection-strings
search_blob_connection_string = os.getenv("SEARCH_BLOB_DATASOURCE_CONNECTION_STRING", blob_connection_string)
blob_container_name = os.getenv("BLOB_CONTAINER_NAME", "int-vec")
azure_openai_endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
azure_openai_key = os.getenv("AZURE_OPENAI_KEY")
azure_openai_embedding_deployment = os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT", "text-embedding-3-large")
azure_openai_model_name = os.getenv("AZURE_OPENAI_EMBEDDING_MODEL_NAME", "text-embedding-3-large")
azure_openai_model_dimensions = int(os.getenv("AZURE_OPENAI_EMBEDDING_DIMENSIONS", 1024))
# This field is only necessary if you want to use OCR to scan PDFs in the datasource or use the Document Layout skill without a key
azure_ai_services_endpoint = os.getenv("AZURE_AI_SERVICES_ENDPOINT", "")
# This field is only necessary if you want to use OCR to scan PDFs in the data source or use the Document Layout skill and you want to authenticate using a key to Azure AI Services
azure_ai_services_key = os.getenv("AZURE_AI_SERVICES_KEY", "")

# set USE_OCR to enable OCR to add page numbers. It cannot be combined with the document layout skill
use_ocr = os.getenv("USE_OCR", "false") == "true"
# set USE_LAYOUT to enable Document Intelligence Layout skill for chunking by markdown. It cannot be combined with the built-in OCR
use_document_layout = os.getenv("USE_LAYOUT", "false") == "true"
# Deepest nesting level in markdown that should be considered. See https://learn.microsoft.com/azure/search/cognitive-search-skill-document-intelligence-layout to learn more
document_layout_depth = os.getenv("LAYOUT_MARKDOWN_HEADER_DEPTH", "h3")
# OCR must be used to add page numbers
add_page_numbers = use_ocr

if use_ocr and use_document_layout:
    raise Exception("You can only specify one of USE_OCR or USE_LAYOUT")

## Connect to Blob Storage and load documents

Retrieve documents from Blob Storage. You can use the sample documents in the data/documents folder.  

In [5]:
from azure.storage.blob import BlobServiceClient  
import glob

def upload_sample_documents(
        blob_connection_string: str,
        blob_container_name: str,
        documents_directory: str,
        # Set to false if you want to use credentials included in the blob connection string
        # Otherwise your identity will be used as credentials
        use_user_identity: bool = True,
    ):
        # Connect to Blob Storage
        blob_service_client = BlobServiceClient.from_connection_string(logging_enable=True, conn_str=blob_connection_string, credential=DefaultAzureCredential() if use_user_identity else None)
        container_client = blob_service_client.get_container_client(blob_container_name)
        if not container_client.exists():
            container_client.create_container()

        pdf_files = glob.glob(os.path.join(documents_directory, '*.pdf'))
        for file in pdf_files:
            with open(file, "rb") as data:
                name = os.path.basename(file)
                if not container_client.get_blob_client(name).exists():
                    container_client.upload_blob(name=name, data=data)

def upload_documents():
    upload_sample_documents(
        blob_connection_string=blob_connection_string,
        blob_container_name=blob_container_name,
        documents_directory=os.path.join("..", "..", "..", "data", "benefitdocs")
    )

def upload_documents_with_ocr():
    upload_sample_documents(
        blob_connection_string=blob_connection_string,
        blob_container_name=blob_container_name,
        documents_directory = os.path.join("..", "..", "..", "data", "ocrdocuments")
    )

def upload_documents_with_layout():
    upload_sample_documents(
        blob_connection_string=blob_connection_string,
        blob_container_name=blob_container_name,
        documents_directory = os.path.join("..", "..", "..", "data", "layoutdocuments")
    )

if use_ocr:
    upload_documents_with_ocr()
elif use_document_layout:
    upload_documents_with_layout()
else:
    upload_documents()

print(f"Setup sample data in {blob_container_name}")

Setup sample data in demo-container


## Create a blob data source connector on Azure AI Search

In [15]:
from azure.search.documents.indexes import SearchIndexerClient
from azure.search.documents.indexes.models import (
    SearchIndexerDataContainer,
    SearchIndexerDataSourceConnection
)
from azure.search.documents.indexes.models import NativeBlobSoftDeleteDeletionDetectionPolicy

# Create a data source 
indexer_client = SearchIndexerClient(endpoint, credential)
container = SearchIndexerDataContainer(name=blob_container_name)
data_source_connection = SearchIndexerDataSourceConnection(
    name=f"{index_name}-blob",
    type="azureblob",
    connection_string=search_blob_connection_string,
    container=container,
    data_deletion_detection_policy=NativeBlobSoftDeleteDeletionDetectionPolicy()
)
data_source = indexer_client.create_or_update_data_source_connection(data_source_connection)

print(f"Data source '{data_source.name}' created or updated")

Data source 'my-demo-index-blob' created or updated


## Create a search index

Vector and nonvector content is stored in a search index.

In [16]:
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    SearchField,
    SearchFieldDataType,
    VectorSearch,
    HnswAlgorithmConfiguration,
    VectorSearchProfile,
    AzureOpenAIVectorizer,
    AzureOpenAIVectorizerParameters,
    SemanticConfiguration,
    SemanticSearch,
    SemanticPrioritizedFields,
    SemanticField,
    SearchIndex
)

# Create a search index  
index_client = SearchIndexClient(endpoint=endpoint, credential=credential)  
fields = [  
    SearchField(name="parent_id", type=SearchFieldDataType.String, sortable=True, filterable=True, facetable=True),  
    SearchField(name="title", type=SearchFieldDataType.String),  
    SearchField(name="chunk_id", type=SearchFieldDataType.String, key=True, sortable=True, filterable=True, facetable=True, analyzer_name="keyword"),  
    SearchField(name="chunk", type=SearchFieldDataType.String, sortable=False, filterable=False, facetable=False),  
    SearchField(name="vector", type=SearchFieldDataType.Collection(SearchFieldDataType.Single), vector_search_dimensions=azure_openai_model_dimensions, vector_search_profile_name="myHnswProfile"),  
]

if add_page_numbers:
    fields.append(
        SearchField(name="page_number", type=SearchFieldDataType.String, sortable=True, filterable=True, facetable=False)
    )

if use_document_layout:
    fields.extend([
        SearchField(name="header_1", type=SearchFieldDataType.String, sortable=False, filterable=False, facetable=False),
        SearchField(name="header_2", type=SearchFieldDataType.String, sortable=False, filterable=False, facetable=False),
        SearchField(name="header_3", type=SearchFieldDataType.String, sortable=False, filterable=False, facetable=False)
    ])
  
# Configure the vector search configuration  
vector_search = VectorSearch(  
    algorithms=[  
        HnswAlgorithmConfiguration(name="myHnsw"),
    ],  
    profiles=[  
        VectorSearchProfile(  
            name="myHnswProfile",  
            algorithm_configuration_name="myHnsw",  
            vectorizer_name="myOpenAI",  
        )
    ],  
    vectorizers=[  
        AzureOpenAIVectorizer(  
            vectorizer_name="myOpenAI",  
            kind="azureOpenAI",  
            parameters=AzureOpenAIVectorizerParameters(  
                resource_url=azure_openai_endpoint,  
                deployment_name=azure_openai_embedding_deployment,
                model_name=azure_openai_model_name,
                api_key=azure_openai_key,
            ),
        ),  
    ],  
)  
  
semantic_config = SemanticConfiguration(  
    name="my-semantic-config",  
    prioritized_fields=SemanticPrioritizedFields(  
        content_fields=[SemanticField(field_name="chunk")],
        title_field=SemanticField(field_name="title")
    ),  
)
  
# Create the semantic search with the configuration  
semantic_search = SemanticSearch(configurations=[semantic_config])  
  
# Create the search index
index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search, semantic_search=semantic_search)  
result = index_client.create_or_update_index(index)  
print(f"{result.name} created")  


my-demo-index created


## Create a skillset

Skills drive integrated vectorization. [Text Split](https://learn.microsoft.com/azure/search/cognitive-search-skill-textsplit) provides data chunking. [AzureOpenAIEmbedding](https://learn.microsoft.com/azure/search/cognitive-search-skill-azure-openai-embedding) handles calls to Azure OpenAI, using the connection information you provide in the environment variables. An [indexer projection](https://learn.microsoft.com/azure/search/index-projections-concept-intro) specifies secondary indexes used for chunked data.

In [17]:
from azure.search.documents.indexes.models import (
    SplitSkill,
    InputFieldMappingEntry,
    OutputFieldMappingEntry,
    AzureOpenAIEmbeddingSkill,
    OcrSkill,
    SearchIndexerIndexProjection,
    SearchIndexerIndexProjectionSelector,
    SearchIndexerIndexProjectionsParameters,
    IndexProjectionMode,
    SearchIndexerSkillset,
    AIServicesAccountKey,
    AIServicesAccountIdentity,
    DocumentIntelligenceLayoutSkill
)

# Create a skillset name 
skillset_name = f"{index_name}-skillset"

def create_ocr_skillset():
    ocr_skill = OcrSkill(
        description="OCR skill to scan PDFs and other images with text",
        context="/document/normalized_images/*",
        line_ending="Space",
        default_language_code="en",
        should_detect_orientation=True,
        inputs=[
            InputFieldMappingEntry(name="image", source="/document/normalized_images/*")
        ],
        outputs=[
            OutputFieldMappingEntry(name="text", target_name="text"),
            OutputFieldMappingEntry(name="layoutText", target_name="layoutText")
        ]
    )

    split_skill = SplitSkill(  
        description="Split skill to chunk documents",  
        text_split_mode="pages",  
        context="/document/normalized_images/*",  
        maximum_page_length=2000,  
        page_overlap_length=500,  
        inputs=[  
            InputFieldMappingEntry(name="text", source="/document/normalized_images/*/text"),  
        ],  
        outputs=[  
            OutputFieldMappingEntry(name="textItems", target_name="pages")  
        ]
    )

    embedding_skill = AzureOpenAIEmbeddingSkill(  
        description="Skill to generate embeddings via Azure OpenAI",  
        context="/document/normalized_images/*/pages/*",  
        resource_url=azure_openai_endpoint,  
        deployment_name=azure_openai_embedding_deployment,  
        model_name=azure_openai_model_name,
        dimensions=azure_openai_model_dimensions,
        api_key=azure_openai_key,  
        inputs=[  
            InputFieldMappingEntry(name="text", source="/document/normalized_images/*/pages/*"),  
        ],  
        outputs=[
            OutputFieldMappingEntry(name="embedding", target_name="vector")  
        ]
    )

    index_projections = SearchIndexerIndexProjection(  
        selectors=[  
            SearchIndexerIndexProjectionSelector(  
                target_index_name=index_name,  
                parent_key_field_name="parent_id",  
                source_context="/document/normalized_images/*/pages/*",  
                mappings=[
                    InputFieldMappingEntry(name="chunk", source="/document/normalized_images/*/pages/*"),  
                    InputFieldMappingEntry(name="vector", source="/document/normalized_images/*/pages/*/vector"),
                    InputFieldMappingEntry(name="title", source="/document/metadata_storage_name"),
                    InputFieldMappingEntry(name="page_number", source="/document/normalized_images/*/pageNumber")
                ]
            )
        ],  
        parameters=SearchIndexerIndexProjectionsParameters(  
            projection_mode=IndexProjectionMode.SKIP_INDEXING_PARENT_DOCUMENTS  
        )  
    )

    skills = [ocr_skill, split_skill, embedding_skill]

    return SearchIndexerSkillset(  
        name=skillset_name,  
        description="Skillset to chunk documents and generating embeddings",  
        skills=skills,  
        index_projection=index_projections,
        cognitive_services_account=AIServicesAccountKey(key=azure_ai_services_key, subdomain_url=azure_ai_services_endpoint) if azure_ai_services_key else AIServicesAccountIdentity(identity=None, subdomain_url=azure_ai_services_endpoint)
    )

def create_layout_skillset():
    layout_skill = DocumentIntelligenceLayoutSkill(
        description="Layout skill to read documents",
        context="/document",
        output_mode="oneToMany",
        markdown_header_depth="h3",
        inputs=[
            InputFieldMappingEntry(name="file_data", source="/document/file_data")
        ],
        outputs=[
            OutputFieldMappingEntry(name="markdown_document", target_name="markdownDocument")
        ]
    )

    split_skill = SplitSkill(  
        description="Split skill to chunk documents",  
        text_split_mode="pages",  
        context="/document/markdownDocument/*",  
        maximum_page_length=2000,  
        page_overlap_length=500,  
        inputs=[  
            InputFieldMappingEntry(name="text", source="/document/markdownDocument/*/content"),  
        ],  
        outputs=[  
            OutputFieldMappingEntry(name="textItems", target_name="pages")  
        ]
    )

    embedding_skill = AzureOpenAIEmbeddingSkill(  
        description="Skill to generate embeddings via Azure OpenAI",  
        context="/document/markdownDocument/*/pages/*",  
        resource_url=azure_openai_endpoint,  
        deployment_name=azure_openai_embedding_deployment, 
        model_name=azure_openai_model_name,
        dimensions=azure_openai_model_dimensions,
        api_key=azure_openai_key,  
        inputs=[  
            InputFieldMappingEntry(name="text", source="/document/markdownDocument/*/pages/*"),  
        ],  
        outputs=[
            OutputFieldMappingEntry(name="embedding", target_name="vector")  
        ]
    )

    index_projections = SearchIndexerIndexProjection(  
        selectors=[  
            SearchIndexerIndexProjectionSelector(  
                target_index_name=index_name,  
                parent_key_field_name="parent_id",  
                source_context="/document/markdownDocument/*/pages/*",  
                mappings=[
                    InputFieldMappingEntry(name="chunk", source="/document/markdownDocument/*/pages/*"),  
                    InputFieldMappingEntry(name="vector", source="/document/markdownDocument/*/pages/*/vector"),
                    InputFieldMappingEntry(name="title", source="/document/metadata_storage_name"),
                    InputFieldMappingEntry(name="header_1", source="/document/markdownDocument/*/sections/h1"),
                    InputFieldMappingEntry(name="header_2", source="/document/markdownDocument/*/sections/h2"),
                    InputFieldMappingEntry(name="header_3", source="/document/markdownDocument/*/sections/h3"),
                ]
            )
        ],  
        parameters=SearchIndexerIndexProjectionsParameters(  
            projection_mode=IndexProjectionMode.SKIP_INDEXING_PARENT_DOCUMENTS  
        )  
    )

    skills = [layout_skill, split_skill, embedding_skill]

    return SearchIndexerSkillset(  
        name=skillset_name,  
        description="Skillset to chunk documents and generating embeddings",  
        skills=skills,  
        index_projection=index_projections,
        cognitive_services_account=AIServicesAccountKey(key=azure_ai_services_key, subdomain_url=azure_ai_services_endpoint) if azure_ai_services_key else AIServicesAccountIdentity(identity=None, subdomain_url=azure_ai_services_endpoint)
    )

def create_skillset():
    split_skill = SplitSkill(  
        description="Split skill to chunk documents",  
        text_split_mode="pages",  
        context="/document",  
        maximum_page_length=2000,  
        page_overlap_length=500,  
        inputs=[  
            InputFieldMappingEntry(name="text", source="/document/content"),  
        ],  
        outputs=[  
            OutputFieldMappingEntry(name="textItems", target_name="pages")  
        ]
    )

    embedding_skill = AzureOpenAIEmbeddingSkill(  
        description="Skill to generate embeddings via Azure OpenAI",  
        context="/document/pages/*",  
        resource_url=azure_openai_endpoint,  
        deployment_name=azure_openai_embedding_deployment,  
        model_name=azure_openai_model_name,
        dimensions=azure_openai_model_dimensions,
        api_key=azure_openai_key,  
        inputs=[  
            InputFieldMappingEntry(name="text", source="/document/pages/*"),  
        ],  
        outputs=[
            OutputFieldMappingEntry(name="embedding", target_name="vector")  
        ]
    )

    index_projections = SearchIndexerIndexProjection(  
        selectors=[  
            SearchIndexerIndexProjectionSelector(  
                target_index_name=index_name,  
                parent_key_field_name="parent_id",  
                source_context="/document/pages/*",  
                mappings=[
                    InputFieldMappingEntry(name="chunk", source="/document/pages/*"),  
                    InputFieldMappingEntry(name="vector", source="/document/pages/*/vector"),
                    InputFieldMappingEntry(name="title", source="/document/metadata_storage_name")
                ]
            )
        ],  
        parameters=SearchIndexerIndexProjectionsParameters(  
            projection_mode=IndexProjectionMode.SKIP_INDEXING_PARENT_DOCUMENTS  
        )  
    )

    skills = [split_skill, embedding_skill]

    return SearchIndexerSkillset(  
        name=skillset_name,  
        description="Skillset to chunk documents and generating embeddings",  
        skills=skills,  
        index_projection=index_projections
    )

skillset = create_ocr_skillset() if use_ocr else create_layout_skillset() if use_document_layout else create_skillset()
  
client = SearchIndexerClient(endpoint, credential)  
client.create_or_update_skillset(skillset)  
print(f"{skillset.name} created")  


my-demo-index-skillset created


## Create an indexer

In [18]:
from azure.search.documents.indexes.models import (
    SearchIndexer,
    IndexingParameters,
    IndexingParametersConfiguration,
    BlobIndexerImageAction
)

# Create an indexer  
indexer_name = f"{index_name}-indexer"  

indexer_parameters = None
if use_ocr:
    indexer_parameters = IndexingParameters(
        configuration=IndexingParametersConfiguration(
            image_action=BlobIndexerImageAction.GENERATE_NORMALIZED_IMAGE_PER_PAGE,
            query_timeout=None))
elif use_document_layout:
    indexer_parameters = IndexingParameters(
        configuration=IndexingParametersConfiguration(
            allow_skillset_to_read_file_data=True,
            query_timeout=None))

indexer = SearchIndexer(  
    name=indexer_name,  
    description="Indexer to index documents and generate embeddings",  
    skillset_name=skillset_name,  
    target_index_name=index_name,  
    data_source_name=data_source.name,
    parameters=indexer_parameters
)  

indexer_client = SearchIndexerClient(endpoint, credential)  
indexer_result = indexer_client.create_or_update_indexer(indexer)  
  
# Run the indexer  
indexer_client.run_indexer(indexer_name)  
print(f' {indexer_name} is created and running. If queries return no results, please wait a bit and try again.')  


 my-demo-index-indexer is created and running. If queries return no results, please wait a bit and try again.


## Perform a vector similarity search

This example shows a pure vector search using the vectorizable text query, all you need to do is pass in text and your vectorizer will handle the query vectorization.

If you indexed the health plan PDF file, send queries that ask plan-related questions.

In [19]:
from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizableTextQuery

# Pure Vector Search
query = "Which is more comprehensive, Northwind Health Plus vs Northwind Standard?"
if use_ocr:
    query = "Who is the national director?"
if use_document_layout:
    query = "What is contoso?"
  
search_client = SearchClient(endpoint, index_name, credential=credential)
vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=1, fields="vector", exhaustive=True)
# Use the below query to pass in the raw vector query instead of the query vectorization
# vector_query = RawVectorQuery(vector=generate_embeddings(query), k_nearest_neighbors=3, fields="vector")
  
results = search_client.search(  
    search_text=None,  
    vector_queries= [vector_query],
    top=1
)  
  
for result in results:  
    print(f"parent_id: {result['parent_id']}")  
    print(f"chunk_id: {result['chunk_id']}")  
    if add_page_numbers:
        print(f"page_number: {result['page_number']}")
    print(f"Score: {result['@search.score']}")  
    print(f"Content: {result['chunk']}")   


parent_id: aHR0cHM6Ly9oZWlkaXN0c3RvcmFnZWRlbW9lYXN0dXMuYmxvYi5jb3JlLndpbmRvd3MubmV0L2RlbW8tY29udGFpbmVyL0JlbmVmaXRfT3B0aW9ucy5wZGY1
chunk_id: def295033b1d_aHR0cHM6Ly9oZWlkaXN0c3RvcmFnZWRlbW9lYXN0dXMuYmxvYi5jb3JlLndpbmRvd3MubmV0L2RlbW8tY29udGFpbmVyL0JlbmVmaXRfT3B0aW9ucy5wZGY1_pages_1
Score: 0.80918294
Content: a variety of in-network providers, including primary care 
physicians, specialists, hospitals, and pharmacies. This plan does not offer coverage for emergency 
services, mental health and substance abuse coverage, or out-of-network services.

Comparison of Plans 
Both plans offer coverage for routine physicals, well-child visits, immunizations, and other preventive 
care services. The plans also cover preventive care services such as mammograms, colonoscopies, and 
other cancer screenings. 

Northwind Health Plus offers more comprehensive coverage than Northwind Standard. This plan offers 
coverage for emergency services, both in-network and out-of-network, as well as mental healt

## Perform a hybrid search

In [20]:
# Hybrid Search
query = "Which is more comprehensive, Northwind Health Plus vs Northwind Standard?"  
if use_ocr:
    query = "Who is the national director?"
if use_document_layout:
    query = "What is contoso?"

search_client = SearchClient(endpoint, index_name, credential=credential)
vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=1, fields="vector", exhaustive=True)
  
results = search_client.search(  
    search_text=query,  
    vector_queries= [vector_query],
    select=["parent_id", "chunk_id", "chunk"],
    top=1
)  
  
for result in results:  
    print(f"parent_id: {result['parent_id']}")  
    print(f"chunk_id: {result['chunk_id']}")  
    print(f"Score: {result['@search.score']}")  
    print(f"Content: {result['chunk']}")  


parent_id: aHR0cHM6Ly9oZWlkaXN0c3RvcmFnZWRlbW9lYXN0dXMuYmxvYi5jb3JlLndpbmRvd3MubmV0L2RlbW8tY29udGFpbmVyL0JlbmVmaXRfT3B0aW9ucy5wZGY1
chunk_id: def295033b1d_aHR0cHM6Ly9oZWlkaXN0c3RvcmFnZWRlbW9lYXN0dXMuYmxvYi5jb3JlLndpbmRvd3MubmV0L2RlbW8tY29udGFpbmVyL0JlbmVmaXRfT3B0aW9ucy5wZGY1_pages_1
Score: 0.03333333507180214
Content: a variety of in-network providers, including primary care 
physicians, specialists, hospitals, and pharmacies. This plan does not offer coverage for emergency 
services, mental health and substance abuse coverage, or out-of-network services.

Comparison of Plans 
Both plans offer coverage for routine physicals, well-child visits, immunizations, and other preventive 
care services. The plans also cover preventive care services such as mammograms, colonoscopies, and 
other cancer screenings. 

Northwind Health Plus offers more comprehensive coverage than Northwind Standard. This plan offers 
coverage for emergency services, both in-network and out-of-network, as well as men

## Perform a hybrid search + semantic reranking

In [21]:
from azure.search.documents.models import (
    QueryType,
    QueryCaptionType,
    QueryAnswerType
)
# Semantic Hybrid Search
query = "Which is more comprehensive, Northwind Health Plus vs Northwind Standard?"
if use_ocr:
    query = "Who is the national director?"
if use_document_layout:
    query = "What is contoso?"

search_client = SearchClient(endpoint, index_name, credential)
vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=1, fields="vector", exhaustive=True)

results = search_client.search(  
    search_text=query,
    vector_queries=[vector_query],
    select=["parent_id", "chunk_id", "chunk"],
    query_type=QueryType.SEMANTIC,
    semantic_configuration_name='my-semantic-config',
    query_caption=QueryCaptionType.EXTRACTIVE,
    query_answer=QueryAnswerType.EXTRACTIVE,
    top=1
)

semantic_answers = results.get_answers()
if semantic_answers:
    for answer in semantic_answers:
        if answer.highlights:
            print(f"Semantic Answer: {answer.highlights}")
        else:
            print(f"Semantic Answer: {answer.text}")
        print(f"Semantic Answer Score: {answer.score}\n")

for result in results:
    print(f"parent_id: {result['parent_id']}")  
    print(f"chunk_id: {result['chunk_id']}")  
    print(f"Reranker Score: {result['@search.reranker_score']}")
    print(f"Content: {result['chunk']}")  

    captions = result["@search.captions"]
    if captions:
        caption = captions[0]
        if caption.highlights:
            print(f"Caption: {caption.highlights}\n")
        else:
            print(f"Caption: {caption.text}\n")


Semantic Answer: <em>Northwind Health Plus </em>is a<em> comprehensive </em>plan that<em> offers more coverage than Northwind Standard.</em> The table below shows a  cost comparison between the different health plans offered by Contoso Electronics:  Next Steps  We hope that this information has been helpful in understanding the differences between Northwind  Health Plus and Northwind Stan...
Semantic Answer Score: 0.9670000076293945

parent_id: aHR0cHM6Ly9oZWlkaXN0c3RvcmFnZWRlbW9lYXN0dXMuYmxvYi5jb3JlLndpbmRvd3MubmV0L2RlbW8tY29udGFpbmVyL0JlbmVmaXRfT3B0aW9ucy5wZGY1
chunk_id: def295033b1d_aHR0cHM6Ly9oZWlkaXN0c3RvcmFnZWRlbW9lYXN0dXMuYmxvYi5jb3JlLndpbmRvd3MubmV0L2RlbW8tY29udGFpbmVyL0JlbmVmaXRfT3B0aW9ucy5wZGY1_pages_1
Reranker Score: 3.378175973892212
Content: a variety of in-network providers, including primary care 
physicians, specialists, hospitals, and pharmacies. This plan does not offer coverage for emergency 
services, mental health and substance abuse coverage, or out-of-network ser