# Azure AI Search vectorization using sentence-transformers

This code demonstrates how to use Azure AI Search with a Hugging Face embedding model, [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2), and the Azure AI Search Documents Python SDK.

It uses `azd` and a bicep template for all deployment steps so that you can focus on queries.

## Prerequisites

+ Follow the instructions in the [readme](./readme.md) to deploy all Azure resources, and to create and load the search index.

+ Check your search service to make sure the index exists. If you don't see an index, revisit the readme and run the `setup_search_service` script.

+ Don't add an `.env` file to this folder. Environment variables are read from the `azd` deployment.

+ Install the packages necessary for running the queries in this notebook. 

In [1]:
! pip install azure-search-documents==11.6.0b3 --quiet
! pip install python-dotenv azure-identity --quiet

In [7]:
# Load all environment variables from the azd deployment
import subprocess
from io import StringIO
from dotenv import load_dotenv
result = subprocess.run(["azd", "env", "get-values"], stdout=subprocess.PIPE)
load_dotenv(stream=StringIO(result.stdout.decode("utf-8")))

True

In [8]:
import os
search_url = f"https://{os.environ['AZURE_SEARCH_SERVICE']}.search.windows.net"

## Perform a vector similarity search

This example shows a pure vector search using the vectorizable text query, all you need to do is pass in text and your vectorizer will handle the query vectorization.

In [None]:
from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizableTextQuery
from azure.identity import DefaultAzureCredential
# Pure Vector Search
query = "What's a performance review?"  
  
search_client = SearchClient(search_url, "custom-embedding-index", credential=DefaultAzureCredential())
vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=50, fields="vector", exhaustive=True)
# Use the below query to pass in the raw vector query instead of the query vectorization
# vector_query = RawVectorQuery(vector=generate_embeddings(query), k_nearest_neighbors=3, fields="vector")
  
results = search_client.search(  
    search_text=None,  
    vector_queries= [vector_query],
    select=["parent_id", "chunk_id", "chunk"],
    top=1
)  
  
for result in results:  
    print(f"parent_id: {result['parent_id']}")  
    print(f"Score: {result['@search.score']}")  
    print(f"Content: {result['chunk']}")  


## Perform a hybrid search

In [None]:
# Hybrid Search
query = "What's a performance review?"  
  
vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=50, fields="vector", exhaustive=True)
  
results = search_client.search(  
    search_text=query,  
    vector_queries= [vector_query],
    select=["parent_id", "chunk_id", "chunk"],
    top=1
)  
  
for result in results:  
    print(f"parent_id: {result['parent_id']}")  
    print(f"chunk_id: {result['chunk_id']}")  
    print(f"Score: {result['@search.score']}")  
    print(f"Content: {result['chunk']}")  


## Perform a hybrid search + Semantic reranking

In [None]:
from azure.search.documents.models import QueryType, QueryCaptionType, QueryAnswerType

# Semantic Hybrid Search
query = "What's a performance review?"

vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=50, fields="vector", exhaustive=True)

results = search_client.search(  
    search_text=query,
    vector_queries=[vector_query],
    select=["parent_id", "chunk_id", "chunk"],
    query_type=QueryType.SEMANTIC,  semantic_configuration_name='my-semantic-config', query_caption=QueryCaptionType.EXTRACTIVE, query_answer=QueryAnswerType.EXTRACTIVE,
    top=2
)

semantic_answers = results.get_answers()
for answer in semantic_answers:
    if answer.highlights:
        print(f"Semantic Answer: {answer.highlights}")
    else:
        print(f"Semantic Answer: {answer.text}")
    print(f"Semantic Answer Score: {answer.score}\n")

for result in results:
    print(f"parent_id: {result['parent_id']}")  
    print(f"chunk_id: {result['chunk_id']}")  
    print(f"Score: {result['@search.score']}")  
    print(f"Content: {result['chunk']}")  

    captions = result["@search.captions"]
    if captions:
        caption = captions[0]
        if caption.highlights:
            print(f"Caption: {caption.highlights}\n")
        else:
            print(f"Caption: {caption.text}\n")
