demo-python/code/custom-vectorizer/azure-search-custom-vectorization-sample.ipynb (213 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Azure AI Search vectorization using sentence-transformers\n",
"\n",
"This code demonstrates how to use Azure AI Search with a Hugging Face embedding model, [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2), and the Azure AI Search Documents Python SDK.\n",
"\n",
"It uses `azd` and a bicep template for all deployment steps so that you can focus on queries.\n",
"\n",
"## Prerequisites\n",
"\n",
"+ Follow the instructions in the [readme](./readme.md) to deploy all Azure resources, and to create and load the search index.\n",
"\n",
"+ Check your search service to make sure the index exists. If you don't see an index, revisit the readme and run the `setup_search_service` script.\n",
"\n",
"+ Don't add an `.env` file to this folder. Environment variables are read from the `azd` deployment.\n",
"\n",
"+ Install the packages necessary for running the queries in this notebook. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"! pip install azure-search-documents==11.6.0b3 --quiet\n",
"! pip install python-dotenv azure-identity --quiet"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Load all environment variables from the azd deployment\n",
"import subprocess\n",
"from io import StringIO\n",
"from dotenv import load_dotenv\n",
"result = subprocess.run([\"azd\", \"env\", \"get-values\"], stdout=subprocess.PIPE)\n",
"load_dotenv(stream=StringIO(result.stdout.decode(\"utf-8\")))"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"search_url = f\"https://{os.environ['AZURE_SEARCH_SERVICE']}.search.windows.net\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Perform a vector similarity search\n",
"\n",
"This example shows a pure vector search using the vectorizable text query, all you need to do is pass in text and your vectorizer will handle the query vectorization."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azure.search.documents import SearchClient\n",
"from azure.search.documents.models import VectorizableTextQuery\n",
"from azure.identity import DefaultAzureCredential\n",
"# Pure Vector Search\n",
"query = \"What's a performance review?\" \n",
" \n",
"search_client = SearchClient(search_url, \"custom-embedding-index\", credential=DefaultAzureCredential())\n",
"vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=50, fields=\"vector\", exhaustive=True)\n",
"# Use the below query to pass in the raw vector query instead of the query vectorization\n",
"# vector_query = RawVectorQuery(vector=generate_embeddings(query), k_nearest_neighbors=3, fields=\"vector\")\n",
" \n",
"results = search_client.search( \n",
" search_text=None, \n",
" vector_queries= [vector_query],\n",
" select=[\"parent_id\", \"chunk_id\", \"chunk\"],\n",
" top=1\n",
") \n",
" \n",
"for result in results: \n",
" print(f\"parent_id: {result['parent_id']}\") \n",
" print(f\"Score: {result['@search.score']}\") \n",
" print(f\"Content: {result['chunk']}\") \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Perform a hybrid search"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Hybrid Search\n",
"query = \"What's a performance review?\" \n",
" \n",
"vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=50, fields=\"vector\", exhaustive=True)\n",
" \n",
"results = search_client.search( \n",
" search_text=query, \n",
" vector_queries= [vector_query],\n",
" select=[\"parent_id\", \"chunk_id\", \"chunk\"],\n",
" top=1\n",
") \n",
" \n",
"for result in results: \n",
" print(f\"parent_id: {result['parent_id']}\") \n",
" print(f\"chunk_id: {result['chunk_id']}\") \n",
" print(f\"Score: {result['@search.score']}\") \n",
" print(f\"Content: {result['chunk']}\") \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Perform a hybrid search + Semantic reranking"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from azure.search.documents.models import QueryType, QueryCaptionType, QueryAnswerType\n",
"\n",
"# Semantic Hybrid Search\n",
"query = \"What's a performance review?\"\n",
"\n",
"vector_query = VectorizableTextQuery(text=query, k_nearest_neighbors=50, fields=\"vector\", exhaustive=True)\n",
"\n",
"results = search_client.search( \n",
" search_text=query,\n",
" vector_queries=[vector_query],\n",
" select=[\"parent_id\", \"chunk_id\", \"chunk\"],\n",
" query_type=QueryType.SEMANTIC, semantic_configuration_name='my-semantic-config', query_caption=QueryCaptionType.EXTRACTIVE, query_answer=QueryAnswerType.EXTRACTIVE,\n",
" top=2\n",
")\n",
"\n",
"semantic_answers = results.get_answers()\n",
"for answer in semantic_answers:\n",
" if answer.highlights:\n",
" print(f\"Semantic Answer: {answer.highlights}\")\n",
" else:\n",
" print(f\"Semantic Answer: {answer.text}\")\n",
" print(f\"Semantic Answer Score: {answer.score}\\n\")\n",
"\n",
"for result in results:\n",
" print(f\"parent_id: {result['parent_id']}\") \n",
" print(f\"chunk_id: {result['chunk_id']}\") \n",
" print(f\"Score: {result['@search.score']}\") \n",
" print(f\"Content: {result['chunk']}\") \n",
"\n",
" captions = result[\"@search.captions\"]\n",
" if captions:\n",
" caption = captions[0]\n",
" if caption.highlights:\n",
" print(f\"Caption: {caption.highlights}\\n\")\n",
" else:\n",
" print(f\"Caption: {caption.text}\\n\")\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}