demo-python/code/conference-demos/ignite-2024/azure-ai-search-cohere-embed-v3.ipynb (596 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"redirect_from": [
"https://github.com/azure/azure-search-vector-samples/blob/main/demo-python/code/e2e-demos/azure-ai-search-cohere-embed-v3.ipynb"
]
},
"source": [
"# Azure AI Search with Cohere Embed V3 Int8 and Binary Embeddings Support\n",
"\n",
"In this notebook, we'll demonstrate how to leverage `Cohere Embed V3` to generate int8 and binary embeddings, significantly reducing memory costs while maintaining high search quality. We'll integrate these embeddings with Azure AI Search and perform Retrieval-Augmented Generation (RAG) using CommandR+ in Azure AI Studio.\n",
"\n",
"**This unofficial code sample was created for an Ignite 2024 demo. It's offered \"as-is\" and might not work for all customers and all scenarios.**\n",
"\n",
"## Benefits\n",
"- **Memory Efficiency**: int8 embeddings reduce memory usage by 4x, while binary embeddings offer a 32x reduction.\n",
"- **Cost Reduction**: Lower memory requirements lead to substantial cost savings.\n",
"- **Speed**: Faster search performance, up to 30% speed-up with int8 and 40x faster with binary embeddings.\n",
"- **Quality**: Maintain 99.99% of original search quality with int8 and up to 98% with binary embeddings.\n",
"\n",
"## Prerequisites\n",
"- 🐍 Python 3.9 or higher\n",
"- 🔗 [Azure AI Search Service](https://learn.microsoft.com/azure/search/)\n",
"- 🔗 [Cohere Embed V3](https://learn.microsoft.com/azure/ai-studio/how-to/deploy-models-cohere-embed) via [Azure AI Studio Model Inference API](https://learn.microsoft.com/azure/ai-studio/reference/reference-model-inference-api) or [Cohere Embed v3 API](https://learn.microsoft.com/azure/ai-studio/how-to/deploy-models-cohere-embed#cohere-embed-v3)\n",
"- 🔗 [Command R+](https://learn.microsoft.com/azure/ai-studio/how-to/deploy-models-cohere-command) via [Azure AI Studio Model Inference API](https://learn.microsoft.com/azure/ai-studio/reference/reference-model-inference-api) or [Cohere Chat API](https://learn.microsoft.com/azure/ai-studio/how-to/deploy-models-cohere-command#cohere-chat-api)\n",
"## Features Covered\n",
"- ✅ Int8 Embeddings\n",
"- ✅ Binary Embeddings\n",
"- ✅ Vectorizer\n",
"- ✅ RAG using Azure AI Search as a retriever grounding data to Cohere's CommandR+ LLM\n",
"\n",
"Let's get started!\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install required libraries"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"! pip install --quiet azure-search-documents==11.6.0b4\n",
"! pip install --quiet cohere python-dotenv azure-identity"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import cohere\n",
"import numpy as np\n",
"import os\n",
"from azure.core.credentials import AzureKeyCredential\n",
"from azure.identity import DefaultAzureCredential\n",
"from azure.search.documents import SearchClient\n",
"from azure.search.documents.indexes import SearchIndexClient\n",
"from azure.search.documents.indexes.models import (\n",
" AIStudioModelCatalogName,\n",
" AzureMachineLearningParameters,\n",
" AzureMachineLearningVectorizer,\n",
" HnswAlgorithmConfiguration,\n",
" HnswParameters,\n",
" SearchField,\n",
" SearchFieldDataType,\n",
" SearchIndex,\n",
" SearchableField,\n",
" SimpleField,\n",
" VectorEncodingFormat,\n",
" VectorSearch,\n",
" VectorSearchAlgorithmKind,\n",
" VectorSearchAlgorithmMetric,\n",
" VectorSearchProfile\n",
")\n",
"from azure.search.documents.models import (\n",
" VectorizableTextQuery,\n",
" VectorizedQuery\n",
")\n",
"from dotenv import load_dotenv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Set Up Cohere and Azure Credentials\n",
"Before generating embeddings or interacting with Azure AI Search, we need to set up our credentials for both Cohere and Azure AI Search."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# Load environment variables\n",
"load_dotenv()\n",
"\n",
"# Azure AI Studio Cohere Configuration\n",
"AZURE_AI_STUDIO_COHERE_EMBED_KEY = os.getenv(\"AZURE_AI_STUDIO_COHERE_EMBED_KEY\")\n",
"AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT = os.getenv(\"AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT\")\n",
"AZURE_AI_STUDIO_COHERE_COMMAND_KEY = os.getenv(\"AZURE_AI_STUDIO_COHERE_COMMAND_KEY\")\n",
"AZURE_AI_STUDIO_COHERE_COMMAND_ENDPOINT = os.getenv(\"AZURE_AI_STUDIO_COHERE_COMMAND_ENDPOINT\")\n",
"# Index Names\n",
"INT8_INDEX_NAME = \"cohere-embed-v3-int8\"\n",
"BINARY_INDEX_NAME = \"cohere-embed-v3-binary\"\n",
"# Azure Search Service Configuration\n",
"SEARCH_SERVICE_API_KEY = os.getenv(\"AZURE_SEARCH_ADMIN_KEY\")\n",
"SEARCH_SERVICE_ENDPOINT = os.getenv(\"AZURE_SEARCH_SERVICE_ENDPOINT\")\n",
"# Create a Cohere client using the COHERE_API_KEY from the Cohere website\n",
"COHERE_API_KEY = os.getenv(\"COHERE_API_KEY\")\n",
"cohere_website_client = cohere.Client(COHERE_API_KEY)\n",
"\n",
"# Create a Cohere client using the AZURE_AI_STUDIO_COHERE_API_KEY and AZURE_AI_STUDIO_COHERE_ENDPOINT from Azure AI Studio\n",
"cohere_azure_client = cohere.Client(\n",
" base_url=f\"{AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT}/v1\", \n",
" api_key=AZURE_AI_STUDIO_COHERE_EMBED_KEY\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Using AAD for authentication.\n"
]
}
],
"source": [
"# User-specified parameter\n",
"USE_AAD_FOR_SEARCH = True \n",
"\n",
"def authenticate_azure_search(api_key=None, use_aad_for_search=False):\n",
" if use_aad_for_search:\n",
" print(\"Using AAD for authentication.\")\n",
" credential = DefaultAzureCredential()\n",
" else:\n",
" print(\"Using API keys for authentication.\")\n",
" if api_key is None:\n",
" raise ValueError(\"API key must be provided if not using AAD for authentication.\")\n",
" credential = AzureKeyCredential(api_key)\n",
" return credential\n",
"\n",
"azure_search_credential = authenticate_azure_search(api_key=SEARCH_SERVICE_API_KEY, use_aad_for_search=USE_AAD_FOR_SEARCH)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Generate Embeddings Function\n",
"This function `generate_embeddings` takes in a list of texts (or a single string), an input type, and an embedding type. It uses the Cohere client to generate embeddings for the provided texts using the specified model (\"embed-english-v3.0\"). The function ensures that the input texts are in a list format. If a single string is provided, it is converted into a list. The function then returns a list of embeddings.\n",
"\n",
"For this sample, we'll use, \"ubinary\" or \"int8\". The \"ubinary\" type is used for binary embeddings, while the \"int8\" type is used for integer embeddings. "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"def generate_embeddings(texts, input_type=\"search_document\", embedding_type=\"ubinary\"):\n",
" model = \"embed-english-v3.0\"\n",
"\n",
" # Convert single string to list\n",
" texts = [texts] if isinstance(texts, str) else texts\n",
"\n",
" response = cohere_azure_client.embed(\n",
" texts=texts,\n",
" model=model,\n",
" input_type=input_type,\n",
" embedding_types=[embedding_type],\n",
" )\n",
"\n",
" return [embedding for embedding in getattr(response.embeddings, embedding_type)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's check the dimension length of both the int8 and ubinary embeddings. "
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Length of search_document embeddings (ubinary):\n",
"128\n",
"128\n",
"128\n",
"\n",
"Length of search_document embeddings (int8):\n",
"1024\n",
"1024\n",
"1024\n",
"\n",
"Length of search_query embeddings (ubinary):\n",
"128\n",
"\n",
"Length of search_query embeddings (int8):\n",
"1024\n"
]
}
],
"source": [
"def generate_and_print_embeddings(texts, input_type, embedding_type):\n",
" embeddings = generate_embeddings(texts, input_type=input_type, embedding_type=embedding_type)\n",
" print(f\"\\nLength of {input_type} embeddings ({embedding_type}):\")\n",
" for embedding in embeddings:\n",
" print(len(embedding))\n",
"\n",
"# Define the documents and queries\n",
"documents = [\"Hello, world!\", \"Goodbye, world!\", \"How are you, world?\"]\n",
"queries = [\"world\"]\n",
"\n",
"# Generate and print the embeddings\n",
"generate_and_print_embeddings(documents, \"search_document\", \"ubinary\")\n",
"generate_and_print_embeddings(documents, \"search_document\", \"int8\")\n",
"generate_and_print_embeddings(queries, \"search_query\", \"ubinary\")\n",
"generate_and_print_embeddings(queries, \"search_query\", \"int8\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The 'ubinary' format uses 8-bit quantization to compress the embeddings. This means each original floating-point number in the 1024-dimensional embedding is approximated by an 8-bit unsigned integer.\n",
"\n",
"When these 8-bit integers are packed together, eight of them can fit in the space that was originally used by one 64-bit floating-point number (since 64 divided by 8 equals 8).\n",
"\n",
"This is why the 'ubinary' embeddings are 8 times smaller than the original embeddings, resulting in a 128-dimensional vector instead of a 1024-dimensional one."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create or Update Azure AI Search Index\n",
"This function creates or updates an Azure AI Search index to include a vector field for storing the document embeddings."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"def create_or_update_index(\n",
" client,\n",
" index_name,\n",
" vector_field_type,\n",
" scoring_uri,\n",
" authentication_key,\n",
" model_name,\n",
"):\n",
" fields = [\n",
" SimpleField(name=\"id\", type=SearchFieldDataType.String, key=True),\n",
" SearchField(\n",
" name=\"text\",\n",
" type=SearchFieldDataType.String,\n",
" searchable=True,\n",
" ),\n",
" SearchField(\n",
" name=\"embedding\",\n",
" type=vector_field_type,\n",
" vector_search_dimensions=1024,\n",
" vector_search_profile_name=\"my-vector-config\",\n",
" hidden=False,\n",
" stored=True,\n",
" vector_encoding_format=(\n",
" VectorEncodingFormat.PACKED_BIT\n",
" if vector_field_type == \"Collection(Edm.Byte)\"\n",
" else None\n",
" ),\n",
" ),\n",
" ]\n",
"\n",
" vector_search = VectorSearch(\n",
" profiles=[\n",
" VectorSearchProfile(\n",
" name=\"my-vector-config\",\n",
" algorithm_configuration_name=\"my-hnsw\",\n",
" vectorizer=\"my-vectorizer\"\n",
" )\n",
" ],\n",
" algorithms=[\n",
" HnswAlgorithmConfiguration(\n",
" name=\"my-hnsw\",\n",
" kind=VectorSearchAlgorithmKind.HNSW,\n",
" parameters=HnswParameters(\n",
" metric=(\n",
" VectorSearchAlgorithmMetric.HAMMING\n",
" if vector_field_type == \"Collection(Edm.Byte)\"\n",
" else VectorSearchAlgorithmMetric.COSINE\n",
" )\n",
" ),\n",
" )\n",
" ],\n",
" vectorizers=[\n",
" AzureMachineLearningVectorizer(\n",
" name=\"my-vectorizer\",\n",
" aml_parameters=AzureMachineLearningParameters(\n",
" scoring_uri=scoring_uri,\n",
" authentication_key=authentication_key,\n",
" model_name=model_name,\n",
" ),\n",
" )\n",
" ],\n",
" )\n",
"\n",
" index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search)\n",
" client.create_or_update_index(index=index)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Index Documents and Their Embeddings\n",
"Finally, this function indexes the documents along with their int8 embeddings into Azure AI Search."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"def index_documents(search_client, documents, embeddings):\n",
" documents_to_index = [\n",
" {\"id\": str(idx), \"text\": doc, \"embedding\": emb}\n",
" for idx, (doc, emb) in enumerate(zip(documents, embeddings))\n",
" ]\n",
" search_client.upload_documents(documents=documents_to_index)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Run the workflow"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"# Initialize Azure Search Index Client\n",
"search_index_client = SearchIndexClient(\n",
" endpoint=SEARCH_SERVICE_ENDPOINT,\n",
" credential=azure_search_credential\n",
")\n",
"\n",
"# Define the documents\n",
"documents = [\n",
" \"Alan Turing was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist.\",\n",
" \"Albert Einstein was a German-born theoretical physicist who is widely held to be one of the greatest and most influential scientists of all time.\",\n",
" \"Isaac Newton was an English polymath active as a mathematician, physicist, astronomer, alchemist, theologian, and author who was described in his time as a natural philosopher.\",\n",
" \"Marie Curie was a Polish and naturalized-French physicist and chemist who conducted pioneering research on radioactivity.\",\n",
"]\n",
"\n",
"# Create the \"ubinary\" index and generate embeddings\n",
"create_or_update_index(\n",
" search_index_client,\n",
" BINARY_INDEX_NAME,\n",
" \"Collection(Edm.Byte)\",\n",
" scoring_uri=AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT,\n",
" authentication_key=AZURE_AI_STUDIO_COHERE_EMBED_KEY,\n",
" model_name=AIStudioModelCatalogName.COHERE_EMBED_V3_ENGLISH,\n",
")\n",
"\n",
"ubinary_embeddings = generate_embeddings(\n",
" documents,\n",
" input_type=\"search_document\",\n",
" embedding_type=\"ubinary\"\n",
")\n",
"\n",
"ubinary_search_client = SearchClient(\n",
" endpoint=SEARCH_SERVICE_ENDPOINT,\n",
" credential=azure_search_credential,\n",
" index_name=BINARY_INDEX_NAME,\n",
")\n",
"\n",
"index_documents(ubinary_search_client, documents, ubinary_embeddings)\n",
"\n",
"# Create the \"int8\" index and generate embeddings\n",
"create_or_update_index(\n",
" search_index_client,\n",
" INT8_INDEX_NAME,\n",
" \"Collection(Edm.SByte)\",\n",
" scoring_uri=AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT,\n",
" authentication_key=AZURE_AI_STUDIO_COHERE_EMBED_KEY,\n",
" model_name=AIStudioModelCatalogName.COHERE_EMBED_V3_ENGLISH,\n",
")\n",
"\n",
"int8_embeddings = generate_embeddings(\n",
" documents,\n",
" input_type=\"search_document\",\n",
" embedding_type=\"int8\"\n",
")\n",
"\n",
"int8_search_client = SearchClient(\n",
" endpoint=SEARCH_SERVICE_ENDPOINT,\n",
" credential=azure_search_credential,\n",
" index_name=INT8_INDEX_NAME,\n",
")\n",
"\n",
"index_documents(int8_search_client, documents, int8_embeddings)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Perform a Vector Search"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Int8 Results:\n",
"Title: Alan Turing was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist.\n",
"Score: 0.6225287\n",
"\n",
"Title: Albert Einstein was a German-born theoretical physicist who is widely held to be one of the greatest and most influential scientists of all time.\n",
"Score: 0.5917698\n",
"\n",
"Title: Isaac Newton was an English polymath active as a mathematician, physicist, astronomer, alchemist, theologian, and author who was described in his time as a natural philosopher.\n",
"Score: 0.5746157\n",
"\n",
"Binary Results:\n",
"Title: Alan Turing was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist.\n",
"Score: 0.002610966\n",
"\n",
"Title: Albert Einstein was a German-born theoretical physicist who is widely held to be one of the greatest and most influential scientists of all time.\n",
"Score: 0.0024509805\n",
"\n",
"Title: Isaac Newton was an English polymath active as a mathematician, physicist, astronomer, alchemist, theologian, and author who was described in his time as a natural philosopher.\n",
"Score: 0.0023980816\n",
"\n"
]
}
],
"source": [
"# Query for vector search\n",
"query = \"foundational figures in computer science\"\n",
"\n",
"# Generate query embeddings\n",
"# Use input_type=\"search_query\" for query embeddings if you want to leverage the function defined above for generating query embeddings\n",
"query_embeddings = generate_embeddings(query, input_type=\"search_query\")\n",
"\n",
"# Note: If you do not want to use a vectorizer on your index, you can use VectorizedQuery instead of VectorizableTextQuery.\n",
"# Here's an example:\n",
"# vector_query_int8 = VectorizedQuery(\n",
"# vector=query_embeddings, k_nearest_neighbors=3, fields=\"embedding\"\n",
"# )\n",
"\n",
"# Search the \"int8\" index\n",
"vector_query_int8 = VectorizableTextQuery(\n",
" text=query, k_nearest_neighbors=3, fields=\"embedding\"\n",
")\n",
"\n",
"results_int8 = int8_search_client.search(\n",
" search_text=None, # No search text for pure vector search\n",
" vector_queries=[vector_query_int8],\n",
")\n",
"\n",
"# Search the \"binary\" index\n",
"vector_query_binary = VectorizableTextQuery(\n",
" text=query, k_nearest_neighbors=3, fields=\"embedding\"\n",
")\n",
"\n",
"results_binary = ubinary_search_client.search(\n",
" search_text=None, # No search text for pure vector search\n",
" vector_queries=[vector_query_binary],\n",
")\n",
"\n",
"# Print the results side by side\n",
"print(\"Int8 Results:\")\n",
"for result in results_int8:\n",
" print(f\"Title: {result['text']}\")\n",
" print(f\"Score: {result['@search.score']}\\n\")\n",
"\n",
"print(\"Binary Results:\")\n",
"for result in results_binary:\n",
" print(f\"Title: {result['text']}\")\n",
" print(f\"Score: {result['@search.score']}\\n\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The difference in the scores between the int8 results and the binary results is due to the use of different similarity metrics. Cosine similarity measures the similarity between numerical vectors, while Hamming distance measures the difference between binary vectors. Since cosine similarity is a measure of similarity and larger scores indicate greater similarity, the scores for the int8 results are larger than the scores for the binary results. On the other hand, since Hamming distance is a measure of dissimilarity and smaller scores indicate greater similarity, the scores for the binary results are smaller than the scores for the int8 results."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create a Chat Function"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"co_chat = cohere.Client(\n",
" base_url=f\"{AZURE_AI_STUDIO_COHERE_COMMAND_ENDPOINT}/v1\", api_key=AZURE_AI_STUDIO_COHERE_COMMAND_KEY\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"There are many foundational figures who have made significant contributions to the field of computer science. Here are some of the most notable individuals:\n",
"\n",
"1. Alan Turing: Often considered the \"father of computer science,\" Alan Turing was a British mathematician and computer scientist who made groundbreaking contributions to computing, cryptography, and artificial intelligence. He is widely known for his work on the Turing machine, a theoretical device that served as a model for modern computers, and for his crucial role in breaking German Enigma codes during World War\n"
]
}
],
"source": [
"# Extract the documents from the search results\n",
"documents_binary = [{\"text\": result[\"text\"]} for result in results_binary]\n",
"\n",
"# Ground the documents from the \"binary\" index\n",
"chat_response_binary = co_chat.chat(\n",
" message=query, documents=documents_binary, max_tokens=100\n",
")\n",
"\n",
"# Using Binary Vectors\n",
"print(chat_response_binary.text)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": ".venv",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}