# Azure AI Search with Cohere Embed V3 Int8 and Binary Embeddings Support

In this notebook, we'll demonstrate how to leverage `Cohere Embed V3` to generate int8 and binary embeddings, significantly reducing memory costs while maintaining high search quality. We'll integrate these embeddings with Azure AI Search and perform Retrieval-Augmented Generation (RAG) using CommandR+ in Azure AI Studio.

**This unofficial code sample was created for an Ignite 2024 demo. It's offered "as-is" and might not work for all customers and all scenarios.**

## Benefits
- **Memory Efficiency**: int8 embeddings reduce memory usage by 4x, while binary embeddings offer a 32x reduction.
- **Cost Reduction**: Lower memory requirements lead to substantial cost savings.
- **Speed**: Faster search performance, up to 30% speed-up with int8 and 40x faster with binary embeddings.
- **Quality**: Maintain 99.99% of original search quality with int8 and up to 98% with binary embeddings.

## Prerequisites
- üêç Python 3.9 or higher
- üîó [Azure AI Search Service](https://learn.microsoft.com/azure/search/)
- üîó [Cohere Embed V3](https://learn.microsoft.com/azure/ai-studio/how-to/deploy-models-cohere-embed) via [Azure AI Studio Model Inference API](https://learn.microsoft.com/azure/ai-studio/reference/reference-model-inference-api) or [Cohere Embed v3 API](https://learn.microsoft.com/azure/ai-studio/how-to/deploy-models-cohere-embed#cohere-embed-v3)
- üîó [Command R+](https://learn.microsoft.com/azure/ai-studio/how-to/deploy-models-cohere-command) via [Azure AI Studio Model Inference API](https://learn.microsoft.com/azure/ai-studio/reference/reference-model-inference-api) or [Cohere Chat API](https://learn.microsoft.com/azure/ai-studio/how-to/deploy-models-cohere-command#cohere-chat-api)
## Features Covered
- ‚úÖ Int8 Embeddings
- ‚úÖ Binary Embeddings
- ‚úÖ Vectorizer
- ‚úÖ RAG using Azure AI Search as a retriever grounding data to Cohere's CommandR+ LLM

Let's get started!


## Install required libraries

In [7]:
! pip install --quiet azure-search-documents==11.6.0b4
! pip install --quiet cohere python-dotenv azure-identity

In [None]:
import cohere
import numpy as np
import os
from azure.core.credentials import AzureKeyCredential
from azure.identity import DefaultAzureCredential
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    AIStudioModelCatalogName,
    AzureMachineLearningParameters,
    AzureMachineLearningVectorizer,
    HnswAlgorithmConfiguration,
    HnswParameters,
    SearchField,
    SearchFieldDataType,
    SearchIndex,
    SearchableField,
    SimpleField,
    VectorEncodingFormat,
    VectorSearch,
    VectorSearchAlgorithmKind,
    VectorSearchAlgorithmMetric,
    VectorSearchProfile
)
from azure.search.documents.models import (
    VectorizableTextQuery,
    VectorizedQuery
)
from dotenv import load_dotenv

## Set Up Cohere and Azure Credentials
Before generating embeddings or interacting with Azure AI Search, we need to set up our credentials for both Cohere and Azure AI Search.

In [4]:
# Load environment variables
load_dotenv()

# Azure AI Studio Cohere Configuration
AZURE_AI_STUDIO_COHERE_EMBED_KEY = os.getenv("AZURE_AI_STUDIO_COHERE_EMBED_KEY")
AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT = os.getenv("AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT")
AZURE_AI_STUDIO_COHERE_COMMAND_KEY = os.getenv("AZURE_AI_STUDIO_COHERE_COMMAND_KEY")
AZURE_AI_STUDIO_COHERE_COMMAND_ENDPOINT = os.getenv("AZURE_AI_STUDIO_COHERE_COMMAND_ENDPOINT")
# Index Names
INT8_INDEX_NAME = "cohere-embed-v3-int8"
BINARY_INDEX_NAME = "cohere-embed-v3-binary"
# Azure Search Service Configuration
SEARCH_SERVICE_API_KEY = os.getenv("AZURE_SEARCH_ADMIN_KEY")
SEARCH_SERVICE_ENDPOINT = os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT")
# Create a Cohere client using the COHERE_API_KEY from the Cohere website
COHERE_API_KEY = os.getenv("COHERE_API_KEY")
cohere_website_client = cohere.Client(COHERE_API_KEY)

# Create a Cohere client using the AZURE_AI_STUDIO_COHERE_API_KEY and AZURE_AI_STUDIO_COHERE_ENDPOINT from Azure AI Studio
cohere_azure_client = cohere.Client(
    base_url=f"{AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT}/v1", 
    api_key=AZURE_AI_STUDIO_COHERE_EMBED_KEY
)

In [5]:
# User-specified parameter
USE_AAD_FOR_SEARCH = True  

def authenticate_azure_search(api_key=None, use_aad_for_search=False):
    if use_aad_for_search:
        print("Using AAD for authentication.")
        credential = DefaultAzureCredential()
    else:
        print("Using API keys for authentication.")
        if api_key is None:
            raise ValueError("API key must be provided if not using AAD for authentication.")
        credential = AzureKeyCredential(api_key)
    return credential

azure_search_credential = authenticate_azure_search(api_key=SEARCH_SERVICE_API_KEY, use_aad_for_search=USE_AAD_FOR_SEARCH)


Using AAD for authentication.


## Generate Embeddings Function
This function `generate_embeddings` takes in a list of texts (or a single string), an input type, and an embedding type. It uses the Cohere client to generate embeddings for the provided texts using the specified model ("embed-english-v3.0"). The function ensures that the input texts are in a list format. If a single string is provided, it is converted into a list. The function then returns a list of embeddings.

For this sample, we'll use, "ubinary" or "int8". The "ubinary" type is used for binary embeddings, while the "int8" type is used for integer embeddings. 

In [6]:
def generate_embeddings(texts, input_type="search_document", embedding_type="ubinary"):
    model = "embed-english-v3.0"

    # Convert single string to list
    texts = [texts] if isinstance(texts, str) else texts

    response = cohere_azure_client.embed(
        texts=texts,
        model=model,
        input_type=input_type,
        embedding_types=[embedding_type],
    )

    return [embedding for embedding in getattr(response.embeddings, embedding_type)]

Let's check the dimension length of both the int8 and ubinary embeddings. 

In [7]:
def generate_and_print_embeddings(texts, input_type, embedding_type):
    embeddings = generate_embeddings(texts, input_type=input_type, embedding_type=embedding_type)
    print(f"\nLength of {input_type} embeddings ({embedding_type}):")
    for embedding in embeddings:
        print(len(embedding))

# Define the documents and queries
documents = ["Hello, world!", "Goodbye, world!", "How are you, world?"]
queries = ["world"]

# Generate and print the embeddings
generate_and_print_embeddings(documents, "search_document", "ubinary")
generate_and_print_embeddings(documents, "search_document", "int8")
generate_and_print_embeddings(queries, "search_query", "ubinary")
generate_and_print_embeddings(queries, "search_query", "int8")


Length of search_document embeddings (ubinary):
128
128
128

Length of search_document embeddings (int8):
1024
1024
1024

Length of search_query embeddings (ubinary):
128

Length of search_query embeddings (int8):
1024


The 'ubinary' format uses 8-bit quantization to compress the embeddings. This means each original floating-point number in the 1024-dimensional embedding is approximated by an 8-bit unsigned integer.

When these 8-bit integers are packed together, eight of them can fit in the space that was originally used by one 64-bit floating-point number (since 64 divided by 8 equals 8).

This is why the 'ubinary' embeddings are 8 times smaller than the original embeddings, resulting in a 128-dimensional vector instead of a 1024-dimensional one.

## Create or Update Azure AI Search Index
This function creates or updates an Azure AI Search index to include a vector field for storing the document embeddings.

In [8]:
def create_or_update_index(
    client,
    index_name,
    vector_field_type,
    scoring_uri,
    authentication_key,
    model_name,
):
    fields = [
        SimpleField(name="id", type=SearchFieldDataType.String, key=True),
        SearchField(
            name="text",
            type=SearchFieldDataType.String,
            searchable=True,
        ),
        SearchField(
            name="embedding",
            type=vector_field_type,
            vector_search_dimensions=1024,
            vector_search_profile_name="my-vector-config",
            hidden=False,
            stored=True,
            vector_encoding_format=(
                VectorEncodingFormat.PACKED_BIT
                if vector_field_type == "Collection(Edm.Byte)"
                else None
            ),
        ),
    ]

    vector_search = VectorSearch(
        profiles=[
            VectorSearchProfile(
                name="my-vector-config",
                algorithm_configuration_name="my-hnsw",
                vectorizer="my-vectorizer"
            )
        ],
        algorithms=[
            HnswAlgorithmConfiguration(
                name="my-hnsw",
                kind=VectorSearchAlgorithmKind.HNSW,
                parameters=HnswParameters(
                    metric=(
                        VectorSearchAlgorithmMetric.HAMMING
                        if vector_field_type == "Collection(Edm.Byte)"
                        else VectorSearchAlgorithmMetric.COSINE
                    )
                ),
            )
        ],
        vectorizers=[
            AzureMachineLearningVectorizer(
                name="my-vectorizer",
                aml_parameters=AzureMachineLearningParameters(
                    scoring_uri=scoring_uri,
                    authentication_key=authentication_key,
                    model_name=model_name,
                ),
            )
        ],
    )

    index = SearchIndex(name=index_name, fields=fields, vector_search=vector_search)
    client.create_or_update_index(index=index)

## Index Documents and Their Embeddings
Finally, this function indexes the documents along with their int8 embeddings into Azure AI Search.

In [9]:
def index_documents(search_client, documents, embeddings):
    documents_to_index = [
        {"id": str(idx), "text": doc, "embedding": emb}
        for idx, (doc, emb) in enumerate(zip(documents, embeddings))
    ]
    search_client.upload_documents(documents=documents_to_index)

## Run the workflow

In [10]:
# Initialize Azure Search Index Client
search_index_client = SearchIndexClient(
    endpoint=SEARCH_SERVICE_ENDPOINT,
    credential=azure_search_credential
)

# Define the documents
documents = [
    "Alan Turing was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist.",
    "Albert Einstein was a German-born theoretical physicist who is widely held to be one of the greatest and most influential scientists of all time.",
    "Isaac Newton was an English polymath active as a mathematician, physicist, astronomer, alchemist, theologian, and author who was described in his time as a natural philosopher.",
    "Marie Curie was a Polish and naturalized-French physicist and chemist who conducted pioneering research on radioactivity.",
]

# Create the "ubinary" index and generate embeddings
create_or_update_index(
    search_index_client,
    BINARY_INDEX_NAME,
    "Collection(Edm.Byte)",
    scoring_uri=AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT,
    authentication_key=AZURE_AI_STUDIO_COHERE_EMBED_KEY,
    model_name=AIStudioModelCatalogName.COHERE_EMBED_V3_ENGLISH,
)

ubinary_embeddings = generate_embeddings(
    documents,
    input_type="search_document",
    embedding_type="ubinary"
)

ubinary_search_client = SearchClient(
    endpoint=SEARCH_SERVICE_ENDPOINT,
    credential=azure_search_credential,
    index_name=BINARY_INDEX_NAME,
)

index_documents(ubinary_search_client, documents, ubinary_embeddings)

# Create the "int8" index and generate embeddings
create_or_update_index(
    search_index_client,
    INT8_INDEX_NAME,
    "Collection(Edm.SByte)",
    scoring_uri=AZURE_AI_STUDIO_COHERE_EMBED_ENDPOINT,
    authentication_key=AZURE_AI_STUDIO_COHERE_EMBED_KEY,
    model_name=AIStudioModelCatalogName.COHERE_EMBED_V3_ENGLISH,
)

int8_embeddings = generate_embeddings(
    documents,
    input_type="search_document",
    embedding_type="int8"
)

int8_search_client = SearchClient(
    endpoint=SEARCH_SERVICE_ENDPOINT,
    credential=azure_search_credential,
    index_name=INT8_INDEX_NAME,
)

index_documents(int8_search_client, documents, int8_embeddings)


## Perform a Vector Search

In [11]:
# Query for vector search
query = "foundational figures in computer science"

# Generate query embeddings
# Use input_type="search_query" for query embeddings if you want to leverage the function defined above for generating query embeddings
query_embeddings = generate_embeddings(query, input_type="search_query")

# Note: If you do not want to use a vectorizer on your index, you can use VectorizedQuery instead of VectorizableTextQuery.
# Here's an example:
# vector_query_int8 = VectorizedQuery(
#     vector=query_embeddings, k_nearest_neighbors=3, fields="embedding"
# )

# Search the "int8" index
vector_query_int8 = VectorizableTextQuery(
    text=query, k_nearest_neighbors=3, fields="embedding"
)

results_int8 = int8_search_client.search(
    search_text=None,  # No search text for pure vector search
    vector_queries=[vector_query_int8],
)

# Search the "binary" index
vector_query_binary = VectorizableTextQuery(
    text=query, k_nearest_neighbors=3, fields="embedding"
)

results_binary = ubinary_search_client.search(
    search_text=None,  # No search text for pure vector search
    vector_queries=[vector_query_binary],
)

# Print the results side by side
print("Int8 Results:")
for result in results_int8:
    print(f"Title: {result['text']}")
    print(f"Score: {result['@search.score']}\n")

print("Binary Results:")
for result in results_binary:
    print(f"Title: {result['text']}")
    print(f"Score: {result['@search.score']}\n")

Int8 Results:
Title: Alan Turing was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist.
Score: 0.6225287

Title: Albert Einstein was a German-born theoretical physicist who is widely held to be one of the greatest and most influential scientists of all time.
Score: 0.5917698

Title: Isaac Newton was an English polymath active as a mathematician, physicist, astronomer, alchemist, theologian, and author who was described in his time as a natural philosopher.
Score: 0.5746157

Binary Results:
Title: Alan Turing was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist.
Score: 0.002610966

Title: Albert Einstein was a German-born theoretical physicist who is widely held to be one of the greatest and most influential scientists of all time.
Score: 0.0024509805

Title: Isaac Newton was an English polymath active as a mathematician, physicist, astronomer, alchemist, theologian, an

The difference in the scores between the int8 results and the binary results is due to the use of different similarity metrics. Cosine similarity measures the similarity between numerical vectors, while Hamming distance measures the difference between binary vectors. Since cosine similarity is a measure of similarity and larger scores indicate greater similarity, the scores for the int8 results are larger than the scores for the binary results. On the other hand, since Hamming distance is a measure of dissimilarity and smaller scores indicate greater similarity, the scores for the binary results are smaller than the scores for the int8 results.

## Create a Chat Function

In [12]:
co_chat = cohere.Client(
    base_url=f"{AZURE_AI_STUDIO_COHERE_COMMAND_ENDPOINT}/v1", api_key=AZURE_AI_STUDIO_COHERE_COMMAND_KEY
)

In [13]:
# Extract the documents from the search results
documents_binary = [{"text": result["text"]} for result in results_binary]

# Ground the documents from the "binary" index
chat_response_binary = co_chat.chat(
    message=query, documents=documents_binary, max_tokens=100
)

# Using Binary Vectors
print(chat_response_binary.text)

There are many foundational figures who have made significant contributions to the field of computer science. Here are some of the most notable individuals:

1. Alan Turing: Often considered the "father of computer science," Alan Turing was a British mathematician and computer scientist who made groundbreaking contributions to computing, cryptography, and artificial intelligence. He is widely known for his work on the Turing machine, a theoretical device that served as a model for modern computers, and for his crucial role in breaking German Enigma codes during World War
