# Reducing storage usage for vectors on Azure AI Search

This code demonstrates how to use the following features to reduce vector storage on Azure AI Search.

+ Use smaller "narrow" data types instead of `Edm.Single`. Types such as `Edm.Half` reduce storage overhead.
+ Disable storing vectors used in the query response. Vectors returned in a query response are stored separately from the vectors used during queries.
+ Quantizing vectors. Use built-in scalar or binary quantization to quantize embeddings to `Edm.Int8` without any reduction in query performance. Information loss from quantization can be compensated for using the original unquantized embeddings and oversampling.
+ Truncating dimensions. Use built-in truncation dimension option to reduce vector dimensionality with minimal reduction in query performance.

### Prerequisites

+ An Azure subscription.
 
+ Azure AI Search, any tier, but we recommend Basic or higher for this workload. [Enable semantic ranker](https://learn.microsoft.com/azure/search/semantic-how-to-enable-disable) if you want to run a hybrid query with semantic ranking.

### Set up a Python virtual environment in Visual Studio Code

1. Open the Command Palette (Ctrl+Shift+P).
1. Search for **Python: Create Environment**.
1. Select **Venv**.
1. Select a Python interpreter. Choose 3.10 or later.

It can take a minute to set up. If you run into problems, see [Python environments in VS Code](https://code.visualstudio.com/docs/python/environments).

### Install packages

In [1]:
! pip install -r vector-quantization-and-storage-requirements.txt --quiet

### Load .env file (Copy .env-sample to .env and update accordingly)

In [2]:
from dotenv import load_dotenv
from azure.identity import DefaultAzureCredential
from azure.core.credentials import AzureKeyCredential
import os

load_dotenv(override=True) # take environment variables from .env.

# Variables not used here do not need to be updated in your .env file
endpoint = os.environ["AZURE_SEARCH_SERVICE_ENDPOINT"]
credential = AzureKeyCredential(os.getenv("AZURE_SEARCH_ADMIN_KEY", "")) if len(os.getenv("AZURE_SEARCH_ADMIN_KEY", "")) > 0 else DefaultAzureCredential()
base_index_name = os.getenv("AZURE_SEARCH_INDEX", "teststorage")
embedding_dimensions = int(os.getenv("AZURE_OPENAI_EMBEDDING_DIMENSIONS", 3072))


## Load embeddings

Load the embeddings from a precomputed file. These embeddings use [text-embedding-3-large](https://learn.microsoft.com/azure/ai-services/openai/concepts/models#embeddings) with 3072 dimensions. The chunks are from the sample data in the document folder, chunked using the [Split Skill](https://learn.microsoft.com/azure/search/cognitive-search-skill-textsplit).

In [3]:
import json
from lib.embeddings import content_path

with open(content_path, "r") as f:
    chunks = json.load(f)

# Create indexes

To demonstrate the storage impact of the different options, the following code creates indexes that use each option, and another index that combines all the options together

In [4]:
# Function to define the indexes on the search service
from typing import List
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    SimpleField,
    SearchFieldDataType,
    VectorSearch,
    HnswAlgorithmConfiguration,
    VectorSearchProfile,
    SemanticConfiguration,
    SemanticPrioritizedFields,
    SemanticField,
    SemanticSearch,
    SearchIndex,
    SearchField,
    ScalarQuantizationCompression,
    BinaryQuantizationCompression,
    VectorSearchCompression
)

def create_index(index_name, dimensions, use_scalar_compression=False, use_binary_compression=False, use_float16=False, use_stored=True, truncation_dimension=None):
    if use_float16:
        vector_type = "Collection(Edm.Half)"
    else:
        vector_type = "Collection(Edm.Single)"

    # Vector fields that aren't stored can never be returned in the response
    fields = [
        SimpleField(name="id", type=SearchFieldDataType.String, key=True, sortable=True, filterable=True),
        SearchField(name="title", type=SearchFieldDataType.String),
        SearchField(name="chunk", type=SearchFieldDataType.String),
        SearchField(name="embedding", type=vector_type, searchable=True, stored=use_stored, vector_search_dimensions=dimensions, vector_search_profile_name="myHnswProfile")
    ]

    compression_configurations: List[VectorSearchCompression] = []
    if use_scalar_compression:
        compression_name = "myCompression"
        compression_configurations = [
            ScalarQuantizationCompression(compression_name=compression_name, truncation_dimension=truncation_dimension)
        ]
    elif use_binary_compression:
        compression_name = "myCompression"
        compression_configurations = [
            BinaryQuantizationCompression(compression_name=compression_name, truncation_dimension=truncation_dimension)
        ]
    else:
        compression_name = None
        compression_configurations = []
    
    vector_search = VectorSearch(
        algorithms=[
            HnswAlgorithmConfiguration(name="myHnsw")
        ],
        profiles=[
            VectorSearchProfile(name="myHnswProfile", algorithm_configuration_name="myHnsw", compression_name=compression_name)
        ],
        compressions=compression_configurations
    )

    semantic_config = SemanticConfiguration(
        name="my-semantic-config",
        prioritized_fields=SemanticPrioritizedFields(
            title_field=SemanticField(field_name="title"),
            content_fields=[SemanticField(field_name="chunk")]
        )
    )
    semantic_search = SemanticSearch(configurations=[semantic_config])

    return SearchIndex(name=index_name, fields=fields, vector_search=vector_search, semantic_search=semantic_search)

In [5]:
# Create indexes to compare storage usage
# The baseline index does not use any options

indexes = {
    "baseline": {},
    "scalar-compression": {
        "use_scalar_compression": True
    },
    "binary-compression": {
        "use_binary_compression": True
    },
    "narrow": {
        "use_float16": True
    },
    "no-stored": {
        "use_stored": False
    },
    "scalar-compresssion-truncation-dimension": {
        "use_scalar_compression": True,
        "truncation_dimension": 1024
    },
    "binary-compression-truncation-dimension": {
        "use_binary_compression": True,
        "truncation_dimension": 1024
    },
    "all-options-with-scalar": {
        "use_scalar_compression": True,
        "use_float16": True,
        "use_stored": False,
        "truncation_dimension": 1024
    },
    "all-options-with-binary": {
        "use_binary_compression": True,
        "use_float16": True,
        "use_stored": False,
        "truncation_dimension": 1024
    }
}

search_index_client = SearchIndexClient(endpoint, credential)
for index, options in indexes.items():
    index = create_index(f"{base_index_name}-{index}", dimensions=embedding_dimensions, **options)
    search_index_client.create_or_update_index(index)

print("Created indexes")


Created indexes


In [6]:
# Function to upload the embeddings to each index

import json
from lib.embeddings import content_path
from azure.search.documents import SearchIndexingBufferedSender

def upload_embeddings(index_name):
    with open(content_path, "r") as f:
        content = json.load(f)
    
    with SearchIndexingBufferedSender(endpoint, index_name, credential) as client:
        client.upload_documents(content)

In [7]:
for index in indexes.keys():
    upload_embeddings(f"{base_index_name}-{index}")

## Check storage sizes

Find the new storage size in MB to demonstrate how the various options affect storage.

In [8]:
# Please note - there may be delays in finding index statistics after document upload
# Index statistics is not a real time API
# See https://learn.microsoft.com/rest/api/searchservice/preview-api/get-index-statistics for more information

def bytes_to_mb(bytes):
    return round(bytes / (1024 * 1024), 4)

def find_storage_size_mb(index_name):
    response = search_index_client.get_index_statistics(index_name)
    return bytes_to_mb(response["storage_size"]), bytes_to_mb(response["vector_index_size"])

index_sizes = [(find_storage_size_mb(index_name), index_name) for index_name in (f"{base_index_name}-{index}" for index in indexes.keys())]
index_sizes.sort(key=lambda item: item[0][0], reverse=True)

for ((storage_size, vector_size), index_name) in index_sizes:
    print("*" * 40)
    print(f"Index Name: {index_name}\nStorage Size: {storage_size}MB\nVector Size: {vector_size}MB")



****************************************
Index Name: my-demo-index-scalar-compression
Storage Size: 19.3605MB
Vector Size: 1.2242MB
****************************************
Index Name: my-demo-index-scalar-compresssion-truncation-dimension
Storage Size: 18.5597MB
Vector Size: 0.4234MB
****************************************
Index Name: my-demo-index-binary-compression
Storage Size: 18.3085MB
Vector Size: 0.1732MB
****************************************
Index Name: my-demo-index-binary-compression-truncation-dimension
Storage Size: 18.2084MB
Vector Size: 0.0731MB
****************************************
Index Name: my-demo-index-baseline
Storage Size: 18.1559MB
Vector Size: 4.8277MB
****************************************
Index Name: my-demo-index-narrow
Storage Size: 15.7536MB
Vector Size: 2.4254MB
****************************************
Index Name: my-demo-index-no-stored
Storage Size: 7.7143MB
Vector Size: 4.8277MB
****************************************
Index Name: my-demo-inde