demo-python/code/vector-quantization-and-storage/vector-quantization-and-storage.ipynb (379 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Reducing storage usage for vectors on Azure AI Search\n",
"\n",
"This code demonstrates how to use the following features to reduce vector storage on Azure AI Search.\n",
"\n",
"+ Use smaller \"narrow\" data types instead of `Edm.Single`. Types such as `Edm.Half` reduce storage overhead.\n",
"+ Disable storing vectors used in the query response. Vectors returned in a query response are stored separately from the vectors used during queries.\n",
"+ Quantizing vectors. Use built-in scalar or binary quantization to quantize embeddings to `Edm.Int8` without any reduction in query performance. Information loss from quantization can be compensated for using the original unquantized embeddings and oversampling.\n",
"+ Truncating dimensions. Use built-in truncation dimension option to reduce vector dimensionality with minimal reduction in query performance.\n",
"\n",
"### Prerequisites\n",
"\n",
"+ An Azure subscription.\n",
" \n",
"+ Azure AI Search, any tier, but we recommend Basic or higher for this workload. [Enable semantic ranker](https://learn.microsoft.com/azure/search/semantic-how-to-enable-disable) if you want to run a hybrid query with semantic ranking."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Set up a Python virtual environment in Visual Studio Code\n",
"\n",
"1. Open the Command Palette (Ctrl+Shift+P).\n",
"1. Search for **Python: Create Environment**.\n",
"1. Select **Venv**.\n",
"1. Select a Python interpreter. Choose 3.10 or later.\n",
"\n",
"It can take a minute to set up. If you run into problems, see [Python environments in VS Code](https://code.visualstudio.com/docs/python/environments)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Install packages"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"! pip install -r vector-quantization-and-storage-requirements.txt --quiet"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load .env file (Copy .env-sample to .env and update accordingly)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from dotenv import load_dotenv\n",
"from azure.identity import DefaultAzureCredential\n",
"from azure.core.credentials import AzureKeyCredential\n",
"import os\n",
"\n",
"load_dotenv(override=True) # take environment variables from .env.\n",
"\n",
"# Variables not used here do not need to be updated in your .env file\n",
"endpoint = os.environ[\"AZURE_SEARCH_SERVICE_ENDPOINT\"]\n",
"credential = AzureKeyCredential(os.getenv(\"AZURE_SEARCH_ADMIN_KEY\", \"\")) if len(os.getenv(\"AZURE_SEARCH_ADMIN_KEY\", \"\")) > 0 else DefaultAzureCredential()\n",
"base_index_name = os.getenv(\"AZURE_SEARCH_INDEX\", \"teststorage\")\n",
"embedding_dimensions = int(os.getenv(\"AZURE_OPENAI_EMBEDDING_DIMENSIONS\", 3072))\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Load embeddings\n",
"\n",
"Load the embeddings from a precomputed file. These embeddings use [text-embedding-3-large](https://learn.microsoft.com/azure/ai-services/openai/concepts/models#embeddings) with 3072 dimensions. The chunks are from the sample data in the document folder, chunked using the [Split Skill](https://learn.microsoft.com/azure/search/cognitive-search-skill-textsplit)."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"from lib.embeddings import content_path\n",
"\n",
"with open(content_path, \"r\") as f:\n",
" chunks = json.load(f)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create indexes\n",
"\n",
"To demonstrate the storage impact of the different options, the following code creates indexes that use each option, and another index that combines all the options together"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# Function to define the indexes on the search service\n",
"from typing import List\n",
"from azure.search.documents.indexes import SearchIndexClient\n",
"from azure.search.documents.indexes.models import (\n",
" SimpleField,\n",
" SearchFieldDataType,\n",
" VectorSearch,\n",
" HnswAlgorithmConfiguration,\n",
" VectorSearchProfile,\n",
" SemanticConfiguration,\n",
" SemanticPrioritizedFields,\n",
" SemanticField,\n",
" SemanticSearch,\n",
" SearchIndex,\n",
" SearchField,\n",
" ScalarQuantizationCompression,\n",
" BinaryQuantizationCompression,\n",
" VectorSearchCompression\n",
")\n",
"\n",
"def create_index(index_name, dimensions, use_scalar_compression=False, use_binary_compression=False, use_float16=False, use_stored=True, truncation_dimension=None):\n",
" if use_float16:\n",
" vector_type = \"Collection(Edm.Half)\"\n",
" else:\n",
" vector_type = \"Collection(Edm.Single)\"\n",
"\n",
" # Vector fields that aren't stored can never be returned in the response\n",
" fields = [\n",
" SimpleField(name=\"id\", type=SearchFieldDataType.String, key=True, sortable=True, filterable=True),\n",
" SearchField(name=\"title\", type=SearchFieldDataType.String),\n",
" SearchField(name=\"chunk\", type=SearchFieldDataType.String),\n",
" SearchField(name=\"embedding\", type=vector_type, searchable=True, stored=use_stored, vector_search_dimensions=dimensions, vector_search_profile_name=\"myHnswProfile\")\n",
" ]\n",
"\n",
" compression_configurations: List[VectorSearchCompression] = []\n",
" if use_scalar_compression:\n",
" compression_name = \"myCompression\"\n",
" compression_configurations = [\n",
" ScalarQuantizationCompression(compression_name=compression_name, truncation_dimension=truncation_dimension)\n",
" ]\n",
" elif use_binary_compression:\n",
" compression_name = \"myCompression\"\n",
" compression_configurations = [\n",
" BinaryQuantizationCompression(compression_name=compression_name, truncation_dimension=truncation_dimension)\n",
" ]\n",
" else:\n",
" compression_name = None\n",
" compression_configurations = []\n",
" \n",
" vector_search = VectorSearch(\n",
" algorithms=[\n",
" HnswAlgorithmConfiguration(name=\"myHnsw\")\n",
" ],\n",
" profiles=[\n",
" VectorSearchProfile(name=\"myHnswProfile\", algorithm_configuration_name=\"myHnsw\", compression_name=compression_name)\n",
" ],\n",
" compressions=compression_configurations\n",
" )\n",
"\n",
" semantic_config = SemanticConfiguration(\n",
" name=\"my-semantic-config\",\n",
" prioritized_fields=SemanticPrioritizedFields(\n",
" title_field=SemanticField(field_name=\"title\"),\n",
" content_fields=[SemanticField(field_name=\"chunk\")]\n",
" )\n",
" )\n",
" semantic_search = SemanticSearch(configurations=[semantic_config])\n",
"\n",
" return SearchIndex(name=index_name, fields=fields, vector_search=vector_search, semantic_search=semantic_search)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Created indexes\n"
]
}
],
"source": [
"# Create indexes to compare storage usage\n",
"# The baseline index does not use any options\n",
"\n",
"indexes = {\n",
" \"baseline\": {},\n",
" \"scalar-compression\": {\n",
" \"use_scalar_compression\": True\n",
" },\n",
" \"binary-compression\": {\n",
" \"use_binary_compression\": True\n",
" },\n",
" \"narrow\": {\n",
" \"use_float16\": True\n",
" },\n",
" \"no-stored\": {\n",
" \"use_stored\": False\n",
" },\n",
" \"scalar-compresssion-truncation-dimension\": {\n",
" \"use_scalar_compression\": True,\n",
" \"truncation_dimension\": 1024\n",
" },\n",
" \"binary-compression-truncation-dimension\": {\n",
" \"use_binary_compression\": True,\n",
" \"truncation_dimension\": 1024\n",
" },\n",
" \"all-options-with-scalar\": {\n",
" \"use_scalar_compression\": True,\n",
" \"use_float16\": True,\n",
" \"use_stored\": False,\n",
" \"truncation_dimension\": 1024\n",
" },\n",
" \"all-options-with-binary\": {\n",
" \"use_binary_compression\": True,\n",
" \"use_float16\": True,\n",
" \"use_stored\": False,\n",
" \"truncation_dimension\": 1024\n",
" }\n",
"}\n",
"\n",
"search_index_client = SearchIndexClient(endpoint, credential)\n",
"for index, options in indexes.items():\n",
" index = create_index(f\"{base_index_name}-{index}\", dimensions=embedding_dimensions, **options)\n",
" search_index_client.create_or_update_index(index)\n",
"\n",
"print(\"Created indexes\")\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# Function to upload the embeddings to each index\n",
"\n",
"import json\n",
"from lib.embeddings import content_path\n",
"from azure.search.documents import SearchIndexingBufferedSender\n",
"\n",
"def upload_embeddings(index_name):\n",
" with open(content_path, \"r\") as f:\n",
" content = json.load(f)\n",
" \n",
" with SearchIndexingBufferedSender(endpoint, index_name, credential) as client:\n",
" client.upload_documents(content)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"for index in indexes.keys():\n",
" upload_embeddings(f\"{base_index_name}-{index}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Check storage sizes\n",
"\n",
"Find the new storage size in MB to demonstrate how the various options affect storage."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"****************************************\n",
"Index Name: my-demo-index-scalar-compression\n",
"Storage Size: 19.3605MB\n",
"Vector Size: 1.2242MB\n",
"****************************************\n",
"Index Name: my-demo-index-scalar-compresssion-truncation-dimension\n",
"Storage Size: 18.5597MB\n",
"Vector Size: 0.4234MB\n",
"****************************************\n",
"Index Name: my-demo-index-binary-compression\n",
"Storage Size: 18.3085MB\n",
"Vector Size: 0.1732MB\n",
"****************************************\n",
"Index Name: my-demo-index-binary-compression-truncation-dimension\n",
"Storage Size: 18.2084MB\n",
"Vector Size: 0.0731MB\n",
"****************************************\n",
"Index Name: my-demo-index-baseline\n",
"Storage Size: 18.1559MB\n",
"Vector Size: 4.8277MB\n",
"****************************************\n",
"Index Name: my-demo-index-narrow\n",
"Storage Size: 15.7536MB\n",
"Vector Size: 2.4254MB\n",
"****************************************\n",
"Index Name: my-demo-index-no-stored\n",
"Storage Size: 7.7143MB\n",
"Vector Size: 4.8277MB\n",
"****************************************\n",
"Index Name: my-demo-index-all-options-with-scalar\n",
"Storage Size: 5.7287MB\n",
"Vector Size: 0.4245MB\n",
"****************************************\n",
"Index Name: my-demo-index-all-options-with-binary\n",
"Storage Size: 0.0019MB\n",
"Vector Size: 0.0369MB\n"
]
}
],
"source": [
"# Please note - there may be delays in finding index statistics after document upload\n",
"# Index statistics is not a real time API\n",
"# See https://learn.microsoft.com/rest/api/searchservice/preview-api/get-index-statistics for more information\n",
"\n",
"def bytes_to_mb(bytes):\n",
" return round(bytes / (1024 * 1024), 4)\n",
"\n",
"def find_storage_size_mb(index_name):\n",
" response = search_index_client.get_index_statistics(index_name)\n",
" return bytes_to_mb(response[\"storage_size\"]), bytes_to_mb(response[\"vector_index_size\"])\n",
"\n",
"index_sizes = [(find_storage_size_mb(index_name), index_name) for index_name in (f\"{base_index_name}-{index}\" for index in indexes.keys())]\n",
"index_sizes.sort(key=lambda item: item[0][0], reverse=True)\n",
"\n",
"for ((storage_size, vector_size), index_name) in index_sizes:\n",
" print(\"*\" * 40)\n",
" print(f\"Index Name: {index_name}\\nStorage Size: {storage_size}MB\\nVector Size: {vector_size}MB\")\n",
"\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
}
},
"nbformat": 4,
"nbformat_minor": 2
}