demo-python/code/vector-quantization-and-storage/vector-quantization-and-storage.ipynb

{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Reducing storage usage for vectors on Azure AI Search\n", "\n", "This code demonstrates how to use the following features to reduce vector storage on Azure AI Search.\n", "\n", "+ Use smaller \"narrow\" data types instead of `Edm.Single`. Types such as `Edm.Half` reduce storage overhead.\n", "+ Disable storing vectors used in the query response. Vectors returned in a query response are stored separately from the vectors used during queries.\n", "+ Quantizing vectors. Use built-in scalar or binary quantization to quantize embeddings to `Edm.Int8` without any reduction in query performance. Information loss from quantization can be compensated for using the original unquantized embeddings and oversampling.\n", "+ Truncating dimensions. Use built-in truncation dimension option to reduce vector dimensionality with minimal reduction in query performance.\n", "\n", "### Prerequisites\n", "\n", "+ An Azure subscription.\n", " \n", "+ Azure AI Search, any tier, but we recommend Basic or higher for this workload. [Enable semantic ranker](https://learn.microsoft.com/azure/search/semantic-how-to-enable-disable) if you want to run a hybrid query with semantic ranking." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Set up a Python virtual environment in Visual Studio Code\n", "\n", "1. Open the Command Palette (Ctrl+Shift+P).\n", "1. Search for **Python: Create Environment**.\n", "1. Select **Venv**.\n", "1. Select a Python interpreter. Choose 3.10 or later.\n", "\n", "It can take a minute to set up. If you run into problems, see [Python environments in VS Code](https://code.visualstudio.com/docs/python/environments)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Install packages" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "! pip install -r vector-quantization-and-storage-requirements.txt --quiet" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load .env file (Copy .env-sample to .env and update accordingly)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from dotenv import load_dotenv\n", "from azure.identity import DefaultAzureCredential\n", "from azure.core.credentials import AzureKeyCredential\n", "import os\n", "\n", "load_dotenv(override=True) # take environment variables from .env.\n", "\n", "# Variables not used here do not need to be updated in your .env file\n", "endpoint = os.environ[\"AZURE_SEARCH_SERVICE_ENDPOINT\"]\n", "credential = AzureKeyCredential(os.getenv(\"AZURE_SEARCH_ADMIN_KEY\", \"\")) if len(os.getenv(\"AZURE_SEARCH_ADMIN_KEY\", \"\")) > 0 else DefaultAzureCredential()\n", "base_index_name = os.getenv(\"AZURE_SEARCH_INDEX\", \"teststorage\")\n", "embedding_dimensions = int(os.getenv(\"AZURE_OPENAI_EMBEDDING_DIMENSIONS\", 3072))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load embeddings\n", "\n", "Load the embeddings from a precomputed file. These embeddings use [text-embedding-3-large](https://learn.microsoft.com/azure/ai-services/openai/concepts/models#embeddings) with 3072 dimensions. The chunks are from the sample data in the document folder, chunked using the [Split Skill](https://learn.microsoft.com/azure/search/cognitive-search-skill-textsplit)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import json\n", "from lib.embeddings import content_path\n", "\n", "with open(content_path, \"r\") as f:\n", " chunks = json.load(f)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Create indexes\n", "\n", "To demonstrate the storage impact of the different options, the following code creates indexes that use each option, and another index that combines all the options together" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Function to define the indexes on the search service\n", "from typing import List\n", "from azure.search.documents.indexes import SearchIndexClient\n", "from azure.search.documents.indexes.models import (\n", " SimpleField,\n", " SearchFieldDataType,\n", " VectorSearch,\n", " HnswAlgorithmConfiguration,\n", " VectorSearchProfile,\n", " SemanticConfiguration,\n", " SemanticPrioritizedFields,\n", " SemanticField,\n", " SemanticSearch,\n", " SearchIndex,\n", " SearchField,\n", " ScalarQuantizationCompression,\n", " BinaryQuantizationCompression,\n", " VectorSearchCompression\n", ")\n", "\n", "def create_index(index_name, dimensions, use_scalar_compression=False, use_binary_compression=False, use_float16=False, use_stored=True, truncation_dimension=None):\n", " if use_float16:\n", " vector_type = \"Collection(Edm.Half)\"\n", " else:\n", " vector_type = \"Collection(Edm.Single)\"\n", "\n", " # Vector fields that aren't stored can never be returned in the response\n", " fields = [\n", " SimpleField(name=\"id\", type=SearchFieldDataType.String, key=True, sortable=True, filterable=True),\n", " SearchField(name=\"title\", type=SearchFieldDataType.String),\n", " SearchField(name=\"chunk\", type=SearchFieldDataType.String),\n", " SearchField(name=\"embedding\", type=vector_type, searchable=True, stored=use_stored, vector_search_dimensions=dimensions, vector_search_profile_name=\"myHnswProfile\")\n", " ]\n", "\n", " compression_configurations: List[VectorSearchCompression] = []\n", " if use_scalar_compression:\n", " compression_name = \"myCompression\"\n", " compression_configurations = [\n", " ScalarQuantizationCompression(compression_name=compression_name, truncation_dimension=truncation_dimension)\n", " ]\n", " elif use_binary_compression:\n", " compression_name = \"myCompression\"\n", " compression_configurations = [\n", " BinaryQuantizationCompression(compression_name=compression_name, truncation_dimension=truncation_dimension)\n", " ]\n", " else:\n", " compression_name = None\n", " compression_configurations = []\n", " \n", " vector_search = VectorSearch(\n", " algorithms=[\n", " HnswAlgorithmConfiguration(name=\"myHnsw\")\n", " ],\n", " profiles=[\n", " VectorSearchProfile(name=\"myHnswProfile\", algorithm_configuration_name=\"myHnsw\", compression_name=compression_name)\n", " ],\n", " compressions=compression_configurations\n", " )\n", "\n", " semantic_config = SemanticConfiguration(\n", " name=\"my-semantic-config\",\n", " prioritized_fields=SemanticPrioritizedFields(\n", " title_field=SemanticField(field_name=\"title\"),\n", " content_fields=[SemanticField(field_name=\"chunk\")]\n", " )\n", " )\n", " semantic_search = SemanticSearch(configurations=[semantic_config])\n", "\n", " return SearchIndex(name=index_name, fields=fields, vector_search=vector_search, semantic_search=semantic_search)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Created indexes\n" ] } ], "source": [ "# Create indexes to compare storage usage\n", "# The baseline index does not use any options\n", "\n", "indexes = {\n", " \"baseline\": {},\n", " \"scalar-compression\": {\n", " \"use_scalar_compression\": True\n", " },\n", " \"binary-compression\": {\n", " \"use_binary_compression\": True\n", " },\n", " \"narrow\": {\n", " \"use_float16\": True\n", " },\n", " \"no-stored\": {\n", " \"use_stored\": False\n", " },\n", " \"scalar-compresssion-truncation-dimension\": {\n", " \"use_scalar_compression\": True,\n", " \"truncation_dimension\": 1024\n", " },\n", " \"binary-compression-truncation-dimension\": {\n", " \"use_binary_compression\": True,\n", " \"truncation_dimension\": 1024\n", " },\n", " \"all-options-with-scalar\": {\n", " \"use_scalar_compression\": True,\n", " \"use_float16\": True,\n", " \"use_stored\": False,\n", " \"truncation_dimension\": 1024\n", " },\n", " \"all-options-with-binary\": {\n", " \"use_binary_compression\": True,\n", " \"use_float16\": True,\n", " \"use_stored\": False,\n", " \"truncation_dimension\": 1024\n", " }\n", "}\n", "\n", "search_index_client = SearchIndexClient(endpoint, credential)\n", "for index, options in indexes.items():\n", " index = create_index(f\"{base_index_name}-{index}\", dimensions=embedding_dimensions, **options)\n", " search_index_client.create_or_update_index(index)\n", "\n", "print(\"Created indexes\")\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Function to upload the embeddings to each index\n", "\n", "import json\n", "from lib.embeddings import content_path\n", "from azure.search.documents import SearchIndexingBufferedSender\n", "\n", "def upload_embeddings(index_name):\n", " with open(content_path, \"r\") as f:\n", " content = json.load(f)\n", " \n", " with SearchIndexingBufferedSender(endpoint, index_name, credential) as client:\n", " client.upload_documents(content)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "for index in indexes.keys():\n", " upload_embeddings(f\"{base_index_name}-{index}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Check storage sizes\n", "\n", "Find the new storage size in MB to demonstrate how the various options affect storage." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "****************************************\n", "Index Name: my-demo-index-scalar-compression\n", "Storage Size: 19.3605MB\n", "Vector Size: 1.2242MB\n", "****************************************\n", "Index Name: my-demo-index-scalar-compresssion-truncation-dimension\n", "Storage Size: 18.5597MB\n", "Vector Size: 0.4234MB\n", "****************************************\n", "Index Name: my-demo-index-binary-compression\n", "Storage Size: 18.3085MB\n", "Vector Size: 0.1732MB\n", "****************************************\n", "Index Name: my-demo-index-binary-compression-truncation-dimension\n", "Storage Size: 18.2084MB\n", "Vector Size: 0.0731MB\n", "****************************************\n", "Index Name: my-demo-index-baseline\n", "Storage Size: 18.1559MB\n", "Vector Size: 4.8277MB\n", "****************************************\n", "Index Name: my-demo-index-narrow\n", "Storage Size: 15.7536MB\n", "Vector Size: 2.4254MB\n", "****************************************\n", "Index Name: my-demo-index-no-stored\n", "Storage Size: 7.7143MB\n", "Vector Size: 4.8277MB\n", "****************************************\n", "Index Name: my-demo-index-all-options-with-scalar\n", "Storage Size: 5.7287MB\n", "Vector Size: 0.4245MB\n", "****************************************\n", "Index Name: my-demo-index-all-options-with-binary\n", "Storage Size: 0.0019MB\n", "Vector Size: 0.0369MB\n" ] } ], "source": [ "# Please note - there may be delays in finding index statistics after document upload\n", "# Index statistics is not a real time API\n", "# See https://learn.microsoft.com/rest/api/searchservice/preview-api/get-index-statistics for more information\n", "\n", "def bytes_to_mb(bytes):\n", " return round(bytes / (1024 * 1024), 4)\n", "\n", "def find_storage_size_mb(index_name):\n", " response = search_index_client.get_index_statistics(index_name)\n", " return bytes_to_mb(response[\"storage_size\"]), bytes_to_mb(response[\"vector_index_size\"])\n", "\n", "index_sizes = [(find_storage_size_mb(index_name), index_name) for index_name in (f\"{base_index_name}-{index}\" for index in indexes.keys())]\n", "index_sizes.sort(key=lambda item: item[0][0], reverse=True)\n", "\n", "for ((storage_size, vector_size), index_name) in index_sizes:\n", " print(\"*\" * 40)\n", " print(f\"Index Name: {index_name}\\nStorage Size: {storage_size}MB\\nVector Size: {vector_size}MB\")\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9" } }, "nbformat": 4, "nbformat_minor": 2 }

demo-python/code/vector-quantization-and-storage/vector-quantization-and-storage.ipynb (379 lines of code) (raw):