# Azure AI Search backup and restore sample

**This unofficial code sample is offered "as-is" and might not work for all customers and scenarios. If you run into difficulties, you should manually recreate and reload your search index on a new search service.**

This notebook demonstrates how to back up and restore a search index and migrate it to another instance of Azure AI Search. The target instance can be a different tier and configuration, but make sure it has available storage and quota, and that the [region has the features you require](https://azure.microsoft.com/explore/global-infrastructure/products-by-region/?products=search).

> **Note**: Azure AI Search now supports [service upgrades](https://learn.microsoft.com/azure/search/search-how-to-upgrade) and [pricing tier changes](https://learn.microsoft.com/azure/search/search-capacity-planning#change-your-pricing-tier). If you're backing up and restoring your index for migration to a higher capacity service, you now have other options.

### Prerequisites

+ The search index has 100,000 documents or less. For larger indexes, use [Resumable backup and restore](../resumable-index-backup-restore/backup-and-restore.ipynb). 

+ The search index you're backing up must have a `key` field that is `filterable` and `sortable`. If your document key doesn't meet this criteria, you can create and populate a new key field, and remove the `key=true` flag from the previous key field. 

+ Only fields marked as `retrievable` can be successfully backed up and restored. You can toggle `retrievable` between true and false on any field, but as of this writing, the Azure portal doesn't allow you to modify `retrievable` on vector fields. As a workaround, use an Azure SDK or Postman with an Update Index REST call.

  Setting `retrievable` to true doesn't increase index size. A `retrievable` action pulls from content that already exists in your index.

### Set up a Python virtual environment in Visual Studio Code

1. Open the Command Palette (Ctrl+Shift+P).
1. Search for **Python: Create Environment**.
1. Select **Venv**.
1. Select a Python interpreter. Choose 3.10 or later.

It can take a minute to set up. If you run into problems, see [Python environments in VS Code](https://code.visualstudio.com/docs/python/environments).

### Install packages

In [None]:
! pip install -r azure-search-backup-and-restore-requirements.txt --quiet

### Load .env file (Copy .env-sample to .env and update accordingly)

In [None]:
from dotenv import load_dotenv
from azure.identity import DefaultAzureCredential
from azure.core.credentials import AzureKeyCredential
import os

load_dotenv(override=True) # take environment variables from .env.

# Variables not used here do not need to be updated in your .env file
source_endpoint = os.environ["AZURE_SEARCH_SERVICE_ENDPOINT"]
source_credential = AzureKeyCredential(os.environ["AZURE_SEARCH_ADMIN_KEY"]) if len(os.environ["AZURE_SEARCH_ADMIN_KEY"]) > 0 else DefaultAzureCredential()
source_index_name = os.environ["AZURE_SEARCH_INDEX"]
# Default to same service for copying index
target_endpoint = os.environ["AZURE_TARGET_SEARCH_SERVICE_ENDPOINT"] if len(os.environ["AZURE_TARGET_SEARCH_SERVICE_ENDPOINT"]) > 0 else source_endpoint
target_credential = AzureKeyCredential(os.environ["AZURE_TARGET_SEARCH_ADMIN_KEY"]) if len(os.environ["AZURE_TARGET_SEARCH_ADMIN_KEY"]) > 0 else DefaultAzureCredential()
target_index_name = os.environ["AZURE_TARGET_SEARCH_INDEX"] 

This script demonstrates backing up and restoring an Azure AI Search index between two services. The `backup_and_restore_index` function retrieves the source index definition, creates a new target index, backs up all documents, and restores them to the target index.

In [None]:
from azure.search.documents import SearchClient  
from azure.search.documents.indexes import SearchIndexClient
import tqdm  
  
def create_clients(endpoint, credential, index_name):  
    search_client = SearchClient(endpoint=endpoint, index_name=index_name, credential=credential)  
    index_client = SearchIndexClient(endpoint=endpoint, credential=credential)  
    return search_client, index_client

def total_count(search_client):
    response = search_client.search(include_total_count=True, search_text="*", top=0)
    return response.get_count()
  
def search_results_with_filter(search_client, key_field_name):
    last_item = None
    response = search_client.search(search_text="*", top=100000, order_by=key_field_name).by_page()
    while True:
        for page in response:
            page = list(page)
            if len(page) > 0:
                last_item = page[-1]
                yield page
            else:
                last_item = None
        
        if last_item:
            response = search_client.search(search_text="*", top=100000, order_by=key_field_name, filter=f"{key_field_name} gt '{last_item[key_field_name]}'").by_page()
        else:
            break

def search_results_without_filter(search_client):
    response = search_client.search(search_text="*", top=100000).by_page()
    for page in response:
        page = list(page)
        yield page

def backup_and_restore_index(source_endpoint, source_key, source_index_name, target_endpoint, target_key, target_index_name):  
    # Create search and index clients  
    source_search_client, source_index_client = create_clients(source_endpoint, source_key, source_index_name)  
    target_search_client, target_index_client = create_clients(target_endpoint, target_key, target_index_name)  
  
    # Get the source index definition  
    source_index = source_index_client.get_index(name=source_index_name)
    non_retrievable_fields = []
    for field in source_index.fields:
        if field.hidden == True:
            non_retrievable_fields.append(field)
        if field.key == True:
            key_field = field

    if not key_field:
        raise Exception("Key Field Not Found")
    
    if len(non_retrievable_fields) > 0:
        print(f"WARNING: The following fields are not marked as retrievable and cannot be backed up and restored: {', '.join(f.name for f in non_retrievable_fields)}")
  
    # Create target index with the same definition 
    source_index.name = target_index_name
    target_index_client.create_or_update_index(source_index)
  
    document_count = total_count(source_search_client)
    can_use_filter = key_field.sortable and key_field.filterable
    if not can_use_filter:
        print("WARNING: The key field is not filterable or not sortable. A maximum of 100,000 records can be backed up and restored.")
    # Backup and restore documents  
    all_documents = search_results_with_filter(source_search_client, key_field.name) if can_use_filter else search_results_without_filter(source_search_client)

    print("Backing up and restoring documents:")  
    failed_documents = 0  
    failed_keys = []  
    with tqdm.tqdm(total=document_count) as progress_bar:  
        for page in all_documents:
            result = target_search_client.upload_documents(documents=page)  
            progress_bar.update(len(result))  
  
            for item in result:  
                if item.succeeded is not True:  
                    failed_documents += 1
                    failed_keys.append(page[result.index_of(item)].id)  
                    print(f"Document upload error: {item.error.message}")  
  
    if failed_documents > 0:  
        print(f"Failed documents: {failed_documents}")  
        print(f"Failed document keys: {failed_keys}")  
    else:  
        print("All documents uploaded successfully.")  
  
    print(f"Successfully backed up '{source_index_name}' and restored to '{target_index_name}'")  
    return source_search_client, target_search_client, all_documents  

source_search_client, target_search_client, all_documents = backup_and_restore_index(source_endpoint, source_credential, source_index_name, target_endpoint, target_credential, target_index_name)  


Use document counts to verify a successful restore. The verify_counts function compares document counts between source and target indexes after backup and restore. It prints a message indicating if the document counts match or not.

Storage usage won't be exactly the same as the original index. It's expected to see small variations in consumed storage.

In [None]:
def verify_counts(source_search_client, target_search_client):  
    source_document_count = source_search_client.get_document_count()  
    target_document_count = target_search_client.get_document_count()  
  
    print(f"Source document count: {source_document_count}")  
    print(f"Target document count: {target_document_count}")  
  
    if source_document_count == target_document_count:  
        print("Document counts match.")  
    else:  
        print("Document counts do not match.")  
  
# Call the verify_counts function with the search_clients returned by the backup_and_restore_index function  
verify_counts(source_search_client, target_search_client)  
