# Upgrade an index to use ELSER model

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/model-upgrades/upgrading-index-to-use-elser.ipynb)

In this notebook we will see example on how to upgrade your index to ELSER model `.elser_model_2` using [Reindex API](https://elasticsearch-py.readthedocs.io/en/stable/api.html#elasticsearch.Elasticsearch.reindex). 

**Note:** Alternatively, you could also [Update by query](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update-by-query.html) to update index in place to use ELSER. In this notebook, we will see examples on using Reindex API. 


Scenerios that we will see in this notebook:

1. Migrating a index which hasn't  generated [`text_expansion`](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-text-expansion-query.html) field to  ELSER model `.elser_model_2` 
2. Upgrade an existing index with `.elser_model_1` to use `.elser_model_2` model
3. Upgrade a index which use different model to use ELSER

# Install and Connect

To get started, we'll need to connect to our Elastic deployment using the Python client.
Because we're using an Elastic Cloud deployment, we'll use the **Cloud ID** to identify our deployment.
First we need to `pip` install the following packages:

- `elasticsearch`


In [None]:
!pip install "elasticsearch<9" -qU

Next, we will import all the modules that we need. 

In [3]:
from elasticsearch import Elasticsearch, helpers
from urllib.request import urlopen
from getpass import getpass
import json
import time

Now we will instantiate the Python Elasticsearch client. First we prompt for  password and Cloud ID.

Then we create a `client` object that instantiates an instance of the `Elasticsearch` class.


In [4]:
# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id
ELASTIC_CLOUD_ID = getpass("Elastic Cloud ID: ")

# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key
ELASTIC_API_KEY = getpass("Elastic Api Key: ")

# Create the client instance
client = Elasticsearch(
    cloud_id=ELASTIC_CLOUD_ID,
    api_key=ELASTIC_API_KEY,
)

print(client.info())

Elastic Cloud ID:  ········
Elastic Api Key:  ········


{'name': 'instance-0000000001', 'cluster_name': 'ad402eb9a59041458b8edfc021e91caf', 'cluster_uuid': 'ks_HfcCdSf2qrcKZQsk9Lg', 'version': {'number': '8.11.0', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': 'd9ec3fa628c7b0ba3d25692e277ba26814820b20', 'build_date': '2023-11-04T10:04:57.184859352Z', 'build_snapshot': False, 'lucene_version': '9.8.0', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'}


# Download and Deploy ELSER v2 Model

Before we begin, we have to download and deploy ELSER model `.elser_model_2`. 

Follow the instructions under the section [Download and Deploy ELSER Model](../search/03-ELSER.ipynb#download-and-deploy-elser-model)  from the [ELSER](../search/03-ELSER.ipynb) notebook 


#  Case 1: Migrate an index with no `text_expansion` field

In this case we will see how to upgrade an index which has a [ingestion pipeline](https://www.elastic.co/guide/en/elasticsearch/reference/current/ingest.html) configured, to use ELSER model `elser_model_2` 

# Create Ingestion pipeline with lowercase

We will create a simple pipeline to convert title field values to lowercase and use this ingestion pipeline on our index. 

In [5]:
client.ingest.put_pipeline(
    id="ingest-pipeline-lowercase",
    description="Ingest pipeline to change title to lowercase",
    processors=[{"lowercase": {"field": "title"}}],
)

ObjectApiResponse({'acknowledged': True})

# Create index - `movies` with mappings

Next, we will create a index with pipeline `ingest-pipeline-lowercase` that we created in previous step.

In [6]:
client.indices.delete(index="movies", ignore_unavailable=True)
client.indices.create(
    index="movies",
    settings={
        "index": {
            "number_of_shards": 1,
            "number_of_replicas": 1,
            "default_pipeline": "ingest-pipeline-lowercase",
        }
    },
    mappings={
        "properties": {
            "plot": {
                "type": "text",
                "fields": {"keyword": {"type": "keyword", "ignore_above": 256}},
            },
        }
    },
)

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'movies'})

# Insert Documents
we are now ready to insert sample dataset of 12 movies to our index `movies`

In [8]:
url = "https://raw.githubusercontent.com/elastic/elasticsearch-labs/main/notebooks/search/movies.json"
response = urlopen(url)

# Load the response data into a JSON object
data_json = json.loads(response.read())

# Prepare the documents to be indexed
documents = []
for doc in data_json:
    documents.append(
        {
            "_index": "movies",
            "_source": doc,
        }
    )

# Use helpers.bulk to index
helpers.bulk(client, documents)

time.sleep(5)
print("Done indexing documents into `movies` index!")

Done indexing documents into `movies` index!


# Upgrade index `movies` to use ELSER model

we are ready to re-index  `movies` to a new index with the ELSER model `.elser_model_2`. As a first step, we have to create new ingestion pipeline and index to use ELSER model. 

# Create a new pipeline with ELSER 
Let's create a new ingestion pipeline with ELSER model `.elser_model_2`. 

In [9]:
client.ingest.put_pipeline(
    id="elser-ingest-pipeline",
    description="Ingest pipeline for ELSER",
    processors=[
        {
            "inference": {
                "model_id": ".elser_model_2",
                "input_output": [
                    {"input_field": "plot", "output_field": "plot_embedding"}
                ],
            }
        }
    ],
)

ObjectApiResponse({'acknowledged': True})

# Create a index with mappings

Next, create an index with required mappings for ELSER.  

In [13]:
client.indices.delete(index="elser-movies", ignore_unavailable=True)
client.indices.create(
    index="elser-movies",
    mappings={
        "properties": {
            "plot": {
                "type": "text",
                "fields": {"keyword": {"type": "keyword", "ignore_above": 256}},
            },
            "plot_embedding": {"type": "sparse_vector"},
        }
    },
)

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'elser-movies'})

**Note:**
- `plot_embedding` is the name of the field that contains generated token with the type [`sparse_vector`](https://www.elastic.co/guide/en/elasticsearch/reference/master/sparse-vector.html) 
- `plot` is the name of the field from which the [`sparse_vector`](https://www.elastic.co/guide/en/elasticsearch/reference/master/sparse-vector.html)  are created. 

# Reindex with updated pipeline 

With the help of [Reindex API](https://elasticsearch-py.readthedocs.io/en/stable/api.html#elasticsearch.Elasticsearch.reindex), we can copy data from old index `movies` and to new index `elser-movies` with  ingestion pipeline set to `elser-ingest-pipeline` .  On success, the index `elser-movies` creates tokens on the `text_expansion` terms that you targeted for ELSER inference.

In [15]:
client.reindex(
    source={"index": "movies"},
    dest={"index": "elser-movies", "pipeline": "elser-ingest-pipeline"},
)
time.sleep(7)

Once reindex is complete, inspect any document in the index `elser-movies` and notice that the document has a additional field `plot_embedding` with terms that we will be using in `text_expansion` query. 
 

# Querying documents with ELSER 

Let's try a semantic search on our index with ELSER model `.elser_model_2`

In [48]:
response = client.search(
    index="elser-movies",
    size=3,
    query={
        "text_expansion": {
            "plot_embedding": {
                "model_id": ".elser_model_2",
                "model_text": "investigation",
            }
        }
    },
)

for hit in response["hits"]["hits"]:
    doc_id = hit["_id"]
    score = hit["_score"]
    title = hit["_source"]["title"]
    plot = hit["_source"]["plot"]
    print(f"Score: {score}\nTitle: {title}\nPlot: {plot}\n")

Score: 6.403748
Title: se7en
Plot: Two detectives, a rookie and a veteran, hunt a serial killer who uses the seven deadly sins as his motives.

Score: 3.6703482
Title: the departed
Plot: An undercover cop and a mole in the police attempt to identify each other while infiltrating an Irish gang in South Boston.

Score: 2.9359207
Title: the usual suspects
Plot: A sole survivor tells of the twisty events leading up to a horrific gun battle on a boat, which began when five criminals met at a seemingly random police lineup.



# Case 2: Upgrade index with ELSER model to `.elser_model_2`

If you already have a index with ELSER model `.elser_model_1` and would like to upgrade to `.elser_model_2`, you can use the Reindex API with ingestion pipeline to use ELSER `.elser_model_2` model.

**`Note:`** Before we begin, ensure that you are on Elasticsearch 8.11 version and ELSER model `.elser_model_2` is deployed. 


# Create a new ingestion pipeline

We will create a pipeline with `.elser_model_2` to enable us with reindexing. 

In [37]:
client.ingest.put_pipeline(
    id="elser-pipeline-upgrade-demo",
    description="Ingest pipeline for ELSER upgrade demo",
    processors=[
        {
            "inference": {
                "model_id": ".elser_model_2",
                "input_output": [
                    {"input_field": "title", "output_field": "title_embedding"}
                ],
            }
        }
    ],
)

ObjectApiResponse({'acknowledged': True})

# Create a new index with mappings
We will create  a new index with required mappings supporting ELSER

In [38]:
client.indices.delete(index="elser-upgrade-index-demo", ignore_unavailable=True)
client.indices.create(
    index="elser-upgrade-index-demo",
    mappings={
        "properties": {
            "title": {
                "type": "text",
                "fields": {"keyword": {"type": "keyword", "ignore_above": 256}},
            },
            "title_embedding": {"type": "sparse_vector"},
        }
    },
)

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'elser-upgrade-index-demo'})

# Use Reindex API
we will use [Reindex API](https://elasticsearch-py.readthedocs.io/en/stable/api.html#elasticsearch.Elasticsearch.reindex) to move data from old index to new index `elser-upgrade-index-demo`. We will be excluding target field from old index and instead generate new tokens in the field `plot_embedding` with `.elser_model_2` while reindexing. 

**`Note:`** Make sure to replace `my-index` with your index name that you intend to upgrade and the field `my-tokens-field` with the field name that you have generated tokens previously.



In [39]:
client.reindex(
    source={
        "index": "books",  # replace with your index name
        "_source": {
            "excludes": [
                "title_vector"
            ]  # replace with the field-name from your index, that has previously generated tokens
        },
    },
    dest={
        "index": "elser-upgrade-index-demo",
        "pipeline": "elser-pipeline-upgrade-demo",
    },
)
time.sleep(5)

# Querying your data

Once reindexing is complete, you are ready to query on your data and perform semantic search 

In [40]:
response = client.search(
    index="elser-upgrade-index-demo",
    size=3,
    query={
        "text_expansion": {
            "title_embedding": {
                "model_id": ".elser_model_2",
                "model_text": "Programming Course",
            }
        }
    },
)

for hit in response["hits"]["hits"]:
    doc_id = hit["_id"]
    score = hit["_score"]
    title = hit["_source"]["title"]
    plot = hit["_source"]["title"]
    print(f"Score: {score}\nTitle: {title}\nPlot: {plot}\n")

Score: 14.755971
Title: Python Crash Course
Plot: Python Crash Course

Score: 14.168372
Title: The Pragmatic Programmer: Your Journey to Mastery
Plot: The Pragmatic Programmer: Your Journey to Mastery

Score: 11.704832
Title: The Clean Coder: A Code of Conduct for Professional Programmers
Plot: The Clean Coder: A Code of Conduct for Professional Programmers



# Case 3: Upgrade a index with different model to ELSER

Now we will see how to move your index which already has generated `embedding` using a different model. 

Lets consider the index - `books` and has generated `title_vector` using the NLP model `sentence-transformers__all-minilm-l6-v2`. In case you would like know about more how to load a NLP model to an index, follow the steps from our notebook [loading-model-from-hugging-face.ipynb](../integrations/hugging-face/loading-model-from-hugging-face.ipynb)

Follow similiar proceedure that we did in previously: 
1. Create a ingestion pipeline with ELSER model `.elser_model_2`
2. Create a index with mappings, with the pipeline we created in the previous step. 
3. Reindex, excluding the field that has embedding from the `books` index

Before we begin, lets take a look at our index `books` and see the mappings

In [41]:
client.indices.get(index="books")

ObjectApiResponse({'books': {'aliases': {}, 'mappings': {'properties': {'authors': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'num_reviews': {'type': 'long'}, 'publish_date': {'type': 'date'}, 'publisher': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'summary': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'title': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'title_vector': {'type': 'dense_vector', 'dims': 384, 'index': True, 'similarity': 'cosine'}}}, 'settings': {'index': {'routing': {'allocation': {'include': {'_tier_preference': 'data_content'}}}, 'number_of_shards': '1', 'provided_name': 'books', 'creation_date': '1706118077023', 'number_of_replicas': '1', 'uuid': 'GxGfG_LtSBOIXsB-5bF2_A', 'version': {'created': '8500003'}}}}})

Notice the field `title_vector`, We will exclude this field in our new index and generate new mapping against the field `title` from the `books` index

# Create ingestion pipeline

Next, we will create a pipeline using ELSER model `.elser_model_2`


In [42]:
client.ingest.put_pipeline(
    id="elser-pipeline-books",
    description="Ingest pipeline for ELSER upgrade",
    processors=[
        {
            "inference": {
                "model_id": ".elser_model_2",
                "input_output": [
                    {"input_field": "title", "output_field": "title_embedding"}
                ],
            }
        }
    ],
)

ObjectApiResponse({'acknowledged': True})

# Create index with mappings

Lets create a index `elser-books` with mappings

In [43]:
client.indices.delete(index="elser-books", ignore_unavailable=True)
client.indices.create(
    index="elser-books",
    mappings={
        "properties": {
            "title": {
                "type": "text",
                "fields": {"keyword": {"type": "keyword", "ignore_above": 256}},
            },
            "title_embedding": {"type": "sparse_vector"},
        }
    },
)

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'elser-books'})

# Reindex API

we will use the [Reindex API](https://elasticsearch-py.readthedocs.io/en/stable/api.html#elasticsearch.Elasticsearch.reindex) to copy data and generate `text_expansion` embedding to our new index `elser-books`. 

In [44]:
client.reindex(
    source={"index": "books", "_source": {"excludes": ["title_vector"]}},
    dest={"index": "elser-books", "pipeline": "elser-pipeline-books"},
)
time.sleep(5)

# Querying your data
Success! Now we can query data on the index `elser-books`.

In [47]:
response = client.search(
    index="elser-books",
    size=3,
    query={
        "text_expansion": {
            "title_embedding": {
                "model_id": ".elser_model_2",
                "model_text": "Python tutorial",
            }
        }
    },
)

for hit in response["hits"]["hits"]:
    doc_id = hit["_id"]
    score = hit["_score"]
    title = hit["_source"]["title"]
    print(f"Score: {score}\nTitle: {title}")

Score: 22.333044
Title: Python Crash Course
Score: 9.364547
Title: The Pragmatic Programmer: Your Journey to Mastery
Score: 8.410445
Title: Clean Code: A Handbook of Agile Software Craftsmanship
