# Semantic Reranking with Cohere Reranker

<a target="_blank" href="https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/search/10-semantic-reranking-retriever-cohere.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This example will show how to combine search and [semantic reranking](https://www.elastic.co/guide/en/elasticsearch/reference/current/semantic-reranking.html) to improve the accuracy of your search results. We'll be using the [rerank feature from Cohere](https://cohere.com/rerank).

Note: for a complete integration with Cohere please refer to [this notebook](https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/integrations/cohere/cohere-elasticsearch.ipynb). This example focuses on Cohere reranking only through an Elastic [retriever](https://www.elastic.co/guide/en/elasticsearch/reference/current/retrievers-overview.html) query.

# Requirements

For this example, you will need:

- An Elastic deployment:

  - We'll be using [Elastic Cloud](https://www.elastic.co/guide/en/cloud/current/ec-getting-started.html) for this example (available with a [free trial](https://cloud.elastic.co/registration?onboarding_token=vectorsearch&utm_source=github&utm_content=elasticsearch-labs-notebook))

- Elasticsearch 8.15 or above, or [Elasticsearch serverless](https://www.elastic.co/elasticsearch/serverless)

- Cohere API key

## Create Elastic Cloud deployment

If you don't have an Elastic Cloud deployment, sign up [here](https://cloud.elastic.co/registration?onboarding_token=vectorsearch&utm_source=github&utm_content=elasticsearch-labs-notebook) for a free trial.

## Install packages and connect with Elasticsearch Client

To get started, we'll need to connect to our Elastic deployment using the Python client (version 8.15.0 or above).
Because we're using an Elastic Cloud deployment, we'll use the **Cloud ID** to identify our deployment.

First we need to `pip` install the `elasticsearch` package:

In [None]:
!pip install elasticsearch

Next, we need to import the modules we need.

üîê NOTE: `getpass` enables us to securely prompt the user for credentials without echoing them to the terminal, or storing it in memory.

In [None]:
from elasticsearch import Elasticsearch, exceptions
from urllib.request import urlopen
from getpass import getpass
import json

Now we can instantiate the Python Elasticsearch client.

First we prompt the user for their password and Cloud ID.
Then we create a `client` object that instantiates an instance of the `Elasticsearch` class.

In [None]:
# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id
ELASTIC_CLOUD_ID = getpass("Elastic Cloud ID: ")

# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key
ELASTIC_API_KEY = getpass("Elastic API Key: ")

# Create the client instance
client = Elasticsearch(
    # For local development
    # hosts=["http://localhost:9200"]
    cloud_id=ELASTIC_CLOUD_ID,
    api_key=ELASTIC_API_KEY,
)

### Enable telemetry

Knowing that you are using this notebook helps us decide where to invest our efforts to improve our products. We would like to ask you that you run the following code to let us gather anonymous usage statistics. See [telemetry.py](https://github.com/elastic/elasticsearch-labs/blob/main/telemetry/telemetry.py) for details. Thank you!

In [None]:
!curl -O -s https://raw.githubusercontent.com/elastic/elasticsearch-labs/main/telemetry/telemetry.py
from telemetry import enable_telemetry

client = enable_telemetry(client, "10-semantic-reranking-retriever-cohere")

### Test the Client

Before you continue, confirm that the client has connected with this test.

In [None]:
print(client.info())

Refer to [the documentation](https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#connect-self-managed-new) to learn how to connect to a self-managed deployment.

Read [this page](https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#connect-self-managed-new) to learn how to connect using API keys.

## Set up Cohere Inference Endpoint

We'll be using the [Cohere rerank](https://cohere.com/rerank) feature to perform semantic reordering of search hits through an Elasticsearch [inference endpoint](https://www.elastic.co/guide/en/elasticsearch/reference/current/inference-apis.html).

Go to the [Cohere website](https://cohere.com/) and create an API key, then set it here.

In [None]:
COHERE_API_KEY = getpass("Cohere API key: ")

### Create the Inference Endpoint

Let's create the inference endpoint by using the [Create inference API](https://www.elastic.co/guide/en/elasticsearch/reference/current/put-inference-api.html).

For this example we'll use the [Cohere service](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-elser.html), but the inference API also supports [many other inference services](https://www.elastic.co/guide/en/elasticsearch/reference/current/put-inference-api.html#put-inference-api-desc).

In [None]:
try:
    client.inference.delete(inference_id="cohere-rerank-inference")
except exceptions.NotFoundError:
    # Inference endpoint does not exist
    pass

try:
    client.options(
        request_timeout=60, max_retries=3, retry_on_timeout=True
    ).inference.put(
        task_type="rerank",
        inference_id="cohere-rerank-inference",
        inference_config={
            "service": "cohere",
            "service_settings": {
                "api_key": COHERE_API_KEY,
                "model_id": "rerank-english-v3.0",
            },
        },
    )
    print("Inference endpoint created successfully")
except exceptions.BadRequestError as e:
    if e.error == "resource_already_exists_exception":
        print("Inference endpoint created successfully")
    else:
        raise e

## Create the Index

Now we need to create an index. Let's create one that enables us to perform search and semantic reranking on text articles.

In [None]:
client.indices.delete(index="semantic-reranking-articles", ignore_unavailable=True)
client.indices.create(
    index="semantic-reranking-articles",
    mappings={
        "properties": {
            "title": {"type": "text"},
            "text": {"type": "text"},
        },
    },
)

## Populate the Index

Let's populate the index with a couple of random article fragments from Wikipedia.

In [None]:
url = "https://raw.githubusercontent.com/elastic/elasticsearch-labs/main/notebooks/search/articles-wikipedia.json"
response = urlopen(url)
articles = json.loads(response.read())

operations = []
for article in articles:
    operations.append({"index": {"_index": "semantic-reranking-articles"}})
    operations.append(article)
client.bulk(index="semantic-reranking-articles", operations=operations, refresh=True)

## Search without reranking

First let's run a classic search that uses lexical text matching.

### Aside: Pretty printing Elasticsearch search results

Your `search` API calls will return hard-to-read nested JSON.
We'll create a little function called `pretty_search_response` to return nice, human-readable outputs from our examples.

In [None]:
def pretty_search_response(response):
    if len(response["hits"]["hits"]) == 0:
        print("Your search returned no results.")
    else:
        for hit in response["hits"]["hits"]:
            id = hit["_id"]
            score = hit["_score"]
            title = hit["_source"]["title"]
            text = hit["_source"]["text"]

            pretty_output = f"\nID: {id}\nScore: {score}\nTitle: {title}\nText: {text}"

            print(pretty_output)

Assume we're interested to learn about the solar eclipse, but we don't know the exact name of this phenomenon. We'll perform a classic search that matches the text _"the Moon covers the Sun"_. Let's see what results this finds:

In [67]:
query = {"match": {"text": "the Moon covers the Sun"}}
response = client.search(
    index="semantic-reranking-articles",
    query=query,
)

pretty_search_response(response)


ID: yyKfN5EB0RS1pNqNt7y0
Score: 2.0718374
Title: Cheshire
Text: Cheshire is a county in England. It is the North West part of the country. It is most famous for making salt and cheese. Cheshire is made up of lots of little towns including the Borough of Macclesfield which covers a large area of plains. The main attraction is in Kerridge where there is the famous landmark 'White Nancy.'

ID: 1CKfN5EB0RS1pNqNt7y0
Score: 0.8610966
Title: Sun Moon Lake
Text: Sun Moon Lake (; Thao: "Zintun") is a lake in Nantou County, Taiwan. It is the largest lake in Taiwan. Sun Moon Lake is one of the Eight Views of Taiwan. The lake was named because the east side of the lake looks like a sun, and the west side of the lake looks like a moon.

ID: 1SKfN5EB0RS1pNqNt7y0
Score: 0.83579814
Title: Unification Church
Text: The Unification Church is a religious movement started by Sun Myung Moon in Korea in the 1940s. It officially began as a church in 1954 in Seoul, South Korea. On October 12, 2009, it was ann

The top hits - Cheshire, Sun Moon Lake, Unification Church and so on - all come up because the text has matching words with our query's words, for example "covers" or "sun". However, these contents are unrelated to the _meaning_ of our query. Further down below, result #7 _is_ an article about the solar eclipse, but it's lost among the many other hits.

Can we somehow get more relevant results?

## Search with reranking

Enter semantic reranking! We'll instruct Elasticsearch to run the same query, but this time also perform semantic reranking on the top results. For this we need to wrap our query in a `text_similarity_reranker` [retriever](https://www.elastic.co/guide/en/elasticsearch/reference/current/retrievers-overview.html), and reference the previously created Cohere inference endpoint that will do the reranking.

In [68]:
query = {"match": {"text": "the Moon covers the Sun"}}
response = client.search(
    index="semantic-reranking-articles",
    retriever={
        "text_similarity_reranker": {
            "retriever": {"standard": {"query": query}},
            "field": "text",
            "rank_window_size": 20,
            "inference_id": "cohere-rerank-inference",
            "inference_text": "the Moon covers the Sun",
            "min_score": 0.40,
        },
    },
)

pretty_search_response(response)


ID: 0SKfN5EB0RS1pNqNt7y0
Score: 0.9812029
Title: Solar eclipse
Text: As seen from Earth, a solar eclipse /"ee-klips"/ happens when the Moon is directly between the Earth and the Sun. This makes the Moon fully or partially (partly) cover the sun. Solar eclipses can only happen during a new moon. Every year there are about two solar eclipses. Sometimes there are even five solar eclipses in a year. However, only two of these can be total solar eclipses, and often a year will pass without a total eclipse.

ID: 0iKfN5EB0RS1pNqNt7y0
Score: 0.9078038
Title: Solar eclipse of December 14, 2020
Text: A total solar eclipse occurred on Monday, December 14, 2020. A solar eclipse occurs when the Moon passes between Earth and the Sun, which will cover the image of the Sun for a viewer on Earth.

ID: zyKfN5EB0RS1pNqNt7y0
Score: 0.41584742
Title: Phases of the Moon
Text: As the Moon orbits around the Earth, the half of the Moon that faces the Sun will be lit up. The different shapes of the lit portion

Much better! Not only are the top results semantically close to our query _"the Moon covers the Sun"_, the irrelevant results with a low score were discarded from the response. As a result, the list of articles we ended up with are indeed those that provide the best answer to our question.

What's also great about reranking is that it can be used on top of existing search solutions out of the box. Under the hood the same lexical search was executed as before - the one that resulted in mixed hits -, then Cohere took the texts from the top articles and reordered them according to their relation to our query's meaning.

Whether your search application uses lexical, vector or hybrid search, reranking can improve your results.

## Conclusion

[Semantic reranking](https://www.elastic.co/guide/en/elasticsearch/reference/current/semantic-reranking.html) is an incredibly powerful tool for boosting the performance of a search experience or a RAG tool. It lets us immediately add semantic search capabilities to existing Elasticsearch installations out there.