notebooks/model-upgrades/upgrading-index-to-use-elser.ipynb (863 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Upgrade an index to use ELSER model\n",
"\n",
"[](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/model-upgrades/upgrading-index-to-use-elser.ipynb)\n",
"\n",
"In this notebook we will see example on how to upgrade your index to ELSER model `.elser_model_2` using [Reindex API](https://elasticsearch-py.readthedocs.io/en/stable/api.html#elasticsearch.Elasticsearch.reindex). \n",
"\n",
"**Note:** Alternatively, you could also [Update by query](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update-by-query.html) to update index in place to use ELSER. In this notebook, we will see examples on using Reindex API. \n",
"\n",
"\n",
"Scenerios that we will see in this notebook:\n",
"\n",
"1. Migrating a index which hasn't generated [`text_expansion`](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-text-expansion-query.html) field to ELSER model `.elser_model_2` \n",
"2. Upgrade an existing index with `.elser_model_1` to use `.elser_model_2` model\n",
"3. Upgrade a index which use different model to use ELSER"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Install and Connect\n",
"\n",
"To get started, we'll need to connect to our Elastic deployment using the Python client.\n",
"Because we're using an Elastic Cloud deployment, we'll use the **Cloud ID** to identify our deployment.\n",
"First we need to `pip` install the following packages:\n",
"\n",
"- `elasticsearch`\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install \"elasticsearch<9\" -qU"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we will import all the modules that we need. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"from elasticsearch import Elasticsearch, helpers\n",
"from urllib.request import urlopen\n",
"from getpass import getpass\n",
"import json\n",
"import time"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we will instantiate the Python Elasticsearch client. First we prompt for password and Cloud ID.\n",
"\n",
"Then we create a `client` object that instantiates an instance of the `Elasticsearch` class.\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdin",
"output_type": "stream",
"text": [
"Elastic Cloud ID: ········\n",
"Elastic Api Key: ········\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'name': 'instance-0000000001', 'cluster_name': 'ad402eb9a59041458b8edfc021e91caf', 'cluster_uuid': 'ks_HfcCdSf2qrcKZQsk9Lg', 'version': {'number': '8.11.0', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': 'd9ec3fa628c7b0ba3d25692e277ba26814820b20', 'build_date': '2023-11-04T10:04:57.184859352Z', 'build_snapshot': False, 'lucene_version': '9.8.0', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'}\n"
]
}
],
"source": [
"# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id\n",
"ELASTIC_CLOUD_ID = getpass(\"Elastic Cloud ID: \")\n",
"\n",
"# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key\n",
"ELASTIC_API_KEY = getpass(\"Elastic Api Key: \")\n",
"\n",
"# Create the client instance\n",
"client = Elasticsearch(\n",
" cloud_id=ELASTIC_CLOUD_ID,\n",
" api_key=ELASTIC_API_KEY,\n",
")\n",
"\n",
"print(client.info())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Download and Deploy ELSER v2 Model\n",
"\n",
"Before we begin, we have to download and deploy ELSER model `.elser_model_2`. \n",
"\n",
"Follow the instructions under the section [Download and Deploy ELSER Model](../search/03-ELSER.ipynb#download-and-deploy-elser-model) from the [ELSER](../search/03-ELSER.ipynb) notebook \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Case 1: Migrate an index with no `text_expansion` field\n",
"\n",
"In this case we will see how to upgrade an index which has a [ingestion pipeline](https://www.elastic.co/guide/en/elasticsearch/reference/current/ingest.html) configured, to use ELSER model `elser_model_2` "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create Ingestion pipeline with lowercase\n",
"\n",
"We will create a simple pipeline to convert title field values to lowercase and use this ingestion pipeline on our index. "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"ObjectApiResponse({'acknowledged': True})"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"client.ingest.put_pipeline(\n",
" id=\"ingest-pipeline-lowercase\",\n",
" description=\"Ingest pipeline to change title to lowercase\",\n",
" processors=[{\"lowercase\": {\"field\": \"title\"}}],\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create index - `movies` with mappings\n",
"\n",
"Next, we will create a index with pipeline `ingest-pipeline-lowercase` that we created in previous step."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'movies'})"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"client.indices.delete(index=\"movies\", ignore_unavailable=True)\n",
"client.indices.create(\n",
" index=\"movies\",\n",
" settings={\n",
" \"index\": {\n",
" \"number_of_shards\": 1,\n",
" \"number_of_replicas\": 1,\n",
" \"default_pipeline\": \"ingest-pipeline-lowercase\",\n",
" }\n",
" },\n",
" mappings={\n",
" \"properties\": {\n",
" \"plot\": {\n",
" \"type\": \"text\",\n",
" \"fields\": {\"keyword\": {\"type\": \"keyword\", \"ignore_above\": 256}},\n",
" },\n",
" }\n",
" },\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Insert Documents\n",
"we are now ready to insert sample dataset of 12 movies to our index `movies`"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Done indexing documents into `movies` index!\n"
]
}
],
"source": [
"url = \"https://raw.githubusercontent.com/elastic/elasticsearch-labs/main/notebooks/search/movies.json\"\n",
"response = urlopen(url)\n",
"\n",
"# Load the response data into a JSON object\n",
"data_json = json.loads(response.read())\n",
"\n",
"# Prepare the documents to be indexed\n",
"documents = []\n",
"for doc in data_json:\n",
" documents.append(\n",
" {\n",
" \"_index\": \"movies\",\n",
" \"_source\": doc,\n",
" }\n",
" )\n",
"\n",
"# Use helpers.bulk to index\n",
"helpers.bulk(client, documents)\n",
"\n",
"time.sleep(5)\n",
"print(\"Done indexing documents into `movies` index!\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Upgrade index `movies` to use ELSER model\n",
"\n",
"we are ready to re-index `movies` to a new index with the ELSER model `.elser_model_2`. As a first step, we have to create new ingestion pipeline and index to use ELSER model. \n",
"\n",
"# Create a new pipeline with ELSER \n",
"Let's create a new ingestion pipeline with ELSER model `.elser_model_2`. "
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"ObjectApiResponse({'acknowledged': True})"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"client.ingest.put_pipeline(\n",
" id=\"elser-ingest-pipeline\",\n",
" description=\"Ingest pipeline for ELSER\",\n",
" processors=[\n",
" {\n",
" \"inference\": {\n",
" \"model_id\": \".elser_model_2\",\n",
" \"input_output\": [\n",
" {\"input_field\": \"plot\", \"output_field\": \"plot_embedding\"}\n",
" ],\n",
" }\n",
" }\n",
" ],\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create a index with mappings\n",
"\n",
"Next, create an index with required mappings for ELSER. "
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'elser-movies'})"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"client.indices.delete(index=\"elser-movies\", ignore_unavailable=True)\n",
"client.indices.create(\n",
" index=\"elser-movies\",\n",
" mappings={\n",
" \"properties\": {\n",
" \"plot\": {\n",
" \"type\": \"text\",\n",
" \"fields\": {\"keyword\": {\"type\": \"keyword\", \"ignore_above\": 256}},\n",
" },\n",
" \"plot_embedding\": {\"type\": \"sparse_vector\"},\n",
" }\n",
" },\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note:**\n",
"- `plot_embedding` is the name of the field that contains generated token with the type [`sparse_vector`](https://www.elastic.co/guide/en/elasticsearch/reference/master/sparse-vector.html) \n",
"- `plot` is the name of the field from which the [`sparse_vector`](https://www.elastic.co/guide/en/elasticsearch/reference/master/sparse-vector.html) are created. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Reindex with updated pipeline \n",
"\n",
"With the help of [Reindex API](https://elasticsearch-py.readthedocs.io/en/stable/api.html#elasticsearch.Elasticsearch.reindex), we can copy data from old index `movies` and to new index `elser-movies` with ingestion pipeline set to `elser-ingest-pipeline` . On success, the index `elser-movies` creates tokens on the `text_expansion` terms that you targeted for ELSER inference."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"client.reindex(\n",
" source={\"index\": \"movies\"},\n",
" dest={\"index\": \"elser-movies\", \"pipeline\": \"elser-ingest-pipeline\"},\n",
")\n",
"time.sleep(7)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once reindex is complete, inspect any document in the index `elser-movies` and notice that the document has a additional field `plot_embedding` with terms that we will be using in `text_expansion` query. \n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Querying documents with ELSER \n",
"\n",
"Let's try a semantic search on our index with ELSER model `.elser_model_2`"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Score: 6.403748\n",
"Title: se7en\n",
"Plot: Two detectives, a rookie and a veteran, hunt a serial killer who uses the seven deadly sins as his motives.\n",
"\n",
"Score: 3.6703482\n",
"Title: the departed\n",
"Plot: An undercover cop and a mole in the police attempt to identify each other while infiltrating an Irish gang in South Boston.\n",
"\n",
"Score: 2.9359207\n",
"Title: the usual suspects\n",
"Plot: A sole survivor tells of the twisty events leading up to a horrific gun battle on a boat, which began when five criminals met at a seemingly random police lineup.\n",
"\n"
]
}
],
"source": [
"response = client.search(\n",
" index=\"elser-movies\",\n",
" size=3,\n",
" query={\n",
" \"text_expansion\": {\n",
" \"plot_embedding\": {\n",
" \"model_id\": \".elser_model_2\",\n",
" \"model_text\": \"investigation\",\n",
" }\n",
" }\n",
" },\n",
")\n",
"\n",
"for hit in response[\"hits\"][\"hits\"]:\n",
" doc_id = hit[\"_id\"]\n",
" score = hit[\"_score\"]\n",
" title = hit[\"_source\"][\"title\"]\n",
" plot = hit[\"_source\"][\"plot\"]\n",
" print(f\"Score: {score}\\nTitle: {title}\\nPlot: {plot}\\n\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Case 2: Upgrade index with ELSER model to `.elser_model_2`\n",
"\n",
"If you already have a index with ELSER model `.elser_model_1` and would like to upgrade to `.elser_model_2`, you can use the Reindex API with ingestion pipeline to use ELSER `.elser_model_2` model.\n",
"\n",
"**`Note:`** Before we begin, ensure that you are on Elasticsearch 8.11 version and ELSER model `.elser_model_2` is deployed. \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create a new ingestion pipeline\n",
"\n",
"We will create a pipeline with `.elser_model_2` to enable us with reindexing. "
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"ObjectApiResponse({'acknowledged': True})"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"client.ingest.put_pipeline(\n",
" id=\"elser-pipeline-upgrade-demo\",\n",
" description=\"Ingest pipeline for ELSER upgrade demo\",\n",
" processors=[\n",
" {\n",
" \"inference\": {\n",
" \"model_id\": \".elser_model_2\",\n",
" \"input_output\": [\n",
" {\"input_field\": \"title\", \"output_field\": \"title_embedding\"}\n",
" ],\n",
" }\n",
" }\n",
" ],\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create a new index with mappings\n",
"We will create a new index with required mappings supporting ELSER"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'elser-upgrade-index-demo'})"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"client.indices.delete(index=\"elser-upgrade-index-demo\", ignore_unavailable=True)\n",
"client.indices.create(\n",
" index=\"elser-upgrade-index-demo\",\n",
" mappings={\n",
" \"properties\": {\n",
" \"title\": {\n",
" \"type\": \"text\",\n",
" \"fields\": {\"keyword\": {\"type\": \"keyword\", \"ignore_above\": 256}},\n",
" },\n",
" \"title_embedding\": {\"type\": \"sparse_vector\"},\n",
" }\n",
" },\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Use Reindex API\n",
"we will use [Reindex API](https://elasticsearch-py.readthedocs.io/en/stable/api.html#elasticsearch.Elasticsearch.reindex) to move data from old index to new index `elser-upgrade-index-demo`. We will be excluding target field from old index and instead generate new tokens in the field `plot_embedding` with `.elser_model_2` while reindexing. \n",
"\n",
"**`Note:`** Make sure to replace `my-index` with your index name that you intend to upgrade and the field `my-tokens-field` with the field name that you have generated tokens previously.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [],
"source": [
"client.reindex(\n",
" source={\n",
" \"index\": \"books\", # replace with your index name\n",
" \"_source\": {\n",
" \"excludes\": [\n",
" \"title_vector\"\n",
" ] # replace with the field-name from your index, that has previously generated tokens\n",
" },\n",
" },\n",
" dest={\n",
" \"index\": \"elser-upgrade-index-demo\",\n",
" \"pipeline\": \"elser-pipeline-upgrade-demo\",\n",
" },\n",
")\n",
"time.sleep(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Querying your data\n",
"\n",
"Once reindexing is complete, you are ready to query on your data and perform semantic search "
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Score: 14.755971\n",
"Title: Python Crash Course\n",
"Plot: Python Crash Course\n",
"\n",
"Score: 14.168372\n",
"Title: The Pragmatic Programmer: Your Journey to Mastery\n",
"Plot: The Pragmatic Programmer: Your Journey to Mastery\n",
"\n",
"Score: 11.704832\n",
"Title: The Clean Coder: A Code of Conduct for Professional Programmers\n",
"Plot: The Clean Coder: A Code of Conduct for Professional Programmers\n",
"\n"
]
}
],
"source": [
"response = client.search(\n",
" index=\"elser-upgrade-index-demo\",\n",
" size=3,\n",
" query={\n",
" \"text_expansion\": {\n",
" \"title_embedding\": {\n",
" \"model_id\": \".elser_model_2\",\n",
" \"model_text\": \"Programming Course\",\n",
" }\n",
" }\n",
" },\n",
")\n",
"\n",
"for hit in response[\"hits\"][\"hits\"]:\n",
" doc_id = hit[\"_id\"]\n",
" score = hit[\"_score\"]\n",
" title = hit[\"_source\"][\"title\"]\n",
" plot = hit[\"_source\"][\"title\"]\n",
" print(f\"Score: {score}\\nTitle: {title}\\nPlot: {plot}\\n\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Case 3: Upgrade a index with different model to ELSER\n",
"\n",
"Now we will see how to move your index which already has generated `embedding` using a different model. \n",
"\n",
"Lets consider the index - `books` and has generated `title_vector` using the NLP model `sentence-transformers__all-minilm-l6-v2`. In case you would like know about more how to load a NLP model to an index, follow the steps from our notebook [loading-model-from-hugging-face.ipynb](../integrations/hugging-face/loading-model-from-hugging-face.ipynb)\n",
"\n",
"Follow similiar proceedure that we did in previously: \n",
"1. Create a ingestion pipeline with ELSER model `.elser_model_2`\n",
"2. Create a index with mappings, with the pipeline we created in the previous step. \n",
"3. Reindex, excluding the field that has embedding from the `books` index\n",
"\n",
"Before we begin, lets take a look at our index `books` and see the mappings"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"ObjectApiResponse({'books': {'aliases': {}, 'mappings': {'properties': {'authors': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'num_reviews': {'type': 'long'}, 'publish_date': {'type': 'date'}, 'publisher': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'summary': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'title': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'title_vector': {'type': 'dense_vector', 'dims': 384, 'index': True, 'similarity': 'cosine'}}}, 'settings': {'index': {'routing': {'allocation': {'include': {'_tier_preference': 'data_content'}}}, 'number_of_shards': '1', 'provided_name': 'books', 'creation_date': '1706118077023', 'number_of_replicas': '1', 'uuid': 'GxGfG_LtSBOIXsB-5bF2_A', 'version': {'created': '8500003'}}}}})"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"client.indices.get(index=\"books\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice the field `title_vector`, We will exclude this field in our new index and generate new mapping against the field `title` from the `books` index"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create ingestion pipeline\n",
"\n",
"Next, we will create a pipeline using ELSER model `.elser_model_2`\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"ObjectApiResponse({'acknowledged': True})"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"client.ingest.put_pipeline(\n",
" id=\"elser-pipeline-books\",\n",
" description=\"Ingest pipeline for ELSER upgrade\",\n",
" processors=[\n",
" {\n",
" \"inference\": {\n",
" \"model_id\": \".elser_model_2\",\n",
" \"input_output\": [\n",
" {\"input_field\": \"title\", \"output_field\": \"title_embedding\"}\n",
" ],\n",
" }\n",
" }\n",
" ],\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create index with mappings\n",
"\n",
"Lets create a index `elser-books` with mappings"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'elser-books'})"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"client.indices.delete(index=\"elser-books\", ignore_unavailable=True)\n",
"client.indices.create(\n",
" index=\"elser-books\",\n",
" mappings={\n",
" \"properties\": {\n",
" \"title\": {\n",
" \"type\": \"text\",\n",
" \"fields\": {\"keyword\": {\"type\": \"keyword\", \"ignore_above\": 256}},\n",
" },\n",
" \"title_embedding\": {\"type\": \"sparse_vector\"},\n",
" }\n",
" },\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Reindex API\n",
"\n",
"we will use the [Reindex API](https://elasticsearch-py.readthedocs.io/en/stable/api.html#elasticsearch.Elasticsearch.reindex) to copy data and generate `text_expansion` embedding to our new index `elser-books`. "
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [],
"source": [
"client.reindex(\n",
" source={\"index\": \"books\", \"_source\": {\"excludes\": [\"title_vector\"]}},\n",
" dest={\"index\": \"elser-books\", \"pipeline\": \"elser-pipeline-books\"},\n",
")\n",
"time.sleep(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Querying your data\n",
"Success! Now we can query data on the index `elser-books`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Score: 22.333044\n",
"Title: Python Crash Course\n",
"Score: 9.364547\n",
"Title: The Pragmatic Programmer: Your Journey to Mastery\n",
"Score: 8.410445\n",
"Title: Clean Code: A Handbook of Agile Software Craftsmanship\n"
]
}
],
"source": [
"response = client.search(\n",
" index=\"elser-books\",\n",
" size=3,\n",
" query={\n",
" \"text_expansion\": {\n",
" \"title_embedding\": {\n",
" \"model_id\": \".elser_model_2\",\n",
" \"model_text\": \"Python tutorial\",\n",
" }\n",
" }\n",
" },\n",
")\n",
"\n",
"for hit in response[\"hits\"][\"hits\"]:\n",
" doc_id = hit[\"_id\"]\n",
" score = hit[\"_score\"]\n",
" title = hit[\"_source\"][\"title\"]\n",
" print(f\"Score: {score}\\nTitle: {title}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
},
"vscode": {
"interpreter": {
"hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e"
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}