notebooks/langchain/multi-query-retriever-examples/langchain-multi-query-retriever.ipynb (260 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# MultiQueryRetriever with elasticsearch and langchain\n",
"[](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/langchain/notebooks/langchain/multi-query-retriever-examples/langchain-multi-query-retriever.ipynb)\n",
"\n",
"This workbook demonstrates example of Elasticsearch's [MultiQuery Retriever](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.multi_query.MultiQueryRetriever.html) to generate multiple queries for a given user input query and apply all queries to retrieve a larger set of relevant documents from a vectorstore.\n",
"\n",
"Before we begin, we first split the documents into chunks with `langchain` and then using [`ElasticsearchStore.from_documents`](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html#langchain.vectorstores.elasticsearch.ElasticsearchStore.from_documents), we create a `vectorstore` and index data to elasticsearch.\n",
"\n",
"\n",
"We will then see few examples query demonstrating full power of elasticsearch powered multiquery retriever."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Install packages and import modules"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m23.3.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m23.3.2\u001b[0m\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49m/Library/Developer/CommandLineTools/usr/bin/python3 -m pip install --upgrade pip\u001b[0m\n"
]
}
],
"source": [
"!python3 -m pip install -qU lark langchain langchain-elasticsearh langchain_openai tiktoken\n",
"\n",
"from langchain.schema import Document\n",
"from langchain_openai.embeddings import OpenAIEmbeddings\n",
"from langchain_elasticsearch import ElasticsearchStore\n",
"from langchain_openai.chat_models import ChatOpenAI\n",
"from langchain.retrievers.multi_query import MultiQueryRetriever\n",
"from getpass import getpass"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create documents\n",
"Next, we will create list of documents with summary of movies using [langchain Schema Document](https://api.python.langchain.com/en/latest/schema/langchain.schema.document.Document.html), containing each document's `page_content` and `metadata` ."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"docs = [\n",
" Document(\n",
" page_content=\"A bunch of scientists bring back dinosaurs and mayhem breaks loose\",\n",
" metadata={\n",
" \"year\": 1993,\n",
" \"rating\": 7.7,\n",
" \"genre\": \"science fiction\",\n",
" \"director\": \"Steven Spielberg\",\n",
" \"title\": \"Jurassic Park\",\n",
" },\n",
" ),\n",
" Document(\n",
" page_content=\"Leo DiCaprio gets lost in a dream within a dream within a dream within a ...\",\n",
" metadata={\n",
" \"year\": 2010,\n",
" \"director\": \"Christopher Nolan\",\n",
" \"rating\": 8.2,\n",
" \"title\": \"Inception\",\n",
" },\n",
" ),\n",
" Document(\n",
" page_content=\"A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea\",\n",
" metadata={\n",
" \"year\": 2006,\n",
" \"director\": \"Satoshi Kon\",\n",
" \"rating\": 8.6,\n",
" \"title\": \"Paprika\",\n",
" },\n",
" ),\n",
" Document(\n",
" page_content=\"A bunch of normal-sized women are supremely wholesome and some men pine after them\",\n",
" metadata={\n",
" \"year\": 2019,\n",
" \"director\": \"Greta Gerwig\",\n",
" \"rating\": 8.3,\n",
" \"title\": \"Little Women\",\n",
" },\n",
" ),\n",
" Document(\n",
" page_content=\"Toys come alive and have a blast doing so\",\n",
" metadata={\n",
" \"year\": 1995,\n",
" \"genre\": \"animated\",\n",
" \"director\": \"John Lasseter\",\n",
" \"rating\": 8.3,\n",
" \"title\": \"Toy Story\",\n",
" },\n",
" ),\n",
" Document(\n",
" page_content=\"Three men walk into the Zone, three men walk out of the Zone\",\n",
" metadata={\n",
" \"year\": 1979,\n",
" \"rating\": 9.9,\n",
" \"director\": \"Andrei Tarkovsky\",\n",
" \"genre\": \"science fiction\",\n",
" \"rating\": 9.9,\n",
" \"title\": \"Stalker\",\n",
" },\n",
" ),\n",
"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Connect to Elasticsearch\n",
"\n",
"ℹ️ We're using an Elastic Cloud deployment of Elasticsearch for this notebook. If you don't have an Elastic Cloud deployment, sign up [here](https://cloud.elastic.co/registration?onboarding_token=vectorsearch&utm_source=github&utm_content=elasticsearch-labs-notebook) for a free trial.\n",
"\n",
"We'll use the **Cloud ID** to identify our deployment, because we are using Elastic Cloud deployment. To find the Cloud ID for your deployment, go to https://cloud.elastic.co/deployments and select your deployment.\n",
"\n",
"\n",
"We will use [ElasticsearchStore](https://api.python.langchain.com/en/latest/vectorstores/langchain.vectorstores.elasticsearch.ElasticsearchStore.html) to connect to our elastic cloud deployment, This would help create and index data easily. We would also send list of documents that we created in the previous step"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id\n",
"ELASTIC_CLOUD_ID = getpass(\"Elastic Cloud ID: \")\n",
"\n",
"# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key\n",
"ELASTIC_API_KEY = getpass(\"Elastic Api Key: \")\n",
"\n",
"# https://platform.openai.com/api-keys\n",
"OPENAI_API_KEY = getpass(\"OpenAI API key: \")\n",
"\n",
"embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)\n",
"\n",
"vectorstore = ElasticsearchStore.from_documents(\n",
" docs,\n",
" embeddings,\n",
" index_name=\"elasticsearch-multi-query-demo\",\n",
" es_cloud_id=ELASTIC_CLOUD_ID,\n",
" es_api_key=ELASTIC_API_KEY,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup query retriever\n",
"\n",
"Next we will instantiate MultiQuery retriever by providing a bit information about our document attributes and a short description about the document.\n",
"\n",
"We will then instantiate retriever with [MultiQueryRetriever.from_llm](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.multi_query.MultiQueryRetriever.html)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"# Set up openAI llm with sampling temperature 0\n",
"llm = ChatOpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)\n",
"\n",
"# instantiate retriever\n",
"retriever = MultiQueryRetriever.from_llm(vectorstore.as_retriever(), llm)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test retriever with simple query\n",
"\n",
"We will test the retriever with a simple query: `What are some movies about dream`.\n",
"\n",
"The output shows all the relevant documents to the query."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:langchain.retrievers.multi_query:Generated queries: ['1. Can you recommend any films that explore the theme of dreams?', '2. Are there any movies that delve into the realm of dreams?', '3. Could you suggest some films that revolve around the concept of dreaming?']\n"
]
},
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Set logging for the queries\n",
"import logging\n",
"\n",
"logging.basicConfig()\n",
"logging.getLogger(\"langchain.retrievers.multi_query\").setLevel(logging.INFO)\n",
"\n",
"# This example specifies a relevant\n",
"question = \"What are some movies about dream\"\n",
"unique_docs = retriever.get_relevant_documents(query=question)\n",
"len(unique_docs)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}