notebooks/en/rag_llamaindex_librarian.ipynb (429 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Building A RAG Ebook \"Librarian\" Using LlamaIndex\n",
"\n",
"_Authored by: [Jonathan Jin](https://huggingface.co/jinnovation)_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"\n",
"This notebook demonstrates how to quickly build a RAG-based \"librarian\" for your\n",
"local ebook library.\n",
"\n",
"Think about the last time you visited a library and took advantage of the\n",
"expertise of the knowledgeable staff there to help you find what you need out of\n",
"the troves of textbooks, novels, and other resources at the library. Our RAG\n",
"\"librarian\" will do the same for us, except for our own local collection of\n",
"ebooks.\n",
"\n",
"## Requirements\n",
"\n",
"We'd like our librarian to be **lightweight** and **run locally as much as\n",
"possible** with **minimal dependencies**. This means that we will leverage\n",
"open-source to the fullest extent possible, as well as bias towards models that\n",
"can be **executed locally on typical hardware, e.g. M1 Macbooks**.\n",
"\n",
"## Components\n",
"\n",
"Our solution will consist of the following components:\n",
"\n",
"- [LlamaIndex], a data framework for LLM-based applications that's, unlike\n",
" [LangChain], designed specifically for RAG;\n",
"- [Ollama], a user-friendly solution for running LLMs such as Llama 2 locally;\n",
"- The [`BAAI/bge-base-en-v1.5`](https://huggingface.co/BAAI/bge-base-en-v1.5)\n",
" embedding model, which performs [reasonably well and is reasonably lightweight\n",
" in size](https://huggingface.co/spaces/mteb/leaderboard);\n",
"- [Llama 2], which we'll run via [Ollama].\n",
"\n",
"[LlamaIndex]: https://docs.llamaindex.ai/en/stable/index.html\n",
"[LangChain]: https://python.langchain.com/docs/get_started/introduction\n",
"[Ollama]: https://ollama.com/\n",
"[Llama 2]: https://ollama.com/library/llama2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Dependencies\n",
"\n",
"First let's install our dependencies."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -q \\\n",
" llama-index \\\n",
" EbookLib \\\n",
" html2text \\\n",
" llama-index-embeddings-huggingface \\\n",
" llama-index-llms-ollama"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Ollama installation\n",
"\n",
"These dependencies help properly detect the GPU."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!apt install pciutils lshw"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Install Ollama."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!curl -fsSL https://ollama.com/install.sh | sh"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Run Ollama service in the background."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"get_ipython().system_raw('ollama serve &')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Pull Llama2 from the Ollama library."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"vscode": {
"languageId": "markdown"
}
},
"outputs": [],
"source": [
"!ollama pull llama2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Test Library Setup\n",
"\n",
"Next, let's create our test \"library.\"\n",
"\n",
"For simplicity's sake, let's say that our \"library\" is simply a **nested directory of `.epub` files**. We can easily see this solution generalizing to, say, a Calibre library with a `metadata.db` database file. We'll leave that extension as an exercise for the reader. 😇\n",
"\n",
"Let's pull two `.epub` files from [Project Gutenberg](https://www.gutenberg.org/) for our library."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!mkdir -p \"./test/library/jane-austen\"\n",
"!mkdir -p \"./test/library/victor-hugo\"\n",
"!wget https://www.gutenberg.org/ebooks/1342.epub.noimages -O \"./test/library/jane-austen/pride-and-prejudice.epub\"\n",
"!wget https://www.gutenberg.org/ebooks/135.epub.noimages -O \"./test/library/victor-hugo/les-miserables.epub\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## RAG with LlamaIndex\n",
"\n",
"RAG with LlamaIndex, at its core, consists of the following broad phases:\n",
"\n",
"1. **Loading**, in which you tell LlamaIndex where your data lives and how to\n",
" load it;\n",
"2. **Indexing**, in which you augment your loaded data to facilitate querying, e.g. with vector embeddings;\n",
"3. **Querying**, in which you configure an LLM to act as the query interface for\n",
" your indexed data.\n",
"\n",
"This explanation only scratches at the surface of what's possible with\n",
"LlamaIndex. For more in-depth details, I highly recommend reading the\n",
"[\"High-Level Concepts\" page of the LlamaIndex\n",
"documentation](https://docs.llamaindex.ai/en/stable/getting_started/concepts.html)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Loading\n",
"\n",
"Naturally, let's start with the **loading** phase.\n",
"\n",
"I mentioned before that LlamaIndex is designed specifically for RAG. This\n",
"immediately becomes obvious from its\n",
"[`SimpleDirectoryReader`](https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader.html)\n",
"construct, which ✨ **magically** ✨ supports a whole host of multi-model file\n",
"types for free. Conveniently for us, `.epub` is in the supported set."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from llama_index.core import SimpleDirectoryReader\n",
"\n",
"loader = SimpleDirectoryReader(\n",
" input_dir=\"./test/\",\n",
" recursive=True,\n",
" required_exts=[\".epub\"],\n",
")\n",
"\n",
"documents = loader.load_data()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`SimpleDirectoryReader.load_data()` converts our ebooks into a set of [`Document`s](https://docs.llamaindex.ai/en/stable/api/llama_index.core.schema.Document.html) for LlamaIndex to work with.\n",
"\n",
"One important thing to note here is that the documents **have not been chunked at this stage** -- that will happen during indexing. Read on..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Indexing\n",
"\n",
"Next up after **loading** the data is to **index** it. This will allow our RAG pipeline to look up the relevant context for our query to pass to our LLM to **augment** their generated response. This is also where document chunking will take place.\n",
"\n",
"[`VectorStoreIndex`](https://docs.llamaindex.ai/en/stable/module_guides/indexing/vector_store_index.html)\n",
"is a \"default\" entrypoint for indexing in LlamaIndex. By default,\n",
"`VectorStoreIndex` uses a simple, in-memory dictionary to store the indices, but\n",
"LlamaIndex also supports [a wide variety of vector storage\n",
"solutions](https://docs.llamaindex.ai/en/stable/module_guides/storing/vector_stores.html)\n",
"for you to graduate to as you scale.\n",
"\n",
"<Tip> \n",
"By default, LlamaIndex uses a chunk size of 1024 and a chunk overlap of\n",
"20. For more details, see the [LlamaIndex\n",
"documentation](https://docs.llamaindex.ai/en/stable/optimizing/basic_strategies/basic_strategies.html#chunk-sizes).\n",
"</Tip>\n",
"\n",
"\n",
"Like mentioned before, we'll use the\n",
"[`BAAI/bge-small-en-v1.5`](https://huggingface.co/BAAI/bge-base-en-v1.5) to\n",
"generate our embeddings. By default, [LlamaIndex uses\n",
"OpenAI](https://docs.llamaindex.ai/en/stable/getting_started/starter_example.html)\n",
"(specifically `gpt-3.5-turbo`), which we'd like to avoid given our desire for a lightweight, locally-runnable end-to-end solution.\n",
"\n",
"Thankfully, LlamaIndex supports retrieving embedding models from Hugging Face through the convenient `HuggingFaceEmbedding` class, so we'll use that here."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from llama_index.embeddings.huggingface import HuggingFaceEmbedding\n",
"\n",
"embedding_model = HuggingFaceEmbedding(model_name=\"BAAI/bge-small-en-v1.5\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll pass that in to `VectorStoreIndex` as our embedding model to circumvent the OpenAI default behavior."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"from llama_index.core import VectorStoreIndex\n",
"\n",
"index = VectorStoreIndex.from_documents(\n",
" documents,\n",
" embed_model=embedding_model,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Querying\n",
"\n",
"Now for the final piece of the RAG puzzle -- wiring up the query layer.\n",
"\n",
"We'll use Llama 2 for the purposes of this recipe, but I encourage readers to play around with different models to see which produces the \"best\" responses here.\n",
"\n",
"First let's start up the Ollama server. Unfortunately, there is no support in the [Ollama Python client](https://github.com/ollama/ollama-python) for actually starting and stopping the server itself, so we'll have to pop out of Python land for this.\n",
"\n",
"In a separate terminal, run: `ollama serve`. Remember to terminate this after we're done here!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's hook Llama 2 up to LlamaIndex and use it as the basis of our query engine."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"from llama_index.llms.ollama import Ollama\n",
"\n",
"llama = Ollama(\n",
" model=\"llama2\",\n",
" request_timeout=40.0,\n",
")\n",
"\n",
"query_engine = index.as_query_engine(llm=llama)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Final Result\n",
"\n",
"With that, our basic RAG librarian is set up and we can start asking questions about our library. For example:"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Based on the context provided, there are two books available:\n",
"\n",
"1. \"Pride and Prejudice\" by Jane Austen\n",
"2. \"Les Misérables\" by Victor Hugo\n",
"\n",
"The context used to derive this answer includes:\n",
"\n",
"* The file path for each book, which provides information about the location of the book files on the computer.\n",
"* The titles of the books, which are mentioned in the context as being available for reading.\n",
"* A list of words associated with each book, such as \"epub\" and \"notebooks\", which provide additional information about the format and storage location of each book.\n"
]
}
],
"source": [
"print(query_engine.query(\"What are the titles of all the books available? Show me the context used to derive your answer.\"))"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The main character of 'Pride and Prejudice' is Elizabeth Bennet.\n"
]
}
],
"source": [
"print(query_engine.query(\"Who is the main character of 'Pride and Prejudice'?\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Conclusion and Future Improvements\n",
"\n",
"We've demonstrated how to build a basic RAG-based \"librarian\" that runs entirely locally, even on Apple silicon Macs. In doing so, we've also carried out a \"grand tour\" of LlamaIndex and how it streamlines the process of setting up RAG-based applications.\n",
"\n",
"That said, we've really only scratched the surface of what's possible here. Here are some ideas of how to refine and build upon this foundation.\n",
"\n",
"### Forcing Citations\n",
"\n",
"To guard against the risk of our librarian hallucinating, how might we require that it provide citations for everything that it says?\n",
"\n",
"### Using Extended Metadata\n",
"\n",
"Ebook library management solutions like [Calibre](https://calibre-ebook.com/) create additional metadata for ebooks in a library. This can provide information such as publisher or edition that might not be readily available in the text of the book itself. How could we extend our RAG pipeline to account for additional sources of information that aren't `.epub` files?\n",
"\n",
"### Efficient Indexing\n",
"\n",
"If we were to collect everything we built here into a script/executable, the resulting script would re-index our library on each invocation. For our tiny test library of two files, this is \"fine,\" but for any library of non-trivial size this will very quickly become annoying for users. How could we persist the embedding indices and only update them when the contents of the library have meaningfully changed, e.g. new books have been added?"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}