# ü§ñ LLM Text Generation with Retrieval-Augmented Generation (RAG)

Welcome to this beginner-friendly guide! üöÄ In this project, you'll learn how to:
- Use **LLaMA**, a powerful large language model (LLM), for **text generation**.
- Retrieve relevant text snippets using **Sentence Transformers**, a popular tool for embedding text.
- Combine these techniques to answer questions based on context provided from retrieved snippets.

Run this command to install everything you need:

In [None]:
!pip install -Uq sentence-transformers

## üõ†Ô∏è Setup: Installing the Necessary Libraries

First, we need to install some Python libraries that will help us:
- **`transformers`**: Provides access to LLaMA and other pre-trained models for text generation. üìö
- **`sentence-transformers`**: Helps generate embeddings, which are essential for comparing text snippets. üß†

In [3]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from sentence_transformers import SentenceTransformer
import torch

# üîç Step 1: Model Setup and Tokenizer

In [None]:
chechpoint = "meta-llama/Llama-3.2-3B-Instruct"

tokenizer = AutoTokenizer.from_pretrained(chechpoint)
llama_model = AutoModelForCausalLM.from_pretrained(chechpoint, torch_dtype=torch.bfloat16)
generator = pipeline("text-generation", model=llama_model, tokenizer=tokenizer, device="cuda")              # Create a simple text generation pipeline

# üìëStep 2: Text Snippet Retrieval Setup

In [6]:
text_snippets = [
    "Fiona thanked Ethan for his unwavering support and promised to cherish their friendship.",
    "As they ventured deeper into the forest, they encountered a wide array of obstacles.",
    "Ethan and Fiona crossed treacherous ravines using rickety bridges, relying on each other's strength.",
    "Overwhelmed with joy, Fiona thanked Ethan and disappeared into the embrace of her family.",
    "Ethan returned to his cottage, heart full of memories and a smile brighter than ever before.",
]

#  üîÑ Step 3: Convert text snippets to embeddings for later comparison.

In [None]:
model = SentenceTransformer("all-MiniLM-L6-v2")
embeddings_text_snippets = model.encode(text_snippets)                      # Generate embeddings for the text snippets

# üßÆ Step 4: Create a function to retrieve the closest matching snippet using cosine similarity.

In [8]:
def retrieve_snippet(query):
    query_embedded = model.encode([query])                                              # Encode the query to obtain its embedding
    similarities = model.similarity(embeddings_text_snippets, query_embedded)           # Calculate cosine similarities between the query embedding and the snippet embeddings
    retrieved_texts = text_snippets[similarities.argmax().item()]                       # Retrieve the text snippet with the highest similarity
    return retrieved_texts


# üí¨ Step 5: Create a function to generate the answer based on the retrieved snippet and query.


In [9]:
# In this step, we utilize the retrieved context snippets to generate a relevant answer using LLaMA, exemplifying the power of RAG in enhancing the quality of responses.

def ask_query(query):
    retrieved_texts = retrieve_snippet(query)

    # Prepare the messages for the text generation pipeline
    messages = [
        {"role": "system", "content": "You are a helpful AI assistant."
                "Provide one Answer ONLY the following query based on the context provided below. "
                "Do not generate or answer any other questions. "
                "Do not make up or infer any information that is not directly stated in the context. "
                "Provide a concise answer."
                f"{retrieved_texts}"},
        {"role": "user", "content": query}
    ]

    # Generate a response using the text generation pipeline
    response = generator(messages, max_new_tokens=128)[-1]["generated_text"][-1]["content"]
    print(f"Query: \n\t{query}")
    print(f"Context: \n\t{retrieved_texts}")
    print(f"Answer: \n\t{response}")

# üó£Ô∏è Step 6: Ask a Question 

In [10]:
query = "Why did Fiona thank Ethan?"
ask_query(query)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Query: 
	Why did Fiona thank Ethan?
Context: 
	Fiona thanked Ethan for his unwavering support and promised to cherish their friendship.
Answer: 
	Fiona thanked Ethan for his unwavering support.
