# Basic/Semantic Caching with Azure Cache for Redis
Azure Cache for Redis is a fully managed caching service by Microsoft. It provides high-speed data access to improve application performance and scalability. It supports key-value caching and advanced features like Redis Stack for search and vector-based semantic caching, enhancing AI-powered apps.

This guide will help you set up Azure Cache for Redis and implement basic and semantic caching in your application.

<br>

## 1. Pre-requisites
---

Before running this code, ensure you have the following:
- An Azure account with an active subscription.
- An Azure Cache for Redis instance with **`RediSearch`** module is enabled. We recommend Azure Managed Redis for this hands-on lab.
![azure-redis](images/azure-redis.png)    
    - Please note that you need to enable the RediSearch module in your Azure Cache for Redis instance in order to use Semantic Cache. Please refer to the [documentation](https://learn.microsoft.com/en-us/azure/redis/redis-modules#redisearch) for instructions on how to enable the **`RediSearch`** module.
    - Please make sure to add RediSearch to modules in Advanced tab for creating the Redis Cache instance.

    - ![redisearch](images/redisearch.png)



After provisioning the Redis Cache instance, you can get the endpoint and access key from the Azure portal. Please check the following screenshot for reference:

![redis-endpoint](images/redis-endpoint.png)
![redis-access-key](images/redis-access-key.png)

<br>

## 2. Azure Cache for Redis example

----

After setting up the Azure Cache for Redis, you can use it to cache data in your application. Below is an example of how to use Redis as a semantic cache in Python.
You need to install `openai`, `langchain` and `redis` packages if you haven't already:
- Reference: https://learn.microsoft.com/en-us/azure/redis/tutorial-semantic-cache

In [None]:
from dotenv import load_dotenv

load_dotenv(override=True)

In [None]:
import time
import redis
from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
from langchain.cache import RedisCache, RedisSemanticCache
from langchain.callbacks import get_openai_callback
from langchain.globals import set_llm_cache
from langchain_core.prompts import load_prompt

llm = AzureChatOpenAI(
    model="gpt-4o-mini",
    max_retries=2,
    temperature=0.1,
    max_tokens=200,
)

# Please set your Redis endpoint and access key. It is recommended to use a secure method to store and access these credentials in production.
REDIS_ENDPOINT = "[YOUR-REDIS-ENDPOINT]"
REDIS_PASSWORD = "[YOUR-REDIS-ACCESS-KEY]"
REDIS_URL = "rediss://:" + REDIS_PASSWORD + "@" + REDIS_ENDPOINT
USE_SEMATIC_CACHE = False

# Set the cache to Redis
if USE_SEMATIC_CACHE:
    # Semantic cache
    embeddings = AzureOpenAIEmbeddings(
        model="text-embedding-3-large",
        chunk_size=1000,
    )
    semantic_cache = RedisSemanticCache(
        redis_url=REDIS_URL, embedding=embeddings, score_threshold=0.05
    )
    set_llm_cache(semantic_cache)
else:
    # Standard cache. You can setup a Redis cache with a TTL (time to live) for the cache entries.
    redis_client = redis.Redis.from_url(REDIS_URL)
    redis_cache = RedisCache(redis_client, ttl=30)  # 30 seconds TTL
    set_llm_cache(redis_cache)

In [None]:
def execute_with_timing(prompt):
    with get_openai_callback() as cb:
        start_time = time.perf_counter()
        result = llm.invoke(prompt)
        end_time = time.perf_counter()
        elapsed_time = end_time - start_time
    return result, elapsed_time, cb


# First call (not cached)
prompt = "Explain the concept of LLM in three sentences."
result1, time1, cb1 = execute_with_timing(prompt)
print(f"First call (not cached):")
print(f"{result1}\nTime: {time1:.2f} seconds\n")

# Second call (should be cached)
result2, time2, cb2 = execute_with_timing(prompt)
print(f"Second call (cached):")
print(f"{result2}\nTime: {time2:.2f} seconds\n")

print(f"Speed improvement: {time1 / time2:.2f}x faster")

### Checking cache hit/miss

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
import redis.asyncio as redis


async def get_redis_info(redis_url):
    r = await redis.from_url(redis_url)
    info = await r.info()
    # print(info)
    print("Redis Server Information:")
    print(f"Used Memory  : {info['used_memory_human']}")
    # Display the Redis info in a pandas DataFrame and plot it

    def convert_memory_to_bytes(memory_str):
        units = {"K": 1024, "M": 1024**2, "G": 1024**3}
        if memory_str[-1] in units:
            return float(memory_str[:-1]) * units[memory_str[-1]]
        return float(memory_str)

    redis_info = {
        "Metric": ["Cache Hits", "Cache Misses", "Evicted Keys", "Expired Keys"],
        "Value": [
            info["keyspace_hits"],
            info["keyspace_misses"],
            info["evicted_keys"],
            info["expired_keys"],
        ],
    }

    df_redis_info = pd.DataFrame(redis_info)
    df_redis_info.plot(kind="barh", x="Metric", y="Value", legend=False)

    plt.title("Redis Server Information")
    plt.xlabel("Value")
    plt.ylabel("Metric")
    plt.show()

    await r.aclose()


await get_redis_info(REDIS_URL)

### Clear the cache

In [None]:
if not USE_SEMATIC_CACHE:
    redis_cache.clear()
    print("Cache cleared")

<br>

## 3. Implementing Semantic Caching with Redis
---
You do not need to implement your own caching mechanism since LangChain already provides a caching mechanism. But you can implement your own caching mechanism if you want to. Below is an example of how to implement a custom caching mechanism using Redis.

### Setting up Redis for semantic cache

In [None]:
from redisvl.utils.vectorize import AzureOpenAITextVectorizer
from redisvl.extensions.llmcache import SemanticCache

vectorizer = AzureOpenAITextVectorizer(
    model="text-embedding-3-large",
)
## You can test the vectorizer by embedding a sample text
# embedding = vectorizer.embed("Hello, world!") # Test

# Initializing a SemanticCache instance
llmcache = SemanticCache(
    name="llmcache",  # The name of the search index in Redis
    vectorizer=vectorizer,  # The vectorizer to use for embedding
    redis_url=REDIS_URL,  # Connection URL for Redis
    distance_threshold=0.1,  # Similarity threshold for cache matching
)

## This can be useful if you want to allow for a bit more variation in the queries that can hit the cache.
# llmcache.set_threshold(0.2)
print("=== Connected to Redis and initialized SemanticCache.===")

### Invoking the LLM

Semantic caching allows to return multiple matching results for a given query. 

In [None]:
def invoke_llm_with_cache(llm, query, num_results=2, distance_threshold=0.2):
    """
    Invoke the LLM with caching.
    If the response is found in the cache, it will be returned.
    If not, the LLM will be invoked, and the response will be cached.
    """
    results = llmcache.check(
        prompt=query,
        num_results=num_results,
        distance_threshold=distance_threshold,
        return_fields=["prompt", "response"],
    )
    if results:
        print("=== Cache Hit ===")
        return [result["response"] for result in results]
    else:
        response = llm.invoke(query)
        response_text = response.content
        llmcache.store(prompt=query, response=response_text)
        print("=== Cache Miss. Set cache ===")
        return response_text

### Invoke (not cached)

In [None]:
print(
    invoke_llm_with_cache(llm, "What is Machine Learning? Please explain in 100 words.")
)
print(
    invoke_llm_with_cache(
        llm,
        "What is the main difference between GenAI and Machine learning? Please explain in 100 words.",
    )
)
print(
    invoke_llm_with_cache(
        llm, "What is Microsoft's main products? Please explain in 200 words."
    )
)

### Invoke (Cached)

In [None]:
print(
    invoke_llm_with_cache(llm, "What is Machine Learning? Please explain in 100 words.")
)
print(
    invoke_llm_with_cache(
        llm,
        "What is the main difference between GenAI and Machine learning? Please explain in 100 words.",
    )
)
print(
    invoke_llm_with_cache(
        llm, "What is Microsoft's main products? Please explain in 200 words."
    )
)