3_optimization-design-ptn/02_caching/01_caching-azure-redis.ipynb (397 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"id": "7186eae3",
"metadata": {},
"source": [
"# Basic/Semantic Caching with Azure Cache for Redis\n",
"Azure Cache for Redis is a fully managed caching service by Microsoft. It provides high-speed data access to improve application performance and scalability. It supports key-value caching and advanced features like Redis Stack for search and vector-based semantic caching, enhancing AI-powered apps.\n",
"\n",
"This guide will help you set up Azure Cache for Redis and implement basic and semantic caching in your application."
]
},
{
"cell_type": "markdown",
"id": "5463cc7d",
"metadata": {},
"source": [
"<br>\n",
"\n",
"## 1. Pre-requisites\n",
"---\n",
"\n",
"Before running this code, ensure you have the following:\n",
"- An Azure account with an active subscription.\n",
"- An Azure Cache for Redis instance with **`RediSearch`** module is enabled. We recommend Azure Managed Redis for this hands-on lab.\n",
" \n",
" - Please note that you need to enable the RediSearch module in your Azure Cache for Redis instance in order to use Semantic Cache. Please refer to the [documentation](https://learn.microsoft.com/en-us/azure/redis/redis-modules#redisearch) for instructions on how to enable the **`RediSearch`** module.\n",
" - Please make sure to add RediSearch to modules in Advanced tab for creating the Redis Cache instance.\n",
"\n",
" - \n",
"\n",
"\n",
"\n",
"After provisioning the Redis Cache instance, you can get the endpoint and access key from the Azure portal. Please check the following screenshot for reference:\n",
"\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"id": "b6dcf46a",
"metadata": {},
"source": [
"<br>\n",
"\n",
"## 2. Azure Cache for Redis example\n",
"\n",
"----\n",
"\n",
"After setting up the Azure Cache for Redis, you can use it to cache data in your application. Below is an example of how to use Redis as a semantic cache in Python.\n",
"You need to install `openai`, `langchain` and `redis` packages if you haven't already:\n",
"- Reference: https://learn.microsoft.com/en-us/azure/redis/tutorial-semantic-cache"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3995f1cc",
"metadata": {},
"outputs": [],
"source": [
"from dotenv import load_dotenv\n",
"\n",
"load_dotenv(override=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a73d9ef9",
"metadata": {},
"outputs": [],
"source": [
"import time\n",
"import redis\n",
"from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings\n",
"from langchain.cache import RedisCache, RedisSemanticCache\n",
"from langchain.callbacks import get_openai_callback\n",
"from langchain.globals import set_llm_cache\n",
"from langchain_core.prompts import load_prompt\n",
"\n",
"llm = AzureChatOpenAI(\n",
" model=\"gpt-4o-mini\",\n",
" max_retries=2,\n",
" temperature=0.1,\n",
" max_tokens=200,\n",
")\n",
"\n",
"# Please set your Redis endpoint and access key. It is recommended to use a secure method to store and access these credentials in production.\n",
"REDIS_ENDPOINT = \"[YOUR-REDIS-ENDPOINT]\"\n",
"REDIS_PASSWORD = \"[YOUR-REDIS-ACCESS-KEY]\"\n",
"REDIS_URL = \"rediss://:\" + REDIS_PASSWORD + \"@\" + REDIS_ENDPOINT\n",
"USE_SEMATIC_CACHE = False\n",
"\n",
"# Set the cache to Redis\n",
"if USE_SEMATIC_CACHE:\n",
" # Semantic cache\n",
" embeddings = AzureOpenAIEmbeddings(\n",
" model=\"text-embedding-3-large\",\n",
" chunk_size=1000,\n",
" )\n",
" semantic_cache = RedisSemanticCache(\n",
" redis_url=REDIS_URL, embedding=embeddings, score_threshold=0.05\n",
" )\n",
" set_llm_cache(semantic_cache)\n",
"else:\n",
" # Standard cache. You can setup a Redis cache with a TTL (time to live) for the cache entries.\n",
" redis_client = redis.Redis.from_url(REDIS_URL)\n",
" redis_cache = RedisCache(redis_client, ttl=30) # 30 seconds TTL\n",
" set_llm_cache(redis_cache)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a476a508",
"metadata": {},
"outputs": [],
"source": [
"def execute_with_timing(prompt):\n",
" with get_openai_callback() as cb:\n",
" start_time = time.perf_counter()\n",
" result = llm.invoke(prompt)\n",
" end_time = time.perf_counter()\n",
" elapsed_time = end_time - start_time\n",
" return result, elapsed_time, cb\n",
"\n",
"\n",
"# First call (not cached)\n",
"prompt = \"Explain the concept of LLM in three sentences.\"\n",
"result1, time1, cb1 = execute_with_timing(prompt)\n",
"print(f\"First call (not cached):\")\n",
"print(f\"{result1}\\nTime: {time1:.2f} seconds\\n\")\n",
"\n",
"# Second call (should be cached)\n",
"result2, time2, cb2 = execute_with_timing(prompt)\n",
"print(f\"Second call (cached):\")\n",
"print(f\"{result2}\\nTime: {time2:.2f} seconds\\n\")\n",
"\n",
"print(f\"Speed improvement: {time1 / time2:.2f}x faster\")"
]
},
{
"cell_type": "markdown",
"id": "5d72dd3a",
"metadata": {},
"source": [
"### Checking cache hit/miss"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "994c3363",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"import matplotlib as mpl\n",
"import redis.asyncio as redis\n",
"\n",
"\n",
"async def get_redis_info(redis_url):\n",
" r = await redis.from_url(redis_url)\n",
" info = await r.info()\n",
" # print(info)\n",
" print(\"Redis Server Information:\")\n",
" print(f\"Used Memory : {info['used_memory_human']}\")\n",
" # Display the Redis info in a pandas DataFrame and plot it\n",
"\n",
" def convert_memory_to_bytes(memory_str):\n",
" units = {\"K\": 1024, \"M\": 1024**2, \"G\": 1024**3}\n",
" if memory_str[-1] in units:\n",
" return float(memory_str[:-1]) * units[memory_str[-1]]\n",
" return float(memory_str)\n",
"\n",
" redis_info = {\n",
" \"Metric\": [\"Cache Hits\", \"Cache Misses\", \"Evicted Keys\", \"Expired Keys\"],\n",
" \"Value\": [\n",
" info[\"keyspace_hits\"],\n",
" info[\"keyspace_misses\"],\n",
" info[\"evicted_keys\"],\n",
" info[\"expired_keys\"],\n",
" ],\n",
" }\n",
"\n",
" df_redis_info = pd.DataFrame(redis_info)\n",
" df_redis_info.plot(kind=\"barh\", x=\"Metric\", y=\"Value\", legend=False)\n",
"\n",
" plt.title(\"Redis Server Information\")\n",
" plt.xlabel(\"Value\")\n",
" plt.ylabel(\"Metric\")\n",
" plt.show()\n",
"\n",
" await r.aclose()\n",
"\n",
"\n",
"await get_redis_info(REDIS_URL)"
]
},
{
"cell_type": "markdown",
"id": "ea9d4b36",
"metadata": {},
"source": [
"### Clear the cache"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "6fbb7759",
"metadata": {},
"outputs": [],
"source": [
"if not USE_SEMATIC_CACHE:\n",
" redis_cache.clear()\n",
" print(\"Cache cleared\")"
]
},
{
"cell_type": "markdown",
"id": "8d56f51d",
"metadata": {},
"source": [
"<br>\n",
"\n",
"## 3. Implementing Semantic Caching with Redis\n",
"---\n",
"You do not need to implement your own caching mechanism since LangChain already provides a caching mechanism. But you can implement your own caching mechanism if you want to. Below is an example of how to implement a custom caching mechanism using Redis."
]
},
{
"cell_type": "markdown",
"id": "e9b627f5",
"metadata": {},
"source": [
"### Setting up Redis for semantic cache"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1db9faa7",
"metadata": {},
"outputs": [],
"source": [
"from redisvl.utils.vectorize import AzureOpenAITextVectorizer\n",
"from redisvl.extensions.llmcache import SemanticCache\n",
"\n",
"vectorizer = AzureOpenAITextVectorizer(\n",
" model=\"text-embedding-3-large\",\n",
")\n",
"## You can test the vectorizer by embedding a sample text\n",
"# embedding = vectorizer.embed(\"Hello, world!\") # Test\n",
"\n",
"# Initializing a SemanticCache instance\n",
"llmcache = SemanticCache(\n",
" name=\"llmcache\", # The name of the search index in Redis\n",
" vectorizer=vectorizer, # The vectorizer to use for embedding\n",
" redis_url=REDIS_URL, # Connection URL for Redis\n",
" distance_threshold=0.1, # Similarity threshold for cache matching\n",
")\n",
"\n",
"## This can be useful if you want to allow for a bit more variation in the queries that can hit the cache.\n",
"# llmcache.set_threshold(0.2)\n",
"print(\"=== Connected to Redis and initialized SemanticCache.===\")"
]
},
{
"cell_type": "markdown",
"id": "6ec75cc6",
"metadata": {},
"source": [
"### Invoking the LLM\n",
"\n",
"Semantic caching allows to return multiple matching results for a given query. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dd7b3483",
"metadata": {},
"outputs": [],
"source": [
"def invoke_llm_with_cache(llm, query, num_results=2, distance_threshold=0.2):\n",
" \"\"\"\n",
" Invoke the LLM with caching.\n",
" If the response is found in the cache, it will be returned.\n",
" If not, the LLM will be invoked, and the response will be cached.\n",
" \"\"\"\n",
" results = llmcache.check(\n",
" prompt=query,\n",
" num_results=num_results,\n",
" distance_threshold=distance_threshold,\n",
" return_fields=[\"prompt\", \"response\"],\n",
" )\n",
" if results:\n",
" print(\"=== Cache Hit ===\")\n",
" return [result[\"response\"] for result in results]\n",
" else:\n",
" response = llm.invoke(query)\n",
" response_text = response.content\n",
" llmcache.store(prompt=query, response=response_text)\n",
" print(\"=== Cache Miss. Set cache ===\")\n",
" return response_text"
]
},
{
"cell_type": "markdown",
"id": "231f40b4",
"metadata": {},
"source": [
"### Invoke (not cached)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "655ca3e2",
"metadata": {},
"outputs": [],
"source": [
"print(\n",
" invoke_llm_with_cache(llm, \"What is Machine Learning? Please explain in 100 words.\")\n",
")\n",
"print(\n",
" invoke_llm_with_cache(\n",
" llm,\n",
" \"What is the main difference between GenAI and Machine learning? Please explain in 100 words.\",\n",
" )\n",
")\n",
"print(\n",
" invoke_llm_with_cache(\n",
" llm, \"What is Microsoft's main products? Please explain in 200 words.\"\n",
" )\n",
")"
]
},
{
"cell_type": "markdown",
"id": "f62c3141",
"metadata": {},
"source": [
"### Invoke (Cached)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b3fccc45",
"metadata": {},
"outputs": [],
"source": [
"print(\n",
" invoke_llm_with_cache(llm, \"What is Machine Learning? Please explain in 100 words.\")\n",
")\n",
"print(\n",
" invoke_llm_with_cache(\n",
" llm,\n",
" \"What is the main difference between GenAI and Machine learning? Please explain in 100 words.\",\n",
" )\n",
")\n",
"print(\n",
" invoke_llm_with_cache(\n",
" llm, \"What is Microsoft's main products? Please explain in 200 words.\"\n",
" )\n",
")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "py312-dev",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.2"
}
},
"nbformat": 4,
"nbformat_minor": 5
}