3_optimization-design-ptn/02_caching/01

{ "cells": [ { "cell_type": "markdown", "id": "7186eae3", "metadata": {}, "source": [ "# Basic/Semantic Caching with Azure Cache for Redis\n", "Azure Cache for Redis is a fully managed caching service by Microsoft. It provides high-speed data access to improve application performance and scalability. It supports key-value caching and advanced features like Redis Stack for search and vector-based semantic caching, enhancing AI-powered apps.\n", "\n", "This guide will help you set up Azure Cache for Redis and implement basic and semantic caching in your application." ] }, { "cell_type": "markdown", "id": "5463cc7d", "metadata": {}, "source": [ "<br>\n", "\n", "## 1. Pre-requisites\n", "---\n", "\n", "Before running this code, ensure you have the following:\n", "- An Azure account with an active subscription.\n", "- An Azure Cache for Redis instance with **`RediSearch`** module is enabled. We recommend Azure Managed Redis for this hands-on lab.\n", "![azure-redis](images/azure-redis.png) \n", " - Please note that you need to enable the RediSearch module in your Azure Cache for Redis instance in order to use Semantic Cache. Please refer to the [documentation](https://learn.microsoft.com/en-us/azure/redis/redis-modules#redisearch) for instructions on how to enable the **`RediSearch`** module.\n", " - Please make sure to add RediSearch to modules in Advanced tab for creating the Redis Cache instance.\n", "\n", " - ![redisearch](images/redisearch.png)\n", "\n", "\n", "\n", "After provisioning the Redis Cache instance, you can get the endpoint and access key from the Azure portal. Please check the following screenshot for reference:\n", "\n", "![redis-endpoint](images/redis-endpoint.png)\n", "![redis-access-key](images/redis-access-key.png)" ] }, { "cell_type": "markdown", "id": "b6dcf46a", "metadata": {}, "source": [ "<br>\n", "\n", "## 2. Azure Cache for Redis example\n", "\n", "----\n", "\n", "After setting up the Azure Cache for Redis, you can use it to cache data in your application. Below is an example of how to use Redis as a semantic cache in Python.\n", "You need to install `openai`, `langchain` and `redis` packages if you haven't already:\n", "- Reference: https://learn.microsoft.com/en-us/azure/redis/tutorial-semantic-cache" ] }, { "cell_type": "code", "execution_count": null, "id": "3995f1cc", "metadata": {}, "outputs": [], "source": [ "from dotenv import load_dotenv\n", "\n", "load_dotenv(override=True)" ] }, { "cell_type": "code", "execution_count": null, "id": "a73d9ef9", "metadata": {}, "outputs": [], "source": [ "import time\n", "import redis\n", "from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings\n", "from langchain.cache import RedisCache, RedisSemanticCache\n", "from langchain.callbacks import get_openai_callback\n", "from langchain.globals import set_llm_cache\n", "from langchain_core.prompts import load_prompt\n", "\n", "llm = AzureChatOpenAI(\n", " model=\"gpt-4o-mini\",\n", " max_retries=2,\n", " temperature=0.1,\n", " max_tokens=200,\n", ")\n", "\n", "# Please set your Redis endpoint and access key. It is recommended to use a secure method to store and access these credentials in production.\n", "REDIS_ENDPOINT = \"[YOUR-REDIS-ENDPOINT]\"\n", "REDIS_PASSWORD = \"[YOUR-REDIS-ACCESS-KEY]\"\n", "REDIS_URL = \"rediss://:\" + REDIS_PASSWORD + \"@\" + REDIS_ENDPOINT\n", "USE_SEMATIC_CACHE = False\n", "\n", "# Set the cache to Redis\n", "if USE_SEMATIC_CACHE:\n", " # Semantic cache\n", " embeddings = AzureOpenAIEmbeddings(\n", " model=\"text-embedding-3-large\",\n", " chunk_size=1000,\n", " )\n", " semantic_cache = RedisSemanticCache(\n", " redis_url=REDIS_URL, embedding=embeddings, score_threshold=0.05\n", " )\n", " set_llm_cache(semantic_cache)\n", "else:\n", " # Standard cache. You can setup a Redis cache with a TTL (time to live) for the cache entries.\n", " redis_client = redis.Redis.from_url(REDIS_URL)\n", " redis_cache = RedisCache(redis_client, ttl=30) # 30 seconds TTL\n", " set_llm_cache(redis_cache)" ] }, { "cell_type": "code", "execution_count": null, "id": "a476a508", "metadata": {}, "outputs": [], "source": [ "def execute_with_timing(prompt):\n", " with get_openai_callback() as cb:\n", " start_time = time.perf_counter()\n", " result = llm.invoke(prompt)\n", " end_time = time.perf_counter()\n", " elapsed_time = end_time - start_time\n", " return result, elapsed_time, cb\n", "\n", "\n", "# First call (not cached)\n", "prompt = \"Explain the concept of LLM in three sentences.\"\n", "result1, time1, cb1 = execute_with_timing(prompt)\n", "print(f\"First call (not cached):\")\n", "print(f\"{result1}\\nTime: {time1:.2f} seconds\\n\")\n", "\n", "# Second call (should be cached)\n", "result2, time2, cb2 = execute_with_timing(prompt)\n", "print(f\"Second call (cached):\")\n", "print(f\"{result2}\\nTime: {time2:.2f} seconds\\n\")\n", "\n", "print(f\"Speed improvement: {time1 / time2:.2f}x faster\")" ] }, { "cell_type": "markdown", "id": "5d72dd3a", "metadata": {}, "source": [ "### Checking cache hit/miss" ] }, { "cell_type": "code", "execution_count": null, "id": "994c3363", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import matplotlib as mpl\n", "import redis.asyncio as redis\n", "\n", "\n", "async def get_redis_info(redis_url):\n", " r = await redis.from_url(redis_url)\n", " info = await r.info()\n", " # print(info)\n", " print(\"Redis Server Information:\")\n", " print(f\"Used Memory : {info['used_memory_human']}\")\n", " # Display the Redis info in a pandas DataFrame and plot it\n", "\n", " def convert_memory_to_bytes(memory_str):\n", " units = {\"K\": 1024, \"M\": 1024**2, \"G\": 1024**3}\n", " if memory_str[-1] in units:\n", " return float(memory_str[:-1]) * units[memory_str[-1]]\n", " return float(memory_str)\n", "\n", " redis_info = {\n", " \"Metric\": [\"Cache Hits\", \"Cache Misses\", \"Evicted Keys\", \"Expired Keys\"],\n", " \"Value\": [\n", " info[\"keyspace_hits\"],\n", " info[\"keyspace_misses\"],\n", " info[\"evicted_keys\"],\n", " info[\"expired_keys\"],\n", " ],\n", " }\n", "\n", " df_redis_info = pd.DataFrame(redis_info)\n", " df_redis_info.plot(kind=\"barh\", x=\"Metric\", y=\"Value\", legend=False)\n", "\n", " plt.title(\"Redis Server Information\")\n", " plt.xlabel(\"Value\")\n", " plt.ylabel(\"Metric\")\n", " plt.show()\n", "\n", " await r.aclose()\n", "\n", "\n", "await get_redis_info(REDIS_URL)" ] }, { "cell_type": "markdown", "id": "ea9d4b36", "metadata": {}, "source": [ "### Clear the cache" ] }, { "cell_type": "code", "execution_count": null, "id": "6fbb7759", "metadata": {}, "outputs": [], "source": [ "if not USE_SEMATIC_CACHE:\n", " redis_cache.clear()\n", " print(\"Cache cleared\")" ] }, { "cell_type": "markdown", "id": "8d56f51d", "metadata": {}, "source": [ "<br>\n", "\n", "## 3. Implementing Semantic Caching with Redis\n", "---\n", "You do not need to implement your own caching mechanism since LangChain already provides a caching mechanism. But you can implement your own caching mechanism if you want to. Below is an example of how to implement a custom caching mechanism using Redis." ] }, { "cell_type": "markdown", "id": "e9b627f5", "metadata": {}, "source": [ "### Setting up Redis for semantic cache" ] }, { "cell_type": "code", "execution_count": null, "id": "1db9faa7", "metadata": {}, "outputs": [], "source": [ "from redisvl.utils.vectorize import AzureOpenAITextVectorizer\n", "from redisvl.extensions.llmcache import SemanticCache\n", "\n", "vectorizer = AzureOpenAITextVectorizer(\n", " model=\"text-embedding-3-large\",\n", ")\n", "## You can test the vectorizer by embedding a sample text\n", "# embedding = vectorizer.embed(\"Hello, world!\") # Test\n", "\n", "# Initializing a SemanticCache instance\n", "llmcache = SemanticCache(\n", " name=\"llmcache\", # The name of the search index in Redis\n", " vectorizer=vectorizer, # The vectorizer to use for embedding\n", " redis_url=REDIS_URL, # Connection URL for Redis\n", " distance_threshold=0.1, # Similarity threshold for cache matching\n", ")\n", "\n", "## This can be useful if you want to allow for a bit more variation in the queries that can hit the cache.\n", "# llmcache.set_threshold(0.2)\n", "print(\"=== Connected to Redis and initialized SemanticCache.===\")" ] }, { "cell_type": "markdown", "id": "6ec75cc6", "metadata": {}, "source": [ "### Invoking the LLM\n", "\n", "Semantic caching allows to return multiple matching results for a given query. " ] }, { "cell_type": "code", "execution_count": null, "id": "dd7b3483", "metadata": {}, "outputs": [], "source": [ "def invoke_llm_with_cache(llm, query, num_results=2, distance_threshold=0.2):\n", " \"\"\"\n", " Invoke the LLM with caching.\n", " If the response is found in the cache, it will be returned.\n", " If not, the LLM will be invoked, and the response will be cached.\n", " \"\"\"\n", " results = llmcache.check(\n", " prompt=query,\n", " num_results=num_results,\n", " distance_threshold=distance_threshold,\n", " return_fields=[\"prompt\", \"response\"],\n", " )\n", " if results:\n", " print(\"=== Cache Hit ===\")\n", " return [result[\"response\"] for result in results]\n", " else:\n", " response = llm.invoke(query)\n", " response_text = response.content\n", " llmcache.store(prompt=query, response=response_text)\n", " print(\"=== Cache Miss. Set cache ===\")\n", " return response_text" ] }, { "cell_type": "markdown", "id": "231f40b4", "metadata": {}, "source": [ "### Invoke (not cached)" ] }, { "cell_type": "code", "execution_count": null, "id": "655ca3e2", "metadata": {}, "outputs": [], "source": [ "print(\n", " invoke_llm_with_cache(llm, \"What is Machine Learning? Please explain in 100 words.\")\n", ")\n", "print(\n", " invoke_llm_with_cache(\n", " llm,\n", " \"What is the main difference between GenAI and Machine learning? Please explain in 100 words.\",\n", " )\n", ")\n", "print(\n", " invoke_llm_with_cache(\n", " llm, \"What is Microsoft's main products? Please explain in 200 words.\"\n", " )\n", ")" ] }, { "cell_type": "markdown", "id": "f62c3141", "metadata": {}, "source": [ "### Invoke (Cached)" ] }, { "cell_type": "code", "execution_count": null, "id": "b3fccc45", "metadata": {}, "outputs": [], "source": [ "print(\n", " invoke_llm_with_cache(llm, \"What is Machine Learning? Please explain in 100 words.\")\n", ")\n", "print(\n", " invoke_llm_with_cache(\n", " llm,\n", " \"What is the main difference between GenAI and Machine learning? Please explain in 100 words.\",\n", " )\n", ")\n", "print(\n", " invoke_llm_with_cache(\n", " llm, \"What is Microsoft's main products? Please explain in 200 words.\"\n", " )\n", ")" ] } ], "metadata": { "kernelspec": { "display_name": "py312-dev", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.2" } }, "nbformat": 4, "nbformat_minor": 5 }

3_optimization-design-ptn/02_caching/01_caching-azure-redis.ipynb (397 lines of code) (raw):