gemini/long-context/intro_long_context.ipynb (643 lines of code) (raw):

{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "id": "bCIMTPB1WoTq" }, "outputs": [], "source": [ "# Copyright 2024 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "7yVV6txOmNMn" }, "source": [ "# Introduction to Long Context Window with Gemini on Vertex AI\n", "\n", "<table align=\"left\">\n", " <td style=\"text-align: center\">\n", " <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/long-context/intro_long_context.ipynb\">\n", " <img src=\"https://cloud.google.com/ml-engine/images/colab-logo-32px.png\" alt=\"Google Colaboratory logo\"><br> Open in Colab\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Flong-context%2Fintro_long_context.ipynb\">\n", " <img width=\"32px\" src=\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\" alt=\"Google Cloud Colab Enterprise logo\"><br> Open in Colab Enterprise\n", " </a>\n", " </td> \n", " <td style=\"text-align: center\">\n", " <a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/long-context/intro_long_context.ipynb\">\n", " <img src=\"https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32\" alt=\"Vertex AI logo\"><br> Open in Workbench\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/long-context/intro_long_context.ipynb\">\n", " <img src=\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt=\"GitHub logo\"><br> View on GitHub\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"https://goo.gle/4jj1diI\">\n", " <img width=\"32px\" src=\"https://cdn.qwiklabs.com/assets/gcp_cloud-e3a77215f0b8bfa9b3f611c0d2208c7e8708ed31.svg\" alt=\"Google Cloud logo\"><br> Open in Cloud Skills Boost\n", " </a>\n", " </td>\n", "</table>\n", "\n", "<div style=\"clear: both;\"></div>\n", "\n", "<b>Share to:</b>\n", "\n", "<a href=\"https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/long-context/intro_long_context.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg\" alt=\"LinkedIn logo\">\n", "</a>\n", "\n", "<a href=\"https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/long-context/intro_long_context.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg\" alt=\"Bluesky logo\">\n", "</a>\n", "\n", "<a href=\"https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/long-context/intro_long_context.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg\" alt=\"X logo\">\n", "</a>\n", "\n", "<a href=\"https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/long-context/intro_long_context.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png\" alt=\"Reddit logo\">\n", "</a>\n", "\n", "<a href=\"https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/long-context/intro_long_context.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg\" alt=\"Facebook logo\">\n", "</a> \n" ] }, { "cell_type": "markdown", "metadata": { "id": "1EExYZvij2ve" }, "source": [ "| | |\n", "|-|-|\n", "|Author(s) | [Holt Skinner](https://github.com/holtskinner) |" ] }, { "cell_type": "markdown", "metadata": { "id": "t1DnOs6rkbOy" }, "source": [ "## Overview\n", "\n", "Gemini 2.0 Flash comes standard with a 1 million token context window, and Gemini 2.0 comes with a 2 million token context window. Historically, large language models (LLMs) were significantly limited by the amount of text (or tokens) that could be passed to the model at one time. The Gemini long context window, with [near-perfect retrieval (>99%)](https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf), unlocks many new use cases and developer paradigms.\n", "\n", "In practice, 1 million tokens would look like:\n", "\n", "- 50,000 lines of code (with the standard 80 characters per line)\n", "- All the text messages you have sent in the last 5 years\n", "- 8 average length English novels\n", "- Transcripts of over 200 average length podcast episodes\n", "- 1 hour of video\n", "- ~45 minutes of video with audio\n", "- 9.5 hours of audio\n", "\n", "While the standard use case for most generative models is still text input, the Gemini model family enables a new paradigm of multimodal use cases. These models can natively understand text, video, audio, and images.\n", "\n", "In this notebook, we'll explore multimodal use cases of the long context window.\n", "\n", "For more information, refer to the [Gemini documentation about long context](https://ai.google.dev/gemini-api/docs/long-context)." ] }, { "cell_type": "markdown", "metadata": { "id": "3e127df191a2" }, "source": [ "## Tokens\n", "\n", "Tokens can be single characters like `z` or whole words like `cat`. Long words\n", "are broken up into several tokens. The set of all tokens used by the model is\n", "called the vocabulary, and the process of splitting text into tokens is called\n", "_tokenization_.\n", "\n", "> **Important:** For Gemini models, a token is equivalent to about 4 characters. 100 tokens is equal to about 60-80 English words.\n", "\n", "For multimodal input, this is how tokens are calculated regardless of display or file size:\n", "\n", "* Images: `258` tokens\n", "* Video: `263` tokens per second\n", "* Audio: `32` tokens per second" ] }, { "cell_type": "markdown", "metadata": { "id": "a8aa3bd0feda" }, "source": [ "## Why is the long context window useful?\n", "\n", "The basic way you use the Gemini models is by passing information (context)\n", "to the model, which will subsequently generate a response. An analogy for the\n", "context window is short term memory. There is a limited amount of information\n", "that can be stored in someone's short term memory, and the same is true for\n", "generative models.\n", "\n", "You can read more about how models work under the hood in our [generative models guide](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/overview).\n", "\n", "Even though the models can take in more and more context, much of the\n", "conventional wisdom about using large language models assumes this inherent\n", "limitation on the model, which as of 2024, is no longer the case.\n", "\n", "Some common strategies to handle the limitation of small context windows\n", "included:\n", "\n", "- Arbitrarily dropping old messages / text from the context window as new text\n", " comes in\n", "- Summarizing previous content and replacing it with the summary when the\n", " context window gets close to being full\n", "- Using RAG with semantic search to move data out of the context window and\n", " into a vector database\n", "- Using deterministic or generative filters to remove certain text /\n", " characters from prompts to save tokens\n", "\n", "While many of these are still relevant in certain cases, the default place to start is now just putting all of the tokens into the context window. Because Gemini models were purpose-built with a long context window, they are much more capable of in-context learning. This means that instructional materials provided in context can be highly effective for handling inputs that are not covered by the model's training data." ] }, { "cell_type": "markdown", "metadata": { "id": "61RBz8LLbxCR" }, "source": [ "## Getting Started" ] }, { "cell_type": "markdown", "metadata": { "id": "No17Cw5hgx12" }, "source": [ "### Install Google Gen AI SDK\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "tFy3H3aPgx12" }, "outputs": [], "source": [ "%pip install --upgrade --quiet google-genai" ] }, { "cell_type": "markdown", "metadata": { "id": "R5Xep4W9lq-Z" }, "source": [ "### Restart runtime\n", "\n", "To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which restarts the current kernel." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "XRvKdaPDTznN" }, "outputs": [], "source": [ "import sys\n", "\n", "if \"google.colab\" in sys.modules:\n", " import IPython\n", "\n", " app = IPython.Application.instance()\n", " app.kernel.do_shutdown(True)" ] }, { "cell_type": "markdown", "metadata": { "id": "SbmM4z7FOBpM" }, "source": [ "<div class=\"alert alert-block alert-warning\">\n", "<b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. ⚠️</b>\n", "</div>\n" ] }, { "cell_type": "markdown", "metadata": { "id": "dmWOrTJ3gx13" }, "source": [ "### Authenticate your notebook environment (Colab only)\n", "\n", "If you are running this notebook on Google Colab, run the cell below to authenticate your environment.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "NyKGtVQjgx13" }, "outputs": [], "source": [ "import sys\n", "\n", "if \"google.colab\" in sys.modules:\n", " from google.colab import auth\n", "\n", " auth.authenticate_user()" ] }, { "cell_type": "markdown", "metadata": { "id": "f97b5bd5f98d" }, "source": [ "### Import libraries\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "e206cb5e39e2" }, "outputs": [], "source": [ "from IPython.display import Markdown, display\n", "from google import genai\n", "from google.genai.types import Part" ] }, { "cell_type": "markdown", "metadata": { "id": "66987c91b97c" }, "source": [ "### Set Google Cloud project information and create client\n", "\n", "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).\n", "\n", "Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "d4a60efd521e" }, "outputs": [], "source": [ "import os\n", "\n", "PROJECT_ID = \"[your-project-id]\" # @param {type: \"string\", placeholder: \"[your-project-id]\", isTemplate: true}\n", "if not PROJECT_ID or PROJECT_ID == \"[your-project-id]\":\n", " PROJECT_ID = str(os.environ.get(\"GOOGLE_CLOUD_PROJECT\"))\n", "\n", "LOCATION = os.environ.get(\"GOOGLE_CLOUD_REGION\", \"us-central1\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "72a96176402b" }, "outputs": [], "source": [ "client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)" ] }, { "cell_type": "markdown", "metadata": { "id": "BY1nfXrqRxVX" }, "source": [ "### Load the Gemini 2.0 Flash model\n", "\n", "To learn more about all [Gemini API models on Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models#gemini-models).\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "id": "U7ExWmuLBdIA" }, "outputs": [], "source": [ "MODEL_ID = \"gemini-2.0-flash-001\" # @param {type:\"string\"}" ] }, { "cell_type": "markdown", "metadata": { "id": "l9OKM0-4SQf8" }, "source": [ "## Long-form text\n", "\n", "Text has proved to be the layer of intelligence underpinning much of the momentum around LLMs. As mentioned earlier, much of the practical limitation of LLMs was because of not having a large enough context window to do certain tasks. This led to the rapid adoption of retrieval augmented generation (RAG) and other techniques which dynamically provide the model with relevant\n", "contextual information.\n", "\n", "Some emerging and standard use cases for text based long context include:\n", "\n", "- Summarizing large corpuses of text\n", " - Previous summarization options with smaller context models would require\n", " a sliding window or another technique to keep state of previous sections\n", " as new tokens are passed to the model\n", "- Question and answering\n", " - Historically this was only possible with RAG given the limited amount of\n", " context and models' factual recall being low\n", "- Agentic workflows\n", " - Text is the underpinning of how agents keep state of what they have done\n", " and what they need to do; not having enough information about the world\n", " and the agent's goal is a limitation on the reliability of agents" ] }, { "cell_type": "markdown", "metadata": { "id": "2dab25e392cb" }, "source": [ "[War and Peace by Leo Tolstoy](https://en.wikipedia.org/wiki/War_and_Peace) is considered one of the greatest literary works of all time; however, it is over 1,225 pages and the average reader will spend 37 hours and 48 minutes reading this book at 250 WPM (words per minute). 😵‍💫 The text alone takes up 3.4 MB of storage space. However, the entire novel consists of less than 900,000 tokens, so it will fit within the Gemini context window.\n", "\n", "We are going to pass in the entire text into Gemini 2.0 and get a detailed summary of the plot. For this example, we have the text of the novel from [Project Gutenberg](https://www.gutenberg.org/ebooks/2600) stored in a public Google Cloud Storage bucket.\n", "\n", "First, we will use the `count_tokens()` method to examine the token count of the full prompt, then send the prompt to Gemini." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "FhFxrtfdSwOP" }, "outputs": [], "source": [ "# Set contents to send to the model\n", "contents = [\n", " \"Provide a detailed summary of the following novel.\",\n", " Part.from_uri(\n", " file_uri=\"gs://github-repo/generative-ai/gemini/long-context/WarAndPeace.txt\",\n", " mime_type=\"text/plain\",\n", " ),\n", "]\n", "\n", "# Counts tokens\n", "print(\n", " client.models.count_tokens(\n", " model=MODEL_ID,\n", " contents=contents,\n", " )\n", ")\n", "\n", "# Prompt the model to generate content\n", "response = client.models.generate_content(model=MODEL_ID, contents=contents)\n", "\n", "# Print the model response\n", "print(f\"\\nUsage metadata:\\n{response.usage_metadata}\")\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "bedc5fe7ac33" }, "source": [ "## Long-form video\n", "\n", "Video content has been difficult to process due to constraints of the format itself.\n", "It was hard to skim the content, transcripts often failed to capture the nuance of a video, and most tools don't process images, text, and audio together.\n", "The Gemini long context window allows the ability to reason and answer questions about multimodal inputs with\n", "sustained performance.\n", "\n", "When tested on the needle in a video haystack problem with 1M tokens, Gemini 2.0 obtained >99.8% recall of the video in the context window, and Gemini 2.0 reached state of the art performance on the [Video-MME benchmark](https://video-mme.github.io/home_page.html).\n", "\n", "Some emerging and standard use cases for video long context include:\n", "\n", "- Video question and answering\n", "- Video memory, as shown with [Google's Project Astra](https://deepmind.google/technologies/gemini/project-astra/)\n", "- Video captioning\n", "- Video recommendation systems, by enriching existing metadata with new\n", " multimodal understanding\n", "- Video customization, by looking at a corpus of data and associated video\n", " metadata and then removing parts of videos that are not relevant to the\n", " viewer\n", "- Video content moderation\n", "- Real-time video processing" ] }, { "cell_type": "markdown", "metadata": { "id": "14c6c9427a81" }, "source": [ "[Google I/O](https://io.google/) is one of the major events when Google's developer tools are announced. Workshop sessions and are filled with a lot of material, so it can be difficult to keep track all that is discussed.\n", "\n", "We are going to use a video of a session from Google I/O 2024 focused on [Grounding for Gemini](https://www.youtube.com/watch?v=v4s5eU2tfd4) to calculate tokens and process the information presented. We will ask a specific question about a point in the video and ask for a general summary." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "c7890cf45808" }, "outputs": [], "source": [ "# Set contents to send to the model\n", "contents = [\n", " \"At what time in the following video is the Cymbal Starlight demo?\",\n", " Part.from_uri(\n", " file_uri=\"gs://github-repo/generative-ai/gemini/long-context/GoogleIOGroundingRAG.mp4\",\n", " mime_type=\"video/mp4\",\n", " ),\n", "]\n", "\n", "# Counts tokens\n", "print(\n", " client.models.count_tokens(\n", " model=MODEL_ID,\n", " contents=contents,\n", " )\n", ")\n", "\n", "# Prompt the model to generate content\n", "response = client.models.generate_content(model=MODEL_ID, contents=contents)\n", "\n", "# Print the model response\n", "print(f\"\\nUsage metadata:\\n{response.usage_metadata}\")\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "a5d0d710941c" }, "source": [ "## Long-form audio\n", "\n", "In order to process audio, developers have typically needed to string together multiple models, like a speech-to-text model and a text-to-text model, in order to process audio. This led to additional latency due to multiple round-trip requests, and the context of the audio itself could be lost.\n", "\n", "The Gemini models were the first natively multimodal large language models that could understand audio.\n", "\n", "On standard audio-haystack evaluations, Gemini 2.0 is able to find the hidden audio in 100% of the tests and Gemini 2.0 is able to find it in 98.7% [of the tests](https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf). Further, on a test set of 15-minute audio clips, Gemini 2.0 archives a word error rate (WER) of ~5.5%, much lower than even specialized speech-to-text models, without the added complexity of extra input segmentation and pre-processing.\n", "\n", "The long context window accepts up to 9.5 hours of audio in a single request.\n", "\n", "Some emerging and standard use cases for audio context include:\n", "\n", "- Real-time transcription and translation\n", "- Podcast / video question and answering\n", "- Meeting transcription and summarization\n", "- Voice assistants" ] }, { "cell_type": "markdown", "metadata": { "id": "7d3bb89a87a5" }, "source": [ "Podcasts are a great way to learn about the latest news in technology, but there are so many out there that it can be difficult to follow them all. It's also challenging to find a specific episode with a given topic or a quote.\n", "\n", "In this example, we will process 9 episodes of the [Google Kubernetes Podcast](https://cloud.google.com/podcasts/kubernetespodcast) and ask specific questions about the content." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "672059078485" }, "outputs": [], "source": [ "# Set contents to send to the model\n", "contents = [\n", " \"According to the following podcasts, what can you tell me about AI/ML workloads on Kubernetes?\",\n", " Part.from_uri(\n", " file_uri=\"gs://github-repo/generative-ai/gemini/long-context/20240417-kpod223.mp3\",\n", " mime_type=\"audio/mpeg\",\n", " ),\n", " Part.from_uri(\n", " file_uri=\"gs://github-repo/generative-ai/gemini/long-context/20240430-kpod224.mp3\",\n", " mime_type=\"audio/mpeg\",\n", " ),\n", " Part.from_uri(\n", " file_uri=\"gs://github-repo/generative-ai/gemini/long-context/20240515-kpod225.mp3\",\n", " mime_type=\"audio/mpeg\",\n", " ),\n", " Part.from_uri(\n", " file_uri=\"gs://github-repo/generative-ai/gemini/long-context/20240529-kpod226.mp3\",\n", " mime_type=\"audio/mpeg\",\n", " ),\n", " Part.from_uri(\n", " file_uri=\"gs://github-repo/generative-ai/gemini/long-context/20240606-kpod227.mp3\",\n", " mime_type=\"audio/mpeg\",\n", " ),\n", " Part.from_uri(\n", " file_uri=\"gs://github-repo/generative-ai/gemini/long-context/20240611-kpod228.mp3\",\n", " mime_type=\"audio/mpeg\",\n", " ),\n", " Part.from_uri(\n", " file_uri=\"gs://github-repo/generative-ai/gemini/long-context/20240625-kpod229.mp3\",\n", " mime_type=\"audio/mpeg\",\n", " ),\n", " Part.from_uri(\n", " file_uri=\"gs://github-repo/generative-ai/gemini/long-context/20240709-kpod230.mp3\",\n", " mime_type=\"audio/mpeg\",\n", " ),\n", " Part.from_uri(\n", " file_uri=\"gs://github-repo/generative-ai/gemini/long-context/20240723-kpod231.mp3\",\n", " mime_type=\"audio/mpeg\",\n", " ),\n", "]\n", "\n", "# Counts tokens\n", "print(\n", " client.models.count_tokens(\n", " model=MODEL_ID,\n", " contents=contents,\n", " )\n", ")\n", "\n", "# Prompt the model to generate content\n", "response = client.models.generate_content(model=MODEL_ID, contents=contents)\n", "\n", "# Print the model response\n", "print(f\"\\nUsage metadata:\\n{response.usage_metadata}\")\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "8291d9972287" }, "source": [ "## Code\n", "\n", "For a long context window use case involving ingesting an entire GitHub repository, check out [Analyze a codebase with Gemini on Vertex AI](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/code/analyze_codebase.ipynb)" ] }, { "cell_type": "markdown", "metadata": { "id": "aec8e711926b" }, "source": [ "## Context caching\n", "\n", "[Context caching](https://cloud.google.com/vertex-ai/generative-ai/docs/context-cache/context-cache-overview) allows developers to reduce the time and cost of repeated requests using the large context window.\n", "For examples on how to use Context Caching with Gemini on Vertex AI, refer to [Intro to Context Caching with Gemini on Vertex AI](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/context-caching/intro_context_caching.ipynb)" ] } ], "metadata": { "colab": { "name": "intro_long_context.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }