gemini/getting-started/intro_gemini_2_5_pro.ipynb (1,494 lines of code) (raw):

{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "id": "sqi5B7V_Rjim" }, "outputs": [], "source": [ "# Copyright 2025 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "VyPmicX9RlZX" }, "source": [ "# Intro to Gemini 2.5 Pro\n", "\n", "\n", "<table align=\"left\">\n", " <td style=\"text-align: center\">\n", " <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_2_5_pro.ipynb\">\n", " <img width=\"32px\" src=\"https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg\" alt=\"Google Colaboratory logo\"><br> Open in Colab\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fgetting-started%2Fintro_gemini_2_5_pro.ipynb\">\n", " <img width=\"32px\" src=\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\" alt=\"Google Cloud Colab Enterprise logo\"><br> Open in Colab Enterprise\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/getting-started/intro_gemini_2_5_pro.ipynb\">\n", " <img src=\"https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_2_5_pro.ipynb\">\n", " <img width=\"32px\" src=\"https://www.svgrepo.com/download/217753/github.svg\" alt=\"GitHub logo\"><br> View on GitHub\n", " </a>\n", " </td>\n", "</table>\n", "\n", "<div style=\"clear: both;\"></div>\n", "\n", "<b>Share to:</b>\n", "\n", "<a href=\"https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_2_5_pro.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg\" alt=\"LinkedIn logo\">\n", "</a>\n", "\n", "<a href=\"https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_2_5_pro.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg\" alt=\"Bluesky logo\">\n", "</a>\n", "\n", "<a href=\"https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_2_5_pro.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg\" alt=\"X logo\">\n", "</a>\n", "\n", "<a href=\"https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_2_5_pro.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png\" alt=\"Reddit logo\">\n", "</a>\n", "\n", "<a href=\"https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_2_5_pro.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg\" alt=\"Facebook logo\">\n", "</a>" ] }, { "cell_type": "markdown", "metadata": { "id": "8MqT58L6Rm_q" }, "source": [ "| Authors |\n", "| --- |\n", "| [Eric Dong](https://github.com/gericdong) |\n", "| [Holt Skinner](https://github.com/holtskinner) |" ] }, { "cell_type": "markdown", "metadata": { "id": "nVxnv1D5RoZw" }, "source": [ "## Overview\n", "\n", "**YouTube Video: Introduction to Gemini on Vertex AI**\n", "\n", "<a href=\"https://www.youtube.com/watch?v=YfiLUpNejpE&list=PLIivdWyY5sqJio2yeg1dlfILOUO2FoFRx\" target=\"_blank\">\n", " <img src=\"https://img.youtube.com/vi/YfiLUpNejpE/maxresdefault.jpg\" alt=\"Introduction to Gemini on Vertex AI\" width=\"500\">\n", "</a>\n", "\n", "[Gemini 2.5 Pro](https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/) is Google's strongest model for coding and world knowledge.\n", "\n", "With the 2.5 series, the Gemini models are now hybrid reasoning models! Gemini 2.5 Pro can apply an extended amount of thinking across tasks, and use tools in order to maximize response accuracy. \n", "\n", "Gemini 2.5 Pro is: \n", "\n", "- A significant improvement from previous models across capabilities including coding, reasoning, and multimodality \n", "- Industry-leading in reasoning with state of the art performance in Math & STEM benchmarks\n", "- An amazing model for code, with particularly strong web development \n", "- Particularly good for complex prompts, while still being well rounded, including #1 on LMSys" ] }, { "cell_type": "markdown", "metadata": { "id": "WfFPCBL4Hq8x" }, "source": [ "### Objectives\n", "\n", "In this tutorial, you will learn how to use the Gemini API and the Google Gen AI SDK for Python with the Gemini 2.5 Pro model.\n", "\n", "You will complete the following tasks:\n", "\n", "- Generate text from text prompts\n", " - Generate streaming text\n", " - Start multi-turn chats\n", " - Use asynchronous methods\n", "- Configure model parameters\n", "- Set system instructions\n", "- Use safety filters\n", "- Use controlled generation\n", "- Count tokens\n", "- Process multimodal (audio, code, documents, images, video) data\n", "- Use automatic and manual function calling\n", "- Code execution\n", "- Thinking mode examples" ] }, { "cell_type": "markdown", "metadata": { "id": "gPiTOAHURvTM" }, "source": [ "## Getting Started" ] }, { "cell_type": "markdown", "metadata": { "id": "CHRZUpfWSEpp" }, "source": [ "### Install Google Gen AI SDK for Python\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "sG3_LKsWSD3A" }, "outputs": [], "source": [ "%pip install --upgrade --quiet google-genai" ] }, { "cell_type": "markdown", "metadata": { "id": "HlMVjiAWSMNX" }, "source": [ "### Authenticate your notebook environment (Colab only)\n", "\n", "If you are running this notebook on Google Colab, run the cell below to authenticate your environment." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "12fnq4V0SNV3" }, "outputs": [], "source": [ "import sys\n", "\n", "if \"google.colab\" in sys.modules:\n", " from google.colab import auth\n", "\n", " auth.authenticate_user()" ] }, { "cell_type": "markdown", "metadata": { "id": "Ve4YBlDqzyj9" }, "source": [ "### Connect to a generative AI API service\n", "\n", "Google Gen AI APIs and models including Gemini are available in the following two API services:\n", "\n", "- **[Gemini Developer API](https://ai.google.dev/gemini-api/docs)**: Experiment, prototype, and deploy small projects.\n", "- **[Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/overview)**: Build enterprise-ready projects on Google Cloud.\n", "\n", "The Google Gen AI SDK provides a unified interface to these two API services." ] }, { "cell_type": "markdown", "metadata": { "id": "EdvJRUWRNGHE" }, "source": [ "### Import libraries\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "qgdSpVmDbdQ9" }, "outputs": [], "source": [ "from IPython.display import HTML, Image, Markdown, display\n", "from google import genai\n", "from google.genai.types import (\n", " FunctionDeclaration,\n", " GenerateContentConfig,\n", " GoogleSearch,\n", " HarmBlockThreshold,\n", " HarmCategory,\n", " Part,\n", " SafetySetting,\n", " Tool,\n", " ToolCodeExecution,\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "be18ac9c5ec8" }, "source": [ "### Set up Google Cloud Project or API Key for Vertex AI\n", "\n", "You'll need to set up authentication by choosing **one** of the following methods:\n", "\n", "1. **Use a Google Cloud Project:** Recommended for most users, this requires enabling the Vertex AI API in your Google Cloud project.\n", " [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com)\n", " * Run the cell below to set your project ID.\n", "2. **Use a Vertex AI API Key (Express Mode):** For quick experimentation. \n", " [Get an API Key](https://cloud.google.com/vertex-ai/generative-ai/docs/start/express-mode/overview)\n", " * Run the cell further below to use your API key.### Set up Google Cloud Project or API Key for Vertex AI" ] }, { "cell_type": "markdown", "metadata": { "id": "a34b28cb8d5a" }, "source": [ "#### Option 1. Use a Google Cloud Project" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "72f74f7b9786" }, "outputs": [], "source": [ "import os\n", "\n", "PROJECT_ID = \"[your-project-id]\" # @param {type: \"string\", placeholder: \"[your-project-id]\", isTemplate: true}\n", "if not PROJECT_ID or PROJECT_ID == \"[your-project-id]\":\n", " PROJECT_ID = str(os.environ.get(\"GOOGLE_CLOUD_PROJECT\"))\n", "\n", "LOCATION = os.environ.get(\"GOOGLE_CLOUD_REGION\", \"us-central1\")\n", "\n", "client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)" ] }, { "cell_type": "markdown", "metadata": { "id": "c173348120cf" }, "source": [ "#### Option 2. Use a Vertex AI API Key (Express Mode)\n", "\n", "Uncomment the following block to use Express Mode" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "fa3d4873034b" }, "outputs": [], "source": [ "# API_KEY = \"[your-api-key]\" # @param {type: \"string\", placeholder: \"[your-api-key]\", isTemplate: true}\n", "\n", "# if not API_KEY or API_KEY == \"[your-api-key]\":\n", "# raise Exception(\"You must provide an API key to use Vertex AI in express mode.\")\n", "\n", "# client = genai.Client(vertexai=True, api_key=API_KEY)" ] }, { "cell_type": "markdown", "metadata": { "id": "7b36ce4ac022" }, "source": [ "Verify which mode you are using." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "b55e64b8ebe4" }, "outputs": [], "source": [ "if not client.vertexai:\n", " print(\"Using Gemini Developer API.\")\n", "elif client._api_client.project:\n", " print(\n", " f\"Using Vertex AI with project: {client._api_client.project} in location: {client._api_client.location}\"\n", " )\n", "elif client._api_client.api_key:\n", " print(\n", " f\"Using Vertex AI in express mode with API key: {client._api_client.api_key[:5]}...{client._api_client.api_key[-5:]}\"\n", " )" ] }, { "cell_type": "markdown", "metadata": { "id": "n4yRkFg6BBu4" }, "source": [ "## Use the Gemini 2.5 Pro model" ] }, { "cell_type": "markdown", "metadata": { "id": "eXHJi5B6P5vd" }, "source": [ "### Load the Gemini 2.5 Pro model\n", "\n", "Learn more about all [Gemini models on Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models#gemini-models)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "-coEslfWPrxo" }, "outputs": [], "source": [ "MODEL_ID = \"gemini-2.5-pro-preview-03-25\" # @param {type: \"string\"}" ] }, { "cell_type": "markdown", "metadata": { "id": "37CH91ddY9kG" }, "source": [ "### Generate text from text prompts\n", "\n", "Use the `generate_content()` method to generate responses to your prompts.\n", "\n", "You can pass text to `generate_content()`, and use the `.text` property to get the text content of the response.\n", "\n", "By default, Gemini outputs formatted text using [Markdown](https://daringfireball.net/projects/markdown/) syntax." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "xRJuHj0KZ8xz" }, "outputs": [], "source": [ "response = client.models.generate_content(\n", " model=MODEL_ID, contents=\"What's the largest planet in our solar system?\"\n", ")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "JkYQATRxAK1_" }, "source": [ "#### Example prompts\n", "\n", "- What are the biggest challenges facing the healthcare industry?\n", "- What are the latest developments in the automotive industry?\n", "- What are the biggest opportunities in retail industry?\n", "- (Try your own prompts!)\n", "\n", "For more examples of prompt engineering, refer to [this notebook](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/prompts/intro_prompt_design.ipynb)." ] }, { "cell_type": "markdown", "metadata": { "id": "6lLIxqS6_-l8" }, "source": [ "### Generate content stream\n", "\n", "By default, the model returns a response after completing the entire generation process. You can also use the `generate_content_stream` method to stream the response as it is being generated, and the model will return chunks of the response as soon as they are generated." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ZiwWBhXsAMnv" }, "outputs": [], "source": [ "output_text = \"\"\n", "markdown_display_area = display(Markdown(output_text), display_id=True)\n", "\n", "for chunk in client.models.generate_content_stream(\n", " model=MODEL_ID,\n", " contents=\"Tell me a story about a lonely robot who finds friendship in a most unexpected place.\",\n", "):\n", " output_text += chunk.text\n", " markdown_display_area.update(Markdown(output_text))" ] }, { "cell_type": "markdown", "metadata": { "id": "29jFnHZZWXd7" }, "source": [ "### Start a multi-turn chat\n", "\n", "The Gemini API supports freeform multi-turn conversations across multiple turns with back-and-forth interactions.\n", "\n", "The context of the conversation is preserved between messages." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "DbM12JaLWjiF" }, "outputs": [], "source": [ "chat = client.chats.create(model=MODEL_ID)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "JQem1halYDBW" }, "outputs": [], "source": [ "response = chat.send_message(\"Write a function that checks if a year is a leap year.\")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "vUJR4Pno-LGK" }, "source": [ "This follow-up prompt shows how the model responds based on the previous prompt:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "6Fn69TurZ9DB" }, "outputs": [], "source": [ "response = chat.send_message(\"Write a unit test of the generated function.\")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "arLJE4wOuhh6" }, "source": [ "### Send asynchronous requests\n", "\n", "`client.aio` exposes all analogous [async](https://docs.python.org/3/library/asyncio.html) methods that are available on `client`.\n", "\n", "For example, `client.aio.models.generate_content` is the async version of `client.models.generate_content`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "gSReaLazs-dP" }, "outputs": [], "source": [ "response = await client.aio.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"Compose a song about the adventures of a time-traveling squirrel.\",\n", ")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "hIJVEr0RQY8S" }, "source": [ "## Configure model parameters\n", "\n", "You can include parameter values in each call that you send to a model to control how the model generates a response. The model can generate different results for different parameter values. You can experiment with different model parameters to see how the results change.\n", "\n", "- Learn more about [experimenting with parameter values](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/adjust-parameter-values).\n", "\n", "- See a list of all [Gemini API parameters](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference#parameters).\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "d9NXP5N2Pmfo" }, "outputs": [], "source": [ "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"Tell me how the internet works, but pretend I'm a puppy who only understands squeaky toys.\",\n", " config=GenerateContentConfig(\n", " temperature=2.0,\n", " top_p=0.95,\n", " candidate_count=1,\n", " ),\n", ")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "El1lx8P9ElDq" }, "source": [ "## Set system instructions\n", "\n", "[System instructions](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/system-instruction-introduction) allow you to steer the behavior of the model. By setting the system instruction, you are giving the model additional context to understand the task, provide more customized responses, and adhere to guidelines over the user interaction." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "7A-yANiyCLaO" }, "outputs": [], "source": [ "system_instruction = \"\"\"\n", " You are a helpful language translator.\n", " Your mission is to translate text in English to Spanish.\n", "\"\"\"\n", "\n", "prompt = \"\"\"\n", " User input: I like bagels.\n", " Answer:\n", "\"\"\"\n", "\n", "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=prompt,\n", " config=GenerateContentConfig(\n", " system_instruction=system_instruction,\n", " ),\n", ")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "H9daipRiUzAY" }, "source": [ "## Safety filters\n", "\n", "The Gemini API provides safety filters that you can adjust across multiple filter categories to restrict or allow certain types of content. You can use these filters to adjust what's appropriate for your use case. See the [Configure safety filters](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/configure-safety-filters) page for details.\n", "\n", "When you make a request to Gemini, the content is analyzed and assigned a safety rating. You can inspect the safety ratings of the generated content by printing out the model responses.\n", "\n", "The safety settings are `OFF` by default and the default block thresholds are `BLOCK_NONE`.\n", "\n", "For more examples of safety filters, refer to [this notebook](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/responsible-ai/gemini_safety_ratings.ipynb).\n", "\n", "You can use `safety_settings` to adjust the safety settings for each request you make to the API. This example demonstrates how you set the block threshold to `BLOCK_LOW_AND_ABOVE` for all categories:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "yPlDRaloU59b" }, "outputs": [], "source": [ "system_instruction = \"Be as mean and hateful as possible.\"\n", "\n", "prompt = \"\"\"\n", " Write a list of 5 disrespectful things that I might say to the universe after stubbing my toe in the dark.\n", "\"\"\"\n", "\n", "safety_settings = [\n", " SafetySetting(\n", " category=HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,\n", " threshold=HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,\n", " ),\n", " SafetySetting(\n", " category=HarmCategory.HARM_CATEGORY_HARASSMENT,\n", " threshold=HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,\n", " ),\n", " SafetySetting(\n", " category=HarmCategory.HARM_CATEGORY_HATE_SPEECH,\n", " threshold=HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,\n", " ),\n", " SafetySetting(\n", " category=HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,\n", " threshold=HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,\n", " ),\n", "]\n", "\n", "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=prompt,\n", " config=GenerateContentConfig(\n", " system_instruction=system_instruction,\n", " safety_settings=safety_settings,\n", " ),\n", ")\n", "\n", "# Response will be `None` if it is blocked.\n", "print(response.text)\n", "# Finish Reason will be `SAFETY` if it is blocked.\n", "print(response.candidates[0].finish_reason)\n", "# Safety Ratings show the levels for each filter.\n", "for safety_rating in response.candidates[0].safety_ratings:\n", " print(safety_rating)" ] }, { "cell_type": "markdown", "metadata": { "id": "rZV2TY5Pa3Dd" }, "source": [ "## Send multimodal prompts\n", "\n", "Gemini is a multimodal model that supports multimodal prompts.\n", "\n", "You can include any of the following data types from various sources.\n", "\n", "<table>\n", " <thead>\n", " <tr>\n", " <th>Data type</th>\n", " <th>Source(s)</th>\n", " <th>MIME Type(s)</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <td>Text</td>\n", " <td>Inline, Local File, General URL, Google Cloud Storage</td>\n", " <td><code>text/plain</code> <code>text/html</code></td>\n", " </tr>\n", " <tr>\n", " <td>Code</td>\n", " <td>Inline, Local File, General URL, Google Cloud Storage</td>\n", " <td><code>text/plain</code></td>\n", " </tr>\n", " <tr>\n", " <td>Document</td>\n", " <td>Local File, General URL, Google Cloud Storage</td>\n", " <td><code>application/pdf</code></td>\n", " </tr>\n", " <tr>\n", " <td>Image</td>\n", " <td>Local File, General URL, Google Cloud Storage</td>\n", " <td><code>image/jpeg</code> <code>image/png</code> <code>image/webp</code></td>\n", " </tr>\n", " <tr>\n", " <td>Audio</td>\n", " <td>Local File, General URL, Google Cloud Storage</td>\n", " <td>\n", " <code>audio/aac</code> <code>audio/flac</code> <code>audio/mp3</code>\n", " <code>audio/m4a</code> <code>audio/mpeg</code> <code>audio/mpga</code>\n", " <code>audio/mp4</code> <code>audio/opus</code> <code>audio/pcm</code>\n", " <code>audio/wav</code> <code>audio/webm</code>\n", " </td>\n", " </tr>\n", " <tr>\n", " <td>Video</td>\n", " <td>Local File, General URL, Google Cloud Storage, YouTube</td>\n", " <td>\n", " <code>video/mp4</code> <code>video/mpeg</code> <code>video/x-flv</code>\n", " <code>video/quicktime</code> <code>video/mpegps</code> <code>video/mpg</code>\n", " <code>video/webm</code> <code>video/wmv</code> <code>video/3gpp</code>\n", " </td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "\n", "For more examples of multimodal use cases, refer to [this notebook](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/intro_multimodal_use_cases.ipynb)." ] }, { "cell_type": "markdown", "metadata": { "id": "w4npg1tNTYB9" }, "source": [ "### Send local image\n", "\n", "Download an image to local storage from Google Cloud Storage.\n", "\n", "For this example, we'll use this image of a meal.\n", "\n", "<img src=\"https://storage.googleapis.com/cloud-samples-data/generative-ai/image/meal.png\" alt=\"Meal\" width=\"500\">" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "4avkv0Z7qUI-" }, "outputs": [], "source": [ "!wget https://storage.googleapis.com/cloud-samples-data/generative-ai/image/meal.png" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "umhZ61lrSyJh" }, "outputs": [], "source": [ "with open(\"meal.png\", \"rb\") as f:\n", " image = f.read()\n", "\n", "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=[\n", " Part.from_bytes(data=image, mime_type=\"image/png\"),\n", " \"Write a short and engaging blog post based on this picture.\",\n", " ],\n", ")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "e7b6170c9255" }, "source": [ "### Send document from Google Cloud Storage\n", "\n", "This example document is the paper [\"Attention is All You Need\"](https://arxiv.org/abs/1706.03762), created by researchers from Google and the University of Toronto.\n", "\n", "Check out this notebook for more examples of document understanding with Gemini:\n", "\n", "- [Document Processing with Gemini](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/document-processing/document_processing.ipynb)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "1d58b914d798" }, "outputs": [], "source": [ "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=[\n", " Part.from_uri(\n", " file_uri=\"gs://cloud-samples-data/generative-ai/pdf/1706.03762v7.pdf\",\n", " mime_type=\"application/pdf\",\n", " ),\n", " \"Summarize the document.\",\n", " ],\n", ")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "b247a2ee0e38" }, "source": [ "### Send audio from General URL\n", "\n", "This example is audio from an episode of the [Kubernetes Podcast](https://kubernetespodcast.com/)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "cbe8c9c67ba7" }, "outputs": [], "source": [ "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=[\n", " Part.from_uri(\n", " file_uri=\"https://traffic.libsyn.com/secure/e780d51f-f115-44a6-8252-aed9216bb521/KPOD242.mp3\",\n", " mime_type=\"audio/mpeg\",\n", " ),\n", " \"Write a summary of this podcast episode.\",\n", " ],\n", " config=GenerateContentConfig(audio_timestamp=True),\n", ")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "8D3_oNUTuW2q" }, "source": [ "### Send video from YouTube URL\n", "\n", "This example is the YouTube video [Google — 25 Years in Search: The Most Searched](https://www.youtube.com/watch?v=3KtWfp0UopM).\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "l7-w8G_2wAOw" }, "outputs": [], "source": [ "video = Part.from_uri(\n", " file_uri=\"https://www.youtube.com/watch?v=3KtWfp0UopM\",\n", " mime_type=\"video/mp4\",\n", ")\n", "\n", "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=[\n", " video,\n", " \"At what point in the video is Harry Potter shown?\",\n", " ],\n", ")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "df8013cfa7f7" }, "source": [ "### Send web page\n", "\n", "This example is from the [Generative AI on Vertex AI documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/overview).\n", "\n", "**NOTE:** The URL must be publicly accessible." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "337793322c91" }, "outputs": [], "source": [ "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=[\n", " Part.from_uri(\n", " file_uri=\"https://cloud.google.com/vertex-ai/generative-ai/docs/overview\",\n", " mime_type=\"text/html\",\n", " ),\n", " \"Write a summary of this documentation.\",\n", " ],\n", ")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "rVlo0mWuZGkQ" }, "source": [ "## Control generated output\n", "\n", "[Controlled generation](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/control-generated-output) allows you to define a response schema to specify the structure of a model's output, the field names, and the expected data type for each field.\n", "\n", "The response schema is specified in the `response_schema` parameter in `config`, and the model output will strictly follow that schema.\n", "\n", "You can provide the schemas as [Pydantic](https://docs.pydantic.dev/) models or a [JSON](https://www.json.org/json-en.html) string and the model will respond as JSON or an [Enum](https://docs.python.org/3/library/enum.html) depending on the value set in `response_mime_type`.\n", "\n", "For more examples of controlled generation, refer to [this notebook](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/controlled-generation/intro_controlled_generation.ipynb)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "OjSgf2cDN_bG" }, "outputs": [], "source": [ "from pydantic import BaseModel\n", "\n", "\n", "class Recipe(BaseModel):\n", " name: str\n", " description: str\n", " ingredients: list[str]\n", "\n", "\n", "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"List a few popular cookie recipes and their ingredients.\",\n", " config=GenerateContentConfig(\n", " response_mime_type=\"application/json\",\n", " response_schema=Recipe,\n", " ),\n", ")\n", "\n", "print(response.text)" ] }, { "cell_type": "markdown", "metadata": { "id": "nKai5CP_PGQF" }, "source": [ "You can either parse the response string as JSON, or use the `parsed` field to get the response as an object or dictionary." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ZeyDWbnxO-on" }, "outputs": [], "source": [ "parsed_response: Recipe = response.parsed\n", "print(parsed_response)" ] }, { "cell_type": "markdown", "metadata": { "id": "SUSLPrvlvXOc" }, "source": [ "You also can define a response schema in a Python dictionary. You can only use the supported fields as listed below. All other fields are ignored.\n", "\n", "- `enum`\n", "- `items`\n", "- `maxItems`\n", "- `nullable`\n", "- `properties`\n", "- `required`\n", "\n", "In this example, you instruct the model to analyze product review data, extract key entities, perform sentiment classification (multiple choices), provide additional explanation, and output the results in JSON format.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "F7duWOq3vMmS" }, "outputs": [], "source": [ "response_schema = {\n", " \"type\": \"ARRAY\",\n", " \"items\": {\n", " \"type\": \"ARRAY\",\n", " \"items\": {\n", " \"type\": \"OBJECT\",\n", " \"properties\": {\n", " \"rating\": {\"type\": \"INTEGER\"},\n", " \"flavor\": {\"type\": \"STRING\"},\n", " \"sentiment\": {\n", " \"type\": \"STRING\",\n", " \"enum\": [\"POSITIVE\", \"NEGATIVE\", \"NEUTRAL\"],\n", " },\n", " \"explanation\": {\"type\": \"STRING\"},\n", " },\n", " \"required\": [\"rating\", \"flavor\", \"sentiment\", \"explanation\"],\n", " },\n", " },\n", "}\n", "\n", "prompt = \"\"\"\n", " Analyze the following product reviews, output the sentiment classification, and give an explanation.\n", "\n", " - \"Absolutely loved it! Best ice cream I've ever had.\" Rating: 4, Flavor: Strawberry Cheesecake\n", " - \"Quite good, but a bit too sweet for my taste.\" Rating: 1, Flavor: Mango Tango\n", "\"\"\"\n", "\n", "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=prompt,\n", " config=GenerateContentConfig(\n", " response_mime_type=\"application/json\",\n", " response_schema=response_schema,\n", " ),\n", ")\n", "\n", "response_dict = response.parsed\n", "print(response_dict)" ] }, { "cell_type": "markdown", "metadata": { "id": "gV1dR-QlTKRs" }, "source": [ "## Count tokens and compute tokens\n", "\n", "You can use the `count_tokens()` method to calculate the number of input tokens before sending a request to the Gemini API.\n", "\n", "For more information, refer to [list and count tokens](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/list-token)\n" ] }, { "cell_type": "markdown", "metadata": { "id": "Syx-fwLkV1j-" }, "source": [ "### Count tokens" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "UhNElguLRRNK" }, "outputs": [], "source": [ "response = client.models.count_tokens(\n", " model=MODEL_ID,\n", " contents=\"What's the highest mountain in Africa?\",\n", ")\n", "\n", "print(response)" ] }, { "cell_type": "markdown", "metadata": { "id": "_BsP0vXOY7hg" }, "source": [ "## Search as a tool (Grounding)\n", "\n", "[Grounding](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/ground-gemini) lets you connect real-world data to the Gemini model.\n", "\n", "By grounding model responses in Google Search results, the model can access information at runtime that goes beyond its training data which can produce more accurate, up-to-date, and relevant responses.\n", "\n", "Using Grounding with Google Search, you can improve the accuracy and recency of responses from the model. Starting with Gemini 2.0, Google Search is available as a tool. This means that the model can decide when to use Google Search.\n", "\n", "For more examples of Grounding, refer to [this notebook](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/grounding/intro-grounding-gemini.ipynb)." ] }, { "cell_type": "markdown", "metadata": { "id": "4_M_4RRBdO_3" }, "source": [ "### Google Search\n", "\n", "You can add the `tools` keyword argument with a `Tool` including `GoogleSearch` to instruct Gemini to first perform a Google Search with the prompt, then construct an answer based on the web search results.\n", "\n", "[Dynamic Retrieval](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/ground-gemini#dynamic-retrieval) lets you set a threshold for when grounding is used for model responses. This is useful when the prompt doesn't require an answer grounded in Google Search and the supported models can provide an answer based on their knowledge without grounding. This helps you manage latency, quality, and cost more effectively." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "yeR09J3AZT4U" }, "outputs": [], "source": [ "google_search_tool = Tool(google_search=GoogleSearch())\n", "\n", "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"What is the current temperature in Austin, TX?\",\n", " config=GenerateContentConfig(tools=[google_search_tool]),\n", ")\n", "\n", "display(Markdown(response.text))\n", "\n", "print(response.candidates[0].grounding_metadata)\n", "\n", "HTML(response.candidates[0].grounding_metadata.search_entry_point.rendered_content)" ] }, { "cell_type": "markdown", "metadata": { "id": "T0pb-Kh1xEHU" }, "source": [ "## Function calling\n", "\n", "[Function Calling](https://cloud.google.com/vertex-ai/docs/generative-ai/multimodal/function-calling) in Gemini lets developers create a description of a function in their code, then pass that description to a language model in a request.\n", "\n", "You can submit a Python function for automatic function calling, which will run the function and return the output in natural language generated by Gemini.\n", "\n", "You can also submit an [OpenAPI Specification](https://www.openapis.org/) which will respond with the name of a function that matches the description and the arguments to call it with.\n", "\n", "For more examples of Function calling with Gemini, check out this notebook: [Intro to Function Calling with Gemini](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/function-calling/intro_function_calling.ipynb)" ] }, { "cell_type": "markdown", "metadata": { "id": "mSUWWlrrlR-D" }, "source": [ "### Python Function (Automatic Function Calling)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "aRR8HZhLlR-E" }, "outputs": [], "source": [ "def get_current_weather(location: str) -> str:\n", " \"\"\"Example method. Returns the current weather.\n", "\n", " Args:\n", " location: The city and state, e.g. San Francisco, CA\n", " \"\"\"\n", " weather_map: dict[str, str] = {\n", " \"Boston, MA\": \"snowing\",\n", " \"San Francisco, CA\": \"foggy\",\n", " \"Seattle, WA\": \"raining\",\n", " \"Austin, TX\": \"hot\",\n", " \"Chicago, IL\": \"windy\",\n", " }\n", " return weather_map.get(location, \"unknown\")\n", "\n", "\n", "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"What is the weather like in San Francisco?\",\n", " config=GenerateContentConfig(\n", " tools=[get_current_weather],\n", " temperature=0,\n", " ),\n", ")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "h4syyLEClGcn" }, "source": [ "### OpenAPI Specification (Manual Function Calling)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "2BDQPwgcxRN3" }, "outputs": [], "source": [ "get_destination = FunctionDeclaration(\n", " name=\"get_destination\",\n", " description=\"Get the destination that the user wants to go to\",\n", " parameters={\n", " \"type\": \"OBJECT\",\n", " \"properties\": {\n", " \"destination\": {\n", " \"type\": \"STRING\",\n", " \"description\": \"Destination that the user wants to go to\",\n", " },\n", " },\n", " },\n", ")\n", "\n", "destination_tool = Tool(\n", " function_declarations=[get_destination],\n", ")\n", "\n", "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"I'd like to travel to Paris.\",\n", " config=GenerateContentConfig(\n", " tools=[destination_tool],\n", " temperature=0,\n", " ),\n", ")\n", "\n", "print(response.function_calls[0])" ] }, { "cell_type": "markdown", "metadata": { "id": "MhDs2X3o0neK" }, "source": [ "## Code Execution\n", "\n", "The Gemini API [code execution](https://ai.google.dev/gemini-api/docs/code-execution?lang=python) feature enables the model to generate and run Python code and learn iteratively from the results until it arrives at a final output. You can use this code execution capability to build applications that benefit from code-based reasoning and that produce text output. For example, you could use code execution in an application that solves equations or processes text.\n", "\n", "The Gemini API provides code execution as a tool, similar to function calling.\n", "After you add code execution as a tool, the model decides when to use it.\n", "\n", "For more examples of Code Execution, refer to [this notebook](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/code-execution/intro_code_execution.ipynb)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "1W-3c7sy0nyz" }, "outputs": [], "source": [ "code_execution_tool = Tool(code_execution=ToolCodeExecution())\n", "\n", "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"Calculate 20th fibonacci number. Then find the nearest palindrome to it.\",\n", " config=GenerateContentConfig(\n", " tools=[code_execution_tool],\n", " temperature=0,\n", " ),\n", ")\n", "\n", "display(\n", " Markdown(\n", " f\"\"\"\n", "## Code\n", "\n", "```py\n", "{response.executable_code}\n", "```\n", "\n", "### Output\n", "\n", "```\n", "{response.code_execution_result}\n", "```\n", "\"\"\"\n", " )\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "d5b5adb2eb70" }, "source": [ "## Thinking mode examples\n", "\n", "The following examples are some complex tasks that require multiple rounds of strategizing and iteratively solving.\n", "\n", "### **Example 1**: Code generation\n", "\n", "Gemini 2.5 Pro excels at creating visually compelling web apps and agentic code applications, along with code transformation and editing.\n", "\n", "Let's see how the model uses its reasoning capabilities to create a video game, using executable code from a single line prompt. See the example game [here](https://www.youtube.com/watch?v=RLCBSpgos6s)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "5f120dff0d16" }, "outputs": [], "source": [ "prompt = \"\"\"\n", " Make me a captivating endless runner game. Key instructions on the screen. p5js scene, no HTML. \n", " I like pixelated dinosaurs and interesting backgrounds.\n", "\"\"\"\n", "\n", "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=prompt,\n", ")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "ecf22b47bdc3" }, "source": [ "### **Example 2**: Multimodal reasoning (Geometry)\n", "\n", "This geometry problem requires complex reasoning and is also using multimodal capabilities to reason across text and image." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "60260c0ac118" }, "outputs": [], "source": [ "image_file_url = (\n", " \"https://storage.googleapis.com/generativeai-downloads/images/geometry.png\"\n", ")\n", "display(Image(url=image_file_url, width=400))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "c972334f62ff" }, "outputs": [], "source": [ "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=[\n", " Part.from_uri(file_uri=image_file_url, mime_type=\"image/png\"),\n", " \"What's the area of the overlapping region?\",\n", " ],\n", ")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "52656e92cd69" }, "source": [ "### **Example 3**: Math and problem solving\n", "\n", "Here's another brain teaser based on an image, this time it looks like a mathematical problem, but it cannot actually be solved mathematically. If you check the thoughts of the model you'll see that it will realize it and come up with an out-of-the-box solution." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "d46387bdc9e6" }, "outputs": [], "source": [ "image_file_url = \"https://storage.googleapis.com/generativeai-downloads/images/pool.png\"\n", "display(Image(url=image_file_url, width=400))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "46b694793eb0" }, "outputs": [], "source": [ "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=[\n", " Part.from_uri(file_uri=image_file_url, mime_type=\"image/png\"),\n", " \"How do I use three of the pool balls to sum up to 30?\",\n", " ],\n", ")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "eQwiONFdVHw5" }, "source": [ "## What's next\n", "\n", "- See the [Google Gen AI SDK reference docs](https://googleapis.github.io/python-genai/).\n", "- Explore other notebooks in the [Google Cloud Generative AI GitHub repository](https://github.com/GoogleCloudPlatform/generative-ai).\n", "- Explore AI models in [Model Garden](https://cloud.google.com/vertex-ai/generative-ai/docs/model-garden/explore-models)." ] } ], "metadata": { "colab": { "collapsed_sections": [ "hIJVEr0RQY8S", "rZV2TY5Pa3Dd", "hYKAzG1sH-K1", "mSUWWlrrlR-D", "h4syyLEClGcn" ], "name": "intro_gemini_2_5_pro.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }