gemini/getting-started/intro_gemini

{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "id": "bCIMTPB1WoTq" }, "outputs": [], "source": [ "# Copyright 2024 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "7yVV6txOmNMn" }, "source": [ "# Getting started with Gemini using Vertex AI in Express Mode\n", "\n", "<table align=\"left\">\n", " <td style=\"text-align: center\">\n", " <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_express.ipynb\">\n", " <img src=\"https://cloud.google.com/ml-engine/images/colab-logo-32px.png\" alt=\"Google Colaboratory logo\"><br> Open in Colab\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_express.ipynb\">\n", " <img src=\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt=\"GitHub logo\"><br> View on GitHub\n", " </a>\n", " </td>\n", "</table>\n", "\n", "<div style=\"clear: both;\"></div>\n", "\n", "<b>Share to:</b>\n", "\n", "<a href=\"https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_express.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg\" alt=\"LinkedIn logo\">\n", "</a>\n", "\n", "<a href=\"https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_express.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg\" alt=\"Bluesky logo\">\n", "</a>\n", "\n", "<a href=\"https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_express.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg\" alt=\"X logo\">\n", "</a>\n", "\n", "<a href=\"https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_express.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png\" alt=\"Reddit logo\">\n", "</a>\n", "\n", "<a href=\"https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_express.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg\" alt=\"Facebook logo\">\n", "</a>" ] }, { "cell_type": "markdown", "metadata": { "id": "1EExYZvij2ve" }, "source": [ "| Author |\n", "| --- |\n", "| [Holt Skinner](https://github.com/holtskinner) |" ] }, { "cell_type": "markdown", "metadata": { "id": "t1DnOs6rkbOy" }, "source": [ "## Overview\n", "\n", "**YouTube Video: Introduction to Gemini on Vertex AI**\n", "\n", "<a href=\"https://www.youtube.com/watch?v=YfiLUpNejpE&list=PLIivdWyY5sqJio2yeg1dlfILOUO2FoFRx\" target=\"_blank\">\n", " <img src=\"https://img.youtube.com/vi/YfiLUpNejpE/maxresdefault.jpg\" alt=\"Introduction to Gemini on Vertex AI\" width=\"500\">\n", "</a>\n", "\n", "[Vertex AI in express mode](https://cloud.google.com/vertex-ai/generative-ai/docs/start/express-mode/overview) lets developers quickly try out core generative AI features that are available on Vertex AI.\n", "\n", "- Send API requests to the Gemini API in Vertex AI:\n", " - Non-streaming request\n", " - Streaming request\n", " - Function calling request" ] }, { "cell_type": "markdown", "metadata": { "id": "61RBz8LLbxCR" }, "source": [ "## Getting Started" ] }, { "cell_type": "markdown", "metadata": { "id": "No17Cw5hgx12" }, "source": [ "### Install Google Gen AI SDK for Python\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "tFy3H3aPgx12" }, "outputs": [], "source": [ "%pip install --upgrade --quiet google-genai" ] }, { "cell_type": "markdown", "metadata": { "id": "DF4l8DTdWgPY" }, "source": [ "### Get an API Key and create client\n", "\n", "Refer to the [documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/start/express-mode/overview#eligibility) for how to create an API Key for Vertex AI." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Nqwi-5ufWp_B" }, "outputs": [], "source": [ "import os\n", "\n", "API_KEY = \"[your-api-key]\" # @param {type: \"string\", placeholder: \"[your-api-key]\", isTemplate: true}\n", "\n", "if not API_KEY or API_KEY == \"[your-api-key]\":\n", " API_KEY = os.environ.get(\"GOOGLE_API_KEY\")\n", " if not API_KEY:\n", " raise Exception(\"You must provide an API key to use Vertex AI in express mode.\")\n", "\n", "from google import genai\n", "\n", "client = genai.Client(vertexai=True, api_key=API_KEY)\n", "\n", "if not client._api_client.vertexai:\n", " print(f\"Using Gemini Developer API.\")\n", "elif client._api_client.api_key:\n", " print(\n", " f\"Using Vertex AI in express mode with API key: {client._api_client.api_key[:5]}...{client._api_client.api_key[-5:]}\"\n", " )\n", "elif client._api_client.project:\n", " print(\n", " f\"Using Vertex AI with project: {client._api_client.project} in location: {client._api_client.location}\"\n", " )" ] }, { "cell_type": "markdown", "metadata": { "id": "jXHfaVS66_01" }, "source": [ "### Import libraries\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "lslYAvw37JGQ" }, "outputs": [], "source": [ "from IPython.display import HTML, Markdown, display\n", "from google.genai.types import (\n", " FunctionDeclaration,\n", " GenerateContentConfig,\n", " GoogleSearch,\n", " HarmBlockThreshold,\n", " HarmCategory,\n", " MediaResolution,\n", " Part,\n", " SafetySetting,\n", " Tool,\n", " ToolCodeExecution,\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "BY1nfXrqRxVX" }, "source": [ "### Load the Gemini 2.0 Flash model\n", "\n", "To learn more about all [Gemini API models on Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models#gemini-models).\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "U7ExWmuLBdIA" }, "outputs": [], "source": [ "MODEL_ID = \"gemini-2.0-flash-001\" # @param {type: \"string\"}" ] }, { "cell_type": "markdown", "metadata": { "id": "l9OKM0-4SQf8" }, "source": [ "## Gen AI SDK basic usage\n", "\n", "Below is a simple example that demonstrates how to prompt the Gemini using the Gen AI SDK. Learn more about the [Gemini API parameters](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/gemini).\n", "\n", "You can send either streaming or non-streaming requests to the API.\n", "\n", "- Streaming requests return the response in chunks as the request is being processed.\n", " - To a human user, streamed responses reduce the perception of latency. \n", "- Non-streaming requests return the response in one chunk after the request is processed." ] }, { "cell_type": "markdown", "metadata": { "id": "37CH91ddY9kG" }, "source": [ "### Generate text from text prompts\n", "\n", "Use the `generate_content()` method to generate responses to your prompts.\n", "\n", "You can pass text to `generate_content()`, and use the `.text` property to get the text content of the response.\n", "\n", "By default, Gemini outputs formatted text using [Markdown](https://daringfireball.net/projects/markdown/) syntax." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "xRJuHj0KZ8xz" }, "outputs": [], "source": [ "response = client.models.generate_content(\n", " model=MODEL_ID, contents=\"What's the largest planet in our solar system?\"\n", ")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "f45d58f79eeb" }, "source": [ "#### Example prompts\n", "\n", "- What are the biggest challenges facing the healthcare industry?\n", "- What are the latest developments in the automotive industry?\n", "- What are the biggest opportunities in retail industry?\n", "- (Try your own prompts!)\n", "\n", "For more examples of prompt engineering, refer to [this notebook](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/prompts/intro_prompt_design.ipynb)." ] }, { "cell_type": "markdown", "metadata": { "id": "6lLIxqS6_-l8" }, "source": [ "### Generate content stream\n", "\n", "By default, the model returns a response after completing the entire generation process. You can also use the `generate_content_stream` method to stream the response as it is being generated, and the model will return chunks of the response as soon as they are generated." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ZiwWBhXsAMnv" }, "outputs": [], "source": [ "output_text = \"\"\n", "markdown_display_area = display(Markdown(output_text), display_id=True)\n", "\n", "for chunk in client.models.generate_content_stream(\n", " model=MODEL_ID,\n", " contents=\"Tell me a story about a lonely robot who finds friendship in a most unexpected place.\",\n", "):\n", " output_text += chunk.text\n", " markdown_display_area.update(Markdown(output_text))" ] }, { "cell_type": "markdown", "metadata": { "id": "29jFnHZZWXd7" }, "source": [ "### Start a multi-turn chat\n", "\n", "The Gemini API supports freeform multi-turn conversations across multiple turns with back-and-forth interactions.\n", "\n", "The context of the conversation is preserved between messages." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "DbM12JaLWjiF" }, "outputs": [], "source": [ "chat = client.chats.create(model=MODEL_ID)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "JQem1halYDBW" }, "outputs": [], "source": [ "response = chat.send_message(\"Write a function that checks if a year is a leap year.\")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "vUJR4Pno-LGK" }, "source": [ "This follow-up prompt shows how the model responds based on the previous prompt:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "6Fn69TurZ9DB" }, "outputs": [], "source": [ "response = chat.send_message(\"Write a unit test of the generated function.\")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "arLJE4wOuhh6" }, "source": [ "### Send asynchronous requests\n", "\n", "`client.aio` exposes all analogous [async](https://docs.python.org/3/library/asyncio.html) methods that are available on `client`.\n", "\n", "For example, `client.aio.models.generate_content` is the async version of `client.models.generate_content`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "gSReaLazs-dP" }, "outputs": [], "source": [ "response = await client.aio.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"Compose a song about the adventures of a time-traveling squirrel.\",\n", ")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "hIJVEr0RQY8S" }, "source": [ "## Configure model parameters\n", "\n", "You can include parameter values in each call that you send to a model to control how the model generates a response. The model can generate different results for different parameter values. You can experiment with different model parameters to see how the results change.\n", "\n", "- Learn more about [experimenting with parameter values](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/adjust-parameter-values).\n", "\n", "- See a list of all [Gemini API parameters](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference#parameters).\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "d9NXP5N2Pmfo" }, "outputs": [], "source": [ "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"Tell me how the internet works, but pretend I'm a puppy who only understands squeaky toys.\",\n", " config=GenerateContentConfig(\n", " temperature=0.4,\n", " top_p=0.95,\n", " top_k=20,\n", " candidate_count=1,\n", " seed=5,\n", " max_output_tokens=100,\n", " stop_sequences=[\"STOP!\"],\n", " presence_penalty=0.0,\n", " frequency_penalty=0.0,\n", " ),\n", ")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "El1lx8P9ElDq" }, "source": [ "## Set system instructions\n", "\n", "[System instructions](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/system-instruction-introduction) allow you to steer the behavior of the model. By setting the system instruction, you are giving the model additional context to understand the task, provide more customized responses, and adhere to guidelines over the user interaction." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "7A-yANiyCLaO" }, "outputs": [], "source": [ "system_instruction = \"\"\"\n", " You are a helpful language translator.\n", " Your mission is to translate text in English to Spanish.\n", "\"\"\"\n", "\n", "prompt = \"\"\"\n", " User input: I like bagels.\n", " Answer:\n", "\"\"\"\n", "\n", "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=prompt,\n", " config=GenerateContentConfig(\n", " system_instruction=system_instruction,\n", " ),\n", ")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "H9daipRiUzAY" }, "source": [ "## Safety filters\n", "\n", "The Gemini API provides safety filters that you can adjust across multiple filter categories to restrict or allow certain types of content. You can use these filters to adjust what's appropriate for your use case. See the [Configure safety filters](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/configure-safety-filters) page for details.\n", "\n", "When you make a request to Gemini, the content is analyzed and assigned a safety rating. You can inspect the safety ratings of the generated content by printing out the model responses.\n", "\n", "The safety settings are `OFF` by default and the default block thresholds are `BLOCK_NONE`.\n", "\n", "For more examples of safety filters, refer to [this notebook](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/responsible-ai/gemini_safety_ratings.ipynb).\n", "\n", "You can use `safety_settings` to adjust the safety settings for each request you make to the API. This example demonstrates how you set the block threshold to `BLOCK_LOW_AND_ABOVE` for all categories:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "yPlDRaloU59b" }, "outputs": [], "source": [ "system_instruction = \"Be as mean and evil as possible. Use profane language and insults. Be very rude and disrespectful.\"\n", "\n", "prompt = \"\"\"\n", " Write a list of 5 disrespectful things that I might say to the universe after stubbing my toe in the dark.\n", "\"\"\"\n", "\n", "safety_settings = [\n", " SafetySetting(\n", " category=HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,\n", " threshold=HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,\n", " ),\n", " SafetySetting(\n", " category=HarmCategory.HARM_CATEGORY_HARASSMENT,\n", " threshold=HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,\n", " ),\n", " SafetySetting(\n", " category=HarmCategory.HARM_CATEGORY_HATE_SPEECH,\n", " threshold=HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,\n", " ),\n", " SafetySetting(\n", " category=HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,\n", " threshold=HarmBlockThreshold.BLOCK_LOW_AND_ABOVE,\n", " ),\n", "]\n", "\n", "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=prompt,\n", " config=GenerateContentConfig(\n", " system_instruction=system_instruction,\n", " safety_settings=safety_settings,\n", " ),\n", ")\n", "\n", "# Response will be `None` if it is blocked.\n", "print(response.text)\n", "# Finish Reason will be `SAFETY` if it is blocked.\n", "print(response.candidates[0].finish_reason)\n", "# Safety Ratings show the levels for each filter.\n", "for safety_rating in response.candidates[0].safety_ratings:\n", " print(safety_rating)" ] }, { "cell_type": "markdown", "metadata": { "id": "rZV2TY5Pa3Dd" }, "source": [ "## Send multimodal prompts\n", "\n", "Gemini is a multimodal model that supports multimodal prompts.\n", "\n", "You can include any of the following data types from various sources.\n", "\n", "<table>\n", " <thead>\n", " <tr>\n", " <th>Data type</th>\n", " <th>Source(s)</th>\n", " <th>MIME Type(s)</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <td>Text</td>\n", " <td>Inline, Local File, General URL, Google Cloud Storage</td>\n", " <td><code>text/plain</code> <code>text/html</code></td>\n", " </tr>\n", " <tr>\n", " <td>Code</td>\n", " <td>Inline, Local File, General URL, Google Cloud Storage</td>\n", " <td><code>text/plain</code></td>\n", " </tr>\n", " <tr>\n", " <td>Document</td>\n", " <td>Local File, General URL, Google Cloud Storage</td>\n", " <td><code>application/pdf</code></td>\n", " </tr>\n", " <tr>\n", " <td>Image</td>\n", " <td>Local File, General URL, Google Cloud Storage</td>\n", " <td><code>image/jpeg</code> <code>image/png</code> <code>image/webp</code></td>\n", " </tr>\n", " <tr>\n", " <td>Audio</td>\n", " <td>Local File, General URL, Google Cloud Storage</td>\n", " <td>\n", " <code>audio/aac</code> <code>audio/flac</code> <code>audio/mp3</code>\n", " <code>audio/m4a</code> <code>audio/mpeg</code> <code>audio/mpga</code>\n", " <code>audio/mp4</code> <code>audio/opus</code> <code>audio/pcm</code>\n", " <code>audio/wav</code> <code>audio/webm</code>\n", " </td>\n", " </tr>\n", " <tr>\n", " <td>Video</td>\n", " <td>Local File, General URL, Google Cloud Storage, YouTube</td>\n", " <td>\n", " <code>video/mp4</code> <code>video/mpeg</code> <code>video/x-flv</code>\n", " <code>video/quicktime</code> <code>video/mpegps</code> <code>video/mpg</code>\n", " <code>video/webm</code> <code>video/wmv</code> <code>video/3gpp</code>\n", " </td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "\n", "Set `config.media_resolution` to optimize for speed or quality. Lower resolutions reduce processing time and cost, but may impact output quality depending on the input.\n", "\n", "For more examples of multimodal use cases, refer to [this notebook](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/intro_multimodal_use_cases.ipynb)." ] }, { "cell_type": "markdown", "metadata": { "id": "w4npg1tNTYB9" }, "source": [ "### Send local image\n", "\n", "Download an image to local storage from Google Cloud Storage.\n", "\n", "For this example, we'll use this image of a meal.\n", "\n", "<img src=\"https://storage.googleapis.com/cloud-samples-data/generative-ai/image/meal.png\" alt=\"Meal\" width=\"500\">" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "4avkv0Z7qUI-" }, "outputs": [], "source": [ "!wget https://storage.googleapis.com/cloud-samples-data/generative-ai/image/meal.png" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "umhZ61lrSyJh" }, "outputs": [], "source": [ "with open(\"meal.png\", \"rb\") as f:\n", " image = f.read()\n", "\n", "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=[\n", " Part.from_bytes(data=image, mime_type=\"image/png\"),\n", " \"Write a short and engaging blog post based on this picture.\",\n", " ],\n", " # Optional: Use the `media_resolution` parameter to specify the resolution of the input media.\n", " config=GenerateContentConfig(\n", " media_resolution=MediaResolution.MEDIA_RESOLUTION_LOW,\n", " ),\n", ")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "iRQyv1DhTbnH" }, "source": [ "### Send document from Google Cloud Storage\n", "\n", "This example document is the paper [\"Attention is All You Need\"](https://arxiv.org/abs/1706.03762), created by researchers from Google and the University of Toronto.\n", "\n", "Check out this notebook for more examples of document understanding with Gemini:\n", "\n", "- [Document Processing with Gemini](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/document-processing/document_processing.ipynb)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "pG6l1Fuka6ZJ" }, "outputs": [], "source": [ "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=[\n", " Part.from_uri(\n", " file_uri=\"https://storage.googleapis.com/cloud-samples-data/generative-ai/pdf/1706.03762v7.pdf\",\n", " mime_type=\"application/pdf\",\n", " ),\n", " \"Summarize the document.\",\n", " ],\n", ")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "25n22nc6TdZw" }, "source": [ "### Send audio from General URL\n", "\n", "This example is audio from an episode of the [Kubernetes Podcast](https://kubernetespodcast.com/)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "uVU9XyCCo-h2" }, "outputs": [], "source": [ "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=[\n", " Part.from_uri(\n", " file_uri=\"https://traffic.libsyn.com/secure/e780d51f-f115-44a6-8252-aed9216bb521/KPOD242.mp3\",\n", " mime_type=\"audio/mpeg\",\n", " ),\n", " \"Write a summary of this podcast episode.\",\n", " ],\n", " config=GenerateContentConfig(audio_timestamp=True),\n", ")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "8D3_oNUTuW2q" }, "source": [ "### Send video from YouTube URL\n", "\n", "This example is the YouTube video [Google — 25 Years in Search: The Most Searched](https://www.youtube.com/watch?v=3KtWfp0UopM).\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "l7-w8G_2wAOw" }, "outputs": [], "source": [ "video = Part.from_uri(\n", " file_uri=\"https://www.youtube.com/watch?v=3KtWfp0UopM\",\n", " mime_type=\"video/mp4\",\n", ")\n", "\n", "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=[\n", " video,\n", " \"At what point in the video is Harry Potter shown?\",\n", " ],\n", ")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "df8013cfa7f7" }, "source": [ "### Send web page\n", "\n", "This example is from the [Generative AI on Vertex AI documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/overview).\n", "\n", "**NOTE:** The URL must be publicly accessible." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "337793322c91" }, "outputs": [], "source": [ "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=[\n", " Part.from_uri(\n", " file_uri=\"https://cloud.google.com/vertex-ai/generative-ai/docs/overview\",\n", " mime_type=\"text/html\",\n", " ),\n", " \"Write a summary of this documentation.\",\n", " ],\n", ")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "rVlo0mWuZGkQ" }, "source": [ "## Control generated output\n", "\n", "[Controlled generation](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/control-generated-output) allows you to define a response schema to specify the structure of a model's output, the field names, and the expected data type for each field.\n", "\n", "The response schema is specified in the `response_schema` parameter in `config`, and the model output will strictly follow that schema.\n", "\n", "You can provide the schemas as [Pydantic](https://docs.pydantic.dev/) models or a [JSON](https://www.json.org/json-en.html) string and the model will respond as JSON or an [Enum](https://docs.python.org/3/library/enum.html) depending on the value set in `response_mime_type`.\n", "\n", "For more examples of controlled generation, refer to [this notebook](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/controlled-generation/intro_controlled_generation.ipynb)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "OjSgf2cDN_bG" }, "outputs": [], "source": [ "from pydantic import BaseModel\n", "\n", "\n", "class Recipe(BaseModel):\n", " name: str\n", " description: str\n", " ingredients: list[str]\n", "\n", "\n", "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"List a few popular cookie recipes and their ingredients.\",\n", " config=GenerateContentConfig(\n", " response_mime_type=\"application/json\",\n", " response_schema=Recipe,\n", " ),\n", ")\n", "\n", "print(response.text)" ] }, { "cell_type": "markdown", "metadata": { "id": "nKai5CP_PGQF" }, "source": [ "You can either parse the response string as JSON, or use the `parsed` field to get the response as an object or dictionary." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ZeyDWbnxO-on" }, "outputs": [], "source": [ "parsed_response: Recipe = response.parsed\n", "print(parsed_response)" ] }, { "cell_type": "markdown", "metadata": { "id": "SUSLPrvlvXOc" }, "source": [ "You also can define a response schema in a Python dictionary. You can only use the supported fields as listed below. All other fields are ignored.\n", "\n", "- `enum`\n", "- `items`\n", "- `maxItems`\n", "- `nullable`\n", "- `properties`\n", "- `required`\n", "\n", "In this example, you instruct the model to analyze product review data, extract key entities, perform sentiment classification (multiple choices), provide additional explanation, and output the results in JSON format.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "F7duWOq3vMmS" }, "outputs": [], "source": [ "response_schema = {\n", " \"type\": \"ARRAY\",\n", " \"items\": {\n", " \"type\": \"ARRAY\",\n", " \"items\": {\n", " \"type\": \"OBJECT\",\n", " \"properties\": {\n", " \"rating\": {\"type\": \"INTEGER\"},\n", " \"flavor\": {\"type\": \"STRING\"},\n", " \"sentiment\": {\n", " \"type\": \"STRING\",\n", " \"enum\": [\"POSITIVE\", \"NEGATIVE\", \"NEUTRAL\"],\n", " },\n", " \"explanation\": {\"type\": \"STRING\"},\n", " },\n", " \"required\": [\"rating\", \"flavor\", \"sentiment\", \"explanation\"],\n", " },\n", " },\n", "}\n", "\n", "prompt = \"\"\"\n", " Analyze the following product reviews, output the sentiment classification, and give an explanation.\n", "\n", " - \"Absolutely loved it! Best ice cream I've ever had.\" Rating: 4, Flavor: Strawberry Cheesecake\n", " - \"Quite good, but a bit too sweet for my taste.\" Rating: 1, Flavor: Mango Tango\n", "\"\"\"\n", "\n", "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=prompt,\n", " config=GenerateContentConfig(\n", " response_mime_type=\"application/json\",\n", " response_schema=response_schema,\n", " ),\n", ")\n", "\n", "response_dict = response.parsed\n", "print(response_dict)" ] }, { "cell_type": "markdown", "metadata": { "id": "gV1dR-QlTKRs" }, "source": [ "## Count tokens and compute tokens\n", "\n", "You can use the `count_tokens()` method to calculate the number of input tokens before sending a request to the Gemini API.\n", "\n", "For more information, refer to [list and count tokens](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/list-token)\n" ] }, { "cell_type": "markdown", "metadata": { "id": "Syx-fwLkV1j-" }, "source": [ "### Count tokens" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "UhNElguLRRNK" }, "outputs": [], "source": [ "response = client.models.count_tokens(\n", " model=MODEL_ID,\n", " contents=\"What's the highest mountain in Africa?\",\n", ")\n", "\n", "print(response)" ] }, { "cell_type": "markdown", "metadata": { "id": "VS-AP7AHUQmV" }, "source": [ "### Compute tokens\n", "\n", "The `compute_tokens()` method runs a local tokenizer instead of making an API call. It also provides more detailed token information such as the `token_ids` and the `tokens` themselves\n", "\n", "<div class=\"alert alert-block alert-info\">\n", "<b>NOTE: This method is only supported in Vertex AI.</b>\n", "</div>" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Cdhi5AX1TuH0" }, "outputs": [], "source": [ "response = client.models.compute_tokens(\n", " model=MODEL_ID,\n", " contents=\"What's the longest word in the English language?\",\n", ")\n", "\n", "print(response)" ] }, { "cell_type": "markdown", "metadata": { "id": "_BsP0vXOY7hg" }, "source": [ "## Search as a tool (Grounding)\n", "\n", "[Grounding](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/ground-gemini) lets you connect real-world data to the Gemini model.\n", "\n", "By grounding model responses in Google Search results, the model can access information at runtime that goes beyond its training data which can produce more accurate, up-to-date, and relevant responses.\n", "\n", "Using Grounding with Google Search, you can improve the accuracy and recency of responses from the model. Starting with Gemini 2.0, Google Search is available as a tool. This means that the model can decide when to use Google Search.\n", "\n", "For more examples of Grounding, refer to [this notebook](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/grounding/intro-grounding-gemini.ipynb)." ] }, { "cell_type": "markdown", "metadata": { "id": "4_M_4RRBdO_3" }, "source": [ "### Google Search\n", "\n", "You can add the `tools` keyword argument with a `Tool` including `GoogleSearch` to instruct Gemini to first perform a Google Search with the prompt, then construct an answer based on the web search results.\n", "\n", "[Dynamic Retrieval](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/ground-gemini#dynamic-retrieval) lets you set a threshold for when grounding is used for model responses. This is useful when the prompt doesn't require an answer grounded in Google Search and the supported models can provide an answer based on their knowledge without grounding. This helps you manage latency, quality, and cost more effectively." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "yeR09J3AZT4U" }, "outputs": [], "source": [ "google_search_tool = Tool(google_search=GoogleSearch())\n", "\n", "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"When is the next total solar eclipse in the United States?\",\n", " config=GenerateContentConfig(tools=[google_search_tool]),\n", ")\n", "\n", "display(Markdown(response.text))\n", "\n", "print(response.candidates[0].grounding_metadata)\n", "\n", "HTML(response.candidates[0].grounding_metadata.search_entry_point.rendered_content)" ] }, { "cell_type": "markdown", "metadata": { "id": "735e061b780e" }, "source": [ "## Function calling\n", "\n", "[Function Calling](https://cloud.google.com/vertex-ai/docs/generative-ai/multimodal/function-calling) in Gemini lets developers create a description of a function in their code, then pass that description to a language model in a request.\n", "\n", "You can submit a Python function for automatic function calling, which will run the function and return the output in natural language generated by Gemini.\n", "\n", "You can also submit an [OpenAPI Specification](https://www.openapis.org/) which will respond with the name of a function that matches the description and the arguments to call it with.\n", "\n", "For more examples of Function calling with Gemini, check out this notebook: [Intro to Function Calling with Gemini](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/function-calling/intro_function_calling.ipynb)" ] }, { "cell_type": "markdown", "metadata": { "id": "mSUWWlrrlR-D" }, "source": [ "### Python Function (Automatic Function Calling)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "89db04dbdb53" }, "outputs": [], "source": [ "def get_current_weather(location: str) -> str:\n", " \"\"\"Example method. Returns the current weather.\n", "\n", " Args:\n", " location: The city and state, e.g. San Francisco, CA\n", " \"\"\"\n", " weather_map: dict[str, str] = {\n", " \"Boston, MA\": \"snowing\",\n", " \"San Francisco, CA\": \"foggy\",\n", " \"Seattle, WA\": \"raining\",\n", " \"Austin, TX\": \"hot\",\n", " \"Chicago, IL\": \"windy\",\n", " }\n", " return weather_map.get(location, \"unknown\")\n", "\n", "\n", "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"What is the weather like in Austin?\",\n", " config=GenerateContentConfig(\n", " tools=[get_current_weather],\n", " temperature=0,\n", " ),\n", ")\n", "\n", "display(Markdown(response.text))" ] }, { "cell_type": "markdown", "metadata": { "id": "h4syyLEClGcn" }, "source": [ "### OpenAPI Specification (Manual Function Calling)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "2BDQPwgcxRN3" }, "outputs": [], "source": [ "get_destination = FunctionDeclaration(\n", " name=\"get_destination\",\n", " description=\"Get the destination that the user wants to go to\",\n", " parameters={\n", " \"type\": \"OBJECT\",\n", " \"properties\": {\n", " \"destination\": {\n", " \"type\": \"STRING\",\n", " \"description\": \"Destination that the user wants to go to\",\n", " },\n", " },\n", " },\n", ")\n", "\n", "destination_tool = Tool(\n", " function_declarations=[get_destination],\n", ")\n", "\n", "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"I'd like to travel to Paris.\",\n", " config=GenerateContentConfig(\n", " tools=[destination_tool],\n", " temperature=0,\n", " ),\n", ")\n", "\n", "print(response.function_calls[0])" ] }, { "cell_type": "markdown", "metadata": { "id": "MhDs2X3o0neK" }, "source": [ "## Code Execution\n", "\n", "The Gemini API [code execution](https://ai.google.dev/gemini-api/docs/code-execution?lang=python) feature enables the model to generate and run Python code and learn iteratively from the results until it arrives at a final output. You can use this code execution capability to build applications that benefit from code-based reasoning and that produce text output. For example, you could use code execution in an application that solves equations or processes text.\n", "\n", "The Gemini API provides code execution as a tool, similar to function calling.\n", "After you add code execution as a tool, the model decides when to use it.\n", "\n", "For more examples of Code Execution, refer to [this notebook](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/code-execution/intro_code_execution.ipynb)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "1W-3c7sy0nyz" }, "outputs": [], "source": [ "code_execution_tool = Tool(code_execution=ToolCodeExecution())\n", "\n", "response = client.models.generate_content(\n", " model=MODEL_ID,\n", " contents=\"Calculate 20th fibonacci number. Then find the nearest palindrome to it.\",\n", " config=GenerateContentConfig(\n", " tools=[code_execution_tool],\n", " temperature=0,\n", " ),\n", ")\n", "\n", "display(\n", " Markdown(\n", " f\"\"\"\n", "## Code\n", "\n", "```py\n", "{response.executable_code}\n", "```\n", "\n", "### Output\n", "\n", "```\n", "{response.code_execution_result}\n", "```\n", "\"\"\"\n", " )\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "9d2d8fdf1d12" }, "source": [ "## Spatial Understanding\n", "\n", "Gemini 2.0 includes improved spatial understanding and object detection capabilities. Check out this notebook for examples:\n", "\n", "- [2D spatial understanding with Gemini 2.0](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/spatial-understanding/spatial_understanding.ipynb)" ] }, { "cell_type": "markdown", "metadata": { "id": "eQwiONFdVHw5" }, "source": [ "## What's next\n", "\n", "- See the [Google Gen AI SDK reference docs](https://googleapis.github.io/python-genai/).\n", "- Explore other notebooks in the [Google Cloud Generative AI GitHub repository](https://github.com/GoogleCloudPlatform/generative-ai).\n", "- Explore AI models in [Model Garden](https://cloud.google.com/vertex-ai/generative-ai/docs/model-garden/explore-models)." ] } ], "metadata": { "colab": { "name": "intro_gemini_express.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }

gemini/getting-started/intro_gemini_express.ipynb (1,283 lines of code) (raw):