quickstarts/Embeddings.ipynb (347 lines of code) (raw):

{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "Tce3stUlHN0L" }, "source": [ "##### Copyright 2025 Google LLC." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "cellView": "form", "id": "tuOe1ymfHZPu" }, "outputs": [], "source": [ "# @title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "BuhjNPTpju5n" }, "source": [ "# Gemini API: Embeddings Quickstart" ] }, { "cell_type": "markdown", "metadata": { "id": "sUsgeyPu6ogK" }, "source": [ "<a target=\"_blank\" href=\"https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Embeddings.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" height=30/></a>" ] }, { "cell_type": "markdown", "metadata": { "id": "ddZb9-z46mM5" }, "source": [ "The Gemini API generates state-of-the-art text embeddings. An embedding is a list of floating point numbers that represent the meaning of a word, sentence, or paragraph. You can use embeddings in many downstream applications like document search.\n", "\n", "This notebook provides quick code examples that show you how to get started generating embeddings." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "YD6urJjWGVDf" }, "outputs": [], "source": [ "%pip install -q -U \"google-genai>=1.0.0\"" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "yBapI259C99C" }, "outputs": [], "source": [ "from google import genai" ] }, { "cell_type": "markdown", "metadata": { "id": "DJriBaWmkL6Z" }, "source": [ "## Configure your API key\n", "\n", "To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for an example." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "id": "Zey3UiYGDDzU" }, "outputs": [], "source": [ "from google.colab import userdata\n", "GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')\n", "client = genai.Client(api_key=GOOGLE_API_KEY)" ] }, { "cell_type": "markdown", "metadata": { "id": "gGpQ8Eg0kNXW" }, "source": [ "## Embed content\n", "\n", "Call the `embed_content` method with the `models/text-embedding-004` model to generate text embeddings." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "id": "J76TNa3QDwCc" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.013168517, -0.00871193, -0.046782672, 0.0006996 ... TRIMMED]\n" ] } ], "source": [ "text = [\"Hello world\"]\n", "result = client.models.embed_content(model=\"text-embedding-004\", contents=text)\n", "[embedding] = result.embeddings\n", "\n", "# Print just a part of the embedding to keep the output manageable\n", "print(str(embedding.values)[:50], '... TRIMMED]')" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "id": "rU6XX33547Ll" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "768\n" ] } ], "source": [ "print(len(embedding.values)) # The embeddings have 768 dimensions" ] }, { "cell_type": "markdown", "metadata": { "id": "BUKqxF9yQuZl" }, "source": [ "## Batch embed content\n", "\n", "You can embed a list of multiple prompts with one API call for efficiency." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "id": "Hzz-7Heuf4tV" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "values=[-0.010632273, 0.019375853, 0.020965198, 0. ... TRIMMED]\n", "values=[0.018468002, 0.0054281265, -0.017658807, 0 ... TRIMMED]\n", "values=[0.058089074, 0.020941732, -0.10872878, -0. ... TRIMMED]\n" ] } ], "source": [ "result = client.models.embed_content(\n", " model=\"text-embedding-004\",\n", " contents=[\n", " 'What is the meaning of life?',\n", " 'How much wood would a woodchuck chuck?',\n", " 'How does the brain work?'])\n", "\n", "for embedding in result.embeddings:\n", " # Print just a part of the embedding to keep the output manageable\n", " print(str(embedding)[:50], '... TRIMMED]')" ] }, { "cell_type": "markdown", "metadata": { "id": "r0r0dt958QQg" }, "source": [ "## Truncating embeddings\n", "\n", "The `text-embedding-004` model also supports lower embedding dimensions. Specify `output_dimensionality` to truncate the output." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "id": "bX_AjfMx8PvV" }, "outputs": [ { "data": { "text/plain": [ "(768, 10)" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "text=[\"Hello world\"]\n", "# Not truncated\n", "result1 = client.models.embed_content(\n", " model=\"text-embedding-004\",\n", " contents=text)\n", "\n", "\n", "# Truncated\n", "from google.genai import types\n", "\n", "result2 = client.models.embed_content(\n", " model=\"text-embedding-004\",\n", " contents=text,\n", " config=types.EmbedContentConfig(output_dimensionality=10))\n", "\n", "[embedding1] = result1.embeddings\n", "[embedding2] = result2.embeddings\n", "\n", "\n", "(len(embedding1.values), len(embedding2.values))" ] }, { "cell_type": "markdown", "metadata": { "id": "sSKcLGIpo8yc" }, "source": [ "## Specify `task_type`" ] }, { "cell_type": "markdown", "metadata": { "id": "bz0zq1_shk98" }, "source": [ "For details on how to call `embed_content`, check out the [Embeddings API reference](https://ai.google.dev/api/embeddings#method:-models.embedcontent), including the section on [`task_type`](https://ai.google.dev/api/embeddings#tasktype)." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "id": "LFjMapMV91es" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "values=[0.013168517, -0.00871193, -0.046782672, 0. ... TRIMMED]\n", "values=[0.023399517, -0.008547142, -0.052534223, - ... TRIMMED]\n" ] } ], "source": [ "# Notice the API returns different embeddings depending on `task_type`\n", "result1 = client.models.embed_content(\n", " model=\"text-embedding-004\",\n", " contents=[\"Hello world\"])\n", "\n", "result2 = client.models.embed_content(\n", " model=\"text-embedding-004\",\n", " contents=[\"Hello world\"],\n", " config={\"task_type\": \"retrieval_document\"})\n", "\n", "\n", "[embedding1] = result1.embeddings\n", "[embedding2] = result2.embeddings\n", "\n", "print(str(embedding1)[:50], '... TRIMMED]')\n", "print(str(embedding2)[:50], '... TRIMMED]')" ] }, { "cell_type": "markdown", "metadata": { "id": "tpBm7GIdbkdK" }, "source": [ "## Learning more\n", "\n", "Check out these examples in the Cookbook to learn more about what you can do with embeddings:\n", "\n", "* [Search Reranking](https://github.com/google-gemini/cookbook/blob/main/examples/Search_reranking_using_embeddings.ipynb): Use embeddings from the Gemini API to rerank search results from Wikipedia.\n", "\n", "* [Anomaly detection with embeddings](https://github.com/google-gemini/cookbook/blob/main/examples/Anomaly_detection_with_embeddings.ipynb): Use embeddings from the Gemini API to detect potential outliers in your dataset.\n", "\n", "* [Train a text classifier](https://github.com/google-gemini/cookbook/blob/main/examples/Classify_text_with_embeddings.ipynb): Use embeddings from the Gemini API to train a model that can classify different types of newsgroup posts based on the topic.\n", "\n", "* Embeddings have many applications in Vector Databases, too. Check out this [example with Chroma DB](https://github.com/google/generative-ai-docs/blob/main/examples/gemini/python/vectordb_with_chroma/vectordb_with_chroma.ipynb).\n", "\n", "You can learn more about embeddings in general on ai.google.dev in the [embeddings guide](https://ai.google.dev/docs/embeddings_guide)\n", "\n", "* You can find additional code examples with the Python SDK [here](https://ai.google.dev/tutorials/python_quickstart#use_embeddings).\n", "\n", "* You can also find more details in the API Reference for [embedContent](https://ai.google.dev/api/rest/v1/models/embedContent) and [batchEmbedContents](https://ai.google.dev/api/rest/v1/models/batchEmbedContents)." ] } ], "metadata": { "colab": { "name": "Embeddings.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }