quickstarts/Embeddings.ipynb (347 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "Tce3stUlHN0L"
},
"source": [
"##### Copyright 2025 Google LLC."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"cellView": "form",
"id": "tuOe1ymfHZPu"
},
"outputs": [],
"source": [
"# @title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "BuhjNPTpju5n"
},
"source": [
"# Gemini API: Embeddings Quickstart"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "sUsgeyPu6ogK"
},
"source": [
"<a target=\"_blank\" href=\"https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Embeddings.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" height=30/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ddZb9-z46mM5"
},
"source": [
"The Gemini API generates state-of-the-art text embeddings. An embedding is a list of floating point numbers that represent the meaning of a word, sentence, or paragraph. You can use embeddings in many downstream applications like document search.\n",
"\n",
"This notebook provides quick code examples that show you how to get started generating embeddings."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "YD6urJjWGVDf"
},
"outputs": [],
"source": [
"%pip install -q -U \"google-genai>=1.0.0\""
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"id": "yBapI259C99C"
},
"outputs": [],
"source": [
"from google import genai"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DJriBaWmkL6Z"
},
"source": [
"## Configure your API key\n",
"\n",
"To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for an example."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"id": "Zey3UiYGDDzU"
},
"outputs": [],
"source": [
"from google.colab import userdata\n",
"GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')\n",
"client = genai.Client(api_key=GOOGLE_API_KEY)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "gGpQ8Eg0kNXW"
},
"source": [
"## Embed content\n",
"\n",
"Call the `embed_content` method with the `models/text-embedding-004` model to generate text embeddings."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"id": "J76TNa3QDwCc"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0.013168517, -0.00871193, -0.046782672, 0.0006996 ... TRIMMED]\n"
]
}
],
"source": [
"text = [\"Hello world\"]\n",
"result = client.models.embed_content(model=\"text-embedding-004\", contents=text)\n",
"[embedding] = result.embeddings\n",
"\n",
"# Print just a part of the embedding to keep the output manageable\n",
"print(str(embedding.values)[:50], '... TRIMMED]')"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"id": "rU6XX33547Ll"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"768\n"
]
}
],
"source": [
"print(len(embedding.values)) # The embeddings have 768 dimensions"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "BUKqxF9yQuZl"
},
"source": [
"## Batch embed content\n",
"\n",
"You can embed a list of multiple prompts with one API call for efficiency."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"id": "Hzz-7Heuf4tV"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"values=[-0.010632273, 0.019375853, 0.020965198, 0. ... TRIMMED]\n",
"values=[0.018468002, 0.0054281265, -0.017658807, 0 ... TRIMMED]\n",
"values=[0.058089074, 0.020941732, -0.10872878, -0. ... TRIMMED]\n"
]
}
],
"source": [
"result = client.models.embed_content(\n",
" model=\"text-embedding-004\",\n",
" contents=[\n",
" 'What is the meaning of life?',\n",
" 'How much wood would a woodchuck chuck?',\n",
" 'How does the brain work?'])\n",
"\n",
"for embedding in result.embeddings:\n",
" # Print just a part of the embedding to keep the output manageable\n",
" print(str(embedding)[:50], '... TRIMMED]')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "r0r0dt958QQg"
},
"source": [
"## Truncating embeddings\n",
"\n",
"The `text-embedding-004` model also supports lower embedding dimensions. Specify `output_dimensionality` to truncate the output."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"id": "bX_AjfMx8PvV"
},
"outputs": [
{
"data": {
"text/plain": [
"(768, 10)"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"text=[\"Hello world\"]\n",
"# Not truncated\n",
"result1 = client.models.embed_content(\n",
" model=\"text-embedding-004\",\n",
" contents=text)\n",
"\n",
"\n",
"# Truncated\n",
"from google.genai import types\n",
"\n",
"result2 = client.models.embed_content(\n",
" model=\"text-embedding-004\",\n",
" contents=text,\n",
" config=types.EmbedContentConfig(output_dimensionality=10))\n",
"\n",
"[embedding1] = result1.embeddings\n",
"[embedding2] = result2.embeddings\n",
"\n",
"\n",
"(len(embedding1.values), len(embedding2.values))"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "sSKcLGIpo8yc"
},
"source": [
"## Specify `task_type`"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bz0zq1_shk98"
},
"source": [
"For details on how to call `embed_content`, check out the [Embeddings API reference](https://ai.google.dev/api/embeddings#method:-models.embedcontent), including the section on [`task_type`](https://ai.google.dev/api/embeddings#tasktype)."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"id": "LFjMapMV91es"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"values=[0.013168517, -0.00871193, -0.046782672, 0. ... TRIMMED]\n",
"values=[0.023399517, -0.008547142, -0.052534223, - ... TRIMMED]\n"
]
}
],
"source": [
"# Notice the API returns different embeddings depending on `task_type`\n",
"result1 = client.models.embed_content(\n",
" model=\"text-embedding-004\",\n",
" contents=[\"Hello world\"])\n",
"\n",
"result2 = client.models.embed_content(\n",
" model=\"text-embedding-004\",\n",
" contents=[\"Hello world\"],\n",
" config={\"task_type\": \"retrieval_document\"})\n",
"\n",
"\n",
"[embedding1] = result1.embeddings\n",
"[embedding2] = result2.embeddings\n",
"\n",
"print(str(embedding1)[:50], '... TRIMMED]')\n",
"print(str(embedding2)[:50], '... TRIMMED]')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "tpBm7GIdbkdK"
},
"source": [
"## Learning more\n",
"\n",
"Check out these examples in the Cookbook to learn more about what you can do with embeddings:\n",
"\n",
"* [Search Reranking](https://github.com/google-gemini/cookbook/blob/main/examples/Search_reranking_using_embeddings.ipynb): Use embeddings from the Gemini API to rerank search results from Wikipedia.\n",
"\n",
"* [Anomaly detection with embeddings](https://github.com/google-gemini/cookbook/blob/main/examples/Anomaly_detection_with_embeddings.ipynb): Use embeddings from the Gemini API to detect potential outliers in your dataset.\n",
"\n",
"* [Train a text classifier](https://github.com/google-gemini/cookbook/blob/main/examples/Classify_text_with_embeddings.ipynb): Use embeddings from the Gemini API to train a model that can classify different types of newsgroup posts based on the topic.\n",
"\n",
"* Embeddings have many applications in Vector Databases, too. Check out this [example with Chroma DB](https://github.com/google/generative-ai-docs/blob/main/examples/gemini/python/vectordb_with_chroma/vectordb_with_chroma.ipynb).\n",
"\n",
"You can learn more about embeddings in general on ai.google.dev in the [embeddings guide](https://ai.google.dev/docs/embeddings_guide)\n",
"\n",
"* You can find additional code examples with the Python SDK [here](https://ai.google.dev/tutorials/python_quickstart#use_embeddings).\n",
"\n",
"* You can also find more details in the API Reference for [embedContent](https://ai.google.dev/api/rest/v1/models/embedContent) and [batchEmbedContents](https://ai.google.dev/api/rest/v1/models/batchEmbedContents)."
]
}
],
"metadata": {
"colab": {
"name": "Embeddings.ipynb",
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}