##### Copyright 2025 Google LLC.

In [1]:
# @title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Gemini API: Embeddings Quickstart

<a target="_blank" href="https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Embeddings.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" height=30/></a>

The Gemini API generates state-of-the-art text embeddings. An embedding is a list of floating point numbers that represent the meaning of a word, sentence, or paragraph. You can use embeddings in many downstream applications like document search.

This notebook provides quick code examples that show you how to get started generating embeddings.

In [None]:
%pip install -q -U "google-genai>=1.0.0"

In [3]:
from google import genai

## Configure your API key

To run the following cell, your API key must be stored it in a Colab Secret named `GOOGLE_API_KEY`. If you don't already have an API key, or you're not sure how to create a Colab Secret, see  [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) for an example.

In [4]:
from google.colab import userdata
GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
client = genai.Client(api_key=GOOGLE_API_KEY)

## Embed content

Call the `embed_content` method with the `models/text-embedding-004` model to generate text embeddings.

In [5]:
text = ["Hello world"]
result = client.models.embed_content(model="text-embedding-004", contents=text)
[embedding] = result.embeddings

# Print just a part of the embedding to keep the output manageable
print(str(embedding.values)[:50], '... TRIMMED]')

[0.013168517, -0.00871193, -0.046782672, 0.0006996 ... TRIMMED]


In [6]:
print(len(embedding.values))  # The embeddings have 768 dimensions

768


## Batch embed content

You can embed a list of multiple prompts with one API call for efficiency.

In [7]:
result = client.models.embed_content(
    model="text-embedding-004",
    contents=[
      'What is the meaning of life?',
      'How much wood would a woodchuck chuck?',
      'How does the brain work?'])

for embedding in result.embeddings:
    # Print just a part of the embedding to keep the output manageable
  print(str(embedding)[:50], '... TRIMMED]')

values=[-0.010632273, 0.019375853, 0.020965198, 0. ... TRIMMED]
values=[0.018468002, 0.0054281265, -0.017658807, 0 ... TRIMMED]
values=[0.058089074, 0.020941732, -0.10872878, -0. ... TRIMMED]


## Truncating embeddings

The `text-embedding-004` model also supports lower embedding dimensions. Specify `output_dimensionality` to truncate the output.

In [8]:
text=["Hello world"]
# Not truncated
result1 = client.models.embed_content(
    model="text-embedding-004",
    contents=text)


# Truncated
from google.genai import types

result2 = client.models.embed_content(
    model="text-embedding-004",
    contents=text,
    config=types.EmbedContentConfig(output_dimensionality=10))

[embedding1] = result1.embeddings
[embedding2] = result2.embeddings


(len(embedding1.values), len(embedding2.values))

(768, 10)

## Specify `task_type`

For details on how to call `embed_content`, check out the [Embeddings API reference](https://ai.google.dev/api/embeddings#method:-models.embedcontent), including the section on [`task_type`](https://ai.google.dev/api/embeddings#tasktype).

In [9]:
# Notice the API returns different embeddings depending on `task_type`
result1 = client.models.embed_content(
    model="text-embedding-004",
    contents=["Hello world"])

result2 = client.models.embed_content(
    model="text-embedding-004",
    contents=["Hello world"],
    config={"task_type": "retrieval_document"})


[embedding1] = result1.embeddings
[embedding2] = result2.embeddings

print(str(embedding1)[:50], '... TRIMMED]')
print(str(embedding2)[:50], '... TRIMMED]')

values=[0.013168517, -0.00871193, -0.046782672, 0. ... TRIMMED]
values=[0.023399517, -0.008547142, -0.052534223, - ... TRIMMED]


## Learning more

Check out these examples in the Cookbook to learn more about what you can do with embeddings:

* [Search Reranking](https://github.com/google-gemini/cookbook/blob/main/examples/Search_reranking_using_embeddings.ipynb): Use embeddings from the Gemini API to rerank search results from Wikipedia.

* [Anomaly detection with embeddings](https://github.com/google-gemini/cookbook/blob/main/examples/Anomaly_detection_with_embeddings.ipynb): Use embeddings from the Gemini API to detect potential outliers in your dataset.

* [Train a text classifier](https://github.com/google-gemini/cookbook/blob/main/examples/Classify_text_with_embeddings.ipynb): Use embeddings from the Gemini API to train a model that can classify different types of newsgroup posts based on the topic.

* Embeddings have many applications in Vector Databases, too. Check out this [example with Chroma DB](https://github.com/google/generative-ai-docs/blob/main/examples/gemini/python/vectordb_with_chroma/vectordb_with_chroma.ipynb).

You can learn more about embeddings in general on ai.google.dev in the [embeddings guide](https://ai.google.dev/docs/embeddings_guide)

* You can find additional code examples with the Python SDK [here](https://ai.google.dev/tutorials/python_quickstart#use_embeddings).

* You can also find more details in the API Reference for [embedContent](https://ai.google.dev/api/rest/v1/models/embedContent) and [batchEmbedContents](https://ai.google.dev/api/rest/v1/models/batchEmbedContents).