quickstarts/rest/Embeddings_REST.ipynb (345 lines of code) (raw):

{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "Tce3stUlHN0L" }, "source": [ "##### Copyright 2025 Google LLC." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "tuOe1ymfHZPu" }, "outputs": [], "source": [ "# @title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "agmT3hrjsffX" }, "source": [ "# Gemini API: Embedding Quickstart with REST\n", "\n", "<a target=\"_blank\" href=\"https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/rest/Embeddings_REST.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" height=30/></a>" ] }, { "cell_type": "markdown", "metadata": { "id": "JMNKdTpTGZET" }, "source": [ "This notebook provides quick code examples that show you how to get started generating embeddings using `curl`.\n", "\n", "You can run this in Google Colab, or you can copy/paste the `curl` commands into your terminal.\n", "\n", "To run this notebook, your API key must be stored it in a Colab Secret named GOOGLE_API_KEY. If you are running in a different environment, you can store your key in an environment variable. See [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) to learn more." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "R-Vw_mOM_WD0" }, "outputs": [], "source": [ "import os\n", "from google.colab import userdata" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "wCkLTpb3oTXE" }, "outputs": [], "source": [ "os.environ['GOOGLE_API_KEY'] = userdata.get('GOOGLE_API_KEY')" ] }, { "cell_type": "markdown", "metadata": { "id": "tjGqGBZ9yARd" }, "source": [ "## Embed content\n", "\n", "Call the `embed_content` method with the `text-embedding-004` model to generate text embeddings:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "eA7I_Ww8IETn" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"embedding\": {\n", " \"values\": [\n", " 0.013168523,\n", " -0.008711934,\n", " -0.046782676,\n", " 0.00069968984,\n", " -0.009518873,\n", " -0.008720178,\n", " 0.060103577,\n" ] } ], "source": [ "%%bash\n", "\n", "curl \"https://generativelanguage.googleapis.com/v1beta/models/text-embedding-004:embedContent?key=$GOOGLE_API_KEY\" \\\n", "-H 'Content-Type: application/json' \\\n", "-d '{\"model\": \"models/text-embedding-004\",\n", " \"content\": {\n", " \"parts\":[{\n", " \"text\": \"Hello world\"}]}, }' 2> /dev/null | head" ] }, { "cell_type": "markdown", "metadata": { "id": "x7ngWdZ7yDHp" }, "source": [ "# Batch embed content\n", "\n", "You can embed a list of multiple prompts with one API call for efficiency.\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "id": "Z0b35xv5Ja_d" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"embeddings\": [\n", " {\n", " \"values\": [\n", " -0.010632277,\n", " 0.019375855,\n", " 0.0209652,\n", " 0.0007706424,\n", " -0.061464064,\n", "--\n", " -0.0071538696,\n", " -0.028534694\n", " ]\n", " },\n", " {\n", " \"values\": [\n", " 0.018467998,\n", " 0.0054281196,\n", " -0.017658804,\n", " 0.013859266,\n", " 0.053418662,\n", "--\n", " 0.026714385,\n", " 0.0018762538\n", " ]\n", " },\n", " {\n", " \"values\": [\n", " 0.05808907,\n", " 0.020941721,\n", " -0.108728774,\n", " -0.04039259,\n", " -0.04440443,\n" ] } ], "source": [ "%%bash\n", "\n", "curl \"https://generativelanguage.googleapis.com/v1beta/models/text-embedding-004:batchEmbedContents?key=$GOOGLE_API_KEY\" \\\n", "-H 'Content-Type: application/json' \\\n", "-d '{\"requests\": [{\n", " \"model\": \"models/text-embedding-004\",\n", " \"content\": {\n", " \"parts\":[{\n", " \"text\": \"What is the meaning of life?\"}]}, },\n", " {\n", " \"model\": \"models/text-embedding-004\",\n", " \"content\": {\n", " \"parts\":[{\n", " \"text\": \"How much wood would a woodchuck chuck?\"}]}, },\n", " {\n", " \"model\": \"models/text-embedding-004\",\n", " \"content\": {\n", " \"parts\":[{\n", " \"text\": \"How does the brain work?\"}]}, }, ]}' 2> /dev/null | grep -C 5 values" ] }, { "cell_type": "markdown", "metadata": { "id": "nPBk2k4xuql8" }, "source": [ "## Set the output dimensionality\n", "If you're using `text-embeddings-004`, you can set the `output_dimensionality` parameter to create smaller embeddings.\n", "\n", "* `output_dimensionality` truncates the embedding (e.g., `[1, 3, 5]` becomes `[1,3]` when `output_dimensionality=2`).\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "id": "ny3bOQK1ut2_" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"embedding\": {\n", " \"values\": [\n", " 0.013168523,\n", " -0.008711934,\n", " -0.046782676,\n", " 0.00069968984,\n", " -0.009518873,\n", " -0.008720178,\n", " 0.060103577,\n" ] } ], "source": [ "%%bash\n", "\n", "curl \"https://generativelanguage.googleapis.com/v1beta/models/text-embedding-004:embedContent?key=$GOOGLE_API_KEY\" \\\n", "-H 'Content-Type: application/json' \\\n", "-d '{\"model\": \"models/text-embedding-004\",\n", " \"output_dimensionality\":256,\n", " \"content\": {\n", " \"parts\":[{\n", " \"text\": \"Hello world\"}]}, }' 2> /dev/null | head" ] }, { "cell_type": "markdown", "metadata": { "id": "ObAdUvlk9x05" }, "source": [ "## Use `task_type` to provide a hint to the model how you'll use the embeddings\n", "\n", "Let's look at all the parameters the embed_content method takes. There are four:\n", "\n", "* `model`: Required. Must be `models/embedding-001`.\n", "* `content`: Required. The content that you would like to embed.\n", "* `task_type`: Optional. The task type for which the embeddings will be used. See below for possible values.\n", "* `title`: The given text is a document from a corpus being searched. Optionally, set the `title` parameter with the title of the document. Can only be set when `task_type` is `RETRIEVAL_DOCUMENT`.\n", "\n", "`task_type` is an optional parameter that provides a hint to the API about how you intend to use the embeddings in your application.\n", "\n", "The following task_type parameters are accepted:\n", "\n", "* `TASK_TYPE_UNSPECIFIED`: If you do not set the value, it will default to retrieval_query.\n", "* `RETRIEVAL_QUERY` : The given text is a query in a search/retrieval setting.\n", "* `RETRIEVAL_DOCUMENT`: The given text is a document from the corpus being searched.\n", "* `SEMANTIC_SIMILARITY`: The given text will be used for Semantic Textual Similarity (STS).\n", "* `CLASSIFICATION`: The given text will be classified.\n", "* `CLUSTERING`: The embeddings will be used for clustering.\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "id": "NwzsJmRrAo-t" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"embedding\": {\n", " \"values\": [\n", " 0.060187872,\n", " -0.031515103,\n", " -0.03244149,\n", " -0.019341845,\n", " 0.057285223,\n", " 0.037159503,\n", " 0.035636507,\n" ] } ], "source": [ "%%bash\n", "\n", "curl \"https://generativelanguage.googleapis.com/v1beta/models/embedding-001:embedContent?key=$GOOGLE_API_KEY\" \\\n", "-H 'Content-Type: application/json' \\\n", "-d '{\"model\": \"models/text-embedding-004\",\n", " \"content\": {\n", " \"parts\":[{\n", " \"text\": \"Hello world\"}]},\n", " \"task_type\": \"RETRIEVAL_DOCUMENT\",\n", " \"title\": \"My title\"}' 2> /dev/null | head" ] }, { "cell_type": "markdown", "metadata": { "id": "jXkRYBhbB_b2" }, "source": [ "## Learning more\n", "\n", "* Learn more about text-embeddings-004 [here](https://developers.googleblog.com/2024/04/gemini-15-pro-in-public-preview-with-new-features.html).\n", "* See the [REST API reference](https://ai.google.dev/api/rest) to learn more.\n", "* Explore more examples in the cookbook.\n" ] } ], "metadata": { "colab": { "name": "Embeddings_REST.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }