quickstarts/rest/Audio

{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "aqhgRiRTnkqL" }, "source": [ "##### Copyright 2025 Google LLC." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "2gYF4IEpn1VT" }, "outputs": [], "source": [ "# @title Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "yFN3UPYdVrql" }, "source": [ "# Gemini API: Audio prompting with REST" ] }, { "cell_type": "markdown", "metadata": { "id": "4v9S1Uytalb_" }, "source": [ "<a target=\"_blank\" href=\"https://colab.research.google.com/github/google-gemini/cookbook/blob/main/quickstarts/rest/Audio_REST.ipynb\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" height=30/></a>" ] }, { "cell_type": "markdown", "metadata": { "id": "xw58Kqh3asaN" }, "source": [ "This notebook provides quick code examples that show you how to prompt the Gemini API using an audio file with `curl`. In this case, you will use a 43 minute clip of a US Presidental State of the Union Address from January 30th, 1961.\n", "\n", "You can run this in Google Colab, or you can copy/paste the `curl` commands into your terminal." ] }, { "cell_type": "markdown", "metadata": { "id": "HjA5xWmabklG" }, "source": [ "## Set up the environment\n", "\n", "To run this notebook, your API key must be stored in a Colab Secret named `GOOGLE_API_KEY`. If you are running in a different environment, you can store your key in an environment variable. See [Authentication](https://github.com/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb) to learn more." ] }, { "cell_type": "markdown", "metadata": { "id": "ktbKLzkGb8Gb" }, "source": [ "### Authentication Overview\n", "\n", "**Important:** The File API uses API keys for authentication and access. Uploaded files are associated with the API key's cloud project. Unlike other Gemini APIs that use API keys, your API key also grants access data you've uploaded to the File API, so take extra care in keeping your API key secure. For best practices on securing API keys, refer to the [API console support center](https://support.google.com/googleapi/answer/6310037)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "4lWKxrtfs2AM" }, "outputs": [], "source": [ "import os\n", "from google.colab import userdata\n", "os.environ['GOOGLE_API_KEY'] = userdata.get('GOOGLE_API_KEY')" ] }, { "cell_type": "markdown", "metadata": { "id": "3xaop47VcFrS" }, "source": [ "For this example you will also need to install `jq` to help with processing JSON API responses, as well as ffmpeg for manipulating audio files." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "su9rrzJT1uml" }, "outputs": [], "source": [ "!apt install -q jq\n", "!apt install ffmpeg -y" ] }, { "cell_type": "markdown", "metadata": { "id": "5EPq0IstcWEr" }, "source": [ "## Use an audio file with the Gemini API\n", "\n", "The Gemini API accepts audio file formats through the File API. The File API accepts files under 2GB in size and can store up to 20GB of files per project. Files last for 2 days and cannot be downloaded from the API. For this example, you will use the 1961 US State of the Union Address, which is available as a part of the public domain.\n", "\n", "Note: In Colab, you can also [upload your own files](https://github.com/google-gemini/cookbook/blob/main/examples/Upload_files_to_Colab.ipynb) to use." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "UU_S0kPIs4Lo" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2025-02-14 22:33:25-- https://storage.googleapis.com/generativeai-downloads/data/State_of_the_Union_Address_30_January_1961.mp3\n", "Resolving storage.googleapis.com (storage.googleapis.com)... 172.217.203.207, 142.250.98.207, 142.251.107.207, ...\n", "Connecting to storage.googleapis.com (storage.googleapis.com)|172.217.203.207|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 41762063 (40M) [audio/mpeg]\n", "Saving to: ‘sample.mp3’\n", "\n", "sample.mp3 100%[===================>] 39.83M 140MB/s in 0.3s \n", "\n", "2025-02-14 22:33:25 (140 MB/s) - ‘sample.mp3’ saved [41762063/41762063]\n", "\n" ] } ], "source": [ "!wget https://storage.googleapis.com/generativeai-downloads/data/State_of_the_Union_Address_30_January_1961.mp3 -O sample.mp3" ] }, { "cell_type": "markdown", "metadata": { "id": "ziZ__48sdSvs" }, "source": [ "With the audio file now available locally, you can store the metadata about the file that will be used in subsequent steps. This includes the mime type of the audio file, the number of bytes within that file, and the path to the file.\n", "\n", "Note: Colab doesn't allow variables to be shared between cells, so you will store them in a new file named `vars.sh` to access later." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "P9hwqbiK2fQ2" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "audio/mpeg 41762063 Sample audio\n" ] } ], "source": [ "%%bash\n", "\n", "AUDIO_PATH=\"./sample.mp3\"\n", "\n", "MIME_TYPE=$(file -b --mime-type \"${AUDIO_PATH}\")\n", "NUM_BYTES=$(wc -c < \"${AUDIO_PATH}\")\n", "DISPLAY_NAME=\"Sample audio\"\n", "\n", "echo $MIME_TYPE $NUM_BYTES $DISPLAY_NAME\n", "\n", "# Colab doesn't allow sharing shell variables between cells, so save them.\n", "cat >./vars.sh <<-EOF\n", " export BASE_URL=\"https://generativelanguage.googleapis.com\"\n", " export DISPLAY_NAME=\"${DISPLAY_NAME}\"\n", " export AUDIO_PATH=${AUDIO_PATH}\n", " export MIME_TYPE=${MIME_TYPE}\n", " export NUM_BYTES=${NUM_BYTES}\n", "EOF" ] }, { "cell_type": "markdown", "metadata": { "id": "WOKshHYxgQBJ" }, "source": [ "Now that you have the necessary data, it's time to let the Gemini File API know that you want to upload a file. You can create a curl request with the following headers and some content to let it know the display name for the file you want to upload.\n", "\n", "Once you've done that, you can retrieve the destination upload URL that you will use for your file and upload the file. Finally you will retrieve the file uri and other info that will be used for later requests with the Gemini API." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "4E3Qk7Bt2OGG" }, "outputs": [], "source": [ "%%bash\n", ". vars.sh\n", "\n", "tmp_header_file=upload-header.tmp\n", "\n", "# Initial resumable request defining metadata.\n", "# The upload url is in the response headers dump them to a file.\n", "curl \"${BASE_URL}/upload/v1beta/files?key=${GOOGLE_API_KEY}\" \\\n", " -D upload-header.tmp \\\n", " -H \"X-Goog-Upload-Protocol: resumable\" \\\n", " -H \"X-Goog-Upload-Command: start\" \\\n", " -H \"X-Goog-Upload-Header-Content-Length: ${NUM_BYTES}\" \\\n", " -H \"X-Goog-Upload-Header-Content-Type: ${MIME_TYPE}\" \\\n", " -H \"Content-Type: application/json\" \\\n", " -d \"{'file': {'display_name': '${DISPLAY_NAME}'}}\" 2> /dev/null\n", "\n", "upload_url=$(grep -i \"x-goog-upload-url: \" \"${tmp_header_file}\" | cut -d\" \" -f2 | tr -d \"\\r\")\n", "rm \"${tmp_header_file}\"\n", "\n", "# Upload the actual bytes.\n", "curl \"${upload_url}\" \\\n", " -H \"Content-Length: ${NUM_BYTES}\" \\\n", " -H \"X-Goog-Upload-Offset: 0\" \\\n", " -H \"X-Goog-Upload-Command: upload, finalize\" \\\n", " --data-binary \"@${AUDIO_PATH}\" 2> /dev/null > file_info.json\n", "\n", "file_uri=$(jq \".file.uri\" file_info.json)" ] }, { "cell_type": "markdown", "metadata": { "id": "kINmacLwiN9Q" }, "source": [ "### Wait for processing\n", "\n", "Once the file is uploaded, the file service will perform some pre-processing to prepare the audio file for use with the LLM. For simple media types this is typically a negligible amount of time, but if you are using a larger audio file, you may need to wait a short time before using the file with Gemini.\n", "\n", "You can use the `state` field to query if the audio file is ready for use. If you use it in a prompt prematurely you will see an error like `The File ... is not in an ACTIVE state and usage is not allowed`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "C3XyJg_WgGqL" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Audio is now ACTIVE.\n" ] } ], "source": [ "%%bash\n", "\n", "state=$(jq -r \".file.state\" file_info.json)\n", "file_uri=$(jq -r \".file.uri\" file_info.json)\n", "\n", "while [[ \"${state}\" == \"PROCESSING\" ]];\n", "do\n", " echo \"Processing audio...\"\n", " sleep 5\n", " # Get the file of interest to check state\n", " curl \"${file_uri}?key=${GOOGLE_API_KEY}\" >file_info.json 2>/dev/null\n", " state=$(jq -r \".state\" file_info.json)\n", "done\n", "\n", "echo \"Audio is now ${state}.\"" ] }, { "cell_type": "markdown", "metadata": { "id": "P9fPmbzFjQbL" }, "source": [ "### Get file info\n", "\n", "After uploading the file, you can verify that the API has successfully received the files by querying the [`files.get` endpoint](https://ai.google.dev/api/files#method:-files.get).\n", "\n", "`files.get` lets you see the file uploaded to the File API that are associated with the Cloud project your API key belongs to. Only the `name` (and by extension, the `uri`) are unique." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "OJlBgRO5jVgL" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"name\": \"files/x27lu0k9zc2k\",\n", " \"displayName\": \"Sample audio\",\n", " \"mimeType\": \"audio/mpeg\",\n", " \"sizeBytes\": \"41762063\",\n", " \"createTime\": \"2025-02-14T22:33:33.377817Z\",\n", " \"updateTime\": \"2025-02-14T22:33:33.377817Z\",\n", " \"expirationTime\": \"2025-02-16T22:33:33.359672295Z\",\n", " \"sha256Hash\": \"MGU3ZmFmZTE5ODRhZWQyNGMxNWJlMDc4OWEzNWU2MGM1YWYwYzczNzNiOWVkOWYyNjMxMzE2NzQwYTRiOWVlNg==\",\n", " \"uri\": \"https://generativelanguage.googleapis.com/v1beta/files/x27lu0k9zc2k\",\n", " \"state\": \"ACTIVE\",\n", " \"source\": \"UPLOADED\"\n", "}\n" ] } ], "source": [ "%%bash\n", ". vars.sh\n", "\n", "file_uri=$(jq -r \".file.uri\" file_info.json)\n", "\n", "curl \"${file_uri}?key=${GOOGLE_API_KEY}\" 2>/dev/null" ] }, { "cell_type": "markdown", "metadata": { "id": "WyZ3PpXGjhe4" }, "source": [ "# Viewing info on all files\n", "If you have uploaded multiple files and would like to see info on each, you can query the Files API like so" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "BlSyJzy7jnmV" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"files\": [\n", " {\n", " \"name\": \"files/x27lu0k9zc2k\",\n", " \"displayName\": \"Sample audio\",\n", " \"mimeType\": \"audio/mpeg\",\n", " \"sizeBytes\": \"41762063\",\n", " \"createTime\": \"2025-02-14T22:33:33.377817Z\",\n", " \"updateTime\": \"2025-02-14T22:33:33.377817Z\",\n", " \"expirationTime\": \"2025-02-16T22:33:33.359672295Z\",\n", " \"sha256Hash\": \"MGU3ZmFmZTE5ODRhZWQyNGMxNWJlMDc4OWEzNWU2MGM1YWYwYzczNzNiOWVkOWYyNjMxMzE2NzQwYTRiOWVlNg==\",\n", " \"uri\": \"https://generativelanguage.googleapis.com/v1beta/files/x27lu0k9zc2k\",\n", " \"state\": \"ACTIVE\",\n", " \"source\": \"UPLOADED\"\n", " }\n", " ]\n", "}\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ " % Total % Received % Xferd Average Speed Time Time Time Current\n", " Dload Upload Total Spent Left Speed\n", "\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 577 0 577 0 0 1351 0 --:--:-- --:--:-- --:--:-- 1354\n" ] } ], "source": [ "%%bash\n", "\n", "curl \"https://generativelanguage.googleapis.com/v1beta/files?key=$GOOGLE_API_KEY\"" ] }, { "cell_type": "markdown", "metadata": { "id": "_bovKlC9ipBM" }, "source": [ "# Prompting with the audio file\n", "\n", "At this point your file should be uploaded and available to use with the Gemini API. It's worth noting here that your request contents will include a `file_data` object to represent the file that you have uploaded. In a later section you will learn how to directly reference a *small* audio file using the `inline_data` object." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "XbiVahzAfXuV" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"parts\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[1;39m[\n", " \u001b[1;39m{\n", " \u001b[0m\u001b[34;1m\"text\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"This is an audio recording of President John F. Kennedy delivering his State of the Union address to a joint session of Congress on January 30, 1961. He discusses the state of the economy, the balance of payments, the federal budget, and the political climate. He also outlines some of the key challenges facing the nation, both domestically and internationally, and proposes steps to address them.\"\u001b[0m\u001b[1;39m\n", " \u001b[1;39m}\u001b[0m\u001b[1;39m\n", " \u001b[1;39m]\u001b[0m\u001b[1;39m,\n", " \u001b[0m\u001b[34;1m\"role\"\u001b[0m\u001b[1;39m: \u001b[0m\u001b[0;32m\"model\"\u001b[0m\u001b[1;39m\n", "\u001b[1;39m}\u001b[0m\n" ] } ], "source": [ "%%bash\n", ". vars.sh\n", "\n", "file_uri=$(jq \".file.uri\" file_info.json)\n", "\n", "curl \"${BASE_URL}/v1beta/models/gemini-2.0-flash:generateContent?key=${GOOGLE_API_KEY}\" \\\n", " -H 'Content-Type: application/json' \\\n", " -X POST \\\n", " -d '{\n", " \"contents\": [{\n", " \"parts\":[\n", " {\"text\": \"Please describe this file.\"},\n", " {\"file_data\": {\n", " \"mime_type\": \"'${MIME_TYPE}'\",\n", " \"file_uri\": '${file_uri}'}}]\n", " }]\n", " }' 2>/dev/null >response.json\n", "\n", "jq -C .candidates[].content response.json" ] }, { "cell_type": "markdown", "metadata": { "id": "dSyyHhPmjzfL" }, "source": [ "# Deleting files\n", "\n", "While files will be automatically deleted after 48 hours, you may wish to delete them entirely after use. In this case you can provide the file name to the Files API with a delete request. The following code block retrieves *all* files currently associated with your API key and sends a delete request for each." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "cRh1kX3D9Gp_" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{}\n", "Deleted file: files/x27lu0k9zc2k\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ " % Total % Received % Xferd Average Speed Time Time Time Current\n", " Dload Upload Total Spent Left Speed\n", "\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 577 0 577 0 0 1026 0 --:--:-- --:--:-- --:--:-- 1026\r100 577 0 577 0 0 1026 0 --:--:-- --:--:-- --:--:-- 1026\n", " % Total % Received % Xferd Average Speed Time Time Time Current\n", " Dload Upload Total Spent Left Speed\n", "\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0\r100 3 0 3 0 0 6 0 --:--:-- --:--:-- --:--:-- 6\n" ] } ], "source": [ "# Delete all files\n", "\n", "%%bash\n", "# Fetch the list of files\n", "files_json=$(curl \"https://generativelanguage.googleapis.com/v1beta/files?key=${GOOGLE_API_KEY}\")\n", "\n", "# Extract file names using jq\n", "file_names=$(echo \"$files_json\" | jq -r '.files[].name')\n", "\n", "\n", "# Loop through each file name and delete it\n", "# File names are files/abcd, so path should not include files in it if using the file name.\n", "for file_name in $file_names; do\n", " curl --request \"DELETE\" \"https://generativelanguage.googleapis.com/v1beta/${file_name}?key=${GOOGLE_API_KEY}\"\n", " echo \"Deleted file: ${file_name}\"\n", "done" ] }, { "cell_type": "markdown", "metadata": { "id": "JuKRdKjwkQDZ" }, "source": [ "# Prompting with audio files directly\n", "\n", "If you only need to use smaller audio files, up to 20MB, one time for your application, then you can send them directly with your Gemini API prompt. You will learn how to do this over the remaining portion of this Colab example.\n", "\n", "This section will start by creating two new audio files that can be used for testing by sectioning off the first 30 seconds of the original 43 minute speech, as well as seconds 31 through 60." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "57yJqdMsAA5A" }, "outputs": [], "source": [ "%%bash\n", "\n", "ffmpeg -i sample.mp3 -t 30 -c copy sample_30s.mp3 && \\\n", "ffmpeg -ss 30 -to 60 -i sample.mp3 -c copy sample_31-60.mp3" ] }, { "cell_type": "markdown", "metadata": { "id": "EnVmf2ITk9Uy" }, "source": [ "These audio files will then need to be converted into a Base64 format for sending directly to the Gemini API. The following request will be stored in a new JSON document due to Colab restrictions, as well as so it can be easily reviewed.\n", "\n", "As noted earlier, the data object within this request is using `inline_data` instead of `file_data`, and you will use the `data` parameter instead of `file_uri`.\n", "\n", "Only one `inline_data` object can be sent at a time, but this example has provided two separate Base64 data items that you can use for testing." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "w3wpMdGSPHXQ" }, "outputs": [], "source": [ "%%bash\n", ". vars.sh\n", "\n", "data1_b64=$(base64 sample_30s.mp3)\n", "data2_b64=$(base64 sample_31-60.mp3 | base64 )\n", "\n", "echo '{\n", " \"contents\": [{\n", " \"parts\":[\n", " {\"text\": \"Summarize this clips\"},\n", " {\"inline_data\": {\n", " \"mime_type\": \"'${MIME_TYPE}'\",\n", " \"data\": \"'\"$data1_b64\"'\"\n", " }}\n", " ]\n", " }]\n", " }' > request.json" ] }, { "cell_type": "markdown", "metadata": { "id": "MJXCZLPklhxj" }, "source": [ "You can then send that request directly to the Gemini API with the following `curl` command." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "-L6iiCPOMeYP" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"candidates\": [\n", " {\n", " \"content\": {\n", " \"parts\": [\n", " {\n", " \"text\": \"This clip is an audio recording of President Kennedy’s State of the Union address to Congress on January 30, 1961 in Washington D.C. He is speaking to the Vice President and Members of the Congress.\\n\"\n", " }\n", " ],\n", " \"role\": \"model\"\n", " },\n", " \"finishReason\": \"STOP\",\n", " \"avgLogprobs\": -0.681359271613919\n", " }\n", " ],\n", " \"usageMetadata\": {\n", " \"promptTokenCount\": 754,\n", " \"candidatesTokenCount\": 49,\n", " \"totalTokenCount\": 803,\n", " \"promptTokensDetails\": [\n", " {\n", " \"modality\": \"TEXT\",\n", " \"tokenCount\": 4\n", " },\n", " {\n", " \"modality\": \"AUDIO\",\n", " \"tokenCount\": 750\n", " }\n", " ],\n", " \"candidatesTokensDetails\": [\n", " {\n", " \"modality\": \"TEXT\",\n", " \"tokenCount\": 49\n", " }\n", " ]\n", " },\n", " \"modelVersion\": \"gemini-2.0-flash\"\n", "}\n" ] } ], "source": [ "%%bash\n", ". vars.sh\n", "\n", "curl \"${BASE_URL}/v1beta/models/gemini-2.0-flash:generateContent?key=${GOOGLE_API_KEY}\" \\\n", " -H 'Content-Type: application/json' \\\n", " -X POST \\\n", " -d @request.json 2>/dev/null >response.json\n", "\n", "cat response.json" ] }, { "cell_type": "markdown", "metadata": { "id": "Qnnt4SJgl1VR" }, "source": [ "## Further reading\n", "\n", "The File API lets you upload a variety of multimodal MIME types, including images, audio, and video formats. The File API handles inputs that can be used to generate content with the [content generation endpoint](https://ai.google.dev/api/generate-content).\n", "\n", "* Read the [`File API`](https://ai.google.dev/api/files) reference.\n", "\n", "* Learn more about prompting with [media files](https://ai.google.dev/tutorials/prompting_with_media) in the docs, including the supported formats and maximum length." ] } ], "metadata": { "colab": { "name": "Audio_REST.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }

quickstarts/rest/Audio_REST.ipynb (688 lines of code) (raw):