notebooks/drive_processors.ipynb (350 lines of code) (raw):

{ "cells": [ { "metadata": { "id": "ADJ_-OZlsF4_" }, "cell_type": "code", "source": [ "# Copyright 2025 DeepMind Technologies Limited. All Rights Reserved.\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# http://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ], "outputs": [], "execution_count": null }, { "metadata": { "id": "2kDCttqJrIj8" }, "cell_type": "markdown", "source": [ "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/google-gemini/genai-processors/blob/main/notebooks/drive_processors.ipynb)" ] }, { "metadata": { "id": "EXQIiFf9KMB8" }, "cell_type": "markdown", "source": [ "# Drive Processors 🏎\n", "\n", "The GenAI Processor library includes a suite of Google Drive processors, which makes it easy to create pipelines grounded in Docs, Sheets \u0026/or Slides.\n", "\n", "" ] }, { "metadata": { "id": "L_UGXJPfGZhk" }, "cell_type": "markdown", "source": [ "## Setup 🍳\n", "\n", "Run the hidden cell below to install `genai-processors` and import the relevant modules." ] }, { "metadata": { "id": "PTFtubO8GXse" }, "cell_type": "code", "source": [ "!pip install genai-processors\n", "\n", "from genai_processors import processor\n", "from genai_processors.core import drive\n", "from genai_processors.core import genai_model\n", "from genai_processors.core import preamble\n", "from google.colab import auth as colab_auth\n", "from google.colab import userdata\n", "from IPython.display import Markdown, clear_output, display" ], "outputs": [], "execution_count": null }, { "metadata": { "id": "trgPsfQPb8JP" }, "cell_type": "markdown", "source": [ "\n" ] }, { "metadata": { "id": "hsi8RIw2Ggdw" }, "cell_type": "markdown", "source": [ "## Auth 🔐\n", "\n", "### Drive Auth\n", "\n", "To authenticate with Drive, you can use the `colab_auth` environment in Colab,\n", "or pass a `Credentials` object in other python environments.\n", "\n", "### API Key\n", "\n", "To use the GenAI model processors, you will need an API key. If you have not\n", "done so already, obtain your API key from Google AI Studio, and import it as a\n", "secret in Colab (recommended) or directly set it below." ] }, { "metadata": { "id": "XtMPESjcc3KN" }, "cell_type": "code", "source": [ "CREDS = None\n", "GOOGLE_API_KEY = None\n", "\n", "try:\n", " colab_auth.authenticate_user()\n", "except Exception:\n", " print('Failed to authenticate using `colab_auth`.')\n", "\n", "try:\n", " GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')\n", "except Exception:\n", " print('Failed to obtain `GOOGLE_API_KEY`.')" ], "outputs": [], "execution_count": null }, { "metadata": { "id": "8gLNdFi2cIJ6" }, "cell_type": "markdown", "source": [ "" ] }, { "metadata": { "id": "CnXds_UmGky9" }, "cell_type": "markdown", "source": [ "## `GoogleSlides` processor 🖼\n", "\n", "We can create a pipeline with `GenaiModel` to answer questions about Google Slides." ] }, { "metadata": { "id": "ve5zaGGYGl2S" }, "cell_type": "code", "source": [ "PRESENTATION_ID = \"YOUR-PRESENTATION-ID\" # @param {type:\"string\"}\n", "USER_PROMPT = \"Describe the presentation in detail.\" # @param {type:\"string\"}" ], "outputs": [], "execution_count": null }, { "metadata": { "id": "yDxi15mEGmzB" }, "cell_type": "code", "source": [ "p_slides = drive.Slides(creds=CREDS)\n", "\n", "p_preamble = preamble.Preamble(\n", " content=f\"\"\"You are an expert in reviewing Google Slides presentations.\n", "\n", "You have been provided with slides, and must use them to answer the user's question.\n", "\n", "User question: {USER_PROMPT}\"\"\"\n", ")\n", "\n", "p_genai = genai_model.GenaiModel(\n", " model_name='gemini-2.5-flash-preview-05-20', api_key=GOOGLE_API_KEY\n", ")\n", "\n", "pipeline = p_slides + p_preamble + p_genai\n", "\n", "req = drive.SlidesRequest(presentation_id=PRESENTATION_ID)\n", "req_part = processor.ProcessorPart.from_dataclass(dataclass=req)\n", "\n", "input_stream = processor.stream_content([req_part])\n", "\n", "streaming_text = ''\n", "async for content_part in pipeline(input_stream):\n", " streaming_text += content_part.text\n", " clear_output(wait=True)\n", " display(Markdown(streaming_text))" ], "outputs": [], "execution_count": null }, { "metadata": { "id": "YP7ib2P7WNUf" }, "cell_type": "markdown", "source": [ "## `GoogleDocs` processor 📄\n", "\n", "We can create a pipeline with `GenaiModel` to answer questions about a Google Doc." ] }, { "metadata": { "id": "yHwbqU-3WXVG" }, "cell_type": "code", "source": [ "DOC_ID = \"YOUR-DOC-ID\" # @param {type:\"string\"}\n", "USER_PROMPT = \"Describe the document in detail.\" # @param {type:\"string\"}" ], "outputs": [], "execution_count": null }, { "metadata": { "id": "d2f_pIjgWgq4" }, "cell_type": "code", "source": [ "p_docs = drive.Docs(creds=CREDS)\n", "\n", "p_preamble = preamble.Preamble(\n", " content=f\"\"\"You are an expert in reviewing Google Docs.\n", "\n", "You have been provided with a document, and must use it to answer the user's question.\n", "\n", "User question: {USER_PROMPT}\"\"\"\n", ")\n", "\n", "p_genai = genai_model.GenaiModel(\n", " model_name='gemini-2.5-flash-preview-05-20', api_key=GOOGLE_API_KEY\n", ")\n", "\n", "pipeline = p_docs + p_preamble + p_genai\n", "\n", "req = drive.DocsRequest(doc_id=DOC_ID)\n", "req_part = processor.ProcessorPart.from_dataclass(dataclass=req)\n", "\n", "input_stream = processor.stream_content([req_part])\n", "\n", "streaming_text = ''\n", "async for content_part in pipeline(input_stream):\n", " streaming_text += content_part.text\n", " clear_output(wait=True)\n", " display(Markdown(streaming_text))" ], "outputs": [], "execution_count": null }, { "metadata": { "id": "0v8BBUPsufgJ" }, "cell_type": "markdown", "source": [ "## `GoogleSheets` processor 🧮\n", "\n", "The `GoogleSheets` processor exports sheet data in CSV format, which can be\n", "used to ground a `GenaiModel`." ] }, { "metadata": { "id": "VHJ6bEs_uUUk" }, "cell_type": "code", "source": [ "SPREADSHEET_ID = \"YOUR-SPREADSHEET-ID\" # @param {type:\"string\"}\n", "USER_PROMPT = \"Describe the spreadsheet in granular detail.\" # @param {type:\"string\"}" ], "outputs": [], "execution_count": null }, { "metadata": { "id": "oIBTa1_5u28K" }, "cell_type": "code", "source": [ "p_sheets = drive.Sheets(creds=CREDS)\n", "\n", "p_preamble = preamble.Preamble(\n", " content=f\"\"\"You are an expert in reviewing Google Sheets.\n", "\n", "You have been provided with a spreadsheet, and must use it to answer the user's question.\n", "\n", "User question: {USER_PROMPT}\"\"\"\n", ")\n", "\n", "p_genai = genai_model.GenaiModel(\n", " model_name='gemini-2.5-flash-preview-05-20', api_key=GOOGLE_API_KEY\n", ")\n", "\n", "pipeline = p_sheets + p_preamble + p_genai\n", "\n", "req = drive.SheetsRequest(spreadsheet_id=SPREADSHEET_ID)\n", "req_part = processor.ProcessorPart.from_dataclass(dataclass=req)\n", "\n", "input_stream = processor.stream_content([req_part])\n", "\n", "streaming_text = ''\n", "async for content_part in pipeline(input_stream):\n", " streaming_text += content_part.text\n", " clear_output(wait=True)\n", " display(Markdown(streaming_text))" ], "outputs": [], "execution_count": null } ], "metadata": { "colab": { "last_runtime": { "build_target": "", "kind": "local" }, "private_outputs": true, "provenance": [ { "file_id": "/piper/depot/google3/third_party/py/genai_processors/notebooks/drive_processors_internal.ipynb?workspaceId=kpsawhney:fig-export-drive-processors-21540-change-5::citc", "timestamp": 1749719578564 }, { "file_id": "/piper/depot/google3/third_party/py/genai_processors/core/drive/drive_processors_internal.ipynb?workspaceId=kpsawhney::21540:citc", "timestamp": 1749591873799 }, { "file_id": "1F5_QXPgmmCqQMxycUZ30Hj379ieV9sby", "timestamp": 1749480222690 } ], "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 0 }