notebooks/research_example.ipynb (369 lines of code) (raw):

{ "cells": [ { "metadata": { "id": "Dzr0uiSQ0hSB" }, "cell_type": "code", "source": [ "# Copyright 2025 DeepMind Technologies Limited. All Rights Reserved.\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# http://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ], "outputs": [], "execution_count": null }, { "metadata": { "id": "qMYjFNCd0xP2" }, "cell_type": "markdown", "source": [ "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/google/genai-processors/blob/main/notebooks/research_example.ipynb)" ] }, { "metadata": { "id": "zzFUsTefL_JV" }, "cell_type": "markdown", "source": [ "# 📖 Research Agent Example\n", "\n", "This notebook demonstrates how to build a research agent using the modular components of the `genai-processors` library.\n", "\n", "You will see how a complex task, like researching a topic, can be broken down into a series of smaller, reusable processors. We will explore each component individually and then combine them into a powerful, automated research pipeline.\n", "\n", "In this notebook, we will cover:\n", "\n", "* **Setup**: Getting your environment ready and configuring an API key.\n", "* **Individual Processors**: Exploring the building blocks `TopicGenerator`, `TopicResearcher`, and `TopicVerbalizer`.\n", "* **Chaining**: Learning how to connect processors to create a seamless workflow.\n", "* **The Complete Agent**: Running the `ResearchAgent` to see how all the pieces come together for an end-to-end task.\n", "\n", "Note: You will need to authorize colab to use your GitHub account, in order to import the research example." ] }, { "metadata": { "id": "oxForuGtLv4L" }, "cell_type": "markdown", "source": [ "## 🍳 Setup" ] }, { "metadata": { "cellView": "form", "id": "u7arK-sNjEoy", "language": "python" }, "cell_type": "code", "source": [ "# {display-mode: \"form\"}\n", "# @markdown Run this cell to import libraries and perform initializations.\n", "\n", "# NOTE: we need to import the code from GitHub since it is not included in the\n", "# PyPI package.\n", "\n", "import os\n", "\n", "try:\n", " !pip install genai-processors --q\n", " !git clone --filter=blob:none --q --no-checkout --depth 1 https://github.com/google-gemini/genai-processors.git /content/temp_repo\n", " %cd /content/temp_repo\n", " !git sparse-checkout set --no-cone examples/research\n", " !git checkout\n", " %cd /content\n", "\n", " downloaded_subdir_path = \"/content/temp_repo/examples/research\"\n", " if not os.path.exists(downloaded_subdir_path):\n", " print(\"Download failed using 'git sparse-checkout\")\n", " else:\n", " print(\"✅ Successfully downloaded 'research' examples.\")\n", "\n", " !mv /content/temp_repo/examples/research /content/research_latest\n", " !rm -rf /content/temp_repo\n", "\n", " import shutil\n", " import genai_processors\n", "\n", " package_path = genai_processors.__path__[0]\n", " examples_path = os.path.join(package_path, \"examples\")\n", " installed_research_path = os.path.join(examples_path, \"research\")\n", "\n", " if os.path.exists(installed_research_path):\n", " shutil.rmtree(installed_research_path)\n", "\n", " shutil.move(\"/content/research_latest\", installed_research_path)\n", " print(\"✅ Successfully replaced 'research' examples with the latest version.\")\n", "\n", "except Exception as e:\n", " print(f\"An error occurred during import: {e}\")\n", "\n", "from genai_processors import content_api\n", "from genai_processors import processor\n", "from genai_processors import streams\n", "from genai_processors.core import jinja_template\n", "from genai_processors.examples import research\n", "from google.colab import userdata\n", "from IPython.display import Markdown, display\n", "\n", "ProcessorPart = processor.ProcessorPart\n", "\n", "\n", "def render_part(part: ProcessorPart) -\u003e None:\n", " if part.substream_name == \"status\":\n", " display(Markdown(f\"--- \\n *Status*: {part.text}\"))\n", " else:\n", " try:\n", " display(Markdown(part.text))\n", " except Exception:\n", " display(Markdown(f\" {part.text} \"))" ], "outputs": [], "execution_count": null }, { "metadata": { "id": "0CnijoOm4W2N" }, "cell_type": "markdown", "source": [ "## 🔐 Auth\n", "\n", "To use the research processors, you will need an API key. If you have not\n", "done so already, obtain your API key from Google AI Studio, and import it as a\n", "secret in Colab (recommended) or directly set it below." ] }, { "metadata": { "id": "tMt8TdEG1ADk" }, "cell_type": "code", "source": [ "GOOGLE_API_KEY = None\n", "\n", "try:\n", " GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')\n", "except Exception:\n", " print('Failed to obtain `GOOGLE_API_KEY`.')" ], "outputs": [], "execution_count": null }, { "metadata": { "id": "e60eyr2_1Jst" }, "cell_type": "markdown", "source": [ "" ] }, { "metadata": { "id": "XdMvGFufdeRF" }, "cell_type": "markdown", "source": [ "## 🏗 Processors" ] }, { "metadata": { "id": "DOETzhfmdeRF" }, "cell_type": "code", "source": [ "USER_PROMPT = \"Research the best things about owning dalmatians!\" # @param { \"type\": \"string\" }" ], "outputs": [], "execution_count": null }, { "metadata": { "id": "XCYJp7H1deRF" }, "cell_type": "markdown", "source": [ "### ✍ `TopicGenerator`\n", "\n", "The `TopicGenerator` processor generates a list of research topics, given the user's content." ] }, { "metadata": { "id": "FuHD-6_OdeRF" }, "cell_type": "code", "source": [ "p_generator = research.TopicGenerator(api_key=GOOGLE_API_KEY)\n", "\n", "topic_parts = []\n", "input_stream = streams.stream_content([ProcessorPart(USER_PROMPT)])\n", "async for content_part in p_generator(input_stream):\n", " if content_part.mimetype == 'application/json; type=Topic':\n", " topic_parts.append(content_part)\n", " else:\n", " render_part(content_part)" ], "outputs": [], "execution_count": null }, { "metadata": { "id": "QSKIdaZxdeRF" }, "cell_type": "markdown", "source": [ "### 🔍 `TopicResearcher`\n", "\n", "\n", "Next, we add `TopicResearcher` to `TopicGenerator` to generate `Topic` objects." ] }, { "metadata": { "id": "08mSMNordeRF" }, "cell_type": "code", "source": [ "topics = []\n", "p_researcher = research.TopicResearcher(api_key=GOOGLE_API_KEY)\n", "\n", "pipeline = p_generator + p_researcher\n", "\n", "input_stream = streams.stream_content([ProcessorPart(USER_PROMPT)])\n", "async for content_part in pipeline(input_stream):\n", " if content_part.mimetype == 'application/json; type=Topic':\n", " topics.append(content_part.get_dataclass(research.interfaces.Topic))\n", " elif content_part.substream_name == 'status':\n", " render_part(content_part)\n", "\n", "print(f'Pipeline produced {len(topics)} `Topic` `ProcessorParts`:\\n\\n')\n", "\n", "for t in topics:\n", " print(t)" ], "outputs": [], "execution_count": null }, { "metadata": { "id": "4Wgln4W4deRF" }, "cell_type": "markdown", "source": [ "### 🗣 `TopicVerbalizer`\n", "\n", "A Jinja2 processor is used to convert `TopicResearch` parts into human-readable research text." ] }, { "metadata": { "id": "FqfzbDm9deRF" }, "cell_type": "code", "source": [ "p_verbalizer = jinja_template.RenderDataClass(\n", " template_str=(\n", " \"## {{ data.topic }}\\n\"\n", " \"*{{ data.relationship_to_user_content }}*\"\n", " \"{% if data.research_text|trim != '' %}\"\n", " \"\\n\\n### Research\\n\\n{{ data.research_text }}\"\n", " \"{% endif %}\"\n", " ),\n", " data_class=research.interfaces.Topic,\n", ")\n", "\n", "pipeline = p_generator + p_researcher + p_verbalizer\n", "\n", "input_stream = streams.stream_content([ProcessorPart(USER_PROMPT)])\n", "async for content_part in pipeline(input_stream):\n", " # We exclude printing status to demonstrate the verbalization.\n", " if content_part.substream_name != \"status\":\n", " render_part(content_part)" ], "outputs": [], "execution_count": null }, { "metadata": { "id": "u4-z5E6HdeRG" }, "cell_type": "markdown", "source": [ "## 🤖 Agent\n", "\n", "Now we have all our building blocks, we can chain these together inside our agent, resulting in a seamless flow of Content." ] }, { "metadata": { "id": "Nn7grwqCdeRG" }, "cell_type": "code", "source": [ "input_stream = streams.stream_content([ProcessorPart(USER_PROMPT)])\n", "\n", "output_parts = content_api.ProcessorContent()\n", "async for content_part in research.ResearchAgent(api_key=GOOGLE_API_KEY)(input_stream):\n", " if content_part.substream_name == 'status':\n", " render_part(content_part)\n", " output_parts += content_part\n", "\n", "render_part(ProcessorPart(f\"\"\"# Final synthesized research\n", "\n", "{content_api.as_text(output_parts, substream_name='')}\"\"\"))" ], "outputs": [], "execution_count": null } ], "metadata": { "colab": { "last_runtime": { "build_target": "", "kind": "local" }, "private_outputs": true, "provenance": [ { "file_id": "1TCBn83WhRZIzt-1lVRN9h-JLyLoc-nzC", "timestamp": 1750516269106 }, { "file_id": "/piper/depot/google3/third_party/py/genai_processors/examples/research/genai_research_api.ipynb?cl=759077814", "timestamp": 1750415997748 }, { "file_id": "1TR6JIxtY9TPU5EBn12Uy5DqB6lRov3rL", "timestamp": 1741296571652 } ], "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 0 }