notebooks/generative_ai/content_generation/description_from_video.ipynb (681 lines of code) (raw):

{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# Copyright 2023 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "source": [ "# Generate descriptions from videos" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "<table align=\"left\">\n", "\n", "<a href=\"https://github.com/GoogleCloudPlatform/ai-ml-recipes/blob/main/notebooks/generative_ai/content_generation/description_from_video.ipynb\">\n", "<img src=\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt=\"GitHub logo\">\n", "View on GitHub\n", "</a>\n", "</td>\n", "<td>\n", "<a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/ai-ml-recipes/main/notebooks/generative_ai/content_generation/description_from_video.ipynb\">\n", "<img src=\"https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32\" alt=\"Vertex AI logo\">\n", "Open in Vertex AI Workbench\n", "</a>\n", "</td>\n", "</td>\n", "<td style=\"text-align: center\\\">\n", "<a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fai-ml-recipes%2Fmain%2Fnotebooks%2Fgenerative_ai%2Fcontent_generation%2Fdescription_from_video.ipynb\">\n", "<img width=\"32px\" src=\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\" alt=\"Google Cloud Colab Enterprise logo\"> <br> Open in Colab Enterprise\n", "</a>\n", "</td>\n", "</table>" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "source": [ "## Overview\n", "\n", "This notebook shows how to generate descriptions of videos in a GCS bucket. \n", "It uses the [Youtube UGC dataset](https://media.withyoutube.com/) and uses the [Gemini](https://cloud.google.com/vertex-ai/docs/generative-ai/model-reference/gemini) to generate video descriptions for each video." ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "source": [ "#### **Steps**\n", "Using Spark,\n", "1) It reads the table [Youtube UGC dataset](https://media.withyoutube.com/) from gs://dataproc-metastore-public-binaries/youtube_ucg/\n", "2) It calls Vertex AI Gemini API vision pro to generate description from videos." ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "source": [ "### Setup\n", "\n", "Make sure the service account running this notebook has the required permissions:\n", "\n", "- **Run the notebook**\n", " - AI Platform Notebooks Service Agent\n", " - Notebooks Admin\n", " - Vertex AI Administrator\n", "- **Read files from bucket**\n", " - Storage Object Viewer\n", "- **Run Dataproc jobs**\n", " - Dataproc Service Agent\n", " - Dataproc Worker\n", "- **Call Google APIs**\n", " - Service Usage Consumer" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "source": [ "#### Imports" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false }, "pycharm": { "is_executing": true } }, "outputs": [], "source": [ "import time\n", "\n", "from pyspark.sql.functions import regexp_replace, concat\n", "from pyspark.sql.functions import udf, col, lit\n", "\n", "import google.auth\n", "import google.auth.transport.requests\n", "import requests\n", "\n", "import pandas as pd\n", "pd.set_option('display.max_colwidth', None)\n", "pd.set_option('display.min_rows', 20)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# When using Dataproc Serverless, installed packages are automatically available on all nodes\n", "!pip install --upgrade google-cloud-aiplatform -q\n", "# When using a Dataproc cluster, you will need to install these packages during cluster creation: https://cloud.google.com/dataproc/docs/tutorials/python-configuration" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "source": [ "#### Authentication" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "credentials, project_id = google.auth.default()\n", "auth_req = google.auth.transport.requests.Request()\n", "credentials.refresh(auth_req)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "source": [ "#### Setup Spark Session" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "from pyspark.sql import SparkSession" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "spark = SparkSession.builder \\\n", " .appName(\"Video descriptions generation\") \\\n", " .enableHiveSupport() \\\n", " .getOrCreate()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "source": [ "#### Read dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "BINARIES_BUCKET_PATH = \"gs://dataproc-metastore-public-binaries/youtube_ucg/\"\n", "binaries_df = spark.read.format(\"binaryFile\").option(\"recursiveFileLookup\", \"true\").load(BINARIES_BUCKET_PATH)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "# Let's select the paths of the first 5 youtube videos\n", "paths_df = binaries_df.select(\"path\").limit(5)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "source": [ "#### Define UDF and call Gemini API to generate video descriptions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "response_schema = {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"where\": {\"type\": \"string\"},\n", " \"how_many_people\": {\"type\": \"integer\"},\n", " \"task\": {\"type\": \"string\"},\n", " \"proposition\": {\"type\": \"string\"},\n", " \"description\": {\"type\": \"string\"}\n", " },\n", " \"required\": [\"where\",\"how_many_people\",\"task\",\"proposition\",\"description\"],\n", "}\n", "\n", "system_instructions = [\n", " \"\"\"Format the 5 items as attributes of a JSON object: where, how_many_people, task, proposition and description.\"\"\", \n", " \"\"\"The response should be a single valid formatted JSON object only.\"\"\"\n", "]\n", "\n", "prompt = f\"\"\"\n", " Create a short description for this video with the following questions:\n", " 1) Where is the video recorded? \n", " 2) How many people are shown in the video? \n", " 3) What the people are doing in the video? \n", " 4) Whats the proposition for the video, i.e what it is about?\n", " 5) A sumary description from the itens 1,2,3 and 4\n", " \"\"\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from vertexai.generative_models import GenerativeModel, GenerationConfig, Part, Image, Content, HarmCategory, HarmBlockThreshold\n", "\n", "def predict(uri, prompt, system_instructions=system_instructions, response_schema=response_schema, content_type=\"video/mp4\", temperature=1, model_name=\"gemini-1.5-pro\"):\n", "\n", " model = GenerativeModel(model_name=model_name, system_instruction=system_instructions)\n", " \n", " prompt_content = Content(\n", " role=\"user\",\n", " parts=[\n", " Part.from_uri(uri, content_type),\n", " Part.from_text(prompt)\n", " ]\n", " )\n", "\n", " response = model.generate_content(\n", " prompt_content,\n", " generation_config = GenerationConfig(\n", " temperature=temperature, response_mime_type=\"application/json\", response_schema=response_schema\n", " ),\n", " safety_settings={\n", " HarmCategory.HARM_CATEGORY_UNSPECIFIED: HarmBlockThreshold.BLOCK_ONLY_HIGH\n", " }\n", " )\n", " \n", " return response.text" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "def generate_descriptions(gcs_uri):\n", "\n", " descriptions = predict(gcs_uri, prompt)\n", " return descriptions\n", " \n", "generate_descriptions_udf = udf(generate_descriptions)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "df_descriptions = paths_df.sort(paths_df.path.asc()).withColumn(\"data\", generate_descriptions_udf(paths_df.path))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df_descriptions.cache()" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ " \r" ] }, { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>path</th>\n", " <th>data</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>gs://dataproc-metastore-public-binaries/youtube_ucg/original_videos/HowTo/360P/HowTo_360P-09f8.mp4</td>\n", " <td>{\"description\": \"The video was recorded on a mobile device. It shows how to navigate to a folder and lists the videos inside it. There are no people visible in the video. \", \"how_many_people\": 0, \"proposition\": \"The video shows how to navigate to a folder on a mobile device.\", \"task\": \"The user navigates to a folder in his mobile and shows the videos it contains.\", \"where\": \"On a mobile device.\"}</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>gs://dataproc-metastore-public-binaries/youtube_ucg/original_videos/HowTo/360P/HowTo_360P-1dba.mp4</td>\n", " <td>{\"description\": \"The video was recorded inside a room by one person. This person shows how to make a bracelet with colored rubber bands using their fingers.\", \"how_many_people\": 1, \"proposition\": \"Video tutorial showing how to make a bracelet with colored rubber bands using their fingers.\", \"task\": \"Make a bracelet with colored rubber bands\", \"where\": \"Room\"}</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>gs://dataproc-metastore-public-binaries/youtube_ucg/original_videos/HowTo/360P/HowTo_360P-2fd5.mp4</td>\n", " <td>{\"description\": \"The video shows a person writing something on a piece of paper with a pen. The background suggests the video was recorded indoors.\", \"how_many_people\": 1, \"proposition\": \"Someone writing on a piece of paper\", \"task\": \"Writing\", \"where\": \"Indoors\"}</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>gs://dataproc-metastore-public-binaries/youtube_ucg/original_videos/HowTo/360P/HowTo_360P-5da7.mp4</td>\n", " <td>{\"description\": \"A person is using a hot glue gun to create a design on a silicone mat. They are working on a wooden table. The video demonstrates how to use a hot glue gun to make crafts.\", \"how_many_people\": 1, \"proposition\": \"This video demonstrates how to use a hot glue gun to make crafts.\", \"task\": \"The person is using a hot glue gun to create a design on a silicone mat.\", \"where\": \"The video is recorded indoors on a wooden table.\"}</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>gs://dataproc-metastore-public-binaries/youtube_ucg/original_videos/HowTo/360P/HowTo_360P-7fb1.mp4</td>\n", " <td>{\"description\": \"The video was recorded at home by one person showing a manga style drawing of a female character and comparing it to a comic book. The video is about drawing manga characters.\", \"how_many_people\": 1, \"proposition\": \"How to Draw Manga Characters\", \"task\": \"Drawing\", \"where\": \"Home\"}</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " path \\\n", "0 gs://dataproc-metastore-public-binaries/youtube_ucg/original_videos/HowTo/360P/HowTo_360P-09f8.mp4 \n", "1 gs://dataproc-metastore-public-binaries/youtube_ucg/original_videos/HowTo/360P/HowTo_360P-1dba.mp4 \n", "2 gs://dataproc-metastore-public-binaries/youtube_ucg/original_videos/HowTo/360P/HowTo_360P-2fd5.mp4 \n", "3 gs://dataproc-metastore-public-binaries/youtube_ucg/original_videos/HowTo/360P/HowTo_360P-5da7.mp4 \n", "4 gs://dataproc-metastore-public-binaries/youtube_ucg/original_videos/HowTo/360P/HowTo_360P-7fb1.mp4 \n", "\n", " data \n", "0 {\"description\": \"The video was recorded on a mobile device. It shows how to navigate to a folder and lists the videos inside it. There are no people visible in the video. \", \"how_many_people\": 0, \"proposition\": \"The video shows how to navigate to a folder on a mobile device.\", \"task\": \"The user navigates to a folder in his mobile and shows the videos it contains.\", \"where\": \"On a mobile device.\"} \n", "1 {\"description\": \"The video was recorded inside a room by one person. This person shows how to make a bracelet with colored rubber bands using their fingers.\", \"how_many_people\": 1, \"proposition\": \"Video tutorial showing how to make a bracelet with colored rubber bands using their fingers.\", \"task\": \"Make a bracelet with colored rubber bands\", \"where\": \"Room\"} \n", "2 {\"description\": \"The video shows a person writing something on a piece of paper with a pen. The background suggests the video was recorded indoors.\", \"how_many_people\": 1, \"proposition\": \"Someone writing on a piece of paper\", \"task\": \"Writing\", \"where\": \"Indoors\"} \n", "3 {\"description\": \"A person is using a hot glue gun to create a design on a silicone mat. They are working on a wooden table. The video demonstrates how to use a hot glue gun to make crafts.\", \"how_many_people\": 1, \"proposition\": \"This video demonstrates how to use a hot glue gun to make crafts.\", \"task\": \"The person is using a hot glue gun to create a design on a silicone mat.\", \"where\": \"The video is recorded indoors on a wooden table.\"} \n", "4 {\"description\": \"The video was recorded at home by one person showing a manga style drawing of a female character and comparing it to a comic book. The video is about drawing manga characters.\", \"how_many_people\": 1, \"proposition\": \"How to Draw Manga Characters\", \"task\": \"Drawing\", \"where\": \"Home\"} " ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_descriptions.toPandas()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Extract feature from generated text" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "from pyspark.sql.functions import from_json, col\n", "from pyspark.sql.types import StructType, StructField, StringType\n", "schema = StructType(\n", " [\n", " StructField('where', StringType(), True),\n", " StructField('how_many_people', StringType(), True),\n", " StructField('proposition', StringType(), True),\n", " StructField('description', StringType(), True),\n", " StructField('task', StringType(), True)\n", " ]\n", ")\n", "df_final = df_descriptions.withColumn(\"exploded_data\", from_json(regexp_replace(regexp_replace(col(\"data\"),\"json\", \"\"),\"```\",\"\"), schema))\\\n", " .select(col('path'),col('exploded_data.*'))" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "<div>\n", "<style scoped>\n", " .dataframe tbody tr th:only-of-type {\n", " vertical-align: middle;\n", " }\n", "\n", " .dataframe tbody tr th {\n", " vertical-align: top;\n", " }\n", "\n", " .dataframe thead th {\n", " text-align: right;\n", " }\n", "</style>\n", "<table border=\"1\" class=\"dataframe\">\n", " <thead>\n", " <tr style=\"text-align: right;\">\n", " <th></th>\n", " <th>path</th>\n", " <th>where</th>\n", " <th>how_many_people</th>\n", " <th>proposition</th>\n", " <th>description</th>\n", " <th>task</th>\n", " </tr>\n", " </thead>\n", " <tbody>\n", " <tr>\n", " <th>0</th>\n", " <td>gs://dataproc-metastore-public-binaries/youtube_ucg/original_videos/HowTo/360P/HowTo_360P-09f8.mp4</td>\n", " <td>On a mobile device.</td>\n", " <td>0</td>\n", " <td>The video shows how to navigate to a folder on a mobile device.</td>\n", " <td>The video was recorded on a mobile device. It shows how to navigate to a folder and lists the videos inside it. There are no people visible in the video.</td>\n", " <td>The user navigates to a folder in his mobile and shows the videos it contains.</td>\n", " </tr>\n", " <tr>\n", " <th>1</th>\n", " <td>gs://dataproc-metastore-public-binaries/youtube_ucg/original_videos/HowTo/360P/HowTo_360P-1dba.mp4</td>\n", " <td>Room</td>\n", " <td>1</td>\n", " <td>Video tutorial showing how to make a bracelet with colored rubber bands using their fingers.</td>\n", " <td>The video was recorded inside a room by one person. This person shows how to make a bracelet with colored rubber bands using their fingers.</td>\n", " <td>Make a bracelet with colored rubber bands</td>\n", " </tr>\n", " <tr>\n", " <th>2</th>\n", " <td>gs://dataproc-metastore-public-binaries/youtube_ucg/original_videos/HowTo/360P/HowTo_360P-2fd5.mp4</td>\n", " <td>Indoors</td>\n", " <td>1</td>\n", " <td>Someone writing on a piece of paper</td>\n", " <td>The video shows a person writing something on a piece of paper with a pen. The background suggests the video was recorded indoors.</td>\n", " <td>Writing</td>\n", " </tr>\n", " <tr>\n", " <th>3</th>\n", " <td>gs://dataproc-metastore-public-binaries/youtube_ucg/original_videos/HowTo/360P/HowTo_360P-5da7.mp4</td>\n", " <td>The video is recorded indoors on a wooden table.</td>\n", " <td>1</td>\n", " <td>This video demonstrates how to use a hot glue gun to make crafts.</td>\n", " <td>A person is using a hot glue gun to create a design on a silicone mat. They are working on a wooden table. The video demonstrates how to use a hot glue gun to make crafts.</td>\n", " <td>The person is using a hot glue gun to create a design on a silicone mat.</td>\n", " </tr>\n", " <tr>\n", " <th>4</th>\n", " <td>gs://dataproc-metastore-public-binaries/youtube_ucg/original_videos/HowTo/360P/HowTo_360P-7fb1.mp4</td>\n", " <td>Home</td>\n", " <td>1</td>\n", " <td>How to Draw Manga Characters</td>\n", " <td>The video was recorded at home by one person showing a manga style drawing of a female character and comparing it to a comic book. The video is about drawing manga characters.</td>\n", " <td>Drawing</td>\n", " </tr>\n", " </tbody>\n", "</table>\n", "</div>" ], "text/plain": [ " path \\\n", "0 gs://dataproc-metastore-public-binaries/youtube_ucg/original_videos/HowTo/360P/HowTo_360P-09f8.mp4 \n", "1 gs://dataproc-metastore-public-binaries/youtube_ucg/original_videos/HowTo/360P/HowTo_360P-1dba.mp4 \n", "2 gs://dataproc-metastore-public-binaries/youtube_ucg/original_videos/HowTo/360P/HowTo_360P-2fd5.mp4 \n", "3 gs://dataproc-metastore-public-binaries/youtube_ucg/original_videos/HowTo/360P/HowTo_360P-5da7.mp4 \n", "4 gs://dataproc-metastore-public-binaries/youtube_ucg/original_videos/HowTo/360P/HowTo_360P-7fb1.mp4 \n", "\n", " where how_many_people \\\n", "0 On a mobile device. 0 \n", "1 Room 1 \n", "2 Indoors 1 \n", "3 The video is recorded indoors on a wooden table. 1 \n", "4 Home 1 \n", "\n", " proposition \\\n", "0 The video shows how to navigate to a folder on a mobile device. \n", "1 Video tutorial showing how to make a bracelet with colored rubber bands using their fingers. \n", "2 Someone writing on a piece of paper \n", "3 This video demonstrates how to use a hot glue gun to make crafts. \n", "4 How to Draw Manga Characters \n", "\n", " description \\\n", "0 The video was recorded on a mobile device. It shows how to navigate to a folder and lists the videos inside it. There are no people visible in the video. \n", "1 The video was recorded inside a room by one person. This person shows how to make a bracelet with colored rubber bands using their fingers. \n", "2 The video shows a person writing something on a piece of paper with a pen. The background suggests the video was recorded indoors. \n", "3 A person is using a hot glue gun to create a design on a silicone mat. They are working on a wooden table. The video demonstrates how to use a hot glue gun to make crafts. \n", "4 The video was recorded at home by one person showing a manga style drawing of a female character and comparing it to a comic book. The video is about drawing manga characters. \n", "\n", " task \n", "0 The user navigates to a folder in his mobile and shows the videos it contains. \n", "1 Make a bracelet with colored rubber bands \n", "2 Writing \n", "3 The person is using a hot glue gun to create a design on a silicone mat. \n", "4 Drawing " ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_final.toPandas()" ] } ], "metadata": { "kernelspec": { "display_name": "delta-runtime on Serverless Spark (Remote)", "language": "python", "name": "9c39b79e5d2e7072beb4bd59-delta-runtime" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.3" } }, "nbformat": 4, "nbformat_minor": 4 }