In [None]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Video Thumbnail Generation using Gemini

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/video-thumbnail-generation/video_thumbnail_generation.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo"><br> Run in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fuse-cases%2Fvideo-thumbnail-generation%2Fvideo_thumbnail_generation.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Run in Colab Enterprise
    </a>
  </td>    
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/video-thumbnail-generation/video_thumbnail_generation.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/use-cases/video-thumbnail-generation/video_thumbnail_generation.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
</table>

<div style="clear: both;"></div>

<b>Share to:</b>

<a href="https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/video-thumbnail-generation/video_thumbnail_generation.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg" alt="LinkedIn logo">
</a>

<a href="https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/video-thumbnail-generation/video_thumbnail_generation.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg" alt="Bluesky logo">
</a>

<a href="https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/video-thumbnail-generation/video_thumbnail_generation.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg" alt="X logo">
</a>

<a href="https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/video-thumbnail-generation/video_thumbnail_generation.ipynb" target="_blank">
  <img width="20px" src="https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png" alt="Reddit logo">
</a>

<a href="https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/video-thumbnail-generation/video_thumbnail_generation.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg" alt="Facebook logo">
</a>            

| | |
|-|-|
|Author(s) | [Kartik Chaudhary](https://github.com/kartikgill)|

## Objectives

In this tutorial, you will learn how to extract meaningful thumbnail images from a video using Gemini 2.0 model.

You will complete the following tasks:

- Install the Google Gen AI SDK for Python
- Use the Gemini API in Vertex AI to interact with the Gemini model
    - Extract thumbnails for a Video along with captions using Gemini
    - Use **[`moviepy`](https://zulko.github.io/moviepy/)** Python library for frame extraction for a given timestamp
    - Using a better prompt to improve results

### Costs

This tutorial uses billable components of Google Cloud:

- Vertex AI

Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.


## Getting Started


### Install libraries for Python

- **[Google Gen AI SDK](https://cloud.google.com/vertex-ai/generative-ai/docs/sdks/overview)**: to call the Gemini API in Vertex AI.
- **[moviepy](https://zulko.github.io/moviepy/)**: A module for video editing.

In [None]:
%pip install --upgrade --quiet google-genai moviepy

### Authenticate your notebook environment (Colab only)

If you are running this notebook on Google Colab, run the following cell to authenticate your environment. This step is not required if you are using [Vertex AI Workbench](https://cloud.google.com/vertex-ai-workbench).


In [None]:
import sys

# Additional authentication is required for Google Colab
if "google.colab" in sys.modules:
    # Authenticate user to Google Cloud
    from google.colab import auth

    auth.authenticate_user()

### Set Google Cloud project information and initialize Vertex AI SDK

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

In [None]:
# Use the environment variable if the user doesn't provide Project ID.
import os

from google import genai

PROJECT_ID = "[your-project-id]"  # @param {type: "string", placeholder: "[your-project-id]", isTemplate: true}
if not PROJECT_ID or PROJECT_ID == "[your-project-id]":
    PROJECT_ID = str(os.environ.get("GOOGLE_CLOUD_PROJECT"))

LOCATION = os.environ.get("GOOGLE_CLOUD_REGION", "us-central1")

client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)

### Import libraries


In [None]:
from google.genai.types import GenerateContentConfig, Part
import matplotlib.pyplot as plt
import moviepy
from moviepy import VideoFileClip

## Using the Gemini model

The Gemini model is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots.

### Load the Gemini model


In [None]:
MODEL_ID = "gemini-2.0-flash"

### Sample Video path from Google Cloud Storage

gs://github-repo/generative-ai/gemini/use-cases/video-thumbnail-generation/sample_video_google_trips.webm

![Google Trips](https://cloud.google.com/vertex-ai/generative-ai/docs/prompt-gallery/samples/video_video_q_and_a_89?hl=en)

#### [Click here to watch/download this video](https://cloud.google.com/vertex-ai/generative-ai/docs/prompt-gallery/samples/video_video_q_and_a_89?hl=en)

In [None]:
video_uri = "gs://github-repo/generative-ai/gemini/use-cases/video-thumbnail-generation/sample_video_google_trips.webm"

### Creating a local copy of the video for easy frame extraction

In [None]:
!gsutil cp {video_uri} sample_video.webm

### Creating a MoviePy Clip Object (Helps in extracting frame at a given timestamp)

In [None]:
clip = VideoFileClip("sample_video.webm")

### Define a function to Call Gemini API

In [None]:
def call_gemini(
    prompt: str,
    gcs_video_path: str,
) -> dict:
    """Call Gemini API with video and prompt."""
    # define fixed schema for Gemini outputs
    response_schema = {
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "timestamp": {
                    "type": "string",
                },
                "caption": {
                    "type": "string",
                },
            },
            "required": ["timestamp", "caption"],
        },
    }
    # model configurations
    generation_config = GenerateContentConfig(
        temperature=1,
        top_p=0.8,
        max_output_tokens=8192,
        response_schema=response_schema,
        response_mime_type="application/json",
    )
    # creating video input for API call
    video_input = Part.from_uri(
        file_uri=gcs_video_path,
        mime_type="video/webm",
    )
    # calling Gemini API
    response = client.models.generate_content(
        model=MODEL_ID,
        contents=[video_input, prompt],
        config=generation_config,
    )
    return response.parsed

### Defining a function to parse output and display results

In [None]:
def display_results(
    json_response: dict,
    clip: moviepy.video.io.VideoFileClip.VideoFileClip,
) -> None:
    """Parse json output, extract thumbnail frames and display."""

    # Image plotting settings
    fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(12, 9))

    # extract frame for each timestamp and plot the images
    counter = 0
    for item in json_response:
        timestamp = item["timestamp"]
        caption = item["caption"]
        frame = clip.get_frame(timestamp)
        row, col = counter // 2, counter % 2
        ax[row, col].imshow(frame)
        ax[row, col].set_title(caption, fontdict={"fontsize": 9})
        counter += 1

    fig.show()

# Case 1: Using a Simple Prompt

### Writing a basic prompt for thumbnail generation

In [None]:
basic_prompt = (
    """Generate 4 thumbnail images from the given video file with short captions."""
)

### calling Gemini API with our prompt and video

In [None]:
response_dict = call_gemini(
    prompt=basic_prompt,
    gcs_video_path=video_uri,
)

### showing JSON output from Gemini

In [None]:
print(response_dict)

### displaying thumbnail results with captions

In [None]:
display_results(response_dict, clip)

# Case 2: Using an Advanced Prompt

### Writing an advanced prompt for better thumbnail generation

In [None]:
advanced_prompt = """You are an expert in video content creation and content marketing.
You have the ability to find best thumbnails from a video and provide meaningful and short and catchy captions for them.
Your task is to find the best 4 thumbnails from a given video along with short, and meaningful captions that is good for marketing.
Consider the following rules while generating thumbnails:

- Thumbnail should have clear focus on the key objects and people, less focus on background
- Thumbnail image should be high quality and bright, avoid blurry images
- Thumbnail image and caption together tell a story
- Thumbnail caption is good for marketing
"""

### calling Gemini API with advanced prompt

In [None]:
response_dict_advanced = call_gemini(
    prompt=advanced_prompt,
    gcs_video_path=video_uri,
)

### showing JSON output string

In [None]:
print(response_dict_advanced)

### displaying final thumbnails with captions

In [None]:
display_results(response_dict_advanced, clip)

### Observations

#### Better prompting shows the following effects on results
- Results have improved in quality
- Captions are more meaningful
- Thumbnail images and captions tell a better story

## Conclusion

- We just saw that Gemini has multimodal capabilities, and can be used for video understanding.
- Results can be improved by better prompting with proper guidelines and expectations.