In [None]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Patents Document Understanding with Gemini

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/document-processing/patents_understanding.ipynb">
      <img width="32px" src="https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg" alt="Google Colaboratory logo"><br> Open in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fuse-cases%2Fdocument-processing%2Fpatents_understanding.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Open in Colab Enterprise
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/use-cases/document-processing/patents_understanding.ipynb">
      <img src="https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/bigquery/import?url=https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/document-processing/patents_understanding.ipynb">
      <img src="https://www.gstatic.com/images/branding/gcpiconscolors/bigquery/v1/32px.svg" alt="BigQuery Studio logo"><br> Open in BigQuery Studio
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/document-processing/patents_understanding.ipynb">
      <img width="32px" src="https://www.svgrepo.com/download/217753/github.svg" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
</table>

<div style="clear: both;"></div>

<b>Share to:</b>

<a href="https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/document-processing/patents_understanding.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg" alt="LinkedIn logo">
</a>

<a href="https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/document-processing/patents_understanding.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg" alt="Bluesky logo">
</a>

<a href="https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/document-processing/patents_understanding.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg" alt="X logo">
</a>

<a href="https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/document-processing/patents_understanding.ipynb" target="_blank">
  <img width="20px" src="https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png" alt="Reddit logo">
</a>

<a href="https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/document-processing/patents_understanding.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg" alt="Facebook logo">
</a>

| | |
|-|-|
| Author(s) |  [Holt Skinner](https://github.com/holtskinner) |

## Overview

Back in 2019, I wrote a [Google Cloud Blog post](https://cloud.google.com/blog/products/ai-machine-learning/building-a-document-understanding-pipeline-with-google-cloud) in collaboration with [Michael Munn](https://github.com/munnm) and [Michael Sherman](https://github.com/michaelwsherman) which illustrates how to build a Document Understanding Pipeline using [AutoML](https://cloud.google.com/automl).

This example showed how to train custom machine learning models for the following tasks:

- [Image Classification](https://cloud.google.com/vision/automl/docs/beginners-guide)
- [Entity Extraction](https://cloud.google.com/natural-language/automl/entity-analysis/docs/)
- [Text Classification](https://cloud.google.com/natural-language/automl/docs/predict)
- [Object Detection](https://cloud.google.com/vision/automl/object-detection/docs/)

In today's world of Generative AI models like [Gemini](https://blog.google/technology/ai/google-gemini-ai/), it's possible to create the same document processing pipeline without training custom models. This significantly simplifies the process and reduces the time & resources required to automate these workflows.

In this notebook, we'll create a document understanding pipeline on a public dataset of [patents PDFs](https://console.cloud.google.com/marketplace/details/global-patents/labeled-patents) stored in BigQuery and use [Batch Prediction for Gemini 2.0 in Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/batch-prediction-gemini) to:

- Classify the patent granter (US or EU).
- Classify the invention type (Medical Tech, Computer Vision, Cryptography, Other).
- Extract key entities like publication date, application number, etc.
- Detect bounding boxes for figures in the document.

## Get started

### Install Google Gen AI SDK for Python


In [None]:
%pip install --upgrade --quiet google-genai google-cloud-bigquery pandas-gbq

### Restart runtime

To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which restarts the current kernel.

The restart might take a minute or longer. After it's restarted, continue to the next step.

In [None]:
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

<div class="alert alert-block alert-warning">
<b>⚠️ The kernel is going to restart. In Colab or Colab Enterprise, you might see an error message that says "Your session crashed for an unknown reason." This is expected. Wait until it's finished before continuing to the next step. ⚠️</b>
</div>


### Authenticate your notebook environment (Colab only)

If you're running this notebook on Google Colab, run the cell below to authenticate your environment.

In [None]:
import sys

if "google.colab" in sys.modules:
    from google.colab import auth

    auth.authenticate_user()

### Import libraries


In [None]:
from google import genai
from google.cloud import bigquery
import pandas_gbq

### Set Google Cloud project information and create client

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

In [None]:
import os

PROJECT_ID = "[your-project-id]"  # @param {type: "string", placeholder: "[your-project-id]", isTemplate: true}
if not PROJECT_ID or PROJECT_ID == "[your-project-id]":
    PROJECT_ID = str(os.environ.get("GOOGLE_CLOUD_PROJECT"))

LOCATION = os.environ.get("GOOGLE_CLOUD_REGION", "us-central1")

client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)
bq_client = bigquery.Client()

### Load the Gemini 2.0 Flash model

To learn more about all [Gemini models on Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models#gemini-models).

In [None]:
MODEL_ID = "gemini-2.0-flash-001"  # @param {type: "string"}

## The Prediction Pipeline

Now, let's build the pipeline for processing the patent documents. We will:

1. Fetch the PDF URIs from BigQuery into a Pandas DataFrame.
2. Create a detailed prompt for Gemini to extract the information.
3. Process each PDF with the prompt and save the results in a Pandas DataFrame
4. Save the structured output to a new BigQuery table

### Get PDF URIs from BigQuery

We'll query BigQuery to get a list of Google Cloud Storage URIs for the patent PDF files.

NOTE: This query limits to only 5 documents to save on processing time for this tutorial.

In [None]:
query = """
SELECT
  COALESCE(t1.gcs_path, t2.gcs_path, t3.gcs_path) AS gcs_path
FROM
  `bigquery-public-data.labeled_patents.extracted_data` AS t1
FULL OUTER JOIN
  `bigquery-public-data.labeled_patents.figures` AS t2
ON
  t1.gcs_path = t2.gcs_path
FULL OUTER JOIN
  `bigquery-public-data.labeled_patents.invention_types` AS t3
ON
  COALESCE(t1.gcs_path, t2.gcs_path) = t3.gcs_path
LIMIT 5
"""

df = bq_client.query(query).result().to_dataframe()
df

### Define the Gemini prompt

Here's the prompt we'll use with Gemini to extract the information we need. It's a detailed instruction that specifies the output format as JSON and also includes a JSON Schema.

In [None]:
PATENTS_PROMPT = """
Given a patent document, please extract the following information and output it as a JSON object with the specified keys.

**JSON Output Format:**

```json
{
  "granter_classification": "United States or European Union",
  "invention_type": "med_tech/computer_vision/crypto/other",
  "entities": {
    "issuer": "string",
    "language": "string",
    "publication_date": "YYYY-MM-DD",
    "class_international": ["string", "string", ...],
    "class_us": ["string", "string", ...],
    "application_number": "string",
    "filing_date": "YYYY-MM-DD",
    "priority_date_eu": "YYYY-MM-DD",
    "representative_eu": ["string","string",...],
    "applicant": ["string", "string", ...],
    "inventor": ["string", "string", ...],
    "title": "string",
    "patent_number": "string"
   },
  "image_detections": [
        {
          "label": "figure_1",
          "bounding_box": [x1, y1, x2, y2]
        },
        {
          "label": "figure_2",
          "bounding_box": [x1, y1, x2, y2]
        }
     ]
}
```

**Detailed Instructions:**

1.  **Granter Classification:** Determine if the patent was granted by the "United States" or "European Union".  Populate the `granter_classification` field accordingly.

2.  **Invention Type Classification:** Classify the type of invention described in the patent as one of the following and populate the `invention_type` field with the corresponding code:
    *   Medical Tech:  `med_tech`
    *   Computer Vision: `computer_vision`
    *   Cryptography: `crypto`
    *   Other: `other`

3.  **Entity Extraction:**
    *   Extract the following entities and populate the corresponding fields within the `entities` object.
    *   If multiple entities exist for a field (e.g., multiple inventors, EU representatives, multiple US and international classes), represent them as a JSON array of strings.
    *   Use the `YYYY-MM-DD` format for dates.
    *   If a field is not present in the document, leave the value as null or an empty array as appropriate

4.  **Image Object Detection:**
    *   Detect the bounding boxes of all image figures in the patent document.
    *   Represent each bounding box as a JSON array with the format `[x1, y1, x2, y2]` where:
        *   `x1` and `y1` are the coordinates of the top-left corner of the bounding box.
        *   `x2` and `y2` are the coordinates of the bottom-right corner of the bounding box.
    *   Include a label (e.g., "figure_1", "figure_2", etc.) for each detected bounding box.
    *   Populate the `image_detections` array with each detected bounding box. If no figures are detected, the image_detections array should be empty.

**Input:**

Provide the complete patent document text as input to this prompt.

**Output:**

The output should be a single JSON object following the specified format, containing all the extracted information.

**Example (Illustrative, Not Complete):**

```json
{
    "granter_classification": "United States",
    "invention_type": "med_tech",
    "entities": {
        "issuer": "United States Patent and Trademark Office",
        "language": "en",
        "publication_date": "2023-10-26",
        "class_international": ["A61K 31/4709", "A61K 9/00"],
         "class_us": ["514/250"],
        "application_number": "17/744,843",
        "filing_date": "2022-05-24",
        "priority_date_eu": null,
        "representative_eu": [],
        "applicant": ["ABC Pharmacy Inc"],
        "inventor": ["John Smith", "Jane Doe"],
        "title": "Novel Formulation",
        "patent_number": "US12345678"
    },
    "image_detections": [
      {
        "label": "figure_1",
        "bounding_box": [100, 50, 300, 250]
      },
       {
         "label": "figure_2",
         "bounding_box": [400, 100, 600, 300]
       }
    ]
}
```

"""

### Create BigQuery Dataset and Table for Batch Prediction

NOTE: The location of the BigQuery dataset must be the same as the location for Vertex AI. (e.g. `us-central1`, not `us`)

In [None]:
DATASET_NAME = "patents_data_batch"
TABLE_NAME = "patents_table"

In [None]:
!bq --location={LOCATION} mk --dataset "{PROJECT_ID}:{DATASET_NAME}"

Load `GenerateContentRequest` JSON into BigQuery table.

This request takes a Google Cloud Storage path to a PDF, processes it with the prompt above, and returns a structured dictionary using [Controlled Generation](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/control-generated-output).

In [None]:
import json


def create_request_json(row) -> str:
    return json.dumps(
        {
            "contents": [
                {
                    "role": "user",
                    "parts": [
                        {"text": PATENTS_PROMPT},
                        {
                            "fileData": {
                                "fileUri": row["gcs_path"],
                                "mimeType": "application/pdf",
                            }
                        },
                    ],
                }
            ],
            "systemInstruction": {
                "parts": [{"text": "You are an expert at analyzing patent documents."}]
            },
            "generationConfig": {
                "temperature": 0,
                "responseMimeType": "application/json",
                "responseSchema": {
                    "type": "OBJECT",
                    "properties": {
                        "granter_classification": {
                            "type": "STRING",
                            "enum": ["United States", "European Union"],
                        },
                        "invention_type": {
                            "type": "STRING",
                            "enum": ["med_tech", "computer_vision", "crypto", "other"],
                        },
                        "entities": {
                            "type": "OBJECT",
                            "properties": {
                                "issuer": {"type": "STRING"},
                                "language": {"type": "STRING"},
                                "publication_date": {
                                    "type": "STRING",
                                    "format": "date",
                                },
                                "class_international": {
                                    "type": "ARRAY",
                                    "items": {"type": "STRING"},
                                },
                                "class_us": {
                                    "type": "ARRAY",
                                    "items": {"type": "STRING"},
                                },
                                "application_number": {"type": "STRING"},
                                "filing_date": {"type": "STRING", "format": "date"},
                                "priority_date_eu": {
                                    "type": "STRING",
                                    "format": "date",
                                    "nullable": True,
                                },
                                "representative_eu": {
                                    "type": "ARRAY",
                                    "items": {"type": "STRING"},
                                },
                                "applicant": {
                                    "type": "ARRAY",
                                    "items": {"type": "STRING"},
                                },
                                "inventor": {
                                    "type": "ARRAY",
                                    "items": {"type": "STRING"},
                                },
                                "title": {"type": "STRING"},
                                "patent_number": {"type": "STRING"},
                            },
                            "required": [
                                "issuer",
                                "language",
                                "publication_date",
                                "class_international",
                                "class_us",
                                "application_number",
                                "filing_date",
                                "applicant",
                                "inventor",
                                "title",
                                "patent_number",
                            ],
                        },
                        "image_detections": {
                            "type": "ARRAY",
                            "items": {
                                "type": "OBJECT",
                                "properties": {
                                    "label": {"type": "STRING"},
                                    "bounding_box": {
                                        "type": "ARRAY",
                                        "items": {"type": "NUMBER"},
                                        "minItems": 4,
                                        "maxItems": 4,
                                    },
                                },
                                "required": ["label", "bounding_box"],
                            },
                        },
                    },
                    "required": [
                        "granter_classification",
                        "invention_type",
                        "entities",
                        "image_detections",
                    ],
                },
            },
        }
    )

In [None]:
df["request"] = df.apply(create_request_json, axis=1)

In [None]:
pandas_gbq.to_gbq(df, f"{DATASET_NAME}.{TABLE_NAME}", project_id=PROJECT_ID)

### Batch Process Patent Documents with Gemini

Batch Processing in Vertex AI will take in the BigQuery table with the requests and return the output in a new BigQuery table in the same dataset.

In [None]:
batch_job = client.batches.create(
    model=MODEL_ID,
    src=f"bq://{PROJECT_ID}.{DATASET_NAME}.{TABLE_NAME}",
)
batch_job

Run the following block to get the latest status of the batch job.

In [None]:
batch_job = client.batches.get(name=batch_job.name)
batch_job

Run the following block to continuously poll the status of the batch job until it completes.

In [None]:
import time

# Refresh the job until complete
while batch_job.state == "JOB_STATE_RUNNING":
    batch_job = client.batches.get(name=batch_job.name)
    # print(batch_job)
    time.sleep(5)

# Check if the job succeeds
if batch_job.state == "JOB_STATE_SUCCEEDED":
    print("Job succeeded!")
else:
    print(f"Job failed: {batch_job.error}")

Load the results from the destination BigQuery table.

In [None]:
results_df = pandas_gbq.read_gbq(
    batch_job.dest.bigquery_uri.replace("bq://", ""), project_id=PROJECT_ID
)

Extract the fields from the controlled generation response.

In [None]:
import pandas as pd


def flatten_response(response) -> dict | None:
    try:
        parsed_json = json.loads(
            response["candidates"][0]["content"]["parts"][0]["text"]
        )

        entities = parsed_json.get("entities", {})
        additional_fields = {
            "granter_classification": parsed_json.get("granter_classification"),
            "image_detections": parsed_json.get("image_detections"),
            "invention_type": parsed_json.get("invention_type"),
        }

        return {**entities, **additional_fields}
    except (KeyError, IndexError, json.JSONDecodeError) as e:
        print(f"Error processing response: {e}")
        return None


results_df = results_df.join(
    pd.json_normalize(results_df["response"].apply(flatten_response))
)

In [None]:
results_df

### Compare Results with Ground Truth

We can pull the ground truth data from BigQuery into a DataFrame to compare the results. Note that the data is in a slightly different format, so a direct comparison is not straightforward.

In [None]:
query = """
SELECT
  COALESCE(t1.gcs_path, t2.gcs_path, t3.gcs_path) AS gcs_path,
  t1.issuer,
  t1.language,
  t1.publication_date,
  t1.class_international,
  t1.class_us,
  t1.application_number,
  t1.filing_date,
  t1.priority_date_eu,
  t1.representative_line_1_eu,
  t1.applicant_line_1,
  t1.inventor_line_1,
  t1.title_line_1,
  t1.number,
  t2.x_relative_min,
  t2.y_relative_min,
  t2.x_relative_max,
  t2.y_relative_max,
  t3.invention_type
FROM
  `bigquery-public-data.labeled_patents.extracted_data` AS t1
FULL OUTER JOIN
  `bigquery-public-data.labeled_patents.figures` AS t2
ON
  t1.gcs_path = t2.gcs_path
FULL OUTER JOIN
  `bigquery-public-data.labeled_patents.invention_types` AS t3
ON
  COALESCE(t1.gcs_path, t2.gcs_path) = t3.gcs_path;
"""

query_job = bq_client.query(query)
ground_truth_df = bq_client.query(query).result().to_dataframe()

## Conclusion

This notebook demonstrates the capabilities of using Gemini to extract structured information from complex documents, simplifying document understanding pipelines without needing to train custom models.

**Key Improvements:**

- **Comprehensive Extraction**: Unlike AutoML, which often limited field extraction to the first line (e.g., applicant, inventor, class US), Gemini accurately extracts full text and all listed values for these fields.
- **Simplified Workflow**: AutoML required four separate models and four requests per document to complete the tasks. Gemini consolidates this into a single request.
- **No Custom Model Training**: Gemini is a pre-trained model, eliminating the need for use case-specific training, saving both time and resources.