In [None]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Vertex AI Search - Querying Blended Data Apps and Summarization with Gemini

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/search/search_data_blending_with_gemini_summarization.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo"><br> Run in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fsearch%2Fsearch_data_blending_with_gemini_summarization.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Run in Colab Enterprise
    </a>
  </td>    
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/search/search_data_blending_with_gemini_summarization.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/search/search_data_blending_with_gemini_summarization.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
</table>

<div style="clear: both;"></div>

<b>Share to:</b>

<a href="https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/search/search_data_blending_with_gemini_summarization.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg" alt="LinkedIn logo">
</a>

<a href="https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/search/search_data_blending_with_gemini_summarization.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg" alt="Bluesky logo">
</a>

<a href="https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/search/search_data_blending_with_gemini_summarization.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg" alt="X logo">
</a>

<a href="https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/search/search_data_blending_with_gemini_summarization.ipynb" target="_blank">
  <img width="20px" src="https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png" alt="Reddit logo">
</a>

<a href="https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/search/search_data_blending_with_gemini_summarization.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg" alt="Facebook logo">
</a>            


| | |
|-|-|
|Author(s) | [Shantam Gupta](https://github.com/ShantamGupta)

## Overview

### Search
Vertex AI Search brings together the power of deep information retrieval, state-of-the-art natural language processing, and the latest in large language processing to understand user intent and return the most relevant results for the user.

With Vertex AI Search, you can create apps for searching and for making recommendations. Vertex AI Search also has special capabilities for some industries, such as media, healthcare, and retail.



### Gemini

Gemini is a family of generative AI models developed by Google DeepMind that is designed for multimodal use cases. The Gemini API gives you access to the Gemini models.

### Gemini API in Vertex AI

The Gemini API in Vertex AI provides a unified interface for interacting with Gemini models.

You can interact with the Gemini API using the following methods:

- Use [Vertex AI Studio](https://cloud.google.com/generative-ai-studio) for quick testing and command generation
- Use cURL commands
- Use the Vertex AI SDK

For more information, see the [Generative AI on Vertex AI](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/overview) documentation.

This tutorial explains how to call a search app with mixed datastore, get search snippets and summarize the response using Gemini. 


### Create a Search App with Mixed Datastores

1. Follow the steps listed here to create a Search App
   - https://cloud.google.com/generative-ai-app-builder/docs/create-engine-es 
2. Create the relevant data stores (GCS, BQ, Website)
   - https://cloud.google.com/generative-ai-app-builder/docs/create-data-store-es
3. Link the data stores to the Search App
   - https://cloud.google.com/generative-ai-app-builder/docs/create-data-store-es#multi-data-stores

The example query and results are based on the data used in this tutorial:
   - https://cloud.google.com/generative-ai-app-builder/docs/try-enterprise-search

### Install the Relevant packages

In [None]:
%pip install --upgrade --user -q google-cloud-aiplatform google-cloud-discoveryengine

### Restart current runtime

To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which will restart the current kernel.

In [None]:
# Restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

### Authenticate your notebook environment (Colab only)

If you are running this notebook on Google Colab, run the following cell to authenticate your environment. This step is not required if you are using [Vertex AI Workbench](https://cloud.google.com/vertex-ai-workbench).


In [None]:
import sys

# Additional authentication is required for Google Colab
if "google.colab" in sys.modules:
    # Authenticate user to Google Cloud
    from google.colab import auth

    auth.authenticate_user()

### Define Google Cloud project information

In [10]:
PROJECT_ID = "PROJECT_ID"  # @param {type:"string"}
SEARCH_APP_LOCATION = "global"  # @param {type:"string"}
SEARCH_ENGINE_ID = "VERTEX_SEARCH_ENGINE_ID"  # @param {type:"string"}
LOCATION_GEMINI_MODEL = "us-central1"  # @param {type:"string"}

### Initialize the Vertex AI SDK

In [2]:
import vertexai

vertexai.init(project=PROJECT_ID, location=LOCATION_GEMINI_MODEL)

### Import the Relevant packages

In [45]:
import re

from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine_v1alpha as discoveryengine
from vertexai.generative_models import (
    GenerationConfig,
    GenerativeModel,
    HarmBlockThreshold,
    HarmCategory,
)

### Send a Request to Vertex AI Search App with Data Blending (Mixed Datastore) 

- https://cloud.google.com/generative-ai-app-builder/docs/create-data-store-es#multi-data-stores

In [82]:
search_query = "What was Google's revenue in Q4 2020?"

In [83]:
# Create a client using a regional endpoint
client = discoveryengine.SearchServiceClient(
    client_options=(
        ClientOptions(
            api_endpoint=f"{SEARCH_APP_LOCATION}-discoveryengine.googleapis.com"
        )
        if SEARCH_APP_LOCATION != "global"
        else None
    )
)

# The full resource name of the search app serving config
serving_config = f"projects/{PROJECT_ID}/locations/{SEARCH_APP_LOCATION}/collections/default_collection/engines/{SEARCH_ENGINE_ID}/servingConfigs/default_config"

response = client.search(
    discoveryengine.SearchRequest(
        serving_config=serving_config,
        query=search_query,
        page_size=10,
    )
)

### Extract & clean up snippets from search results


In [None]:
retrieved_data: list[str] = []

for result in response.results:
    data = result.document.derived_struct_data
    if not data:
        continue

    snippets: list[str] = [
        re.sub("<[^>]*>", "", snippet_item.get("snippet", ""))
        for snippet_item in data.get("snippets", [])
        if snippet_item.get("snippet")
    ]

    extractive_answers: list[str] = [
        re.sub("<[^>]*>", "", snippet_item.get("content", ""))
        for snippet_item in data.get("extractive_answers", [])
        if snippet_item.get("content")
    ]

    if snippets:
        title = data.get("title", "Unknown Title")
        retrieved_data.append(
            f"--- Snippets from Document {title} ---\n{''.join(snippets)}\n"
        )
    elif extractive_answers:
        title = data.get("link", "Unknown")
        retrieved_data.append(
            f"--- Snippets from Document {title} ---\n{''.join(extractive_answers)}\n"
        )

### Feed the Search result snippets to Gemini model and formulate a summary/response based on your original prompt

#### Model parameters

Every prompt you send to the model includes parameter values that control how the model generates a response.

The model can generate different results for different parameter values.

You can experiment with different model parameters to see how the results change.


In [85]:
generation_config = GenerationConfig(
    temperature=0,
    top_p=1.0,
    max_output_tokens=2048,
)

In [86]:
# Prompt for Gemini model
PROMPT_GEMINI = f"""Provide an answer to the question based on the information in the Document snippets provided with citations.
Question: {search_query}
{''.join(retrieved_data)}
"""

model = GenerativeModel("gemini-2.0-flash")  # specify the Gemini model version


def generate(PROMPT_GEMINI: str):
    """
    Given the prompt
    output the summarized response to user's original query
    """
    responses = model.generate_content(
        PROMPT_GEMINI,
        generation_config=generation_config,
        safety_settings={
            HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
            HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
            HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
            HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
        },
        stream=True,
    )

    for response in responses:
        print(response.text, end="")


print(f"PROMPT:\n{PROMPT_GEMINI}")

print("Gemini Response:\n")
generate(PROMPT_GEMINI)

PROMPT:
Provide an answer to the question based on the information in the Document snippets provided with citations.
Question: What was Google's revenue in Q4 2020?
--- Snippets from Document GOOG Exhibit 99.1 Q4'20 ---
Google Cloud revenues were $13.1 billion for 2020, with significant ongoing ... Quarter Q4 2019 Q4 2020 Fiscal Year 2019 2018 2020 Revenues: $ 43,198 $ 2,614&nbsp;...
--- Snippets from Document GOOG 10-K Q4 2020 ---
... Google Maps, Google Play, Search, and YouTube. Google Services generates revenues primarily from advertising; sales of apps, in-app purchases, digital&nbsp;...
--- Snippets from Document GOOG 10-K Q4 2020 ---
Table of Contents Alphabet Inc. Google other revenues increased $4,697 million from 2019 to 2020. The growth was primarily driven by Google Play and YouTube&nbsp;...
--- Snippets from Document GOOG 10-K Q4 2020 ---
The TAC rate on Google properties revenues and the TAC rate on Google Network revenues were both substantially consistent from 2019 to 2