In [None]:
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Responsible AI with Gemini API in Vertex AI: Safety ratings and thresholds

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/responsible-ai/gemini_safety_ratings.ipynb">
      <img width="32px" src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo"><br> Run in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fresponsible-ai%2Fgemini_safety_ratings.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Run in Colab Enterprise
    </a>
  </td>    
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/responsible-ai/gemini_safety_ratings.ipynb">
      <img width="32px" src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/responsible-ai/gemini_safety_ratings.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo"><br>
      Open in Vertex AI Workbench
    </a>
  </td>                                                                                               
</table>

<div style="clear: both;"></div>

<b>Share to:</b>

<a href="https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/responsible-ai/gemini_safety_ratings.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg" alt="LinkedIn logo">
</a>

<a href="https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/responsible-ai/gemini_safety_ratings.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg" alt="Bluesky logo">
</a>

<a href="https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/responsible-ai/gemini_safety_ratings.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg" alt="X logo">
</a>

<a href="https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/responsible-ai/gemini_safety_ratings.ipynb" target="_blank">
  <img width="20px" src="https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png" alt="Reddit logo">
</a>

<a href="https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/responsible-ai/gemini_safety_ratings.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg" alt="Facebook logo">
</a>            

| | |
|-|-|
|Author(s) | [Hussain Chinoy](https://github.com/ghchinoy), [Holt Skinner](https://github.com/holtskinner) |

## Overview

Large language models (LLMs) can translate language, summarize text, generate creative writing, generate code, power chatbots and virtual assistants, and complement search engines and recommendation systems. The incredible versatility of LLMs is also what makes it difficult to predict exactly what kinds of unintended or unforeseen outputs they might produce.

Given these risks and complexities, the Gemini API in Vertex AI is designed with [Google's AI Principles](https://ai.google/responsibility/principles/) in mind. However, it is important for developers to understand and test their models to deploy safely and responsibly. To aid developers, Vertex AI Studio has built-in content filtering, safety ratings, and the ability to define safety filter thresholds that are right for their use cases and business.

For more information, see the [Google Cloud Generative AI documentation on Responsible AI](https://cloud.google.com/vertex-ai/docs/generative-ai/learn/responsible-ai) and [Configuring safety attributes](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/configure-safety-attributes).

### Objectives

In this tutorial, you learn how to inspect the safety ratings returned from the Gemini API in Vertex AI using the Python SDK and how to set a safety threshold to filter responses from the Gemini API in Vertex AI.

The steps performed include:

- Call the Gemini API in Vertex AI and inspect safety ratings of the responses
- Define a threshold for filtering safety ratings according to your needs

### Costs

This tutorial uses billable components of Google Cloud:

- Vertex AI

Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage.


## Getting Started


### Install Google Gen AI SDK for Python


In [None]:
%pip install --upgrade --quiet google-genai

### Restart current runtime

To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which will restart the current kernel.

In [None]:
# Restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

<div class="alert alert-block alert-warning">
<b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. ⚠️</b>
</div>


### Authenticate your notebook environment (Colab only)

If you are running this notebook on Google Colab, run the following cell to authenticate your environment. This step is not required if you are using [Vertex AI Workbench](https://cloud.google.com/vertex-ai-workbench).


In [None]:
import sys

# Additional authentication is required for Google Colab
if "google.colab" in sys.modules:
    # Authenticate user to Google Cloud
    from google.colab import auth

    auth.authenticate_user()

### Set Google Cloud project information and create client

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

In [2]:
import os

PROJECT_ID = "[your-project-id]"  # @param {type: "string", placeholder: "[your-project-id]", isTemplate: true}
if not PROJECT_ID or PROJECT_ID == "[your-project-id]":
    PROJECT_ID = str(os.environ.get("GOOGLE_CLOUD_PROJECT"))

LOCATION = os.environ.get("GOOGLE_CLOUD_REGION", "us-central1")

In [3]:
from google import genai

client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)

### Import libraries


In [36]:
from IPython.display import Markdown, display
from google.genai.types import (
    GenerateContentConfig,
    GenerateContentResponse,
    SafetySetting,
)

### Helper functions

In [75]:
def print_safety_ratings(response: GenerateContentResponse) -> None:
    """Displays safety ratings and related information in Markdown format."""
    display(Markdown("### Safety Ratings\n"))

    if response.prompt_feedback:
        display(Markdown(f"**Prompt Feedback:** {response.prompt_feedback}"))

    candidate = response.candidates[0]

    table_header = (
        "| Blocked | Category | Probability | Probability Score | Severity | Severity Score |\n"
        "|---|---|---|---|---|---|\n"
    )

    table_rows = "\n".join(
        f"| {'✅' if not rating.blocked else '❌'} | `{rating.category}` | `{rating.probability}` | "
        f"`{rating.probability_score}` | `{rating.severity}` | `{rating.severity_score}` |"
        for rating in candidate.safety_ratings
    )

    display(Markdown(table_header + table_rows))

    # Display finish reason and message if they exist
    if candidate.finish_reason:
        display(Markdown(f"**Finish Reason:** `{candidate.finish_reason}`"))
    if candidate.finish_message:
        display(Markdown(f"**Finish Message:** `{candidate.error_message}`"))

### Load the Gemini model

Learn more about all [Gemini models on Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models#gemini-models).

In [72]:
MODEL_ID = "gemini-2.0-flash-001"  # @param {type: "string"}

# Set parameters to reduce variability in responses
generation_config = GenerateContentConfig(
    temperature=0,
    top_p=0.1,
    top_k=1,
    max_output_tokens=1024,
    seed=1,
    candidate_count=1,
)

## Generate text and show safety ratings

Start by generating a pleasant-sounding text response using Gemini.

In [None]:
# Call Gemini API
nice_prompt = "Say three nice things."

response = client.models.generate_content(
    model=MODEL_ID, config=generation_config, contents=nice_prompt
)

display(Markdown(response.text))

#### Inspecting the safety ratings

Look at the `safety_ratings` of the response.

In [None]:
print_safety_ratings(response)

#### Understanding the safety ratings: category and probability

You can see the safety ratings, including each `category` type and its associated `probability` label, as well as a `probability_score`. Additionally, safety ratings have been expanded to `severity` and `severity_score`.

The `category` types include:

* Hate speech: `HARM_CATEGORY_HATE_SPEECH`
* Dangerous content: `HARM_CATEGORY_DANGEROUS_CONTENT`
* Harassment: `HARM_CATEGORY_HARASSMENT`
* Sexually explicit statements: `HARM_CATEGORY_SEXUALLY_EXPLICIT`

The `probability` labels are:

* `NEGLIGIBLE` - content has a negligible probability of being unsafe
* `LOW` - content has a low probability of being unsafe
* `MEDIUM` - content has a medium probability of being unsafe
* `HIGH` - content has a high probability of being unsafe

The `probability_score` has an associated confidence score between `0.0` and `1.0`.

Each of the four safety attributes is assigned a safety rating (severity level) and a severity score ranging from `0.0` to `1.0`. The ratings and scores in the following table reflect the predicted severity of the content belonging to a given category.

#### Comparing Probability and Severity

There are two types of safety scores:

* Safety scores based on **probability** of being unsafe
* Safety scores based on **severity** of harmful content

The probability safety attribute reflects the likelihood that an input or model response is associated with the respective safety attribute. The severity safety attribute reflects the magnitude of how harmful an input or model response might be.

Content can have a low probability score and a high severity score, or a high probability score and a low severity score. For example, consider the following two sentences:

- The robot punched me.
- The robot slashed me up.

The first sentence might cause a higher probability of being unsafe and the second sentence might have a higher severity in terms of violence. Because of this, it's important to carefully test and consider the appropriate level of blocking required to support your key use cases and also minimize harm to end users.

#### Blocked responses

If the response is blocked, you will see that the final candidate includes `blocked: True`, and also observe which of the safety ratings triggered the blocking of the response (e.g. `finish_reason: SAFETY`).

Try a prompt that might trigger one of these categories:

In [None]:
impolite_prompt = "Write a list of 5 disrespectful things that I might say to the universe after stubbing my toe in the dark. Respond using profanity."

responses = client.models.generate_content_stream(
    model=MODEL_ID, config=generation_config, contents=impolite_prompt
)

for response in responses:
    if response.text:
        print(response.text, end="")
    else:
        print_safety_ratings(response)

### Defining thresholds for safety ratings

You may want to adjust the default safety filter thresholds depending on your business policies or use case. The Gemini API in Vertex AI provides you a way to pass in a threshold for each category.

The list below shows the possible threshold labels:

* `BLOCK_ONLY_HIGH` - block when high probability of unsafe content is detected
* `BLOCK_MEDIUM_AND_ABOVE` - block when medium or high probability of content is detected
* `BLOCK_LOW_AND_ABOVE` - block when low, medium, or high probability of unsafe content is detected
* `BLOCK_NONE` - always show, regardless of probability of unsafe content

#### Set safety thresholds
Below, the safety thresholds have been set to the most sensitive threshold: `BLOCK_LOW_AND_ABOVE`

In [79]:
generation_config.safety_settings = [
    SafetySetting(
        category="HARM_CATEGORY_DANGEROUS_CONTENT", threshold="BLOCK_LOW_AND_ABOVE"
    ),
    SafetySetting(
        category="HARM_CATEGORY_HATE_SPEECH", threshold="BLOCK_LOW_AND_ABOVE"
    ),
    SafetySetting(category="HARM_CATEGORY_HARASSMENT", threshold="BLOCK_LOW_AND_ABOVE"),
    SafetySetting(
        category="HARM_CATEGORY_SEXUALLY_EXPLICIT", threshold="BLOCK_LOW_AND_ABOVE"
    ),
]

#### Test thresholds

Here you will reuse the impolite prompt from earlier together with the most sensitive safety threshold. It should block the response even with the `LOW` probability label.

In [None]:
impolite_prompt = "Write a list of 5 disrespectful things that I might say to the universe after stubbing my toe in the dark:"

responses = client.models.generate_content_stream(
    model=MODEL_ID, config=generation_config, contents=impolite_prompt
)

for response in responses:
    if response.text:
        print(response.text, end="")
    else:
        print_safety_ratings(response)

Let's look at how we understand block responses in the next section.

## Understanding Blocked Responses

The documentation for [`FinishReason`](https://cloud.google.com/vertex-ai/docs/reference/rest/v1/GenerateContentResponse#finishreason) contains some more detailed explanations.

For example, the previous response was blocked with the `finish_reason: SAFETY`, indicating that

> The token generation was stopped as the response was flagged for safety reasons. NOTE: The `response.text` will be empty if content filters blocked the output.

As of this writing, the table from the `FinishReason` have been reproduced below, but please look at the docs for the definitive explanations


Finish Reason | Explanation
--- | ---
`FINISH_REASON_UNSPECIFIED`	| The finish reason is unspecified.
`STOP` | Natural stop point of the model or provided stop sequence.
`MAX_TOKENS` | The maximum number of tokens as specified in the request was reached.
`SAFETY` | The token generation was stopped as the response was flagged for safety reasons. |
`RECITATION` | The token generation was stopped as the response was flagged for unauthorized citations.
`OTHER` | All other reasons that stopped the token generation
`BLOCKLIST` | The token generation was stopped as the response was flagged for the terms which are included from the terminology blocklist.
`PROHIBITED_CONTENT` | The token generation was stopped as the response was flagged for the prohibited contents.
`SPII` | The token generation was stopped as the response was flagged for Sensitive Personally Identifiable Information (SPII) contents.
`MALFORMED_FUNCTION_CALL` | The function call generated by the model is invalid.