In [None]:
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Get started with Vertex AI Model Garden SDK

<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/tree/main/open-models/get_started_with_model_garden_sdk.ipynb">
      <img width="32px" src="https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg" alt="Google Colaboratory logo"><br> Open in Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fopen-models%2Fget_started_with_model_garden_sdk.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo"><br> Open in Colab Enterprise
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/open-models/get_started_with_model_garden_sdk.ipynb">
      <img src="https://www.gstatic.com/images/branding/gcpiconscolors/vertexai/v1/32px.svg" alt="Vertex AI logo"><br> Open in Vertex AI Workbench
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/tree/main/open-models/get_started_with_model_garden_sdk.ipynb">
      <img width="32px" src="https://www.svgrepo.com/download/217753/github.svg" alt="GitHub logo"><br> View on GitHub
    </a>
  </td>
</table>

<div style="clear: both;"></div>

<b>Share to:</b>

<a href="https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/get_started_with_model_garden_sdk.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg" alt="LinkedIn logo">
</a>

<a href="https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/get_started_with_model_garden_sdk.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg" alt="Bluesky logo">
</a>

<a href="https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/get_started_with_model_garden_sdk.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg" alt="X logo">
</a>

<a href="https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/get_started_with_model_garden_sdk.ipynb" target="_blank">
  <img width="20px" src="https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png" alt="Reddit logo">
</a>

<a href="https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/get_started_with_model_garden_sdk.ipynb" target="_blank">
  <img width="20px" src="https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg" alt="Facebook logo">
</a>

| Author |
| --- |
| [Ivan Nardini](https://github.com/inardini) |

## Overview

Have you ever tried to deploy open models on Vertex AI? Deploying open models on Vertex AI can sometimes involve navigating numerous API calls and parameters, which can lead to a slower, more error-prone experience.

But what if deploying state-of-the-art open models would be as simple as choosing your model and hitting the 'deploy' button?

The new Vertex AI Model Garden CLI and SDK, powered by the new Deploy API, are designed to do just that. These tools are truly model-centric, providing you with a more consistent, and fluid experience. The SDK simplifies prototyping, removing the need for users to specify container details. The CLI provides a precise and interactive command-line interface for managing models, offering a programmatic alternative to the UI's one-click deployment and enabling scriptable automation.

This tutorial shows how to use the Vertex AI Model Garden SDK for deploying your open models on Vertex AI.

You will learn how to:

- Find the models that you can deploy
- Deploy your 1st Model Garden model
- Handle with some advanced usage including setting deployment parameters and error handling


## Get started

orgpolicy.googleapis.com

### Install Vertex AI SDK and other required packages


In [None]:
%pip install --upgrade --force-reinstall --quiet 'google-cloud-aiplatform>=1.84.0' 'openai' 'google-auth' 'requests'

### Authenticate your notebook environment (Colab only)

If you're running this notebook on Google Colab, run the cell below to authenticate your environment.

In [None]:
import sys

if "google.colab" in sys.modules:
    from google.colab import auth

    auth.authenticate_user()

### Set Google Cloud project information

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).

In [None]:
# Use the environment variable if the user doesn't provide Project ID.
import os

import vertexai

PROJECT_ID = "[your-project-id]"  # @param {type: "string", placeholder: "[your-project-id]", isTemplate: true}

if not PROJECT_ID or PROJECT_ID == "[your-project-id]":
    PROJECT_ID = str(os.environ.get("GOOGLE_CLOUD_PROJECT"))

LOCATION = os.environ.get("GOOGLE_CLOUD_REGION", "us-central1")

vertexai.init(project=PROJECT_ID, location=LOCATION)

## Import libraries

In [None]:
import base64
import io
import os

from PIL import Image
import google.auth
import matplotlib.pyplot as plt
import openai
from vertexai.preview import model_garden

## Define helpers

In [None]:
def print_models(data_list: list[str], items_per_line: int = 2) -> None:
    """Prints the list with a specified number of items per line with index and emojis,
    and includes the total count."""
    print("ðŸŒŸ--- Models available ---ðŸŒŸ")
    print("\n")
    print(f"ðŸ”¢ Total models: {len(data_list)} ðŸ”¢\n")  # Print the count here

    for i, item in enumerate(data_list):
        print(f"âœ¨ {item} ", end="")
        if (i + 1) % items_per_line == 0:
            print()
        else:
            print(" --- ", end="")

    if len(data_list) % items_per_line != 0:
        print()


def plot_image_from_bytes(image_bytes_string: str) -> None:
    """Plots an image from a base64-encoded bytes string."""
    # Decode the base64 string
    image_bytes = base64.b64decode(image_bytes_string)

    # Create a BytesIO object
    image_stream = io.BytesIO(image_bytes)

    # Open the image using Pillow library
    image = Image.open(image_stream)

    # Plot the image using matplotlib
    plt.imshow(image)
    plt.axis("off")
    plt.show()

## Find the models that you can deploy

In Vertex AI Model Garden, you can discover and deploy a wide range of open-source models.

Many of these models are directly supported in Vertex AI Model Garden with some pre-configured for optimized deployment on Vertex AI. When the open model is not available in Vertex AI Model Garden or you want to deploy your model from HF hub, you can leverage the Hugging Face gallery which gives you access to more that 1M models.

With Vertex AI Model Garden SDK, you can models that you can deploy and record the model ID to deploy. You can optionally list the supported Hugging Face models in Model Garden and even filter them by model names.

Let's check with Gemma models are available in Vertex AI Model Garden.

In [None]:
model_garden_models = model_garden.list_deployable_models(
    model_filter="gemma", list_hf_models=False
)

In [None]:
print_models(model_garden_models, items_per_line=3)

To include Gemma models that are available via Hugging Face Gallery, you can enable `list_hf_models` flag.

In [None]:
deployable_models = model_garden.list_deployable_models(
    model_filter="gemma", list_hf_models=True
)

In [None]:
print_models(deployable_models)

## Deploy your 1st Model Garden model

To deploy an open model, Vertex AI Model Garden SDK provides the `OpenModel` which simplifies the process of deploying these models to Vertex AI for inference.


In [None]:
model_id = "google/gemma3@gemma-3-1b-it"

gemma_model = model_garden.OpenModel(model_id)

### Check the deployment configuration

After you initiate the model, use `list_deploy_options()` method to discover the verified deployment configurations supported by a specific model.

This is important to verify if you have enough resources to deploy the model.

In [None]:
gemma_model.list_deploy_options()

### Deploy the model

Now that you know how the model will be deployed, let's use the `deploy()` method to serve the selected open model to a Vertex AI Endpoint. Depending on the model, the deployment would require some minutes.

> **Note**: If the model has an End User License Agreement (EULA), you can accept it using `accept_eula` flag.


In [None]:
gemma_endpoint = gemma_model.deploy(accept_eula=True)

### Generate predictions

After the model gets deployed, you can use the Vertex AI SDK.

In [None]:
prediction = gemma_endpoint.predict(
    instances=[{"prompt": "Tell me a joke", "temperature": 0.7, "max_tokens": 50}]
)
print(prediction.predictions[0])

Additionally, you can get your prediction using ChatCompletion API from the OpenAI SDK.

In [None]:
creds, project = google.auth.default()
auth_req = google.auth.transport.requests.Request()
creds.refresh(auth_req)

gemma_endpoint_url = f"https://{LOCATION}-aiplatform.googleapis.com/v1beta1/{gemma_endpoint.resource_name}"

client = openai.OpenAI(base_url=gemma_endpoint_url, api_key=creds.token)

prediction = client.chat.completions.create(
    model="",
    messages=[{"role": "user", "content": "Tell me a joke"}],
    temperature=0.7,
    max_tokens=50,
)

print(prediction.choices[0].message.content)

## Advanced configuration and error handling

The Vertex AI Model Garden SDK provides a customizable and reliable interface to deploy your open models.

You can specify advanced deployment configurations. And it provides a solid error handling to ensure robust and stable deployments, providing informative messages when problems arise.


### Specify optional deployement parameters

With the `deploy()` method, you can configure various deployment settings, including:

- Compute Resources: Machine type, number of replicas (min/max), accelerator type and count.

- Infrastructure: Spot VMs, reservation affinity, dedicated endpoints.

- Serving Container: Specify custom container images, ports, health checks, and environment variables for advanced customization.

Let's deploy a diffusion model with some additional deployment parameters.

In [None]:
sd_model = model_garden.OpenModel("stabilityai/stable-diffusion-xl-base-1.0")

You deploy the model to a Vertex AI Endpoint.

In [None]:
sd_endpoint = sd_model.deploy(
    machine_type="g2-standard-4",
    accelerator_type="NVIDIA_L4",
    accelerator_count=1,
    min_replica_count=1,
    max_replica_count=1,
    endpoint_display_name="sd-endpoint",
    model_display_name="sd-model",
    deploy_request_timeout=3 * 60 * 60,
)

Generate your image.

In [None]:
prediction = sd_endpoint.predict(instances=["A cat"])
plot_image_from_bytes(prediction.predictions[0])

### Error Handling

#### Model not found

You're trying to deploy a model named `google/some-model@some-version`. This string represents the model's identifier and version within the Model Garden (or potentially another registry). However, Vertex AI cannot find a model matching that identifier and version.

The `model.deploy()` call will raise a Python exception. This is crucial because it immediately signals that something went wrong.

In [None]:
try:
    model = model_garden.OpenModel("google/some-model@some-version")
    endpoint = model.deploy()
except Exception as e:
    print(f"Error: {e}")

#### Malformed model name

You're attempting to deploy a model using the identifier `publisher/google/some-model@some-version`. The problem here isn't that the model doesn't exist, but rather that the format of the model name itself is incorrect according to the Vertex AI Model Garden's expected structure.

Similar to the "Model Not Found" case, `model.deploy()` will raise a Python exception. This time, it will be an exception indicating an invalid input.

In [None]:
try:
    model = model_garden.OpenModel("publisher/google/some-model@some-version")
    endpoint = model.deploy()
except Exception as e:
    print(f"Error: {e}")

#### Insufficient quota

You are trying to deploy the model `meta/llama3_1@llama-3.1-8b-instruct`. This time, the model name is correctly formatted, and the model exists. However, your Google Cloud project has insufficient quota to deploy this specific model.

> Quotas are limits set by Google Cloud to control resource usage and prevent accidental overspending. Relevant quotas could include Compute Engine Quotas which are limits on the number of CPUs, GPUs, or the amount of RAM you can use in a given region. As well as Vertex AI Quota which are specific limits on the number of model deployments, prediction requests, or other Vertex AI-specific operations.

As with the previous cases, the `model.deploy()` call will raise a Python exception. This indicates that the deployment failed due to a quota issue.

In [None]:
try:
    model = model_garden.OpenModel(
        "publishers/deepseek-ai/models/deepseek-r1@deepseek-r1"
    )
    endpoint = model.deploy()
except Exception as e:
    print(f"Error: {e}")

#### Organization policy

You're attempting to deploy the model `stabilityai/stable-diffusion-xl-base-1.0`. The model name is correct, the model exists, and you have sufficient quota. However, an organizational policy prevents you from deploying this specific model.

> Organizational policies are rules set by your Google Cloud organization's administrator to control which resources and services can be used within the organization. These policies are often used for security, compliance, and cost control.

The `model.deploy()` call will raise a Python exception like the one below:
```
---------------------------------------------------------------------------
_InactiveRpcError                         Traceback (most recent call last)
/usr/local/lib/python3.11/dist-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
     75         try:
---> 76             return callable_(*args, **kwargs)
     77         except grpc.RpcError as exc:

6 frames
_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.FAILED_PRECONDITION
	details = "Organization Policy constraint `constraints/vertexai.allowedModels` violated for `projects/your-project-id` attempting to use a disallowed Gen AI model stable-diffusion-xl-base-1.0. Please contact your organization administrator to fix this violation. For more info, see https://cloud.google.com/vertex-ai/generative-ai/docs/control-model-access."
	debug_error_string = "UNKNOWN:Error received from peer ipv4:your-ipv4 {created_time:"2025-01-16T20:52:11.244017603+00:00", grpc_status:9, grpc_message:"Organization Policy constraint `constraints/vertexai.allowedModels` violated for `projects/your-project-id` attempting to use a disallowed Gen AI model stable-diffusion-xl-base-1.0. Please contact your organization administrator to fix this violation. For more info, see https://cloud.google.com/vertex-ai/generative-ai/docs/control-model-access."}"
>

The above exception was the direct cause of the following exception:

FailedPrecondition                        Traceback (most recent call last)
/usr/local/lib/python3.11/dist-packages/google/api_core/grpc_helpers.py in error_remapped_callable(*args, **kwargs)
     76             return callable_(*args, **kwargs)
     77         except grpc.RpcError as exc:
---> 78             raise exceptions.from_grpc_error(exc) from exc
     79
     80     return error_remapped_callable

FailedPrecondition: 400 Organization Policy constraint `constraints/vertexai.allowedModels` violated for `projects/your-project-id` attempting to use a disallowed Gen AI model stable-diffusion-xl-base-1.0. Please contact your organization administrator to fix this violation. For more info, see https://cloud.google.com/vertex-ai/generative-ai/docs/control-model-access.

```

This exception will indicate that the deployment was blocked due to a policy violation.

#### Missing EULA (End-User License Agreement)

You're trying to deploy the model `publishers/meta/models/llama3-2@llama-3.2-90b-vision`. The model name is correct, the model exists, you have sufficient quota, and no organizational policies are blocking it.

However, you haven't accepted the model's EULA. Many models, especially those from third-party publishers (like Meta in this case), require you to agree to their terms of service before you can use them.

The `model.deploy()` call will raise a Python exception. This exception will indicate that the deployment failed because the EULA hasn't been accepted.


In [None]:
try:
    model = model_garden.OpenModel(
        "publishers/meta/models/llama3-2@llama-3.2-90b-vision"
    )
    endpoint = model.deploy()
except Exception as e:
    print(f"Error: {e}")

#### Request Hugging Face gated model without access token


You are trying to deploy `black-forest-labs/flux.1-dev`, which, is a gated model on the Hugging Face Hub. This means you need a Hugging Face Hub account, acceptance of the model's license, and a valid read-only access token to download the model weights. In this scenario, you're missing the necessary access token.

The `model.deploy()` call will raise a Python exception. The exception will signal that authentication with the Hugging Face Hub failed due to a missing token.

In [None]:
try:
    model = model_garden.OpenModel("black-forest-labs/FLUX.1-dev")
    endpoint = model.deploy()
except Exception as e:
    print(f"Error: {e}")

#### Request Hugging Face gated model with invalid access token

You're attempting to deploy the gated model `black-forest-labs/FLUX.1-dev`. You're explicitly providing a Hugging Face access token via the hugging_face_access_token argument to the deploy method. However, the token `invalid-token` is, as the name suggests, not valid.

The `model.deploy()` call will definitely raise an exception. The exception type will be similar to the previous missing token case, indicating an authentication failure with the Hugging Face Hub.

In [None]:
try:
    model = model_garden.OpenModel("black-forest-labs/FLUX.1-dev")
    endpoint = model.deploy(hugging_face_access_token="invalid-token")
except Exception as e:
    print(f"Error: {e}")

## Cleaning up

In [None]:
delete_endpoints = True

if delete_endpoints:
    gemma_endpoint.delete(force=True)
    sd_endpoint.delete(force=True)