gemini/use-cases/retail/product_image_background_generation.ipynb (494 lines of code) (raw):
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ijGzTHJJUCPY"
},
"outputs": [],
"source": [
"# Copyright 2024 Google LLC\n",
"#\n",
"# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VEqbX8OhE8y9"
},
"source": [
"# Product image background generation with Gemini and Imagen\n",
"\n",
"<table align=\"left\">\n",
" <td style=\"text-align: center\">\n",
" <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retail/product_image_background_generation.ipynb\">\n",
" <img src=\"https://cloud.google.com/ml-engine/images/colab-logo-32px.png\" alt=\"Google Colaboratory logo\"><br> Run in Colab\n",
" </a>\n",
" </td>\n",
" <td style=\"text-align: center\">\n",
" <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fuse-cases%2Fretail%2Fproduct_image_background_generation.ipynb\">\n",
" <img width=\"32px\" src=\"https://cloud.google.com/ml-engine/images/colab-enterprise-logo-32px.png\" alt=\"Google Cloud Colab Enterprise logo\"><br> Run in Colab Enterprise\n",
" </a>\n",
" </td> \n",
" <td style=\"text-align: center\">\n",
" <a href=\"https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retail/product_image_background_generation.ipynb\">\n",
" <img src=\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt=\"GitHub logo\"><br> View on GitHub\n",
" </a>\n",
" </td>\n",
" <td style=\"text-align: center\">\n",
" <a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/use-cases/retail/product_image_background_generation.ipynb\">\n",
" <img src=\"https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n",
" </a>\n",
" </td>\n",
"</table>\n",
"\n",
"<div style=\"clear: both;\"></div>\n",
"\n",
"<b>Share to:</b>\n",
"\n",
"<a href=\"https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retail/product_image_background_generation.ipynb\" target=\"_blank\">\n",
" <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg\" alt=\"LinkedIn logo\">\n",
"</a>\n",
"\n",
"<a href=\"https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retail/product_image_background_generation.ipynb\" target=\"_blank\">\n",
" <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg\" alt=\"Bluesky logo\">\n",
"</a>\n",
"\n",
"<a href=\"https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retail/product_image_background_generation.ipynb\" target=\"_blank\">\n",
" <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg\" alt=\"X logo\">\n",
"</a>\n",
"\n",
"<a href=\"https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retail/product_image_background_generation.ipynb\" target=\"_blank\">\n",
" <img width=\"20px\" src=\"https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png\" alt=\"Reddit logo\">\n",
"</a>\n",
"\n",
"<a href=\"https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retail/product_image_background_generation.ipynb\" target=\"_blank\">\n",
" <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg\" alt=\"Facebook logo\">\n",
"</a> \n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "G1KDmM_PBAXz"
},
"source": [
"| | |\n",
"|-|-|\n",
"|Author(s) | [Katie Nguyen](https://github.com/katiemn) |"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CkHPv2myT2cx"
},
"source": [
"## Overview\n",
"\n",
"### Gemini\n",
"\n",
"The Gemini model introduces a long context window of up to 1 million tokens that can seamlessly analyze large amounts of information. Additionally, it is multimodal with the ability to process text, images, audio, video, and code. Learn more about [Gemini 2.0](https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/).\n",
"\n",
"### Imagen 2\n",
"\n",
"Imagen 2 on Vertex AI brings Google's state of the art generative AI capabilities to application developers. With Imagen 2 on Vertex AI, application developers can build next-generation AI products that transform their user's imagination into high quality visual assets. Learn more about [Imagen on Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/image/overview).\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DrkcqHrrwMAo"
},
"source": [
"In this tutorial, you will learn how to use the Vertex AI SDK for Python to interact with the Gemini 2.0 model and certain Imagen 2 image editing features to:\n",
"\n",
"- Upload an image for an e-commerce product listing\n",
"- Generate a product description\n",
"- Enhance the background of a product image"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "r11Gu7qNgx1p"
},
"source": [
"## Getting Started\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "No17Cw5hgx12"
},
"source": [
"### Install Vertex AI SDK for Python\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "tFy3H3aPgx12"
},
"outputs": [],
"source": [
"%pip install --upgrade --user google-cloud-aiplatform"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "dmWOrTJ3gx13"
},
"source": [
"### Authenticate your notebook environment (Colab only)\n",
"\n",
"If you are running this notebook on Google Colab, run the following cell to authenticate your environment.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "NyKGtVQjgx13"
},
"outputs": [],
"source": [
"import sys\n",
"\n",
"if \"google.colab\" in sys.modules:\n",
" from google.colab import auth\n",
"\n",
" auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DF4l8DTdWgPY"
},
"source": [
"### Set Google Cloud project information and initialize Vertex AI SDK\n",
"\n",
"To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).\n",
"\n",
"Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Nqwi-5ufWp_B"
},
"outputs": [],
"source": [
"# Use the environment variable if the user doesn't provide Project ID.\n",
"import os\n",
"\n",
"PROJECT_ID = \"[your-project-id]\" # @param {type: \"string\", placeholder: \"[your-project-id]\", isTemplate: true}\n",
"if not PROJECT_ID or PROJECT_ID == \"[your-project-id]\":\n",
" PROJECT_ID = str(os.environ.get(\"GOOGLE_CLOUD_PROJECT\"))\n",
"\n",
"LOCATION = os.environ.get(\"GOOGLE_CLOUD_REGION\", \"us-central1\")\n",
"\n",
"import vertexai\n",
"\n",
"vertexai.init(project=PROJECT_ID, location=LOCATION)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jXHfaVS66_01"
},
"source": [
"### Import libraries\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "lslYAvw37JGQ"
},
"outputs": [],
"source": [
"from vertexai.generative_models import GenerativeModel, Image, Part"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "BY1nfXrqRxVX"
},
"source": [
"### Load the Gemini model\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "2998506fe6d1"
},
"outputs": [],
"source": [
"model = GenerativeModel(\"gemini-2.0-flash\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Sr2Y3lFwKW1M"
},
"source": [
"### Define helper functions"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "r_38e5rRKB6s"
},
"outputs": [],
"source": [
"import http.client\n",
"import typing\n",
"import urllib.request\n",
"\n",
"import IPython.display\n",
"from PIL import Image as PIL_Image\n",
"from PIL import ImageOps as PIL_ImageOps\n",
"\n",
"\n",
"def display_image(\n",
" image: Image,\n",
" max_width: int = 600,\n",
" max_height: int = 350,\n",
") -> None:\n",
" pil_image = typing.cast(PIL_Image.Image, image._pil_image)\n",
" if pil_image.mode != \"RGB\":\n",
" # RGB is supported by all Jupyter environments (e.g. RGBA is not yet)\n",
" pil_image = pil_image.convert(\"RGB\")\n",
" image_width, image_height = pil_image.size\n",
" if max_width < image_width or max_height < image_height:\n",
" # Resize to display a smaller notebook image\n",
" pil_image = PIL_ImageOps.contain(pil_image, (max_width, max_height))\n",
" IPython.display.display(pil_image)\n",
"\n",
"\n",
"def get_image_bytes_from_url(image_url: str) -> bytes:\n",
" with urllib.request.urlopen(image_url) as response:\n",
" response = typing.cast(http.client.HTTPResponse, response)\n",
" image_bytes = response.read()\n",
" return image_bytes\n",
"\n",
"\n",
"def get_url_from_gcs(gcs_uri: str) -> str:\n",
" # Converts GCS uri to url for image display\n",
" url = \"https://storage.googleapis.com/\" + gcs_uri.replace(\"gs://\", \"\").replace(\n",
" \" \", \"%20\"\n",
" )\n",
" return url"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "XibcO-Ev1xWi"
},
"source": [
"### Download a sample product image from Google Cloud Storage\n",
"\n",
"Use the ```Part.from_uri``` method to load a file as the example product image.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "BXiqOd3NOeSs"
},
"outputs": [],
"source": [
"# Load an image from Google Cloud Storage\n",
"gcs_uri = \"gs://cloud-samples-data/generative-ai/image/suitcase.png\"\n",
"product_image = Part.from_uri(gcs_uri, mime_type=\"image/png\")\n",
"\n",
"# Display image\n",
"url = get_url_from_gcs(product_image.file_data.file_uri)\n",
"IPython.display.Image(url, width=350)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EAYMjHrGMPTY"
},
"source": [
"### Upload a sample product image from a local file (optional)\n",
"Use the ```Image.load_from_file``` method to use your own image that you can upload to the runtime environment.\n",
"\n",
" Uncomment the following cell, and replace \"image.png\" with the name of your image file."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "7IQMAZqgMa_W"
},
"outputs": [],
"source": [
"# # Load from local file\n",
"# image_name = \"image.png\" # @param {type:\"string\"}\n",
"# product_image = Image.load_from_file(image_name)\n",
"\n",
"# # Display image\n",
"# display_image(product_image)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4UYYNDtV3cOD"
},
"source": [
"### Generate image prompt\n",
"\n",
"Use Gemini to generate a text prompt based on the provided product metadata and initial image."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "_pTBLdTxPmz8"
},
"outputs": [],
"source": [
"# Update the product description if using a different image\n",
"product_description = \"a blue suitcase with wheels\" # @param {type:\"string\"}\n",
"\n",
"prompt = \"\"\"\n",
"Create a description of the product image.\n",
"\n",
"The description should be an image generation prompt to create a more interesting background.\n",
"\n",
"Only return the prompt string.\n",
"\"\"\"\n",
"\n",
"contents = [product_image, product_description, prompt]\n",
"\n",
"responses = model.generate_content(contents)\n",
"description = responses.text\n",
"print(description)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "mXkTNjgcJ9c_"
},
"source": [
"### Import libraries"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "jeF0VL13J_ol"
},
"outputs": [],
"source": [
"from vertexai.preview.vision_models import Image as VisionImage\n",
"from vertexai.preview.vision_models import ImageGenerationModel"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VLmwIj2RD0Fx"
},
"source": [
"### Load the image generation model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "F-gd2ypQhh7K"
},
"outputs": [],
"source": [
"generation_model = ImageGenerationModel.from_pretrained(\"imagegeneration@006\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "KrpRA3C6Eq69"
},
"source": [
"### Generate a new product image\n",
"\n",
"Use the ```edit_image``` function to enhance the product image by modifying the background content while preserving the product appearance."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "X5shU_xEjCuX"
},
"outputs": [],
"source": [
"output_file = \"image_enhanced.png\"\n",
"input_file = VisionImage(get_image_bytes_from_url(url))\n",
"\n",
"# Uncomment and replace image name if using your own example\n",
"# image_name = \"image.png\" # @param {type:\"string\"}\n",
"# input_file = VisionImage.load_from_file(image_name)\n",
"\n",
"gen_img = generation_model.edit_image(\n",
" base_image=input_file,\n",
" prompt=description,\n",
" edit_mode=\"product-image\",\n",
")\n",
"\n",
"display_image(gen_img.images[0])\n",
"gen_img.images[0].save(output_file)"
]
}
],
"metadata": {
"colab": {
"name": "product_image_background_generation.ipynb",
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}