gemini/use-cases/retail/product_image_background_generation.ipynb (494 lines of code) (raw):

{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "id": "ijGzTHJJUCPY" }, "outputs": [], "source": [ "# Copyright 2024 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "VEqbX8OhE8y9" }, "source": [ "# Product image background generation with Gemini and Imagen\n", "\n", "<table align=\"left\">\n", " <td style=\"text-align: center\">\n", " <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retail/product_image_background_generation.ipynb\">\n", " <img src=\"https://cloud.google.com/ml-engine/images/colab-logo-32px.png\" alt=\"Google Colaboratory logo\"><br> Run in Colab\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fuse-cases%2Fretail%2Fproduct_image_background_generation.ipynb\">\n", " <img width=\"32px\" src=\"https://cloud.google.com/ml-engine/images/colab-enterprise-logo-32px.png\" alt=\"Google Cloud Colab Enterprise logo\"><br> Run in Colab Enterprise\n", " </a>\n", " </td> \n", " <td style=\"text-align: center\">\n", " <a href=\"https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retail/product_image_background_generation.ipynb\">\n", " <img src=\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt=\"GitHub logo\"><br> View on GitHub\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/use-cases/retail/product_image_background_generation.ipynb\">\n", " <img src=\"https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n", " </a>\n", " </td>\n", "</table>\n", "\n", "<div style=\"clear: both;\"></div>\n", "\n", "<b>Share to:</b>\n", "\n", "<a href=\"https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retail/product_image_background_generation.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg\" alt=\"LinkedIn logo\">\n", "</a>\n", "\n", "<a href=\"https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retail/product_image_background_generation.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg\" alt=\"Bluesky logo\">\n", "</a>\n", "\n", "<a href=\"https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retail/product_image_background_generation.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg\" alt=\"X logo\">\n", "</a>\n", "\n", "<a href=\"https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retail/product_image_background_generation.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png\" alt=\"Reddit logo\">\n", "</a>\n", "\n", "<a href=\"https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retail/product_image_background_generation.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg\" alt=\"Facebook logo\">\n", "</a> \n" ] }, { "cell_type": "markdown", "metadata": { "id": "G1KDmM_PBAXz" }, "source": [ "| | |\n", "|-|-|\n", "|Author(s) | [Katie Nguyen](https://github.com/katiemn) |" ] }, { "cell_type": "markdown", "metadata": { "id": "CkHPv2myT2cx" }, "source": [ "## Overview\n", "\n", "### Gemini\n", "\n", "The Gemini model introduces a long context window of up to 1 million tokens that can seamlessly analyze large amounts of information. Additionally, it is multimodal with the ability to process text, images, audio, video, and code. Learn more about [Gemini 2.0](https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/).\n", "\n", "### Imagen 2\n", "\n", "Imagen 2 on Vertex AI brings Google's state of the art generative AI capabilities to application developers. With Imagen 2 on Vertex AI, application developers can build next-generation AI products that transform their user's imagination into high quality visual assets. Learn more about [Imagen on Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/image/overview).\n" ] }, { "cell_type": "markdown", "metadata": { "id": "DrkcqHrrwMAo" }, "source": [ "In this tutorial, you will learn how to use the Vertex AI SDK for Python to interact with the Gemini 2.0 model and certain Imagen 2 image editing features to:\n", "\n", "- Upload an image for an e-commerce product listing\n", "- Generate a product description\n", "- Enhance the background of a product image" ] }, { "cell_type": "markdown", "metadata": { "id": "r11Gu7qNgx1p" }, "source": [ "## Getting Started\n" ] }, { "cell_type": "markdown", "metadata": { "id": "No17Cw5hgx12" }, "source": [ "### Install Vertex AI SDK for Python\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "tFy3H3aPgx12" }, "outputs": [], "source": [ "%pip install --upgrade --user google-cloud-aiplatform" ] }, { "cell_type": "markdown", "metadata": { "id": "dmWOrTJ3gx13" }, "source": [ "### Authenticate your notebook environment (Colab only)\n", "\n", "If you are running this notebook on Google Colab, run the following cell to authenticate your environment.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "NyKGtVQjgx13" }, "outputs": [], "source": [ "import sys\n", "\n", "if \"google.colab\" in sys.modules:\n", " from google.colab import auth\n", "\n", " auth.authenticate_user()" ] }, { "cell_type": "markdown", "metadata": { "id": "DF4l8DTdWgPY" }, "source": [ "### Set Google Cloud project information and initialize Vertex AI SDK\n", "\n", "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).\n", "\n", "Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Nqwi-5ufWp_B" }, "outputs": [], "source": [ "# Use the environment variable if the user doesn't provide Project ID.\n", "import os\n", "\n", "PROJECT_ID = \"[your-project-id]\" # @param {type: \"string\", placeholder: \"[your-project-id]\", isTemplate: true}\n", "if not PROJECT_ID or PROJECT_ID == \"[your-project-id]\":\n", " PROJECT_ID = str(os.environ.get(\"GOOGLE_CLOUD_PROJECT\"))\n", "\n", "LOCATION = os.environ.get(\"GOOGLE_CLOUD_REGION\", \"us-central1\")\n", "\n", "import vertexai\n", "\n", "vertexai.init(project=PROJECT_ID, location=LOCATION)" ] }, { "cell_type": "markdown", "metadata": { "id": "jXHfaVS66_01" }, "source": [ "### Import libraries\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "lslYAvw37JGQ" }, "outputs": [], "source": [ "from vertexai.generative_models import GenerativeModel, Image, Part" ] }, { "cell_type": "markdown", "metadata": { "id": "BY1nfXrqRxVX" }, "source": [ "### Load the Gemini model\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "2998506fe6d1" }, "outputs": [], "source": [ "model = GenerativeModel(\"gemini-2.0-flash\")" ] }, { "cell_type": "markdown", "metadata": { "id": "Sr2Y3lFwKW1M" }, "source": [ "### Define helper functions" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "r_38e5rRKB6s" }, "outputs": [], "source": [ "import http.client\n", "import typing\n", "import urllib.request\n", "\n", "import IPython.display\n", "from PIL import Image as PIL_Image\n", "from PIL import ImageOps as PIL_ImageOps\n", "\n", "\n", "def display_image(\n", " image: Image,\n", " max_width: int = 600,\n", " max_height: int = 350,\n", ") -> None:\n", " pil_image = typing.cast(PIL_Image.Image, image._pil_image)\n", " if pil_image.mode != \"RGB\":\n", " # RGB is supported by all Jupyter environments (e.g. RGBA is not yet)\n", " pil_image = pil_image.convert(\"RGB\")\n", " image_width, image_height = pil_image.size\n", " if max_width < image_width or max_height < image_height:\n", " # Resize to display a smaller notebook image\n", " pil_image = PIL_ImageOps.contain(pil_image, (max_width, max_height))\n", " IPython.display.display(pil_image)\n", "\n", "\n", "def get_image_bytes_from_url(image_url: str) -> bytes:\n", " with urllib.request.urlopen(image_url) as response:\n", " response = typing.cast(http.client.HTTPResponse, response)\n", " image_bytes = response.read()\n", " return image_bytes\n", "\n", "\n", "def get_url_from_gcs(gcs_uri: str) -> str:\n", " # Converts GCS uri to url for image display\n", " url = \"https://storage.googleapis.com/\" + gcs_uri.replace(\"gs://\", \"\").replace(\n", " \" \", \"%20\"\n", " )\n", " return url" ] }, { "cell_type": "markdown", "metadata": { "id": "XibcO-Ev1xWi" }, "source": [ "### Download a sample product image from Google Cloud Storage\n", "\n", "Use the ```Part.from_uri``` method to load a file as the example product image.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "BXiqOd3NOeSs" }, "outputs": [], "source": [ "# Load an image from Google Cloud Storage\n", "gcs_uri = \"gs://cloud-samples-data/generative-ai/image/suitcase.png\"\n", "product_image = Part.from_uri(gcs_uri, mime_type=\"image/png\")\n", "\n", "# Display image\n", "url = get_url_from_gcs(product_image.file_data.file_uri)\n", "IPython.display.Image(url, width=350)" ] }, { "cell_type": "markdown", "metadata": { "id": "EAYMjHrGMPTY" }, "source": [ "### Upload a sample product image from a local file (optional)\n", "Use the ```Image.load_from_file``` method to use your own image that you can upload to the runtime environment.\n", "\n", " Uncomment the following cell, and replace \"image.png\" with the name of your image file." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "7IQMAZqgMa_W" }, "outputs": [], "source": [ "# # Load from local file\n", "# image_name = \"image.png\" # @param {type:\"string\"}\n", "# product_image = Image.load_from_file(image_name)\n", "\n", "# # Display image\n", "# display_image(product_image)" ] }, { "cell_type": "markdown", "metadata": { "id": "4UYYNDtV3cOD" }, "source": [ "### Generate image prompt\n", "\n", "Use Gemini to generate a text prompt based on the provided product metadata and initial image." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "_pTBLdTxPmz8" }, "outputs": [], "source": [ "# Update the product description if using a different image\n", "product_description = \"a blue suitcase with wheels\" # @param {type:\"string\"}\n", "\n", "prompt = \"\"\"\n", "Create a description of the product image.\n", "\n", "The description should be an image generation prompt to create a more interesting background.\n", "\n", "Only return the prompt string.\n", "\"\"\"\n", "\n", "contents = [product_image, product_description, prompt]\n", "\n", "responses = model.generate_content(contents)\n", "description = responses.text\n", "print(description)" ] }, { "cell_type": "markdown", "metadata": { "id": "mXkTNjgcJ9c_" }, "source": [ "### Import libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "jeF0VL13J_ol" }, "outputs": [], "source": [ "from vertexai.preview.vision_models import Image as VisionImage\n", "from vertexai.preview.vision_models import ImageGenerationModel" ] }, { "cell_type": "markdown", "metadata": { "id": "VLmwIj2RD0Fx" }, "source": [ "### Load the image generation model" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "F-gd2ypQhh7K" }, "outputs": [], "source": [ "generation_model = ImageGenerationModel.from_pretrained(\"imagegeneration@006\")" ] }, { "cell_type": "markdown", "metadata": { "id": "KrpRA3C6Eq69" }, "source": [ "### Generate a new product image\n", "\n", "Use the ```edit_image``` function to enhance the product image by modifying the background content while preserving the product appearance." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "X5shU_xEjCuX" }, "outputs": [], "source": [ "output_file = \"image_enhanced.png\"\n", "input_file = VisionImage(get_image_bytes_from_url(url))\n", "\n", "# Uncomment and replace image name if using your own example\n", "# image_name = \"image.png\" # @param {type:\"string\"}\n", "# input_file = VisionImage.load_from_file(image_name)\n", "\n", "gen_img = generation_model.edit_image(\n", " base_image=input_file,\n", " prompt=description,\n", " edit_mode=\"product-image\",\n", ")\n", "\n", "display_image(gen_img.images[0])\n", "gen_img.images[0].save(output_file)" ] } ], "metadata": { "colab": { "name": "product_image_background_generation.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }