vision/getting-started/imagen3_image_generation.ipynb (543 lines of code) (raw):
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ijGzTHJJUCPY"
},
"outputs": [],
"source": [
"# Copyright 2024 Google LLC\n",
"#\n",
"# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VEqbX8OhE8y9"
},
"source": [
"# Imagen 3 Image Generation\n",
"\n",
"<table align=\"left\">\n",
" <td style=\"text-align: center\">\n",
" <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/vision/getting-started/imagen3_image_generation.ipynb\">\n",
" <img src=\"https://cloud.google.com/ml-engine/images/colab-logo-32px.png\" alt=\"Google Colaboratory logo\"><br> Run in Colab\n",
" </a>\n",
" </td>\n",
" <td style=\"text-align: center\">\n",
" <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fvision%2Fgetting-started%2Fimagen3_image_generation.ipynb\">\n",
" <img width=\"32px\" src=\"https://cloud.google.com/ml-engine/images/colab-enterprise-logo-32px.png\" alt=\"Google Cloud Colab Enterprise logo\"><br> Run in Colab Enterprise\n",
" </a>\n",
" </td>\n",
" <td style=\"text-align: center\">\n",
" <a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/vision/getting-started/imagen3_image_generation.ipynb\">\n",
" <img src=\"https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n",
" </a>\n",
" </td> \n",
" <td style=\"text-align: center\">\n",
" <a href=\"https://github.com/GoogleCloudPlatform/generative-ai/blob/main/vision/getting-started/imagen3_image_generation.ipynb\">\n",
" <img src=\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt=\"GitHub logo\"><br> View on GitHub\n",
" </a>\n",
" </td>\n",
"</table>\n",
"\n",
"<div style=\"clear: both;\"></div>\n",
"\n",
"<b>Share to:</b>\n",
"\n",
"<a href=\"https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/vision/getting-started/imagen3_image_generation.ipynb\" target=\"_blank\">\n",
" <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg\" alt=\"LinkedIn logo\">\n",
"</a>\n",
"\n",
"<a href=\"https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/vision/getting-started/imagen3_image_generation.ipynb\" target=\"_blank\">\n",
" <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg\" alt=\"Bluesky logo\">\n",
"</a>\n",
"\n",
"<a href=\"https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/vision/getting-started/imagen3_image_generation.ipynb\" target=\"_blank\">\n",
" <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg\" alt=\"X logo\">\n",
"</a>\n",
"\n",
"<a href=\"https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/vision/getting-started/imagen3_image_generation.ipynb\" target=\"_blank\">\n",
" <img width=\"20px\" src=\"https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png\" alt=\"Reddit logo\">\n",
"</a>\n",
"\n",
"<a href=\"https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/vision/getting-started/imagen3_image_generation.ipynb\" target=\"_blank\">\n",
" <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg\" alt=\"Facebook logo\">\n",
"</a> \n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "G1KDmM_PBAXz"
},
"source": [
"| | |\n",
"|-|-|\n",
"|Author(s) | [Katie Nguyen](https://github.com/katiemn) |"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CkHPv2myT2cx"
},
"source": [
"## Overview\n",
"\n",
"### Imagen 3\n",
"\n",
"Imagen 3 on Vertex AI brings Google's state of the art generative AI capabilities to application developers. Imagen 3 is Google's highest quality text-to-image model to date. It's capable of creating images with astonishing detail. Thus, developers have more control when building next-generation AI products that transform their imagination into high quality visual assets. Learn more about [Imagen on Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/image/overview).\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DrkcqHrrwMAo"
},
"source": [
"In this tutorial, you will learn how to use the Google Gen AI SDK for Python to interact with the Imagen 3 and Imagen 3 Fast models to generate images showcasing:\n",
"\n",
"- Photorealistic scenes\n",
"- Prompt enhancement\n",
"- Text rendered within images\n",
"- Quality and latency comparisons within the two models"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "r11Gu7qNgx1p"
},
"source": [
"## Get started\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "No17Cw5hgx12"
},
"source": [
"### Install Google Gen AI SDK for Python\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "tFy3H3aPgx12"
},
"outputs": [],
"source": [
"%pip install --upgrade --quiet google-genai"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "dmWOrTJ3gx13"
},
"source": [
"### Authenticate your notebook environment (Colab only)\n",
"\n",
"If you are running this notebook on Google Colab, run the following cell to authenticate your environment.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "NyKGtVQjgx13"
},
"outputs": [],
"source": [
"import sys\n",
"\n",
"if \"google.colab\" in sys.modules:\n",
" from google.colab import auth\n",
"\n",
" auth.authenticate_user()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Ua6PDqB1iBSb"
},
"source": [
"### Import libraries"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "yTiDo0lRh6sc"
},
"outputs": [],
"source": [
"from google import genai\n",
"from google.genai import types"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DF4l8DTdWgPY"
},
"source": [
"### Set Google Cloud project information and create client\n",
"\n",
"To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).\n",
"\n",
"Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"id": "Nqwi-5ufWp_B"
},
"outputs": [],
"source": [
"import os\n",
"\n",
"PROJECT_ID = \"[your-project-id]\" # @param {type: \"string\", placeholder: \"[your-project-id]\", isTemplate: true}\n",
"if not PROJECT_ID or PROJECT_ID == \"[your-project-id]\":\n",
" PROJECT_ID = str(os.environ.get(\"GOOGLE_CLOUD_PROJECT\"))\n",
"\n",
"LOCATION = os.environ.get(\"GOOGLE_CLOUD_REGION\", \"us-central1\")\n",
"\n",
"client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Sr2Y3lFwKW1M"
},
"source": [
"### Define a helper function"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "r_38e5rRKB6s"
},
"outputs": [],
"source": [
"import typing\n",
"\n",
"import IPython.display\n",
"from PIL import Image as PIL_Image\n",
"from PIL import ImageOps as PIL_ImageOps\n",
"\n",
"\n",
"def display_image(\n",
" image,\n",
" max_width: int = 600,\n",
" max_height: int = 350,\n",
") -> None:\n",
" pil_image = typing.cast(PIL_Image.Image, image._pil_image)\n",
" if pil_image.mode != \"RGB\":\n",
" # RGB is supported by all Jupyter environments (e.g. RGBA is not yet)\n",
" pil_image = pil_image.convert(\"RGB\")\n",
" image_width, image_height = pil_image.size\n",
" if max_width < image_width or max_height < image_height:\n",
" # Resize to display a smaller notebook image\n",
" pil_image = PIL_ImageOps.contain(pil_image, (max_width, max_height))\n",
" IPython.display.display(pil_image)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "VLmwIj2RD0Fx"
},
"source": [
"### Load the image generation models\n",
"\n",
"Imagen 3: `imagen-3.0-generate-002`\n",
"\n",
"Imagen 3 Fast: `imagen-3.0-fast-generate-001`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "F-gd2ypQhh7K"
},
"outputs": [],
"source": [
"generation_model = \"imagen-3.0-generate-002\"\n",
"generation_model_fast = \"imagen-3.0-fast-generate-001\""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "mvH2OTrK8IRS"
},
"source": [
"### Imagen 3 & Imagen 3 Fast\n",
"\n",
"With Imagen 3, you also have the option to use Imagen 3 Fast. These two model options give you the choice to optimize for quality and latency, depending on your use case.\n",
"\n",
"**Imagen 3:** Generates high quality images with natural lighting and increased photorealism.\n",
"\n",
"**Imagen 3 Fast:** Suitable for creating brighter images with a higher contrast. Overall, you can see a 40% decrease in latency in Imagen 3 Fast compared to Imagen 2.\n",
"\n",
"With Imagen 3 and Imagen 3 Fast, you can also configure the `aspect ratio` to any of the following:\n",
"* 1:1\n",
"* 9:16\n",
"* 16:9\n",
"* 3:4\n",
"* 4:3"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "KJqHuGqi31rE"
},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"\n",
"prompt = \"\"\"\n",
"a photorealistic image of the inside of an amethyst crystal on display in a museum\n",
"\"\"\"\n",
"\n",
"# Imagen 3 image generation\n",
"image = client.models.generate_images(\n",
" model=generation_model,\n",
" prompt=prompt,\n",
" config=types.GenerateImagesConfig(\n",
" number_of_images=1,\n",
" aspect_ratio=\"3:4\",\n",
" safety_filter_level=\"BLOCK_MEDIUM_AND_ABOVE\",\n",
" person_generation=\"DONT_ALLOW\",\n",
" ),\n",
")\n",
"\n",
"# Imagen 3 Fast image generation\n",
"fast_image = client.models.generate_images(\n",
" model=generation_model_fast,\n",
" prompt=prompt,\n",
" config=types.GenerateImagesConfig(\n",
" number_of_images=1,\n",
" aspect_ratio=\"3:4\",\n",
" safety_filter_level=\"BLOCK_MEDIUM_AND_ABOVE\",\n",
" person_generation=\"DONT_ALLOW\",\n",
" ),\n",
")\n",
"\n",
"# Display generated images\n",
"fig, axis = plt.subplots(1, 2, figsize=(12, 6))\n",
"axis[0].imshow(image.generated_images[0].image._pil_image)\n",
"axis[0].set_title(\"Imagen 3\")\n",
"axis[1].imshow(fast_image.generated_images[0].image._pil_image)\n",
"axis[1].set_title(\"Imagen 3 Fast\")\n",
"for ax in axis:\n",
" ax.axis(\"off\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "34oCrJAYniTJ"
},
"source": [
"### Prompt rewriter\n",
"\n",
"The `imagen-3.0-generate-002` model offers the option to enhance your provided prompt. To utilize this feature, set `enhance_prompt` to True. A new, detailed prompt will be created from your original one to help generate higher quality images."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "_RqB1mKunlPB"
},
"outputs": [],
"source": [
"prompt = \"\"\"\n",
"an art museum\n",
"\"\"\"\n",
"\n",
"image = client.models.generate_images(\n",
" model=generation_model,\n",
" prompt=prompt,\n",
" config=types.GenerateImagesConfig(\n",
" number_of_images=1,\n",
" aspect_ratio=\"1:1\",\n",
" enhance_prompt=True,\n",
" safety_filter_level=\"BLOCK_MEDIUM_AND_ABOVE\",\n",
" person_generation=\"DONT_ALLOW\",\n",
" ),\n",
")\n",
"\n",
"# Print the enhanced prompt\n",
"print(image.generated_images[0].enhanced_prompt)\n",
"display_image(image.generated_images[0].image)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UND0wrdf3mGS"
},
"source": [
"### Photorealism and prompt understanding\n",
"\n",
"**Photorealism:** Imagen 3 is capable of generating photorealistic, lifelike images with fewer distracting visual artifacts than our previous models. This increased quality is especially prevalent when generating images of multiple people, animals, and landscapes.\n",
"\n",
"**Prompt adherence:** It's also better at understanding natural language and the intent behind your prompts. Thus, they can be written in everyday language and can include specific details including camera angles, lens types, lighting, and stylistic features.\n",
"\n",
"When generating images of people you can also set the `safety_filter_level` and `person_generation` parameters accordingly:\n",
"* `person_generation`: DONT_ALLOW, ALLOW_ADULT, ALLOW_ALL\n",
"* `safety_filter_level`: BLOCK_LOW_AND_ABOVE, BLOCK_MEDIUM_AND_ABOVE, BLOCK_ONLY_HIGH, BLOCK_NONE"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "7JBtrCfd1idI"
},
"outputs": [],
"source": [
"prompt = \"\"\"\n",
"A vintage, red convertible car speeds along a winding cliffside road.\n",
"The car is slightly angled, facing towards the viewer, with one wheel slightly off the edge of the road.\n",
"The setting sun casts a warm, golden light across the scene, illuminating the car and the dramatic, rocky cliffs.\n",
"In the distance, the vast ocean stretches towards the horizon.\n",
"\"\"\"\n",
"\n",
"image = client.models.generate_images(\n",
" model=generation_model,\n",
" prompt=prompt,\n",
" config=types.GenerateImagesConfig(\n",
" number_of_images=1,\n",
" aspect_ratio=\"1:1\",\n",
" safety_filter_level=\"BLOCK_MEDIUM_AND_ABOVE\",\n",
" person_generation=\"DONT_ALLOW\",\n",
" ),\n",
")\n",
"\n",
"display_image(image.generated_images[0].image)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "hAeNPdyU9AlC"
},
"source": [
"### Better text rendering\n",
"\n",
"Imagen 3 also does a great job accurately rendering small words and phrases in images. This could be particularly useful for generating business cards, posters, banners, product designs, or greeting cards."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "DLmdtT8Q38fn"
},
"outputs": [],
"source": [
"prompt = \"\"\"\n",
"a beige baseball cap with 'good vibes' written on top in white bubbly stitched letters that are outlined in neon green,\n",
"display it against a pool background with palm trees and pool floats\n",
"\"\"\"\n",
"\n",
"image = client.models.generate_images(\n",
" model=generation_model,\n",
" prompt=prompt,\n",
" config=types.GenerateImagesConfig(\n",
" number_of_images=1,\n",
" aspect_ratio=\"1:1\",\n",
" safety_filter_level=\"BLOCK_MEDIUM_AND_ABOVE\",\n",
" person_generation=\"DONT_ALLOW\",\n",
" ),\n",
")\n",
"\n",
"display_image(image.generated_images[0].image)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xGneutCN-mW5"
},
"source": [
"### Add image watermark\n",
"\n",
"By default, a digital watermark, or [SynthID](https://deepmind.google/technologies/synthid/), is added to Imagen 3 images. If you would like to explicitly set the watermark to True, you can do so with the `add_watermark` parameter. You can also [verify a watermarked image](https://cloud.google.com/vertex-ai/generative-ai/docs/image/generate-images#watermark)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "XtyEx1tYhB8h"
},
"outputs": [],
"source": [
"prompt = \"\"\"\n",
"a whimsical candy store\n",
"\"\"\"\n",
"\n",
"image = client.models.generate_images(\n",
" model=generation_model,\n",
" prompt=prompt,\n",
" config=types.GenerateImagesConfig(\n",
" number_of_images=1,\n",
" aspect_ratio=\"1:1\",\n",
" enhance_prompt=True,\n",
" safety_filter_level=\"BLOCK_MEDIUM_AND_ABOVE\",\n",
" person_generation=\"DONT_ALLOW\",\n",
" add_watermark=True,\n",
" ),\n",
")\n",
"\n",
"# Print the enhanced prompt\n",
"print(image.generated_images[0].enhanced_prompt)\n",
"display_image(image.generated_images[0].image)"
]
}
],
"metadata": {
"colab": {
"name": "imagen3_image_generation.ipynb",
"toc_visible": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}