gemini/use-cases/retail/multimodal_retail

{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "id": "ijGzTHJJUCPY" }, "outputs": [], "source": [ "# Copyright 2023 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "id": "VEqbX8OhE8y9" }, "source": [ "# Multimodal retail recommendation: using Gemini to recommend items based on images and image reasoning\n", "\n", "<table align=\"left\">\n", " <td style=\"text-align: center\">\n", " <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retail/multimodal_retail_recommendations.ipynb\">\n", " <img src=\"https://cloud.google.com/ml-engine/images/colab-logo-32px.png\" alt=\"Google Colaboratory logo\"><br> Run in Colab\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fgemini%2Fuse-cases%2Fretail%2Fmultimodal_retail_recommendations.ipynb\">\n", " <img width=\"32px\" src=\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\" alt=\"Google Cloud Colab Enterprise logo\"><br> Run in Colab Enterprise\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/gemini/use-cases/retail/multimodal_retail_recommendations.ipynb\">\n", " <img src=\"https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retail/multimodal_retail_recommendations.ipynb\">\n", " <img src=\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt=\"GitHub logo\"><br> View on GitHub\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"https://goo.gle/4fX5Ige\">\n", " <img width=\"32px\" src=\"https://cdn.qwiklabs.com/assets/gcp_cloud-e3a77215f0b8bfa9b3f611c0d2208c7e8708ed31.svg\" alt=\"Google Cloud logo\"><br> Open in Cloud Skills Boost\n", " </a>\n", " </td>\n", "</table>\n", "\n", "<div style=\"clear: both;\"></div>\n", "\n", "<b>Share to:</b>\n", "\n", "<a href=\"https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retail/multimodal_retail_recommendations.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg\" alt=\"LinkedIn logo\">\n", "</a>\n", "\n", "<a href=\"https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retail/multimodal_retail_recommendations.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg\" alt=\"Bluesky logo\">\n", "</a>\n", "\n", "<a href=\"https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retail/multimodal_retail_recommendations.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg\" alt=\"X logo\">\n", "</a>\n", "\n", "<a href=\"https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retail/multimodal_retail_recommendations.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png\" alt=\"Reddit logo\">\n", "</a>\n", "\n", "<a href=\"https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/retail/multimodal_retail_recommendations.ipynb\" target=\"_blank\">\n", " <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg\" alt=\"Facebook logo\">\n", "</a> \n" ] }, { "cell_type": "markdown", "metadata": { "id": "048a84091064" }, "source": [ "| | |\n", "|-|-|\n", "|Author(s) | [Thu Ya Kyaw](https://github.com/iamthuya) |" ] }, { "cell_type": "markdown", "metadata": { "id": "VK1Q5ZYdVL4Y" }, "source": [ "## Overview\n", "\n", "For retail companies, recommendation systems improve customer experience and thus can increase sales.\n", "\n", "This notebook shows how you can use the multimodal capabilities of Gemini 2.0 model to rapidly create a multimodal recommendation system out-of-the-box." ] }, { "cell_type": "markdown", "metadata": { "id": "3zLxCZkKEAhC" }, "source": [ "## Scenario\n", "\n", "The customer shows you their living room:\n", "\n", "|Customer photo |\n", "|:-----:|\n", "|<img src=\"https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/rooms/spacejoy-c0JoR_-2x3E-unsplash.jpg\" width=\"80%\"> |\n", "\n", "\n", "\n", "Below are four chair options that the customer is trying to decide between:\n", "\n", "|Chair 1| Chair 2 | Chair 3 | Chair 4 |\n", "|:-----:|:----:|:-----:|:----:|\n", "| <img src=\"https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/cesar-couto-OB2F6CsMva8-unsplash.jpg\" width=\"80%\">|<img src=\"https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/daniil-silantev-1P6AnKDw6S8-unsplash.jpg\" width=\"80%\">|<img src=\"https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/ruslan-bardash-4kTbAMRAHtQ-unsplash.jpg\" width=\"80%\">|<img src=\"https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/scopic-ltd-NLlWwR4d3qU-unsplash.jpg\" width=\"80%\">|\n", "\n", "\n", "How can you use Gemini, a multimodal model, to help the customer choose the best option, and also explain why?" ] }, { "cell_type": "markdown", "metadata": { "id": "RQT500QqVPIb" }, "source": [ "### Objectives\n", "\n", "Your main objective is to learn how to create a recommendation system that can provide both recommendations and explanations using a multimodal model: Gemini 2.0.\n", "\n", "In this notebook, you will begin with a scene (e.g. a living room) and use the Gemini model to perform visual understanding. You will also investigate how the Gemini model can be used to recommend an item (e.g. a chair) from a list of furniture items as input.\n", "\n", "By going through this notebook, you will learn:\n", "- how to use the Gemini model to perform visual understanding\n", "- how to take multimodality into consideration in prompting for the Gemini model\n", "- how the Gemini model can be used to create retail recommendation applications out-of-the-box" ] }, { "cell_type": "markdown", "metadata": { "id": "1y6_3dTwV2fI" }, "source": [ "### Costs\n", "This tutorial uses billable components of Google Cloud:\n", "\n", "- Vertex AI\n", "\n", "Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing) and use the [Pricing Calculator](https://cloud.google.com/products/calculator/) to generate a cost estimate based on your projected usage." ] }, { "cell_type": "markdown", "metadata": { "id": "QDU0XJ1xRDlL" }, "source": [ "## Getting Started" ] }, { "cell_type": "markdown", "metadata": { "id": "N5afkyDMSBW5" }, "source": [ "### Install Gen AI SDK for Python" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "V_-ThW_ZUYRV" }, "outputs": [], "source": [ "%pip install --upgrade google-genai" ] }, { "cell_type": "markdown", "metadata": { "id": "6Fom0ZkMSBW6" }, "source": [ "### Authenticate your notebook environment (Colab only)\n", "\n", "If you are running this notebook on Google Colab, run the following cell to authenticate your environment. This step is not required if you are using [Vertex AI Workbench](https://cloud.google.com/vertex-ai-workbench)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "LCaCx6PLSBW6" }, "outputs": [], "source": [ "import sys\n", "\n", "# Additional authentication is required for Google Colab\n", "if \"google.colab\" in sys.modules:\n", " # Authenticate user to Google Cloud\n", " from google.colab import auth\n", "\n", " auth.authenticate_user()" ] }, { "cell_type": "markdown", "metadata": { "id": "BuQwwRiniVFG" }, "source": [ "### Define Google Cloud project information and initialize Vertex AI\n", "\n", "Initialize the Vertex AI SDK for Python for your project:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "rtMowvm-yQ97" }, "outputs": [], "source": [ "# Use the environment variable if the user doesn't provide Project ID.\n", "import os\n", "\n", "from google import genai\n", "\n", "PROJECT_ID = \"[your-project-id]\" # @param {type: \"string\", placeholder: \"[your-project-id]\", isTemplate: true}\n", "if not PROJECT_ID or PROJECT_ID == \"[your-project-id]\":\n", " PROJECT_ID = str(os.environ.get(\"GOOGLE_CLOUD_PROJECT\"))\n", "\n", "LOCATION = os.environ.get(\"GOOGLE_CLOUD_REGION\", \"us-central1\")\n", "\n", "client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)" ] }, { "cell_type": "markdown", "metadata": { "id": "4437b7608c8e" }, "source": [ "### Import libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "yZ75ioBU9EwM" }, "outputs": [], "source": [ "from google.genai.types import GenerateContentConfig, Part" ] }, { "cell_type": "markdown", "metadata": { "id": "hhcZv4HFUYRW" }, "source": [ "## Using Gemini model\n", "\n", "Gemini is a multimodal model that supports adding image and video in text or chat prompts for a text response." ] }, { "cell_type": "markdown", "metadata": { "id": "ExdEEGUqUYRW" }, "source": [ "### Load Gemini model" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "pxlvLmncUYRW" }, "outputs": [], "source": [ "MODEL_ID = \"gemini-2.0-flash\"" ] }, { "cell_type": "markdown", "metadata": { "id": "MONqa2K8IjSz" }, "source": [ "### Visual understanding with Gemini\n", "\n", "Here you will ask the Gemini model to describe a room in details from its image. To do that you have to **combine text and image in a single prompt**." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "26wP-epVFBBK" }, "outputs": [], "source": [ "# urls for room images\n", "room_image_url = \"https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/rooms/spacejoy-c0JoR_-2x3E-unsplash.jpg\"\n", "room_image = Part.from_uri(file_uri=room_image_url, mime_type=\"image/jpeg\")\n", "\n", "prompt = \"Describe what's visible in this room and the overall atmosphere:\"\n", "contents = [\n", " room_image,\n", " prompt,\n", "]\n", "\n", "responses = client.models.generate_content_stream(model=MODEL_ID, contents=contents)\n", "\n", "print(\"\\n-------Response--------\")\n", "for response in responses:\n", " print(response.text, end=\"\")" ] }, { "cell_type": "markdown", "metadata": { "id": "64I8-GxSbkTv" }, "source": [ "### Generating open recommendations based on built-in knowledge\n", "\n", "Using the same image, you can ask the model to recommend **a piece of furniture** that would fit in it alongside with the description of the room.\n", "\n", "Note that the model can choose **any furniture** to recommend in this case, and can do so from its only built-in knowledge." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "UaRDl1WvypT8" }, "outputs": [], "source": [ "prompt1 = \"Recommend a new piece of furniture for this room:\"\n", "prompt2 = \"and explain the reason in detail\"\n", "contents = [prompt1, room_image, prompt2]\n", "\n", "responses = client.models.generate_content_stream(model=MODEL_ID, contents=contents)\n", "\n", "print(\"\\n-------Response--------\")\n", "for response in responses:\n", " print(response.text, end=\"\")" ] }, { "cell_type": "markdown", "metadata": { "id": "lzdaG4iIdZM8" }, "source": [ "In the next cell, you will ask the model to recommend **a type of chair** that would fit in it alongside with the description of the room.\n", "\n", "Note that the model can choose **any type of chair** to recommend in this case." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "h7xKDMFLyQuD" }, "outputs": [], "source": [ "prompt1 = \"Describe this room:\"\n", "prompt2 = \"and recommend a type of chair that would fit in it\"\n", "contents = [prompt1, room_image, prompt2]\n", "\n", "responses = client.models.generate_content_stream(model=MODEL_ID, contents=contents)\n", "\n", "print(\"\\n-------Response--------\")\n", "for response in responses:\n", " print(response.text, end=\"\")" ] }, { "cell_type": "markdown", "metadata": { "id": "idEB0JFcbznD" }, "source": [ "### Generating recommendations based on provided images\n", "\n", "Instead of keeping the recommendation open, you can also provide a list of items for the model to choose from. Here you will download a few chair images and set them as options for the Gemini model to recommend from. This is particularly useful for retail companies who want to provide recommendations to users based on the kind of room they have, and the available items that the store offers." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "f_jMmwRiFcPF" }, "outputs": [], "source": [ "# Download and display sample chairs\n", "furniture_image_urls = [\n", " \"https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/cesar-couto-OB2F6CsMva8-unsplash.jpg\",\n", " \"https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/daniil-silantev-1P6AnKDw6S8-unsplash.jpg\",\n", " \"https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/ruslan-bardash-4kTbAMRAHtQ-unsplash.jpg\",\n", " \"https://storage.googleapis.com/github-repo/img/gemini/retail-recommendations/furnitures/scopic-ltd-NLlWwR4d3qU-unsplash.jpg\",\n", "]\n", "\n", "# Load furniture images as Part Objects\n", "furniture_images = [\n", " Part.from_uri(file_uri=url, mime_type=\"image/jpeg\") for url in furniture_image_urls\n", "]\n", "\n", "# To recommend an item from a selection, you will need to label the item number within the prompt.\n", "# That way you are providing the model with a way to reference each image as you pose a question.\n", "# Labelling images within your prompt also help to reduce hallucinations and overall produce better results.\n", "contents = [\n", " \"Consider the following chairs:\",\n", " \"chair 1:\",\n", " furniture_images[0],\n", " \"chair 2:\",\n", " furniture_images[1],\n", " \"chair 3:\",\n", " furniture_images[2],\n", " \"chair 4:\",\n", " furniture_images[3],\n", " \"room:\",\n", " room_image,\n", " \"You are an interior designer. For each chair, explain whether it would be appropriate for the style of the room:\",\n", "]\n", "\n", "responses = client.models.generate_content_stream(model=MODEL_ID, contents=contents)\n", "\n", "print(\"\\n-------Response--------\")\n", "for response in responses:\n", " print(response.text, end=\"\")" ] }, { "cell_type": "markdown", "metadata": { "id": "Mkw-DixOUYRc" }, "source": [ "You can also return the responses in JSON format, to make it easier to plug recommendations into a recommendation system:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "MDBQufRYUYRc" }, "outputs": [], "source": [ "contents = [\n", " \"Consider the following chairs:\",\n", " \"chair 1:\",\n", " furniture_images[0],\n", " \"chair 2:\",\n", " furniture_images[1],\n", " \"chair 3:\",\n", " furniture_images[2],\n", " \"chair 4:\",\n", " furniture_images[3],\n", " \"room:\",\n", " room_image,\n", " \"You are an interior designer. Return in JSON, for each chair, whether it would fit in the room, with an explanation:\",\n", "]\n", "\n", "responses = client.models.generate_content_stream(\n", " model=MODEL_ID,\n", " contents=contents,\n", " config=GenerateContentConfig(response_mime_type=\"application/json\"),\n", ")\n", "\n", "print(\"\\n-------Response--------\")\n", "for response in responses:\n", " print(response.text, end=\"\")" ] }, { "cell_type": "markdown", "metadata": { "id": "0_uhGdZERToX" }, "source": [ "## Conclusion\n", "\n", "This notebook showed how you can easily build a multimodal recommendation system using Gemini for furniture, but you can also use the similar approach in:\n", "\n", "- recommending clothes based on an occasion or an image of the venue\n", "- recommending wallpaper based on the room and settings\n", "\n", "You may also want to explore how you can build a RAG (retrieval-augmented generation) system where you retrieve relevant images from your store inventory to users who can they use Gemini to help identify the most ideal choice from the various options provided, and also explain the rationale to users." ] } ], "metadata": { "colab": { "name": "multimodal_retail_recommendations.ipynb", "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }

gemini/use-cases/retail/multimodal_retail_recommendations.ipynb (538 lines of code) (raw):