colab-enterprise/Create-Campaign-Multimodal-Segment-Recommendations.ipynb (1,561 lines of code) (raw):

{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "MokqWo8Oiykt" }, "source": [ "## <font color='#4285f4'>Overview</font>" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook uses Gemini to analyze marketing campaigns and identify ideal customer segments for Chocolate AI. You can start with either a campaign description or a video ad, and Gemini will recommend customer segments to target based one 1/ a predefined set of segments, and 2/ a dynamic segment that doesn’t already exist. The notebook shows how you can easily query complex segment data using JSON, and it demonstrates how to find matches for a net new segment using vector search in BigQuery.\n", "\n", "Process Flow: \n", "1. Retrieve pre-defined customer segments from a BigQuery table (${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments)\n", "2. Text-based Analysis:\n", " - Fetch the details of the latest generated marketing campaign from another BigQuery table (${project_id}.${bigquery_chocolate_ai_dataset}.campaign)\n", " - Construct a detailed prompt for Gemini LLM, providing campaign information and asking it to identify relevant pre-defined and novel customer segments along with explanations.\n", " - Process the LLM response and extract the suggested segments and reasoning.\n", " - Query BigQuery to count customers matching the recommended pre-defined segments.\n", "3. Video-based Analysis:\n", " - Upload a sample advertisement video to a Google Cloud Storage bucket.\n", " - Call multimodal Gemini, providing the video URI and a prompt to identify customer segments likely receptive to the ad.\n", " - Extract and displays the recommended pre-defined and novel segments, along with explanations.\n", "4. Increase reach with dynamic segments powered by vector search\n", " - Extract keywords from the LLM-generated ideal segment description.\n", " - Perform vector search on customer profiles using these keywords to retrieve a list of matching customers (i.e. top 100).\n", "\n", "Author: Paul Ramsey" ] }, { "cell_type": "markdown", "metadata": { "id": "HMsUvoF4BP7Y" }, "source": [ "## <font color='#4285f4'>License</font>" ] }, { "cell_type": "markdown", "metadata": { "id": "jQgQkbOvj55d" }, "source": [ "```\n", "# Copyright 2024 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License.\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "UmyL-Rg4Dr_f" }, "source": [ "## <font color='#4285f4'>Initialize</font>" ] }, { "cell_type": "markdown", "metadata": { "id": "pZcs1uLnI6U2" }, "source": [ "### Pip Installs and Imports" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "xOYsEVSXp6IP" }, "outputs": [], "source": [ "from IPython.display import HTML\n", "from functools import reduce\n", "import google.auth\n", "import requests\n", "import json\n", "import markdown\n", "\n", "import logging\n", "from tenacity import retry, wait_exponential, stop_after_attempt, before_sleep_log, retry_if_exception\n", "from google.cloud import bigquery\n", "client = bigquery.Client()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "wMlHl3bnkFPZ" }, "outputs": [], "source": [ "# Update these variables to match your environment\n", "location=\"us-central1\" # Your region\n", "bigquery_location = \"${bigquery_location}\" # Must be \"us\" or \"eu\"\n", "\n", "### Do not change the values in this cell below this line ###\n", "project_id = !(gcloud config get-value project)\n", "user = !(gcloud auth list --filter=status:ACTIVE --format=\"value(account)\")\n", "\n", "if len(project_id) != 1:\n", " raise RuntimeError(f\"project_id is not set: {project_id}\")\n", "project_id = project_id[0]\n", "\n", "if len(user) != 1:\n", " raise RuntimeError(f\"user is not set: {user}\")\n", "user = user[0]\n", "\n", "bucket_name = \"${chocolate_ai_bucket}\"\n", "\n", "print(f\"project_id = {project_id}\")\n", "print(f\"user = {user}\")\n", "print(f\"location = {location}\")\n", "print(f\"bigquery_location = {bigquery_location}\")\n", "print(f\"bucket_name = {bucket_name}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "sZ6m_wGrK0YG" }, "source": [ "### Helper Methods" ] }, { "cell_type": "markdown", "metadata": { "id": "g57ZpqGRgg51" }, "source": [ "#### RetryCondition(error)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "lT9geulqggwr" }, "outputs": [], "source": [ "def RetryCondition(error):\n", " error_string = str(error)\n", " print(error_string)\n", "\n", " retry_errors = [\n", " \"RESOURCE_EXHAUSTED\",\n", " \"No content in candidate\",\n", " # Add more error messages here as needed\n", " ]\n", "\n", " for retry_error in retry_errors:\n", " if retry_error in error_string:\n", " print(\"Retrying...\")\n", " return True\n", "\n", " return False" ] }, { "cell_type": "markdown", "metadata": { "id": "JbOjdSP1kN9T" }, "source": [ "#### restAPIHelper\n", "Calls the Google Cloud REST API using the current users credentials." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "40wlwnY4kM11" }, "outputs": [], "source": [ "def restAPIHelper(url: str, http_verb: str, request_body: str) -> str:\n", " \"\"\"Calls the Google Cloud REST API passing in the current users credentials\"\"\"\n", "\n", " import requests\n", " import google.auth\n", " import json\n", "\n", " # Get an access token based upon the current user\n", " creds, project = google.auth.default()\n", " auth_req = google.auth.transport.requests.Request()\n", " creds.refresh(auth_req)\n", " access_token=creds.token\n", "\n", " headers = {\n", " \"Content-Type\" : \"application/json\",\n", " \"Authorization\" : \"Bearer \" + access_token\n", " }\n", "\n", " if http_verb == \"GET\":\n", " response = requests.get(url, headers=headers)\n", " elif http_verb == \"POST\":\n", " response = requests.post(url, json=request_body, headers=headers)\n", " elif http_verb == \"PUT\":\n", " response = requests.put(url, json=request_body, headers=headers)\n", " elif http_verb == \"PATCH\":\n", " response = requests.patch(url, json=request_body, headers=headers)\n", " elif http_verb == \"DELETE\":\n", " response = requests.delete(url, headers=headers)\n", " else:\n", " raise RuntimeError(f\"Unknown HTTP verb: {http_verb}\")\n", "\n", " if response.status_code == 200:\n", " return json.loads(response.content)\n", " #image_data = json.loads(response.content)[\"predictions\"][0][\"bytesBase64Encoded\"]\n", " else:\n", " error = f\"Error restAPIHelper -> ' Status: '{response.status_code}' Text: '{response.text}'\"\n", " raise RuntimeError(error)" ] }, { "cell_type": "markdown", "metadata": { "id": "4VMFSz-R3BW2" }, "source": [ "#### Gemini LLM (Pro 1.0 , Pro 1.5)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "wnf6nhAc3JwL" }, "outputs": [], "source": [ "@retry(wait=wait_exponential(multiplier=1, min=1, max=60), stop=stop_after_attempt(10), retry=retry_if_exception(RetryCondition), before_sleep=before_sleep_log(logging.getLogger(), logging.INFO))\n", "def GeminiLLM(prompt, model = \"gemini-2.0-flash\", response_schema = None,\n", " temperature = 1, topP = 1, topK = 32):\n", "\n", " # https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference#supported_models\n", " # model = \"gemini-2.0-flash\"\n", "\n", " llm_response = None\n", " if temperature < 0:\n", " temperature = 0\n", "\n", " creds, project = google.auth.default()\n", " auth_req = google.auth.transport.requests.Request() # required to acess access token\n", " creds.refresh(auth_req)\n", " access_token=creds.token\n", "\n", " headers = {\n", " \"Content-Type\" : \"application/json\",\n", " \"Authorization\" : \"Bearer \" + access_token\n", " }\n", "\n", " # https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference\n", " url = f\"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/publishers/google/models/{model}:generateContent\"\n", "\n", " generation_config = {\n", " \"temperature\": temperature,\n", " \"topP\": topP,\n", " \"maxOutputTokens\": 8192,\n", " \"candidateCount\": 1,\n", " \"responseMimeType\": \"application/json\",\n", " }\n", "\n", " # Add inthe response schema for when it is provided\n", " if response_schema is not None:\n", " generation_config[\"responseSchema\"] = response_schema\n", "\n", " if model == \"gemini-2.0-flash\":\n", " generation_config[\"topK\"] = topK\n", "\n", " payload = {\n", " \"contents\": {\n", " \"role\": \"user\",\n", " \"parts\": {\n", " \"text\": prompt\n", " },\n", " },\n", " \"generation_config\": {\n", " **generation_config\n", " },\n", " \"safety_settings\": {\n", " \"category\": \"HARM_CATEGORY_SEXUALLY_EXPLICIT\",\n", " \"threshold\": \"BLOCK_LOW_AND_ABOVE\"\n", " }\n", " }\n", "\n", " response = requests.post(url, json=payload, headers=headers)\n", "\n", " if response.status_code == 200:\n", " try:\n", " json_response = json.loads(response.content)\n", " except Exception as error:\n", " raise RuntimeError(f\"An error occurred parsing the JSON: {error}\")\n", "\n", " if \"candidates\" in json_response:\n", " candidates = json_response[\"candidates\"]\n", " if len(candidates) > 0:\n", " candidate = candidates[0]\n", " if \"content\" in candidate:\n", " content = candidate[\"content\"]\n", " if \"parts\" in content:\n", " parts = content[\"parts\"]\n", " if len(parts):\n", " part = parts[0]\n", " if \"text\" in part:\n", " text = part[\"text\"]\n", " llm_response = text\n", " else:\n", " raise RuntimeError(\"No text in part: {response.content}\")\n", " else:\n", " raise RuntimeError(\"No parts in content: {response.content}\")\n", " else:\n", " raise RuntimeError(\"No parts in content: {response.content}\")\n", " else:\n", " raise RuntimeError(\"No content in candidate: {response.content}\")\n", " else:\n", " raise RuntimeError(\"No candidates: {response.content}\")\n", " else:\n", " raise RuntimeError(\"No candidates: {response.content}\")\n", "\n", " # Remove some typically response characters (if asking for a JSON reply)\n", " llm_response = llm_response.replace(\"```json\",\"\")\n", " llm_response = llm_response.replace(\"```\",\"\")\n", " llm_response = llm_response.replace(\"\\n\",\"\")\n", "\n", " return llm_response\n", "\n", " else:\n", " raise RuntimeError(f\"Error with prompt:'{prompt}' Status:'{response.status_code}' Text:'{response.text}'\")" ] }, { "cell_type": "markdown", "metadata": { "id": "CbBALqHr3OsE" }, "source": [ "#### Gemini LLM - Multimodal" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "n6LMtgHX3SXU" }, "outputs": [], "source": [ "@retry(wait=wait_exponential(multiplier=1, min=1, max=60), stop=stop_after_attempt(10), retry=retry_if_exception(RetryCondition), before_sleep=before_sleep_log(logging.getLogger(), logging.INFO))\n", "def GeminiLLM_Multimodal(\n", " text_prompt, video_uri, model=\"gemini-2.0-flash\", response_schema=None, temperature=1, topP=1, topK=32\n", "):\n", " \"\"\"\n", " Calls the Gemini API with a text prompt and an image prompt.\n", "\n", " Args:\n", " text_prompt: The text prompt.\n", " image_data: A BytesIO object containing the image data.\n", " model: The Gemini model to use.\n", " response_schema: Optional response schema.\n", " temperature: Temperature parameter for the model.\n", " topP: Top-p parameter for the model.\n", " topK: Top-k parameter for the model.\n", "\n", " Returns:\n", " The Gemini response as a string.\n", " \"\"\"\n", "\n", " creds, project = google.auth.default()\n", " auth_req = google.auth.transport.requests.Request()\n", " creds.refresh(auth_req)\n", " access_token = creds.token\n", "\n", " headers = {\n", " \"Content-Type\": \"application/json\",\n", " \"Authorization\": \"Bearer \" + access_token\n", " }\n", "\n", " url = f\"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/publishers/google/models/{model}:generateContent\"\n", "\n", "\n", " generation_config = {\n", " \"temperature\": temperature,\n", " \"topP\": topP,\n", " \"maxOutputTokens\": 8192,\n", " \"candidateCount\": 1,\n", " \"responseMimeType\": \"application/json\" #Invalid for multi-modal responses\n", " }\n", "\n", " if response_schema is not None:\n", " generation_config[\"responseSchema\"] = response_schema\n", "\n", " if model in [\"gemini-2.0-flash\"]:\n", " generation_config[\"topK\"] = topK\n", "\n", " # Construct the payload with the image URI\n", " payload = {\n", " \"contents\": [{\n", " \"role\": \"user\",\n", " \"parts\": [\n", " {\n", " \"text\": text_prompt\n", " },\n", " {\n", " \"fileData\": {\n", " \"fileUri\": video_uri,\n", " \"mimeType\": \"video/mp4\"\n", " }\n", " }\n", " ]\n", " }],\n", " \"generation_config\": generation_config,\n", " \"safety_settings\": {\n", " \"category\": \"HARM_CATEGORY_SEXUALLY_EXPLICIT\",\n", " \"threshold\": \"BLOCK_ONLY_HIGH\"\n", " }\n", " }\n", "\n", " response = requests.post(url, json=payload, headers=headers)\n", "\n", " if response.status_code == 200:\n", " return json.loads(response.text)\n", " else:\n", " raise RuntimeError(\n", " f\"Error with prompt:'{text_prompt}' Status:'{response.status_code}' Text:'{response.text}'\"\n", " )\n" ] }, { "cell_type": "markdown", "metadata": { "id": "FumgN4cWV5An" }, "source": [ "#### Run Query()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "FrHOgq0sV45p" }, "outputs": [], "source": [ "def RunQuery(sql, job_config = None):\n", " import time\n", "\n", " if (sql.startswith(\"SELECT\") or sql.startswith(\"WITH\")):\n", " df_result = client.query(sql).to_dataframe()\n", " return df_result\n", " else:\n", " if job_config == None:\n", " job_config = bigquery.QueryJobConfig(priority=bigquery.QueryPriority.INTERACTIVE)\n", " query_job = client.query(sql, job_config=job_config)\n", "\n", " # Check on the progress by getting the job's updated state.\n", " query_job = client.get_job(\n", " query_job.job_id, location=query_job.location\n", " )\n", " print(\"Job {} is currently in state {} with error result of {}\".format(query_job.job_id, query_job.state, query_job.error_result))\n", "\n", " while query_job.state != \"DONE\":\n", " time.sleep(2)\n", " query_job = client.get_job(\n", " query_job.job_id, location=query_job.location\n", " )\n", " print(\"Job {} is currently in state {} with error result of {}\".format(query_job.job_id, query_job.state, query_job.error_result))\n", "\n", " if query_job.error_result == None:\n", " return True\n", " else:\n", " return False" ] }, { "cell_type": "markdown", "metadata": { "id": "apiQ6P-4gZUh" }, "source": [ "#### PrettyPrintJson(json_string)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "aUrsncqA3lqK" }, "outputs": [], "source": [ "def PrettyPrintJson(json_string):\n", " json_object = json.loads(json_string)\n", " json_formatted_str = json.dumps(json_object, indent=2)\n", " #print(json_formatted_str)\n", " return json.dumps(json_object)" ] }, { "cell_type": "markdown", "metadata": { "id": "6iSUuDYYExOz" }, "source": [ "#### DisplayMarkdown()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "8aZDTHvNExIK" }, "outputs": [], "source": [ "def DisplayMarkdown(text):\n", " \"\"\"\n", " Displays text in markdown/HTML format in a Colab notebook.\n", "\n", " Args:\n", " text: The text to display. Can be plain text or Markdown.\n", " \"\"\"\n", "\n", " formatted_text = markdown.markdown(text) # Convert to HTML if necessary\n", " display(HTML(formatted_text))\n" ] }, { "cell_type": "markdown", "metadata": { "id": "axOnALYKCtHl" }, "source": [ "### Campaign Methods" ] }, { "cell_type": "markdown", "metadata": { "id": "97MGoAdc_1Vh" }, "source": [ "#### GetPreDefinedSegments()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "OwiODKR-6XGM" }, "outputs": [], "source": [ "def GetPreDefinedSegments():\n", " sql = f\"\"\"SELECT JSON_OBJECT(\n", " 'pre_defined_customer_segments', JSON_ARRAY (\n", " (\n", " --benefits_sought\n", " SELECT JSON_OBJECT('benefits_sought', ARRAY_AGG(DISTINCT segment))\n", " FROM (\n", " SELECT DISTINCT benefits_sought AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", " (\n", " --browsing_behavior\n", " SELECT JSON_OBJECT('browsing_behavior', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT browsing_behavior AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", " (\n", " --occasion_timing\n", " SELECT JSON_OBJECT('occasion_timing', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT occasion_timing AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", " (\n", " --purchase_history\n", " SELECT JSON_OBJECT('purchase_history', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT purchase_history AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", " (\n", " --spending_habits\n", " SELECT JSON_OBJECT('spending_habits', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT spending_habits AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", "\n", " (\n", " --usage_frequency\n", " SELECT JSON_OBJECT('usage_frequency', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT usage_frequency AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", "\n", " (\n", " --user_status\n", " SELECT JSON_OBJECT('user_status', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT user_status AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", "\n", " (\n", " --age\n", " SELECT JSON_OBJECT('age', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT age AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", "\n", " (\n", " --education\n", " SELECT JSON_OBJECT('education', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT education AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", "\n", " (\n", " --ethnicity\n", " SELECT JSON_OBJECT('ethnicity', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT ethnicity AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", "\n", " (\n", " --family_size\n", " SELECT JSON_OBJECT('family_size', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT family_size AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", "\n", " (\n", " --gender\n", " SELECT JSON_OBJECT('gender', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT gender AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", "\n", " (\n", " --generation\n", " SELECT JSON_OBJECT('generation', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT generation AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", "\n", " (\n", " --income\n", " SELECT JSON_OBJECT('income', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT income AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " WHERE income NOT LIKE 'Unknown%'\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", "\n", " (\n", " --language\n", " SELECT JSON_OBJECT('language', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT language AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", "\n", " (\n", " --marital_status\n", " SELECT JSON_OBJECT('marital_status', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT marital_status AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", "\n", " (\n", " --occupation\n", " SELECT JSON_OBJECT('occupation', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT occupation AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", "\n", " (\n", " --challenges\n", " SELECT JSON_OBJECT('challenges', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT challenges AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", "\n", " (\n", " --goals\n", " SELECT JSON_OBJECT('goals', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT goals AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", "\n", " (\n", " --pain_points\n", " SELECT JSON_OBJECT('pain_points', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT pain_points AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", "\n", " (\n", " --priorities\n", " SELECT JSON_OBJECT('priorities', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT priorities AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", "\n", " (\n", " --specific_needs\n", " SELECT JSON_OBJECT('specific_needs', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT specific_needs AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", "\n", " (\n", " --interests\n", " SELECT JSON_OBJECT('interests', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT interests AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", "\n", " (\n", " --lifestyle\n", " SELECT JSON_OBJECT('lifestyle', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT lifestyle AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", "\n", " (\n", " --motivations\n", " SELECT JSON_OBJECT('motivations', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT motivations AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", "\n", " (\n", " --personality\n", " SELECT JSON_OBJECT('personality', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT personality AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", "\n", " (\n", " --customer_values\n", " SELECT JSON_OBJECT('customer_values', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT customer_values AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", "\n", " (\n", " --social_media_platforms\n", " SELECT JSON_OBJECT('social_media_platforms', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT social_media_platforms AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " ),\n", "\n", " (\n", " --tech_savviness\n", " SELECT JSON_OBJECT('tech_savviness', ARRAY_AGG(DISTINCT TRIM(segment, \" \")))\n", " FROM (\n", " SELECT DISTINCT tech_savviness AS segments\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " ), UNNEST(SPLIT(segments, ',')) AS segment\n", " )\n", " )\n", " )\"\"\"\n", "\n", " result = RunQuery(sql)\n", " return result" ] }, { "cell_type": "markdown", "metadata": { "id": "YVx2ex_1BtqR" }, "source": [ "#### GetMenuItem(menu_id)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "OYh4A4jHBt0a" }, "outputs": [], "source": [ "def GetMenuItem(menu_id):\n", " sql = f\"\"\"SELECT *\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.menu`\n", " WHERE menu_id = {menu_id}\n", " \"\"\"\n", " result = RunQuery(sql)\n", " return result" ] }, { "cell_type": "markdown", "metadata": { "id": "PexYQ-Lb_4vp" }, "source": [ "#### GetSegmentsFromCampaign()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "KR4NXhRHWfnM" }, "outputs": [], "source": [ "def GetSegmentsFromCampaign(campaign):\n", "\n", " # Extract campaign variables from the campaign object\n", " campaign_id = campaign['campaign_id']\n", " menu_id = campaign['menu_id']\n", " campaign_name = campaign['campaign_name']\n", " campaign_description = campaign['campaign_description']\n", " campaign_goal = campaign['campaign_goal']\n", " target_audience = campaign['target_audience']\n", " marketing_channels = campaign['marketing_channels']\n", " budget = campaign['budget']\n", " explanation = campaign['explanation']\n", " campaign_outcomes = campaign['campaign_outcomes']\n", " campaign_start_date = campaign['campaign_start_date']\n", " campaign_end_date = campaign['campaign_end_date']\n", " campaign_created_date = campaign['campaign_created_date']\n", "\n", " # Get menu item details\n", " menu_item = GetMenuItem(menu_id)\n", " menu_item = menu_item.iloc[0]\n", " menu_item_name = menu_item['menu_name']\n", " menu_item_description = menu_item['menu_description']\n", " menu_item_price = menu_item['menu_price']\n", "\n", " # Get pre-defined segments from the customer_marketing_profile table\n", " pre_defined_segments = GetPreDefinedSegments()\n", " pre_defined_segments = pre_defined_segments.iloc[0,0]\n", "\n", " # Define prompt\n", " prompt = f\"\"\"You are a Marketing Analyst, and you are reviewing Chocolate AI's latest\n", " marketing campaign called: {campaign_name}. Your task is to determine the ideal\n", " customer segments that should be targeted with the new campaign. You should choose\n", " at least two segments from the following set of pre-defined customer segments:\n", "\n", " {pre_defined_segments}\n", "\n", " You should also define one segment that doesn't exist in the pre-defined set of\n", " segments that describes the perfect target audience for the new campaign.\n", "\n", " Here are the details of the campaign:\n", "\n", " Campaign Name: {campaign_name}\n", " Campaign Description: {campaign_description}\n", " Menu Item being Promoted:\n", " Campaign Goal: {campaign_goal}\n", " Target Audience: {target_audience}\n", " Marketing Channels: {marketing_channels}\n", " Budget: {budget}\n", " Explanation: {explanation}\n", " Campaign Outcomes: {campaign_outcomes}\n", " Campaign Start Date: {campaign_start_date}\n", " Campaign End Date: {campaign_end_date}\n", "\n", " Now choose at least two segments from the set of pre-defined customer segments,\n", " and come up with one segment that doesn't exist in the pre-defined set of segments\n", " that describes the perfect target audience for the new campaign.\n", "\n", " Think step by step and explain your reasoning.\"\"\"\n", "\n", " response_schema = {\n", " \"type\": \"object\",\n", " \"required\": [\n", " \"pre_defined_segments\", \"novel_segment\", \"explanation\"\n", " ],\n", " \"properties\": {\n", " \"pre_defined_segments\": {\n", " \"type\": \"array\",\n", " \"description\": \"An array of customer segments chosen from a pre-defined list.\",\n", " \"items\": {\n", " \"type\": \"string\",\n", " \"description\": \"A segment that is chosen from a pre-defined list.\"\n", " }\n", " },\n", " \"novel_segment\": {\n", " \"type\": \"string\",\n", " \"description\": \"A segment that is created to fit the ideal audience and does not exist in the pre-defined list.\"\n", " },\n", " \"explanation\": {\n", " \"type\": \"string\",\n", " \"description\": \"An explanation for your reasoning on why you picked these segments.\"\n", " }\n", " }\n", " }\n", "\n", " result = GeminiLLM(prompt, response_schema=response_schema)\n", " return result" ] }, { "cell_type": "markdown", "metadata": { "id": "IYiVTw90dTTX" }, "source": [ "#### GetSegmentsFromVideo(campaign, uri)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Mcz8Ut7ucSAV" }, "outputs": [], "source": [ "def GetSegmentsFromVideo(video_uri):\n", "\n", " # Get pre-defined segments from the customer_marketing_profile table\n", " pre_defined_segments = GetPreDefinedSegments()\n", " pre_defined_segments = pre_defined_segments.iloc[0,0]\n", "\n", " # Define prompt\n", " text_prompt = f\"\"\"You are a marketing analyst, and you are reviewing a new video\n", " ad created by your Content & Creatives team. Your task is to watch the new ad\n", " and determine the customer segments that would be most receptive to the new ad\n", " and also the most likely to make a purchase after seeing the ad.\n", "\n", " You should choose at least two segments from the following set of pre-defined\n", " customer segments:\n", "\n", " {pre_defined_segments}\n", "\n", " You should also define one segment that doesn't exist in the pre-defined set of\n", " segments that describes the perfect target audience for the new campaign.\n", "\n", " Now watch the supplied video, then choose at least two segments from the set of\n", " pre-defined customer segments, and come up with one segment that doesn't exist\n", " in the pre-defined set of segments that describes the perfect target audience\n", " for the new ad.\n", "\n", " Think step by step and explain your reasoning.\"\"\"\n", "\n", " response_schema = {\n", " \"type\": \"object\",\n", " \"required\": [\n", " \"pre_defined_segments\", \"novel_segment\", \"explanation\"\n", " ],\n", " \"properties\": {\n", " \"pre_defined_segments\": {\n", " \"type\": \"array\",\n", " \"description\": \"An array of customer segments chosen from a pre-defined list.\",\n", " \"items\": {\n", " \"type\": \"string\",\n", " \"description\": \"A segment that is chosen from a pre-defined list.\"\n", " }\n", " },\n", " \"novel_segment\": {\n", " \"type\": \"string\",\n", " \"description\": \"A segment that is created to fit the ideal audience and does not exist in the pre-defined list.\"\n", " },\n", " \"explanation\": {\n", " \"type\": \"string\",\n", " \"description\": \"An explanation for your reasoning on why you picked these segments.\"\n", " }\n", " }\n", " }\n", "\n", " response = GeminiLLM_Multimodal(text_prompt, video_uri, response_schema=response_schema)\n", "\n", " # Display the results\n", " #response_text = response['candidates'][0]['content']['parts'][0]['text']\n", "\n", " #result = GeminiLLM(prompt, response_schema=response_schema)\n", " return response" ] }, { "cell_type": "markdown", "metadata": { "id": "EYRHDPdVKBzd" }, "source": [ "## <font color='#4285f4'>Get Segments from Campaign Description</font>" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Kcqp37O39uSr" }, "outputs": [], "source": [ "# Get latest generated campaign\n", "sql = \"\"\"SELECT * FROM `${project_id}.${bigquery_chocolate_ai_dataset}.campaign`\n", " ORDER BY campaign_created_date\n", " LIMIT 1;\"\"\"\n", "\n", "campaign = RunQuery(sql)\n", "campaign = campaign.iloc[0]\n", "\n", "recommended_segments = GetSegmentsFromCampaign(campaign)\n", "recommended_segments = json.loads(recommended_segments)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "_pt6bshvDKqZ" }, "outputs": [], "source": [ "result_string = f\"\"\"\n", "##Suggested Customer Segments\n", "\n", "**Campaign Name:**\n", "{campaign['campaign_name']}\n", "\n", "**Campaign Description:**\n", "{campaign['campaign_description']}\n", "\n", "**Suggested customer segments (existing):**\n", "{recommended_segments[\"pre_defined_segments\"]}\n", "\n", "**Ideal customer segment (new segment):**\n", "{recommended_segments[\"novel_segment\"]}\n", "\n", "**Reasoning for segment choices:**\n", "{recommended_segments['explanation']}\n", "\"\"\"\n", "\n", "DisplayMarkdown(result_string)\n" ] }, { "cell_type": "markdown", "metadata": { "id": "Gysuhni0G2hH" }, "source": [ "Let's assume that Gemini chose the following values:\n", "\n", "- Existing customer segments to target: `[\"purchase_history\": \"High Spender\"', '\"social_media_platforms\": [\"Instagram\", \"Facebook\"]]`\n", "- Ideal customer segment: \"Luxury Foodie: Individuals with discerning palates who seek unique and high-quality culinary experiences. They are interested in the story behind their food, appreciate artisanal craftsmanship, and are willing to spend a premium for exceptional taste and ingredients.\"\n", "\n", "We can search through our customer data to get a list of customers we should target for this campaign based on this criteria." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "bFqD5NxRDfBl" }, "outputs": [], "source": [ "sql = \"\"\"SELECT COUNT(*) AS prospect_count\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " WHERE purchase_history = 'High Spender'\n", " OR purchase_history = 'Frequent Buyer';\"\"\"\n", "\n", "prospect_count = RunQuery(sql)\n", "prospect_count = prospect_count.iloc[0]\n", "\n", "print(f\"There are {prospect_count['prospect_count']} customers in the recommended pre-defined segments.\")\n", "\n", "sql = \"\"\"SELECT *\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " WHERE purchase_history = 'High Spender'\n", " OR purchase_history = 'Frequent Buyer';\"\"\"\n", "\n", "prospect_list = RunQuery(sql)\n", "prospect_list" ] }, { "cell_type": "markdown", "metadata": { "id": "tS1IwLSNIW8i" }, "source": [ "We can also use Gemini to extract the key words from the description of the new ideal customer segment, and we can use vector search on our customer_marketing_profile data to get a list of customers who match this criteria." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "dEs51PpKIWMR" }, "outputs": [], "source": [ "prompt = f\"\"\"Extract up to 7 key words from the following customer segment description:\n", "\n", "{recommended_segments[\"novel_segment\"]}\n", "\"\"\"\n", "\n", "response_schema = {\n", " \"type\": \"object\",\n", " \"required\": [\n", " \"key_words\"\n", " ],\n", " \"properties\": {\n", " \"key_words\": {\n", " \"type\": \"array\",\n", " \"description\": \"An array of key words extracted from the supplied text.\",\n", " \"items\": {\n", " \"type\": \"string\",\n", " \"description\": \"A key term extracted from the supplied text.\"\n", " }\n", " }\n", " }\n", " }\n", "\n", "key_words = GeminiLLM(prompt, response_schema=response_schema)\n", "key_words = json.loads(key_words)\n", "key_words_string = \" \".join(key_words['key_words'])\n", "print(f\"Key words: {key_words_string}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Wuk4iwsGJ-RW" }, "outputs": [], "source": [ "# Get the top 1000 customers that match the target customer segment\n", "row_limit = 1000\n", "sql = f\"\"\"WITH target_customers AS (\n", " SELECT\n", " distance,\n", " base.customer_id,\n", " base.customer_marketing_insights,\n", " base.customer_profile_data,\n", " base.customer_loyalty_data,\n", " base.customer_segmentation_data\n", " --base.customer_marketing_insights_embedding\n", " FROM VECTOR_SEARCH(\n", " -- base table or subquery\n", " TABLE `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile`,\n", "\n", " -- embedding column to search in base table - must be of type ARRAY<FLOAT64>\n", " 'customer_marketing_insights_embedding',\n", "\n", " -- query table or subquery - this is where you generate the search embedding\n", " (\n", " SELECT ml_generate_embedding_result, content AS query\n", " FROM ML.GENERATE_EMBEDDING(\n", " MODEL `${project_id}.${bigquery_chocolate_ai_dataset}.textembedding_model`,\n", " (\n", " -- Search term\n", " SELECT \"{key_words_string}\" AS content\n", " ),\n", " STRUCT(\n", " TRUE AS flatten_json_output,\n", " 'SEMANTIC_SIMILARITY' as task_type,\n", " 768 AS output_dimensionality\n", " )\n", " )\n", " ),\n", " top_k => {row_limit}\n", " ) )\n", " SELECT distance,\n", " target_customers.customer_id,\n", " cus.customer_name,\n", " cus.customer_email,\n", " cus.customer_yob,\n", " target_customers.customer_marketing_insights,\n", " target_customers.customer_profile_data,\n", " target_customers.customer_loyalty_data,\n", " target_customers.customer_segmentation_data\n", " FROM target_customers\n", " JOIN `${project_id}.${bigquery_chocolate_ai_dataset}.customer` cus\n", " ON target_customers.customer_id = cus.customer_id\n", " ORDER BY distance;\n", " \"\"\"\n", "\n", "result = RunQuery(sql)\n", "result" ] }, { "cell_type": "markdown", "metadata": { "id": "Kq1b6W0YKAzm" }, "source": [ "## <font color='#4285f4'>Get Segments from Video</font>" ] }, { "cell_type": "markdown", "metadata": { "id": "D-Jv3zKwf3KA" }, "source": [ "### Copy Video to Local Bucket" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "vLIjv5AaJqTL" }, "outputs": [], "source": [ "# Copy the sample ad video to your local bucket\n", "source_video_uri = \"gs://data-analytics-golden-demo/chocolate-ai/v1/Campaign-Assets-Text-to-Video-01/story-01/full-video-with-audio-en-GB.mp4\"\n", "target_video_uri = f\"gs://{bucket_name}/chocolate_ai/videos/chocolate-ai_story-HITL-01_full-video-with-audio-en-GB.mp4\"\n", "\n", "!gsutil -m cp {source_video_uri} {target_video_uri}\n" ] }, { "cell_type": "markdown", "metadata": { "id": "Ixu4Ptebf5wo" }, "source": [ "### Watch the New Marketing Video" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "EYu_0qI7emJJ" }, "outputs": [], "source": [ "# View the video\n", "uri_index = target_video_uri.index(bucket_name)\n", "video_uri = 'https://storage.cloud.google.com/' + target_video_uri[uri_index:]\n", "\n", "HTML(f\"\"\"\n", "<video width=600 height=337 controls autoplay>\n", " <source src=\"{video_uri}\" type=\"video/mp4\">\n", "</video>\n", "\"\"\")" ] }, { "cell_type": "markdown", "metadata": { "id": "I-BVMQfogbt1" }, "source": [ "### Get Recommended Segments for Ad from Gemini" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "nnomvE_EpMh-" }, "outputs": [], "source": [ "# Have Gemini watch the ad and suggest target segments\n", "result = GetSegmentsFromVideo(target_video_uri)\n", "\n", "json_result = json.loads(result['candidates'][0]['content']['parts'][0]['text'])\n", "pre_defined_segments = json_result['pre_defined_segments']\n", "novel_segment = json_result['novel_segment']\n", "explanation = json_result['explanation']\n", "\n", "result_string = f\"\"\"\n", "##Suggested Customer Segments for Video\n", "\n", "**Suggested customer segments (existing):**\n", "{json_result[\"pre_defined_segments\"]}\n", "\n", "**Ideal customer segment (new segment):**\n", "{json_result[\"novel_segment\"]}\n", "\n", "**Reasoning for segment choices:**\n", "{json_result['explanation']}\n", "\"\"\"\n", "\n", "DisplayMarkdown(result_string)" ] }, { "cell_type": "markdown", "metadata": { "id": "2jJTNc92sIY0" }, "source": [ "Let's assume that Gemini chose the following values:\n", "\n", "- Existing customer segments to target: ['Special Occasions', 'High Spender']\n", "- Ideal customer segment: \"Luxury Dessert Enthusiast\"\n", "\n", "We can search through our customer data to get a list of customers we should target for this campaign based on this criteria." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "MshapYjXsudH" }, "outputs": [], "source": [ "sql = \"\"\"SELECT COUNT(*) AS prospect_count\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " WHERE purchase_history = 'High Spender'\n", " OR occasion_timing = 'Special Occasions';\"\"\"\n", "\n", "prospect_count = RunQuery(sql)\n", "prospect_count = prospect_count.iloc[0]\n", "\n", "print(f\"There are {prospect_count['prospect_count']} customers in the recommended pre-defined segments.\")\n", "\n", "sql = \"\"\"SELECT *\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile_segments`\n", " WHERE purchase_history = 'High Spender'\n", " OR occasion_timing = 'Special Occasions';\"\"\"\n", "\n", "prospect_list = RunQuery(sql)\n", "prospect_list" ] }, { "cell_type": "markdown", "metadata": { "id": "swjNwrTqsudI" }, "source": [ "We can also use Gemini to generate key words related to the new ideal customer segment, and we can use vector search on our customer_marketing_profile data to get a list of customers who match this criteria." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "hvVJvVO4sudI" }, "outputs": [], "source": [ "prompt = f\"\"\"Generate up to 7 key words that describe a customer who fits into the following customer segment:\n", "\n", "{novel_segment}\n", "\"\"\"\n", "\n", "response_schema = {\n", " \"type\": \"object\",\n", " \"required\": [\n", " \"key_words\"\n", " ],\n", " \"properties\": {\n", " \"key_words\": {\n", " \"type\": \"array\",\n", " \"description\": \"An array of key words extracted from the supplied text.\",\n", " \"items\": {\n", " \"type\": \"string\",\n", " \"description\": \"A key term extracted from the supplied text.\"\n", " }\n", " }\n", " }\n", " }\n", "\n", "key_words = GeminiLLM(prompt, response_schema=response_schema)\n", "key_words = json.loads(key_words)\n", "key_words_string = \" \".join(key_words['key_words'])\n", "print(f\"Key words: {key_words_string}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Niq8YYlesudJ" }, "outputs": [], "source": [ "# Get the top 1000 customers that match the target customer segment\n", "row_limit = 1000\n", "sql = f\"\"\"WITH target_customers AS (\n", " SELECT\n", " distance,\n", " base.customer_id,\n", " base.customer_marketing_insights,\n", " base.customer_profile_data,\n", " base.customer_loyalty_data,\n", " base.customer_segmentation_data\n", " --base.customer_marketing_insights_embedding\n", " FROM VECTOR_SEARCH(\n", " -- base table or subquery\n", " TABLE `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile`,\n", "\n", " -- embedding column to search in base table - must be of type ARRAY<FLOAT64>\n", " 'customer_marketing_insights_embedding',\n", "\n", " -- query table or subquery - this is where you generate the search embedding\n", " (\n", " SELECT ml_generate_embedding_result, content AS query\n", " FROM ML.GENERATE_EMBEDDING(\n", " MODEL `${project_id}.${bigquery_chocolate_ai_dataset}.textembedding_model`,\n", " (\n", " -- Search term\n", " SELECT \"{key_words_string}\" AS content\n", " ),\n", " STRUCT(\n", " TRUE AS flatten_json_output,\n", " 'SEMANTIC_SIMILARITY' as task_type,\n", " 768 AS output_dimensionality\n", " )\n", " )\n", " ),\n", " top_k => {row_limit}\n", " ) )\n", " SELECT distance,\n", " target_customers.customer_id,\n", " cus.customer_name,\n", " cus.customer_email,\n", " cus.customer_yob,\n", " target_customers.customer_marketing_insights,\n", " target_customers.customer_profile_data,\n", " target_customers.customer_loyalty_data,\n", " target_customers.customer_segmentation_data\n", " FROM target_customers\n", " JOIN `${project_id}.${bigquery_chocolate_ai_dataset}.customer` cus\n", " ON target_customers.customer_id = cus.customer_id\n", " ORDER BY distance ASC;\n", " \"\"\"\n", "\n", "result = RunQuery(sql)\n", "result" ] }, { "cell_type": "markdown", "metadata": { "id": "ASQ2BPisXDA0" }, "source": [ "## <font color='#4285f4'>Reference Links</font>\n" ] }, { "cell_type": "markdown", "metadata": { "id": "D2TgWfsR9uSr" }, "source": [ "- [Generate multimodal content with the Gemini Enterprise API](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference)\n", "- [BigQuery JSON Functions](https://cloud.google.com/bigquery/docs/reference/standard-sql/json_functions)\n", "- [Controlled Generation with Gemini](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/control-generated-output)\n", "- [Working with JSON Data in BigQuery](https://cloud.google.com/bigquery/docs/json-data)\n", "- [Generate Embeddings in BigQuery with ML.GENERATE_EMBEDDING](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-generate-embedding)\n", "- [BigQuery Vector Indexes](https://cloud.google.com/bigquery/docs/vector-index)\n", "- [BigQuery VECTOR_SEARCH() Function](https://cloud.google.com/bigquery/docs/reference/standard-sql/search_functions#vector_search)" ] } ], "metadata": { "colab": { "collapsed_sections": [ "42IxhtRRrvR-" ], "name": "DB-GMA-Create-Campaign-Customer-Segmentation", "private_outputs": true, "provenance": [], "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 0 }