colab-enterprise/Synthetic-Data-Generation-Customers-Marketing-Profile.ipynb (1,692 lines of code) (raw):

{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "SauhRkiN_Rq1" }, "source": [ "## <font color='#4285f4'>Overview</font>" ] }, { "cell_type": "markdown", "metadata": { "id": "FTcqjwUs-pGC" }, "source": [ "High-quality datasets are essential but often difficult to find. This demo showcases how Gemini can generate the data needed, removing this common problem. The data includes raw such as demographics data, social media data, hobbies, etc. It then queries the customers past purchases and create a chocolate-ai profile. Gemini is then used to create “Marketing Insights” about the customer based upon the raw data and chocolate-ai profile. Finally, an English readable customer summary is created.\n", "\n", "Process Flow: \n", "1. Define Functions to Generate and Derive Profile Data:\n", " - GenerateCustomerMarketingProfile: Uses Gemini to create detailed customer profiles including fictional occupations, interests, social media activity, and purchasing habits.\n", " - GenerateCustomerSegments: Analyzes the generated profile data and assigns relevant customer segments based on demographic, geographic, psychographic, behavioral, and other segmentation categories.\n", " - GenerateMarketingInsights: Summarizes the key marketing insights for each customer based on the generated profile and segmentation data.\n", " - DeriveCustomerLoyalty: Queries the existing order and review data in BigQuery to derive customer loyalty metrics like total amount spent, average order value, favorite items, and review sentiment.\n", "2. Run these functions for each customer in the customer table, populating the customer_marketing_profile table with the generated and derived data.\n", "\n", "Authors:\n", "* Paul Ramsey\n", "* Adam Paternostro" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## <font color='#4285f4'>License</font>" ] }, { "cell_type": "markdown", "metadata": { "id": "tfKslF-JBchh" }, "source": [ "```\n", "##################################################################################\n", "# Copyright 2024 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License.\n", "###################################################################################\n", "```\n" ] }, { "cell_type": "markdown", "metadata": { "id": "CJMI6O0PBrR0" }, "source": [ "## <font color='#4285f4'>Initialize</font>\n", "\n", "Update the values of the variables below to match your environment." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "5kEbeA3APzI7" }, "outputs": [], "source": [ "# Update these variables to match your environment\n", "location=\"us-central1\" # Your region\n", "bigquery_location = \"${bigquery_location}\" # Must be \"us\" or \"eu\"\n", "\n", "### Do not change the values in this cell below this line ###\n", "project_id = !(gcloud config get-value project)\n", "user = !(gcloud auth list --filter=status:ACTIVE --format=\"value(account)\")\n", "\n", "if len(project_id) != 1:\n", " raise RuntimeError(f\"project_id is not set: {project_id}\")\n", "project_id = project_id[0]\n", "\n", "if len(user) != 1:\n", " raise RuntimeError(f\"user is not set: {user}\")\n", "user = user[0]\n", "\n", "print(f\"project_id = {project_id}\")\n", "print(f\"user = {user}\")\n", "print(f\"location = {location}\")\n", "print(f\"bigquery_location = {bigquery_location}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "-sE7DTmMB645" }, "outputs": [], "source": [ "import google.auth\n", "import requests\n", "import json\n", "from tenacity import retry, wait_exponential, stop_after_attempt, before_sleep_log, retry_if_exception\n", "import logging\n", "\n", "from google.cloud import bigquery\n", "client = bigquery.Client()" ] }, { "cell_type": "markdown", "metadata": { "id": "dyU1rxRHB80C" }, "source": [ "## <font color='#4285f4'>Helper Functions</font>\n", "\n", "Run the cells below to initialize helper functions which are utilized throughout the notebook." ] }, { "cell_type": "markdown", "metadata": { "id": "mS7yiRQUCcUm" }, "source": [ "#### GeminiLLM (Pro 1.0 , Pro 1.5 and Pro Vision 1.0)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "wbDCCgD6Ctoe" }, "outputs": [], "source": [ "def GeminiLLM(prompt, model = \"gemini-2.0-flash\", response_schema = None,\n", " temperature = 1, topP = 1, topK = 32):\n", "\n", " # https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference#supported_models\n", " # model = \"gemini-2.0-flash\"\n", "\n", " llm_response = None\n", " if temperature < 0:\n", " temperature = 0\n", "\n", " creds, project = google.auth.default()\n", " auth_req = google.auth.transport.requests.Request() # required to acess access token\n", " creds.refresh(auth_req)\n", " access_token=creds.token\n", "\n", " headers = {\n", " \"Content-Type\" : \"application/json\",\n", " \"Authorization\" : \"Bearer \" + access_token\n", " }\n", "\n", " # https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference\n", " url = f\"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/publishers/google/models/{model}:generateContent\"\n", "\n", " generation_config = {\n", " \"temperature\": temperature,\n", " \"topP\": topP,\n", " \"maxOutputTokens\": 8192,\n", " \"candidateCount\": 1,\n", " \"responseMimeType\": \"application/json\",\n", " }\n", "\n", " # Add inthe response schema for when it is provided\n", " if response_schema is not None:\n", " generation_config[\"responseSchema\"] = response_schema\n", "\n", " if model == \"gemini-2.0-flash\":\n", " generation_config[\"topK\"] = topK\n", "\n", " payload = {\n", " \"contents\": {\n", " \"role\": \"user\",\n", " \"parts\": {\n", " \"text\": prompt\n", " },\n", " },\n", " \"generation_config\": {\n", " **generation_config\n", " },\n", " \"safety_settings\": {\n", " \"category\": \"HARM_CATEGORY_SEXUALLY_EXPLICIT\",\n", " \"threshold\": \"BLOCK_LOW_AND_ABOVE\"\n", " }\n", " }\n", "\n", " response = requests.post(url, json=payload, headers=headers)\n", "\n", " if response.status_code == 200:\n", " try:\n", " json_response = json.loads(response.content)\n", " except Exception as error:\n", " raise RuntimeError(f\"An error occurred parsing the JSON: {error}\")\n", "\n", " if \"candidates\" in json_response:\n", " candidates = json_response[\"candidates\"]\n", " if len(candidates) > 0:\n", " candidate = candidates[0]\n", " if \"content\" in candidate:\n", " content = candidate[\"content\"]\n", " if \"parts\" in content:\n", " parts = content[\"parts\"]\n", " if len(parts):\n", " part = parts[0]\n", " if \"text\" in part:\n", " text = part[\"text\"]\n", " llm_response = text\n", " else:\n", " raise RuntimeError(\"No text in part: {response.content}\")\n", " else:\n", " raise RuntimeError(\"No parts in content: {response.content}\")\n", " else:\n", " raise RuntimeError(\"No parts in content: {response.content}\")\n", " else:\n", " raise RuntimeError(\"No content in candidate: {response.content}\")\n", " else:\n", " raise RuntimeError(\"No candidates: {response.content}\")\n", " else:\n", " raise RuntimeError(\"No candidates: {response.content}\")\n", "\n", " # Remove some typically response characters (if asking for a JSON reply)\n", " llm_response = llm_response.replace(\"```json\",\"\")\n", " llm_response = llm_response.replace(\"```\",\"\")\n", " llm_response = llm_response.replace(\"\\n\",\"\")\n", "\n", " return llm_response\n", "\n", " else:\n", " raise RuntimeError(f\"Error with prompt:'{prompt}' Status:'{response.status_code}' Text:'{response.text}'\")" ] }, { "cell_type": "markdown", "metadata": { "id": "H1Smj0TQCSHb" }, "source": [ "#### RunQuery(sql)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "YjeJLVb93Kdo" }, "outputs": [], "source": [ "def RunQuery(sql, job_config = None):\n", " import time\n", "\n", " if (sql.startswith(\"SELECT\") or sql.startswith(\"WITH\")):\n", " df_result = client.query(sql).to_dataframe()\n", " return df_result\n", " else:\n", " if job_config == None:\n", " job_config = bigquery.QueryJobConfig(priority=bigquery.QueryPriority.INTERACTIVE)\n", " query_job = client.query(sql, job_config=job_config)\n", "\n", " # Check on the progress by getting the job's updated state.\n", " query_job = client.get_job(\n", " query_job.job_id, location=query_job.location\n", " )\n", " print(\"Job {} is currently in state {} with error result of {}\".format(query_job.job_id, query_job.state, query_job.error_result))\n", "\n", " while query_job.state != \"DONE\":\n", " time.sleep(2)\n", " query_job = client.get_job(\n", " query_job.job_id, location=query_job.location\n", " )\n", " print(\"Job {} is currently in state {} with error result of {}\".format(query_job.job_id, query_job.state, query_job.error_result))\n", "\n", " if query_job.error_result == None:\n", " return True\n", " else:\n", " return False" ] }, { "cell_type": "markdown", "metadata": { "id": "CXfSLPJFcppR" }, "source": [ "#### PrettyPrintJson(json_string)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "8L2QPmLx4ed-" }, "outputs": [], "source": [ "def PrettyPrintJson(json_string):\n", " json_object = json.loads(json_string)\n", " json_formatted_str = json.dumps(json_object, indent=2)\n", " print(json_formatted_str)\n", " return json.dumps(json_object)" ] }, { "cell_type": "markdown", "metadata": { "id": "of5C2daT9eG0" }, "source": [ "#### RetryCondition(error)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "MDSFROqz9h-5" }, "outputs": [], "source": [ "def RetryCondition(error):\n", " error_string = str(error)\n", " print(error_string)\n", "\n", " retry_errors = [\n", " \"RESOURCE_EXHAUSTED\",\n", " \"No content in candidate\",\n", " # Add more error messages here as needed\n", " ]\n", "\n", " for retry_error in retry_errors:\n", " if retry_error in error_string:\n", " print(\"Retrying...\")\n", " return True\n", "\n", " return False" ] }, { "cell_type": "markdown", "metadata": { "id": "gOE4wpX1LYWV" }, "source": [ "#### restAPIHelper()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "U5WwcRAyvkN7" }, "outputs": [], "source": [ "# Rest API Helper Function\n", "def restAPIHelper(url: str, http_verb: str, request_body: str) -> str:\n", " \"\"\"Calls the Google Cloud REST API passing in the current users credentials\"\"\"\n", "\n", " import requests\n", " import google.auth\n", " import json\n", "\n", " # Get an access token based upon the current user\n", " creds, project = google.auth.default()\n", " auth_req = google.auth.transport.requests.Request()\n", " creds.refresh(auth_req)\n", " access_token=creds.token\n", "\n", " headers = {\n", " \"Content-Type\" : \"application/json\",\n", " \"Authorization\" : \"Bearer \" + access_token\n", " }\n", "\n", " if http_verb == \"GET\":\n", " response = requests.get(url, headers=headers)\n", " elif http_verb == \"POST\":\n", " response = requests.post(url, json=request_body, headers=headers)\n", " elif http_verb == \"PUT\":\n", " response = requests.put(url, json=request_body, headers=headers)\n", " elif http_verb == \"PATCH\":\n", " response = requests.patch(url, json=request_body, headers=headers)\n", " elif http_verb == \"DELETE\":\n", " response = requests.delete(url, headers=headers)\n", " else:\n", " raise RuntimeError(f\"Unknown HTTP verb: {http_verb}\")\n", "\n", " if response.status_code == 200:\n", " return json.loads(response.content)\n", " #image_data = json.loads(response.content)[\"predictions\"][0][\"bytesBase64Encoded\"]\n", " else:\n", " error = f\"Error restAPIHelper -> ' Status: '{response.status_code}' Text: '{response.text}'\"\n", " raise RuntimeError(error)\n" ] }, { "cell_type": "markdown", "metadata": { "id": "cwEZEq0luJKl" }, "source": [ "# <font color='#4285f4'>Generate Customer Marketing Profile Data</font>\n", "\n", "Run the cells below to start generative and deriving the data that will be stored in the `customer_marketing_profile` table. The notebook generates and derives customer profile data for each customer in the `customer` table, which can be used to better understand their preferences, ordering history, general sentiment, and other information useful to marketers when building marketing campaigns. The code stores this data in the `customer_marketing_profile` table, which is defined as follows:\n", "\n", "```sql\n", "CREATE TABLE IF NOT EXISTS `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile`\n", "(\n", " customer_id INTEGER NOT NULL OPTIONS(description=\"Primary key.\"),\n", " customer_profile_data JSON OPTIONS(description=\"The raw data we know about a customer.\"),\n", " customer_profile_data_embedding ARRAY<FLOAT64> OPTIONS(description=\"Vector embedding of the customer_profile_data column.\"),\n", " customer_loyalty_data JSON OPTIONS(description=\"Data about the customer's purchases and reviews in JSON format.\"),\n", " customer_loyalty_data_embedding ARRAY<FLOAT64> OPTIONS(description=\"Vector embedding of the customer_loyalty_data column.\"),\n", " customer_segmentation_data JSON OPTIONS(description=\"The generated customer segmentation data in JSON format.\"),\n", " customer_segmentation_data_embedding ARRAY<FLOAT64> OPTIONS(description=\"Vector embedding of the customer_segmentation_data column.\"),\n", " customer_marketing_insights STRING OPTIONS(description=\"Generated text summary of customer_profile_data, customer_loyalty_data, and customer_segmentation_data.\"),\n", " customer_marketing_insights_embedding ARRAY<FLOAT64> OPTIONS(description=\"Vector embedding of the customer_marketing_insights column.\")\n", ")\n", "CLUSTER BY customer_id;\n", "```\n" ] }, { "cell_type": "markdown", "metadata": { "id": "7LjVBgIFdbcg" }, "source": [ "### Step 1: Create customer_marketing_profile table" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "bDcZCQ3Vz2x4" }, "outputs": [], "source": [ "%%bigquery\n", "\n", "--DROP TABLE `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile`;\n", "\n", "CREATE TABLE IF NOT EXISTS `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile`\n", "(\n", " customer_id INTEGER NOT NULL OPTIONS(description=\"Primary key.\"),\n", " customer_profile_data JSON OPTIONS(description=\"The raw data we know about a customer.\"),\n", " customer_profile_data_embedding ARRAY<FLOAT64> OPTIONS(description=\"Vector embedding of the customer_profile_data column.\"),\n", " customer_loyalty_data JSON OPTIONS(description=\"Data about the customer's purchases and reviews in JSON format.\"),\n", " customer_loyalty_data_embedding ARRAY<FLOAT64> OPTIONS(description=\"Vector embedding of the customer_loyalty_data column.\"),\n", " customer_segmentation_data JSON OPTIONS(description=\"The generated customer segmentation data in JSON format.\"),\n", " customer_segmentation_data_embedding ARRAY<FLOAT64> OPTIONS(description=\"Vector embedding of the customer_segmentation_data column.\"),\n", " customer_marketing_insights STRING OPTIONS(description=\"Generated text summary of customer_profile_data, customer_loyalty_data, and customer_segmentation_data.\"),\n", " customer_marketing_insights_embedding ARRAY<FLOAT64> OPTIONS(description=\"Vector embedding of the customer_marketing_insights column.\")\n", ")\n", "CLUSTER BY customer_id;" ] }, { "cell_type": "markdown", "metadata": { "id": "v9EttoRjdn5d" }, "source": [ "### Step 2: Define Generator and Derivation Functions for Profile Data" ] }, { "cell_type": "markdown", "metadata": { "id": "H2OrC22mdyAy" }, "source": [ "#### Generator Function for `customer_profile_data`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "--zcku90GEpn" }, "outputs": [], "source": [ "@retry(wait=wait_exponential(multiplier=1, min=1, max=60), stop=stop_after_attempt(10), retry=retry_if_exception(RetryCondition), before_sleep=before_sleep_log(logging.getLogger(), logging.INFO))\n", "def GenerateCustomerMarketingProfile(customer_id, customer_name, customer_yob, customer_inception_date, country_code):\n", "\n", " import random\n", " import datetime\n", "\n", " # In the United States, estimates suggest that around 6% of adults and nearly 8% of children (or 1 in 13 children) have food allergies\n", " dietary_preferences_random_number = random.randint(1, 100)\n", " customer_age = datetime.datetime.now().year - customer_yob\n", "\n", " employeement_status_array = [\"Full-time\", \"Part-time\", \"Contract Worker\", \"Freelancer\", \"Gig Worker\"]\n", " job_skill_level_array = [\"Professional\", \"Paraprofessional\", \"Skilled Trades\", \"Unskilled Labor\", \"Blue Collar\", \"White Collar\"]\n", " other_categories_array = [\"Management\", \"Sales\", \"Executive\", \"Entrepreneur\", \"Volunteer\", \"Professor\", \"Doctor\"]\n", "\n", " # Generate random job profile\n", " job_profile = random.choice(employeement_status_array) + \", \" + random.choice(job_skill_level_array)\n", "\n", " if random.randint(1, 100) < 10:\n", " job_profile = job_profile + \", \" + random.choice(other_categories_array)\n", "\n", " print (f\"customer_id: {customer_id}\")\n", " print (f\"customer_name: {customer_name}\")\n", " print (f\"customer_yob: {customer_yob}\")\n", " print (f\"customer_inception_date: {customer_inception_date}\")\n", " print (f\"country_code: {country_code}\")\n", " print (f\"allergy_randomdietary_preferences_random_number_number: {dietary_preferences_random_number}\")\n", " print (f\"customer_age: {customer_age}\")\n", " print (f\"job_profile: {job_profile}\")\n", "\n", " prompt = f\"\"\"You are an inventive storyteller, able to weave captivating narratives from data. Imagine you're writing a character sketch for a \"Chocolate AI\" chocolate enthusiast, breathing life into their profile.\n", "\n", " Here's the basic framework:\n", " Customer ID: {customer_id}\n", " Customer Name: {customer_name}\n", " Customer Age: {customer_age}\n", " Customer Inception Date: {customer_inception_date}\n", " Customer Country Code: {country_code}\n", " Customer Job Profile: {job_profile}\n", "\n", " Now, let your imagination run wild! Fill in the following, making them unexpected yet plausible:\n", "\n", " Occupation: Anything from an enigmatic \"none\" to a passionate student or a seasoned professional in any field. It should relate to their job profile of \"{job_profile}\".\n", " Chocolate Preferences: The chocolate they savor, the time of day they indulge. Think beyond the ordinary - perhaps they're a connoisseur of rare cacao beans, a devotee of single-origin bars, or a master of creating unique chocolate pairings. Do they have a favorite chocolatier or a preferred level of cacao intensity?\n", " Education: This is the highest level of education they've attained. It's a testament to their dedication to learning and their ability to excel. It should relate to their job profile of \"{job_profile}\".\n", " Marital Status: One of these: Single, Married, Divorced, Widowed, Separated, Civil union, Domestic partnership, Common-law marriage.\n", " Interests: Hobbies and passions that reveal their personality. Let their occupation, education, and marital status guide you, but surprise us with unexpected twists. Perhaps they collect antique chocolate molds, create chocolate-themed art, or are involved in fair-trade cacao initiatives.\n", " Lifestyle: Paint a picture of their life with a few vivid words. Are they a jet-setting gourmand, a homebody baker, or a social media influencer sharing their chocolate creations?\n", " Dietary Preferences: Any food allergies or dietary restrictions they might have, adding a touch of realism to their profile. Only generate if this number is less than 10: {dietary_preferences_random_number}\n", " Content Interaction: Engagement with social media posts, email clicks, video views (shows content preferences and brand affinity, particularly towards chocolate brands, recipes, and trends).\n", " Customer Service Interactions: Contact reasons (e.g., inquiries about chocolate origins, delivery issues, feedback on new flavors), sentiment analysis of interactions, resolution time (highlights pain points and customer satisfaction levels).\n", " Sports: Any sports they've played in or sporting teams they like to follow. Pick sports that make sense based upon the country: {country_code}.\n", " Chocolate Buying Habits: When do they buy chocolate? Do they buy it for special occasions, everyday, as gifts, or to satisfy a craving? Do they have a preferred method of purchase (online, local chocolatier, supermarket)?\n", " Children: Do they have kids? Yes or No.\n", " Twitter Handle: Generate a Twitter handle that is unique to this customer, potentially reflecting their love of chocolate.\n", " Twitter Bio: A concise and engaging bio text (maximum character limit of 250) that reflects the user's interests and passions, including their love for chocolate.\n", " Twitter Engagement: (e.g., \"Active,\" \"Passive,\" \"Rarely Uses\") Are they influenced by chocolate trends or online deals?\n", " LinkedIn Handle: Generate a LinkedIn handle that is unique to this customer.\n", " LinkedIn Bio: A concise and engaging bio text (maximum character limit of 1000) that reflects the user's interests and passions, potentially mentioning their love of chocolate in a professional context.\n", " LinkedIn Engagement: (e.g., \"Active,\" \"Passive,\" \"Rarely Uses\")\n", " Facebook Handle: Generate a Facebook handle that is unique to this customer.\n", " Facebook Bio: A concise and engaging bio text (maximum character limit of 1000) that reflects the user's interests and passions, including their love for chocolate.\n", " Facebook Engagement: (e.g., \"Active,\" \"Passive,\" \"Rarely Uses\") Are they members of any chocolate-related groups?\n", " Instagram Handle: Generate an Instagram handle that is unique to this customer, likely related to chocolate.\n", " Instagram Bio: A concise and engaging bio text (maximum character limit of 500) that reflects the user's interests and passions, showcasing their love for chocolate.\n", " Instagram Engagement: (e.g., \"Active,\" \"Passive,\" \"Rarely Uses\") Do they actively post about their chocolate experiences?\n", " TikTok Handle: Generate a TikTok handle that is unique to this customer.\n", " TikTok Bio: A concise and engaging bio text (maximum character limit of 250) that reflects the user's interests and passions, potentially including their love of chocolate.\n", " TikTok Engagement: (e.g., \"Active,\" \"Passive,\" \"Rarely Uses\") Do they create or engage with chocolate-related content?\n", " YouTube Handle: Generate a YouTube handle that is unique to this customer.\n", " YouTube Bio: A concise and engaging bio text (maximum character limit of 500) that reflects the user's interests and passions, potentially mentioning their love of chocolate.\n", " YouTube Engagement: (e.g., \"Active,\" \"Passive,\" \"Rarely Uses\") Do they watch videos about chocolate making, reviews, or recipes?\n", "\n", " Remember, every customer is unique. Craft profiles that are rich in detail, hinting at hidden depths and untold stories. Let's make these \"Chocolate AI\" customers come alive!\n", " Think outside the box.\n", "\n", " Other instructions:\n", " You can leave array's empty instead of \"null\".\n", " You can leave string's empty instead of \"null\".\n", " You can leave number's 0 instead of \"null\".\n", " There is no need to populate all the Twitter, LinkedIn, Facebook, Instagram, TikTok, and YouTube fields.\n", " Realistically we might only have a few of the social media data.\n", " The social media data does not have to pertain to chocolate, it can be anything. Craft a profile rich in detail and let the imagination run wild.\n", " Base the social media data on the customer ({customer_age}) and the country ({country_code}).\n", " If you have a handle for a social media account you should then have the bio data.\n", " - Example: If you have a handle for a Twitter account, you should have the bio data.\n", " - Example: If you have a handle for a LinkedIn account, you should have the bio data.\n", " If you have a handle for a social media account you should then have the engagement data.\n", " - Example: If you have a handle for a Twitter account, you should have the engagement data.\n", " - Example: If you have a handle for a LinkedIn account, you should have the engagement data.\n", " \"\"\"\n", "\n", " # Python (not REST API): https://ai.google.dev/gemini-api/docs/json-mode?lang=python\n", " # REST API (no pip installs): https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/control-generated-output#generativeaionvertexai_gemini_controlled_generation_response_schema-drest\n", " # https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference#request\n", " # https://cloud.google.com/vertex-ai/docs/reference/rest/v1/GenerationConfig\n", " # https://cloud.google.com/vertex-ai/docs/reference/rest/v1/Schema\n", " # NOTE: No need to type this by hand, use gemini and prompt it.\n", " # PROMPT: Write me the json in OpenAPI 3.0 schema object for this object:\n", " \"\"\"\n", " Write me the json in  OpenAPI 3.0 schema object for the below object.\n", " Make all fields required.\n", " {\n", " \"customer_id\" : 1,\n", " \"customer_age\" : 2,\n", " \"occupation\" : \"string\",\n", " \"chocolate_preferences\" : [\"string\"],\n", " \"education\" : \"string\",\n", " \"martial_status\" : \"string\",\n", " \"interests\" : [\"string\"],\n", " \"lifestyle\" : [\"string\"],\n", " \"dietary_preferences\" : [\"string\"],\n", " \"content_interaction\" : [\"string\"],\n", " \"customer_service_interactions\": [\n", " {\n", " \"contact_reason\": \"Order status inquiry\",\n", " \"sentiment_analysis\": \"Neutral\",\n", " \"resolution_time\": \"hours\"\n", " }\n", " ],\n", " \"sports_teams\": [\"string\"],\n", " \"solicated_buying_habits\": [\"string\"],\n", " \"children\": \"string\",\n", " \"twitter_handle\": \"string\",\n", " \"twitter_bio\": \"string\",\n", " \"twitter_engagement\": \"string\",\n", " \"linkedin_handle\": \"string\",\n", " \"linkedin_bio\": \"string\",\n", " \"linkedin_engagement\": \"string\",\n", " \"facebook_handle\": \"string\",\n", " \"facebook_bio\": \"string\",\n", " \"facebook_engagement\": \"string\",\n", " \"instagram_handle\": \"string\",\n", " \"instagram_bio\": \"string\",\n", " \"instagram_engagement\": \"string\",\n", " \"tiktok_handle\": \"string\",\n", " \"tiktok_bio\": \"string\",\n", " \"tiktok_engagement\": \"string\",\n", " \"youtube_handle\": \"string\",\n", " \"youtube_bio\": \"string\",\n", " \"youtube_engagement\": \"string\"\n", " }\n", " \"\"\"\n", " response_schema = {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"customer_id\": {\n", " \"type\": \"integer\",\n", " \"description\": \"Unique identifier for the customer\"\n", " },\n", " \"customer_age\": {\n", " \"type\": \"integer\",\n", " \"description\": \"Age of the customer\"\n", " },\n", " \"occupation\": {\n", " \"type\": \"string\",\n", " \"description\": \"Customer's occupation\"\n", " },\n", " \"chocolate_preferences\": {\n", " \"type\": \"array\",\n", " \"items\": {\n", " \"type\": \"string\",\n", " \"description\": \"choclate preference\"\n", " },\n", " \"description\": \"List of customer's chocolate preferences\"\n", " },\n", " \"education\": {\n", " \"type\": \"string\",\n", " \"description\": \"Customer's highest level of education (e.g., degree)\"\n", " },\n", " \"martial_status\": {\n", " \"type\": \"string\",\n", " \"description\": \"Customer's marital status (e.g., married, single)\"\n", " },\n", " \"interests\": {\n", " \"type\": \"array\",\n", " \"items\": {\n", " \"type\": \"string\",\n", " \"description\": \"Customer's interest (e.g., hobbies, activities)\"\n", " },\n", " \"description\": \"List of customer's interests\"\n", " },\n", " \"lifestyle\": {\n", " \"type\": \"array\",\n", " \"items\": {\n", " \"type\": \"string\",\n", " \"description\": \"Customer's lifestyle category (e.g., profession, travel habits)\"\n", " },\n", " \"description\": \"List of customer's lifestyle categories\"\n", " },\n", " \"dietary_preferences\": {\n", " \"type\": \"array\",\n", " \"items\": {\n", " \"type\": \"string\",\n", " \"description\": \"Customer's dietary preferences (e.g., food allergies)\"\n", " },\n", " \"description\": \"List of customer's dietary preferences\"\n", " },\n", " \"content_interaction\": {\n", " \"type\": \"array\",\n", " \"items\": {\n", " \"type\": \"string\",\n", " \"description\": \"Customer's content interaction details (e.g., engagement, email behavior, video viewing)\"\n", " },\n", " \"description\": \"Details about customer's content interaction\"\n", " },\n", " \"customer_service_interactions\": {\n", " \"type\": \"array\",\n", " \"items\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"contact_reason\": {\n", " \"type\": \"string\",\n", " \"description\": \"Reason for contacting customer service\"\n", " },\n", " \"sentiment_analysis\": {\n", " \"type\": \"string\",\n", " \"description\": \"Sentiment analysis of the customer service interaction (e.g., positive, neutral, negative)\"\n", " },\n", " \"resolution_time\": {\n", " \"type\": \"string\",\n", " \"description\": \"Time taken to resolve the customer service interaction\"\n", " }\n", " },\n", " \"required\": [\n", " \"contact_reason\",\n", " \"sentiment_analysis\",\n", " \"resolution_time\"\n", " ]\n", " },\n", " \"description\": \"Customer's past interactions with customer service\"\n", " },\n", " \"sports\": {\n", " \"type\": \"array\",\n", " \"items\": {\n", " \"type\": \"string\",\n", " \"description\": \"Customer's favorite sports teams\"\n", " },\n", " \"description\": \"Details about what sports they follow or like.\"\n", " },\n", " \"solicated_buying_habits\": {\n", " \"type\": \"array\",\n", " \"items\": {\n", " \"type\": \"string\",\n", " \"description\": \"Customer's favorite reasons for purchaing chocolate\"\n", " },\n", " \"description\": \"Details about when they like to buy chocolate\"\n", " },\n", " \"children\": {\n", " \"type\": \"string\",\n", " \"description\": \"Does the customer have children?\"\n", " },\n", " \"twitter_handle\": {\n", " \"type\": \"string\",\n", " \"description\": \"Customer's Twitter handle\"\n", " },\n", " \"twitter_bio\": {\n", " \"type\": \"string\",\n", " \"description\": \"A generated Twitter bio\"\n", " },\n", " \"twitter_engagement\": {\n", " \"type\": \"string\",\n", " \"description\": \"Active, Passive, or Rarely Uses\"\n", " },\n", " \"linkedin_handle\": {\n", " \"type\": \"string\",\n", " \"description\": \"Customer's LinkedIn handle\"\n", " },\n", " \"linkedin_bio\": {\n", " \"type\": \"string\",\n", " \"description\": \"A generated LinkedIn bio\"\n", " },\n", " \"linkedin_engagement\": {\n", " \"type\": \"string\",\n", " \"description\": \"Active, Passive, or Rarely Uses\"\n", " },\n", " \"facebook_handle\": {\n", " \"type\": \"string\",\n", " \"description\": \"Customer's Facebook handle\"\n", " },\n", " \"facebook_bio\": {\n", " \"type\": \"string\",\n", " \"description\": \"A generated Facebook bio\"\n", " },\n", " \"facebook_engagement\": {\n", " \"type\": \"string\",\n", " \"description\": \"Active, Passive, or Rarely Uses\"\n", " },\n", " \"instagram_handle\": {\n", " \"type\": \"string\",\n", " \"description\": \"Customer's Instagram handle\"\n", " },\n", " \"instagram_bio\": {\n", " \"type\": \"string\",\n", " \"description\": \"A generated Instagram bio\"\n", " },\n", " \"instagram_engagement\": {\n", " \"type\": \"string\",\n", " \"description\": \"Active, Passive, or Rarely Uses\"\n", " },\n", " \"tiktok_handle\": {\n", " \"type\": \"string\",\n", " \"description\": \"Customer's TikTok handle\"\n", " },\n", " \"tiktok_bio\": {\n", " \"type\": \"string\",\n", " \"description\": \"A generated TikTok bio\"\n", " },\n", " \"tiktok_engagement\": {\n", " \"type\": \"string\",\n", " \"description\": \"Active, Passive, or Rarely Uses\"\n", " },\n", " \"youtube_handle\": {\n", " \"type\": \"string\",\n", " \"description\": \"Customer's YouTube handle\"\n", " },\n", " \"youtube_bio\": {\n", " \"type\": \"string\",\n", " \"description\": \"A generated YouTube bio\"\n", " },\n", " \"youtube_engagement\": {\n", " \"type\": \"string\",\n", " \"description\": \"Active, Passive, or Rarely Uses\"\n", " }\n", " },\n", " \"required\": [\n", " \"customer_id\",\n", " \"customer_age\",\n", " \"occupation\",\n", " \"chocolate_preferences\",\n", " \"education\",\n", " \"martial_status\",\n", " \"interests\",\n", " \"lifestyle\",\n", " \"dietary_preferences\",\n", " \"content_interaction\",\n", " \"customer_service_interactions\",\n", " \"sports\",\n", " \"solicated_buying_habits\",\n", " \"children\",\n", " \"twitter_handle\",\n", " \"twitter_bio\",\n", " \"twitter_engagement\",\n", " \"linkedin_handle\",\n", " \"linkedin_bio\",\n", " \"facebook_handle\",\n", " \"facebook_bio\",\n", " \"instagram_handle\",\n", " \"instagram_bio\",\n", " \"tiktok_handle\",\n", " \"tiktok_bio\",\n", " \"youtube_handle\",\n", " \"youtube_bio\"\n", " ]\n", " }\n", "\n", " result = GeminiLLM(prompt, response_schema=response_schema, temperature=1)\n", " return result\n" ] }, { "cell_type": "markdown", "metadata": { "id": "27d6d81rgt0C" }, "source": [ "#### Generator Function for `customer_segmentation_data`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "UpYBCGuzvRzh" }, "outputs": [], "source": [ "# Using tenacity to retry in case of resource exhausted errors (429)\n", "@retry(wait=wait_exponential(multiplier=1, min=1, max=60), stop=stop_after_attempt(10), retry=retry_if_exception(RetryCondition), before_sleep=before_sleep_log(logging.getLogger(), logging.INFO))\n", "def GenerateCustomerSegments(customer_id, customer_profile_data, customer_loyalty_data):\n", " # For each row in customer_marketing_profile_fix, send this prompt:\n", " prompt = \"\"\"\n", " You will be given two JSON objects containing customer profile information:\n", " customer_profile_data and customer_loyalty_data. Your task is to analyze\n", " this data and assign relevant customer segments to the customer based on the\n", " following JSON schema:\n", "\n", " {\n", " \"customer_segments\": {\n", " \"demographic_segmentation\": {\n", " \"subsegments\": {\n", " \"Age\": [\"0-17\", \"18-24\", \"25-34\", \"35-44\", \"45-54\", \"55-64\", \"65+\"],\n", " \"Gender\": [\"Male\", \"Female\", \"Non-binary\", \"Prefer not to say\"],\n", " \"Income\": [\"<$25k\", \"$25k-$50k\", \"$50k-$75k\", \"$75k-$100k\", \"$100k-$150k\", \"$150k+\"],\n", " \"Education\": [\"Less than High School\", \"High School Graduate\", \"Some College\", \"Associate's Degree\", \"Bachelor's Degree\", \"Master's Degree\", \"Doctorate\"],\n", " \"Marital Status\": [\"Single\", \"Married\", \"Divorced\", \"Widowed\", \"Domestic Partnership\"],\n", " \"Occupation\": [\"Professional\", \"Managerial\", \"Technical\", \"Sales\", \"Service\", \"Administrative\", \"Production\", \"Retired\", \"Student\", \"Unemployed\"],\n", " \"Family Size\": [\"1\", \"2\", \"3\", \"4\", \"5\", \"6+\"],\n", " \"Ethnicity\": [\"White\", \"Black or African American\", \"Hispanic or Latino\", \"Asian\", \"Native American or Alaska Native\", \"Native Hawaiian or Other Pacific Islander\", \"Two or More Races\"],\n", " \"Generation\": [\"Baby Boomer\", \"Generation X\", \"Millennial\", \"Generation Z\"],\n", " \"Language\": [\"English\", \"Spanish\", \"Chinese\", \"French\", \"German\", \"Arabic\", \"Hindi\", \"Portuguese\", \"Russian\", \"Japanese\"]\n", " }\n", " },\n", " \"geographic_segmentation\": {\n", " \"subsegments\": {\n", " \"Country\": [\"United States\", \"Canada\", \"Mexico\", \"United Kingdom\", \"Germany\", \"France\", \"China\", \"India\", \"Brazil\", \"Japan\"],\n", " \"Region\": [\"Northeast\", \"Midwest\", \"South\", \"West\"],\n", " \"City\": [\"New York\", \"Los Angeles\", \"Chicago\", \"Houston\", \"Philadelphia\", \"Phoenix\", \"San Antonio\", \"San Diego\", \"Dallas\", \"San Jose\"],\n", " \"Climate\": [\"Tropical\", \"Subtropical\", \"Temperate\", \"Continental\", \"Polar\"],\n", " \"Population Density\": [\"High\", \"Medium\", \"Low\"],\n", " \"Urban/Rural\": [\"Urban\", \"Suburban\", \"Rural\"],\n", " \"Time Zone\": [\"EST\", \"CST\", \"MST\", \"PST\"]\n", " }\n", " },\n", " \"psychographic_segmentation\": {\n", " \"subsegments\": {\n", " \"Personality\": [\"Openness\", \"Conscientiousness\", \"Extraversion\", \"Agreeableness\", \"Neuroticism\"],\n", " \"Lifestyle\": [\"Active\", \"Sedentary\", \"Family-oriented\", \"Career-driven\", \"Social\"],\n", " \"Interests\": [\"Sports\", \"Travel\", \"Fashion\", \"Technology\", \"Food\", \"Music\", \"Art\", \"Reading\", \"Gaming\", \"Outdoor Activities\"],\n", " \"Values\": [\"Achievement\", \"Security\", \"Self-Direction\", \"Stimulation\", \"Hedonism\", \"Power\", \"Tradition\", \"Conformity\", \"Benevolence\", \"Universalism\"],\n", " \"Attitudes\": [\"Positive\", \"Negative\", \"Neutral\"],\n", " \"Hobbies\": [\"Cooking\", \"Gardening\", \"Photography\", \"DIY\", \"Collecting\", \"Volunteering\", \"Writing\", \"Music\", \"Sports\", \"Gaming\"],\n", " \"Social Class\": [\"Upper Class\", \"Upper Middle Class\", \"Lower Middle Class\", \"Working Class\", \"Lower Class\"],\n", " \"Motivations\": [\"Intrinsic\", \"Extrinsic\"]\n", " }\n", " },\n", " \"behavioral_segmentation\": {\n", " \"subsegments\": {\n", " \"Purchase History\": [\"High Spender\", \"Medium Spender\", \"Low Spender\", \"Frequent Buyer\", \"Occasional Buyer\", \"New Buyer\", \"Repeat Buyer\", \"Loyal Customer\", \"Price Sensitive\", \"Brand Loyal\"],\n", " \"Browsing Behavior\": [\"Long Sessions\", \"Short Sessions\", \"Frequent Visits\", \"Occasional Visits\", \"Mobile User\", \"Desktop User\", \"High Engagement\", \"Low Engagement\", \"Product Focused\", \"Content Focused\"],\n", " \"Loyalty Status\": [\"Loyal\", \"Repeat\", \"New\", \"At-Risk\", \"Inactive\"],\n", " \"Usage Frequency\": [\"Heavy User\", \"Moderate User\", \"Light User\"],\n", " \"Benefits Sought\": [\"Quality\", \"Price\", \"Convenience\", \"Service\", \"Style\", \"Status\", \"Innovation\", \"Sustainability\", \"Experience\", \"Community\"],\n", " \"Occasion/Timing\": [\"Holiday\", \"Birthday\", \"Anniversary\", \"Seasonal\", \"Weekend\", \"Weekday\", \"Morning\", \"Afternoon\", \"Evening\"],\n", " \"User Status\": [\"Active\", \"Inactive\", \"Lapsed\", \"Potential\", \"New\"],\n", " \"Spending Habits\": [\"High\", \"Medium\", \"Low\"]\n", " }\n", " },\n", " \"technographic_segmentation\": {\n", " \"subsegments\": {\n", " \"Devices\": [\"Smartphone\", \"Tablet\", \"Laptop\", \"Desktop\", \"Smart TV\", \"Wearable\", \"Gaming Console\", \"Smart Home Device\"],\n", " \"Operating Systems\": [\"iOS\", \"Android\", \"Windows\", \"macOS\", \"Linux\"],\n", " \"Browsers\": [\"Chrome\", \"Safari\", \"Firefox\", \"Edge\", \"Opera\"],\n", " \"Software\": [\"Microsoft Office\", \"Adobe Creative Cloud\", \"Antivirus\", \"Productivity Apps\", \"Gaming Software\"],\n", " \"Social Media Platforms\": [\"Facebook\", \"Instagram\", \"Twitter\", \"LinkedIn\", \"TikTok\", \"Snapchat\", \"Pinterest\", \"YouTube\"],\n", " \"Internet Connectivity\": [\"Broadband\", \"Mobile\", \"Dial-up\", \"Satellite\"],\n", " \"Tech Savviness\": [\"Early Adopter\", \"Mainstream\", \"Laggard\"],\n", " \"Adoption Rate\": [\"High\", \"Medium\", \"Low\"]\n", " }\n", " },\n", " \"needs_based_segmentation\": {\n", " \"subsegments\": {\n", " \"Specific Needs\": [\"Quick & Convenient Chocolate\", \"Variety of Chocolate Options\", \"Fresh Baked Chocolate\", \"Specialty Chocolates\", \"Pastries & Snacks\", \"Mobile Ordering & Payment\", \"Loyalty Program\", \"Outdoor Seating\", \"Catering Services\"],\n", " \"Pain Points\": [\"Long Lines at Traditional Chocolate Shops\", \"Limited Chocolate Options\", \"Inconsistent Chocolate Quality\", \"High Prices\", \"Limited Accessibility\", \"Inconvenient Locations\", \"Lack of Mobile Ordering\", \"No Loyalty Program\"],\n", " \"Challenges\": [\"Finding Quality Chocolate and Coffee on the Go\", \"Limited Time in the Morning\", \"Dietary Restrictions\", \"Budget Constraints\", \"Lack of Nearby Chocolate Options\"],\n", " \"Goals\": [\"Enjoy Delicious Chocolate Anytime, Anywhere\", \"Save Time & Money\", \"Discover New Chocolate Flavors\", \"Support Local Businesses\", \"Socialize & Connect with Others\"],\n", " \"Priorities\": [\"Convenience\", \"Quality\", \"Affordability\", \"Variety\", \"Sustainability\"]\n", " }\n", " },\n", " \"value_based_segmentation\": {\n", " \"subsegments\": {\n", " \"Perceived Value\": [\"High\", \"Medium\", \"Low\"],\n", " \"Price Sensitivity\": [\"High\", \"Medium\", \"Low\"],\n", " \"Willingness to Pay\": [\"High\", \"Medium\", \"Low\"],\n", " \"Cost-Benefit Analysis\": [\"Value-Driven\", \"Price-Driven\"]\n", " }\n", " },\n", " \"customer_lifecycle_segmentation\": {\n", " \"subsegments\": {\n", " \"New Leads\": [\"Subscribed to Newsletter\", \"Followed on Social Media\", \"Visited Website\", \"Downloaded App\"],\n", " \"Potential Customers\": [\"Inquired About Services\", \"Requested a Quote\", \"Visited Mobile Shop Location\"],\n", " \"First-Time Customers\": [\"Placed First Order\", \"Tried One Product\"],\n", " \"Repeat Customers\": [\"Placed Multiple Orders\", \"Tried Multiple Products\"],\n", " \"Loyal Advocates\": [\"Regularly Purchases\", \"Refers Friends & Family\", \"Leaves Positive Reviews\", \"Engages on Social Media\"],\n", " \"At-Risk Customers\": [\"Decreased Purchase Frequency\", \"Expressed Dissatisfaction\", \"Unengaged with Brand\"],\n", " \"Former Customers\": [\"Stopped Purchasing\", \"Switched to Competitor\", \"Unsubscribed from Communications\"],\n", " \"Inactive Customers\": [\"Hasn't Purchased in a While\", \"Doesn't Engage with Brand\"]\n", " }\n", " }\n", " }\n", " }\n", "\n", "\n", " \"\"\"\n", "\n", " prompt = prompt + f\"\"\"\n", " Here is the customer's customer_profile_data:\n", " {customer_profile_data}\n", "\n", " Here is the customer's customer_loyalty_data:\n", " {customer_loyalty_data}\n", "\n", "\n", " Additional instructions:\n", " - Do not omit any segments or subsegments from your response.\n", " - Avoid assigning null values as much as possible.\n", " - Results can be creative, but they must be plausible.\n", "\n", " Now assign relevant customer segments to the customer. Think step by step and explain your reasoning.\n", "\n", " \"\"\"\n", "\n", " # Ref: https://cloud.google.com/vertex-ai/docs/reference/rest/v1/Schema\n", " response_schema = {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"customer_segments\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"demographic_segmentation\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"Age\": { \"type\": \"string\", \"description\": \"Age of the customer\", \"enum\": [\"0-17\", \"18-24\", \"25-34\", \"35-44\", \"45-54\", \"55-64\", \"65+\"] },\n", " \"Gender\": { \"type\": \"string\"},\n", " \"Income\": { \"type\": \"string\", \"description\": \"Estimated income range of the customer\", \"enum\": [\"<$25k\", \"$25k-$50k\", \"$50k-$75k\", \"$75k-$100k\", \"$100k-$150k\", \"$150k+\"] },\n", " \"Education\": { \"type\": \"string\" },\n", " \"Marital Status\": { \"type\": \"string\" },\n", " \"Occupation\": { \"type\": \"string\" },\n", " \"Family Size\": { \"type\": \"string\" },\n", " \"Ethnicity\": { \"type\": \"string\" },\n", " \"Generation\": { \"type\": \"string\" },\n", " \"Language\": { \"type\": \"string\" }\n", " },\n", " \"required\": [\"Age\", \"Gender\", \"Income\", \"Education\", \"Marital Status\", \"Occupation\", \"Family Size\", \"Ethnicity\", \"Generation\", \"Language\"]\n", " },\n", " \"geographic_segmentation\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"Country\": { \"type\": \"string\" },\n", " \"Region\": { \"type\": \"string\" },\n", " \"City\": { \"type\": \"string\" },\n", " \"Climate\": { \"type\": \"string\" },\n", " \"Population Density\": { \"type\": \"string\" },\n", " \"Urban/Rural\": { \"type\": \"string\" },\n", " \"Time Zone\": { \"type\": \"string\" }\n", " },\n", " \"required\": [\"Country\", \"Region\", \"City\", \"Climate\", \"Population Density\", \"Urban/Rural\", \"Time Zone\"]\n", " },\n", " \"psychographic_segmentation\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"Personality\": { \"type\": \"string\" },\n", " \"Lifestyle\": { \"type\": \"string\" },\n", " \"Interests\": { \"type\": \"string\" },\n", " \"Values\": { \"type\": \"string\" },\n", " \"Attitudes\": { \"type\": \"string\" },\n", " \"Hobbies\": { \"type\": \"array\", \"items\": { \"type\": \"string\" } },\n", " \"Social Class\": { \"type\": \"string\" },\n", " \"Motivations\": { \"type\": \"string\" }\n", " },\n", " \"required\": [\"Personality\", \"Lifestyle\", \"Interests\", \"Values\", \"Attitudes\", \"Hobbies\", \"Social Class\", \"Motivations\"]\n", " },\n", " \"behavioral_segmentation\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"Purchase History\": { \"type\": \"string\" },\n", " \"Browsing Behavior\": { \"type\": \"string\" },\n", " \"Loyalty Status\": { \"type\": \"string\" },\n", " \"Usage Frequency\": { \"type\": \"string\" },\n", " \"Benefits Sought\": { \"type\": \"string\" },\n", " \"Occasion/Timing\": { \"type\": \"string\" },\n", " \"User Status\": { \"type\": \"string\" },\n", " \"Spending Habits\": { \"type\": \"string\" }\n", " },\n", " \"required\": [\"Purchase History\", \"Browsing Behavior\", \"Loyalty Status\", \"Usage Frequency\", \"Benefits Sought\", \"Occasion/Timing\", \"User Status\", \"Spending Habits\"]\n", " },\n", " \"technographic_segmentation\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"Devices\": { \"type\": \"string\" },\n", " \"Operating Systems\": { \"type\": \"string\" },\n", " \"Browsers\": { \"type\": \"string\" },\n", " \"Software\": { \"type\": \"string\" },\n", " \"Social Media Platforms\": { \"type\": \"string\" },\n", " \"Internet Connectivity\": { \"type\": \"string\" },\n", " \"Tech Savviness\": { \"type\": \"string\" },\n", " \"Adoption Rate\": { \"type\": \"string\" }\n", " },\n", " \"required\": [\"Devices\", \"Operating Systems\", \"Browsers\", \"Software\", \"Social Media Platforms\", \"Internet Connectivity\", \"Tech Savviness\", \"Adoption Rate\"]\n", " },\n", " \"needs_based_segmentation\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"Specific Needs\": { \"type\": \"string\" },\n", " \"Pain Points\": { \"type\": \"string\" },\n", " \"Challenges\": { \"type\": \"string\" },\n", " \"Goals\": { \"type\": \"string\" },\n", " \"Priorities\": { \"type\": \"string\" }\n", " },\n", " \"required\": [\"Specific Needs\", \"Pain Points\", \"Challenges\", \"Goals\", \"Priorities\"]\n", " },\n", " \"value_based_segmentation\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"Perceived Value\": { \"type\": \"string\" },\n", " \"Price Sensitivity\": { \"type\": \"string\" },\n", " \"Willingness to Pay\": { \"type\": \"string\" },\n", " \"Cost-Benefit Analysis\": { \"type\": \"string\" }\n", " },\n", " \"required\": [\"Perceived Value\", \"Price Sensitivity\", \"Willingness to Pay\", \"Cost-Benefit Analysis\"]\n", " },\n", " \"customer_lifecycle_segmentation\": {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"New Leads\": { \"type\": \"array\", \"items\": { \"type\": \"string\" } },\n", " \"Potential Customers\": { \"type\": \"array\", \"items\": { \"type\": \"string\" } },\n", " \"First-Time Customers\": { \"type\": \"array\", \"items\": { \"type\": \"string\" } },\n", " \"Repeat Customers\": { \"type\": \"array\", \"items\": { \"type\": \"string\" } },\n", " \"Loyal Advocates\": { \"type\": \"array\", \"items\": { \"type\": \"string\" } },\n", " \"At-Risk Customers\": { \"type\": \"array\", \"items\": { \"type\": \"string\" } },\n", " \"Former Customers\": { \"type\": \"array\", \"items\": { \"type\": \"string\" } },\n", " \"Inactive Customers\": { \"type\": \"array\", \"items\": { \"type\": \"string\" } }\n", " },\n", " \"required\": [\"New Leads\", \"Potential Customers\", \"First-Time Customers\", \"Repeat Customers\", \"Loyal Advocates\", \"At-Risk Customers\", \"Former Customers\", \"Inactive Customers\"]\n", " }\n", " },\n", " \"required\": [\"demographic_segmentation\", \"geographic_segmentation\", \"psychographic_segmentation\", \"behavioral_segmentation\", \"technographic_segmentation\", \"needs_based_segmentation\", \"value_based_segmentation\", \"customer_lifecycle_segmentation\"]\n", " }\n", " }\n", " }\n", "\n", " result = GeminiLLM(prompt, response_schema=response_schema)\n", " return result\n" ] }, { "cell_type": "markdown", "metadata": { "id": "AxoUPDCvhASa" }, "source": [ "#### Generator Function for `customer_marketing_insights`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Gd6GYUtVI1Xw" }, "outputs": [], "source": [ "# Using tenacity to retry in case of resource exhausted errors (429)\n", "@retry(wait=wait_exponential(multiplier=1, min=1, max=60), stop=stop_after_attempt(10), retry=retry_if_exception(RetryCondition), before_sleep=before_sleep_log(logging.getLogger(), logging.INFO))\n", "def GenerateMarketingInsights(customer_id, customer_profile_data, customer_loyalty_data, customer_segmentation_data):\n", " # For each row in customer_marketing_profile, send this prompt:\n", " prompt = f\"\"\"\n", " You will be given three JSON objects with data about a customer: customer_profile_data, customer_loyalty_data, customer_segmentation_data.\n", "\n", " Your task is to generate a summary for this customer, highlighting the most important things a Marketing Analyst should know in order to successfully sell to the customer.\n", "\n", " Here is the customer's customer_profile_data:\n", " {customer_profile_data}\n", "\n", " Here is the customer's customer_loyalty_data:\n", " {customer_loyalty_data}\n", "\n", " Here is the customer's customer_segmentation_data:\n", " {customer_segmentation_data}\n", "\n", " Format your output as a text string, not JSON.\n", "\n", " Now think step by step and generate the summary.\n", "\n", " \"\"\"\n", "\n", " response_schema = {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"output_string\": {\n", " \"type\": \"string\"\n", " }\n", " },\n", " \"required\": \"output_string\"\n", " }\n", "\n", "\n", " result = GeminiLLM(prompt, response_schema=response_schema)\n", " return result\n" ] }, { "cell_type": "markdown", "metadata": { "id": "KD5r5Pw-fI3j" }, "source": [ "#### Define Derivation Function for `customer_loyalty_data`" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "qbtUxxBTR00N" }, "outputs": [], "source": [ "@retry(wait=wait_exponential(multiplier=1, min=1, max=60), stop=stop_after_attempt(10), retry=retry_if_exception(RetryCondition), before_sleep=before_sleep_log(logging.getLogger(), logging.INFO))\n", "def DeriveCustomerLoyalty(customer_id):\n", " sql=f\"\"\"SELECT PARSE_JSON(chocolate_ai_profile.customer_data, wide_number_mode=>'round') AS customer_loyalty_data\n", " FROM (\n", " WITH\n", " customer_lifetime_spend AS (\n", " SELECT\n", " c.customer_id,\n", " SUM(oi.item_total) AS total_amount_spent,\n", " AVG(oi.item_total) AS average_amount_spent_per_order,\n", " COUNT(DISTINCT o.order_id) AS total_orders,\n", " MAX(order_datetime) AS last_order_date\n", " FROM\n", " `${project_id}.${bigquery_chocolate_ai_dataset}.customer` AS c\n", " INNER JOIN `${project_id}.${bigquery_chocolate_ai_dataset}.order` AS o ON c.customer_id = o.customer_id\n", " INNER JOIN `${project_id}.${bigquery_chocolate_ai_dataset}.order_item` AS oi ON o.order_id = oi.order_id\n", " WHERE CAST(o.order_datetime AS DATETIME) < CURRENT_DATETIME()\n", " AND c.customer_id = {customer_id}\n", " GROUP BY 1\n", " ),\n", "\n", " purchase_locations AS (\n", " SELECT\n", " t1.customer_id,\n", " ARRAY_AGG(DISTINCT t2.store_id) AS purchase_locations,\n", " ARRAY_AGG(store_id ORDER BY cnt DESC LIMIT 1)[SAFE_OFFSET(0)] AS most_frequent_purchase_location\n", " FROM\n", " `${project_id}.${bigquery_chocolate_ai_dataset}.customer` AS t1\n", " INNER JOIN (\n", " SELECT\n", " customer_id,\n", " store_id,\n", " COUNT(*) AS cnt\n", " FROM\n", " `${project_id}.${bigquery_chocolate_ai_dataset}.order` AS o\n", " WHERE CAST(o.order_datetime AS DATETIME) < CURRENT_DATETIME()\n", " AND o.customer_id = {customer_id}\n", " GROUP BY 1, 2\n", " ) AS t2 ON t1.customer_id = t2.customer_id\n", " WHERE t1.customer_id = {customer_id}\n", " GROUP BY 1\n", " ),\n", "\n", " favorite_menu_items AS (\n", " WITH CustomerOrderItemCounts AS (\n", " SELECT\n", " c.customer_id,\n", " oi.menu_id,\n", " COUNT(*) AS item_count\n", " FROM\n", " `${project_id}.${bigquery_chocolate_ai_dataset}.customer` AS c\n", " INNER JOIN\n", " `${project_id}.${bigquery_chocolate_ai_dataset}.order` AS o\n", " ON c.customer_id = o.customer_id\n", " INNER JOIN\n", " `${project_id}.${bigquery_chocolate_ai_dataset}.order_item` AS oi\n", " ON o.order_id = oi.order_id\n", " WHERE\n", " CAST(o.order_datetime AS DATETIME) < CURRENT_DATETIME()\n", " AND c.customer_id = {customer_id}\n", " GROUP BY\n", " c.customer_id, oi.menu_id\n", " ),\n", " RankedCustomerOrderItemCounts AS (\n", " SELECT\n", " customer_id,\n", " menu_id,\n", " item_count,\n", " ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY item_count DESC) AS rn\n", " FROM\n", " `CustomerOrderItemCounts`\n", " WHERE\n", " customer_id = {customer_id}\n", " )\n", " SELECT\n", " customer_id,\n", " ARRAY_AGG(menu_id ORDER BY rn) AS top_3_favorite_menu_items\n", " FROM\n", " `RankedCustomerOrderItemCounts`\n", " WHERE\n", " rn <= 3\n", " and customer_id = {customer_id}\n", " GROUP BY\n", " customer_id\n", " ),\n", "\n", " review_data AS (\n", " WITH CustomerReviewCounts AS (\n", " SELECT\n", " customer_id,\n", " COUNT(customer_review_id) AS total_reviews,\n", " COUNTIF(review_sentiment = 'Positive') AS positive_reviews,\n", " COUNTIF(review_sentiment = 'Negative') AS negative_reviews,\n", " COUNTIF(review_sentiment = 'Neutral') AS neutral_reviews\n", " FROM\n", " `${project_id}.${bigquery_chocolate_ai_dataset}.customer_review`\n", " WHERE CAST(review_datetime AS DATETIME) < CURRENT_DATETIME()\n", " AND customer_id = {customer_id}\n", " GROUP BY\n", " customer_id\n", " ),\n", "\n", " CustomerLatestReview AS (\n", " SELECT\n", " customer_id,\n", " ARRAY_AGG(review_sentiment ORDER BY review_datetime DESC LIMIT 1)[SAFE_OFFSET(0)] AS latest_review_sentiment\n", " FROM\n", " `${project_id}.${bigquery_chocolate_ai_dataset}.customer_review`\n", " WHERE CAST(review_datetime AS DATETIME) < CURRENT_DATETIME()\n", " AND customer_id = {customer_id}\n", " GROUP BY\n", " customer_id\n", " )\n", "\n", " SELECT\n", " c.customer_id,\n", " COALESCE(crc.total_reviews, 0) AS total_reviews,\n", " clr.latest_review_sentiment,\n", " SAFE_DIVIDE(crc.positive_reviews * 100, crc.total_reviews) AS positive_review_percentage,\n", " SAFE_DIVIDE(crc.negative_reviews * 100, crc.total_reviews) AS negative_review_percentage,\n", " SAFE_DIVIDE(crc.neutral_reviews * 100, crc.total_reviews) AS neutral_review_percentage\n", " FROM\n", " `${project_id}.${bigquery_chocolate_ai_dataset}.customer` AS c\n", " LEFT JOIN\n", " `CustomerReviewCounts` AS crc ON c.customer_id = crc.customer_id\n", " LEFT JOIN\n", " `CustomerLatestReview` AS clr ON c.customer_id = clr.customer_id\n", " WHERE c.customer_id = {customer_id}\n", " ORDER BY\n", " c.customer_id\n", " ),\n", "\n", " profile_cte AS (\n", " SELECT\n", " clv.customer_id,\n", " TO_JSON_STRING(STRUCT(\n", " clv.total_amount_spent,\n", " clv.average_amount_spent_per_order,\n", " clv.total_orders,\n", " clv.last_order_date,\n", " pl.purchase_locations,\n", " pl.most_frequent_purchase_location,\n", " fmi.top_3_favorite_menu_items,\n", " rd.total_reviews,\n", " rd.latest_review_sentiment,\n", " rd.positive_review_percentage,\n", " rd.negative_review_percentage,\n", " rd.neutral_review_percentage\n", " )) AS customer_data\n", " FROM\n", " customer_lifetime_spend AS clv\n", " JOIN purchase_locations AS pl ON clv.customer_id = pl.customer_id\n", " JOIN review_data AS rd ON clv.customer_id = rd.customer_id\n", " JOIN favorite_menu_items AS fmi ON clv.customer_id = fmi.customer_id\n", " WHERE clv.customer_id = {customer_id}\n", " )\n", " SELECT customer_id, customer_data FROM profile_cte\n", " ) AS chocolate_ai_profile\n", " WHERE chocolate_ai_profile.customer_id = {customer_id};\n", " \"\"\"\n", "\n", " result = RunQuery(sql)\n", " return result.iloc[0,0]" ] }, { "cell_type": "markdown", "metadata": { "id": "cbYsy6-zd1yv" }, "source": [ "##### READ MORE\n", "\n", "This section is provided for informational purposes. It defines the individual queries that were combined in the larger update query used by the Derivation Function for `customer_loyalty_data` above." ] }, { "cell_type": "markdown", "metadata": { "id": "YqGjw0zuF35N" }, "source": [ "###### Derivation Query Definitions\n", "\n", "Get Customer Lifetime Spend and Last Order\n", "\n", "```sql\n", "# For each customer in the `customer` table, get the total amount spent by the customer.\n", "# The amount a customer has spent is the sum of `item_total` in the `order_item` table.\n", "# Find which order_item rows apply to each customer by joining the following tables: `customer`, `order`, `order_item`.\n", "SELECT\n", " c.customer_id,\n", " SUM(oi.item_total) AS total_amount_spent,\n", " AVG(oi.item_total) AS average_amount_spent_per_order,\n", " COUNT(DISTINCT o.order_id) AS total_orders,\n", " MAX(order_datetime) AS last_order_date\n", " FROM\n", " `${project_id}.${bigquery_chocolate_ai_dataset}.customer` AS c\n", " INNER JOIN `${project_id}.${bigquery_chocolate_ai_dataset}.order` AS o ON c.customer_id = o.customer_id\n", " INNER JOIN `${project_id}.${bigquery_chocolate_ai_dataset}.order_item` AS oi ON o.order_id = oi.order_id\n", " WHERE CAST(o.order_datetime AS DATETIME) < CURRENT_DATETIME()\n", " GROUP BY 1\n", " ```\n", "\n", "###### Get Average Spent Per Order\n", "\n", "```sql\n", "# For each customer in the `customer` table, get the average amount spent by the customer per order.\n", "# The amount a customer has spent is the sum of `item_total` in the `order_item` table.\n", "# Find which order_item rows apply to each customer by joining the following tables: `customer`, `order`, `order_item`.\n", "SELECT\n", " c.customer_id,\n", " avg(oi.item_total) AS average_amount_spent_per_order\n", " FROM\n", " `${project_id}.${bigquery_chocolate_ai_dataset}.customer` AS c\n", " INNER JOIN `${project_id}.${bigquery_chocolate_ai_dataset}.order` AS o ON c.customer_id = o.customer_id\n", " INNER JOIN `${project_id}.${bigquery_chocolate_ai_dataset}.order_item` AS oi ON o.order_id = oi.order_id\n", " WHERE CAST(o.order_datetime AS DATETIME) < CURRENT_DATETIME()\n", " GROUP BY 1;\n", "```\n", "\n", "###### Get Total Number of Orders\n", "\n", "```sql\n", "# For each customer in the `customer` table, get the total number of orders.\n", "# Find which orders apply to each customer by joining the following tables: `customer`, `order`.\n", "SELECT\n", " c.customer_id,\n", " COUNT(o.order_id) AS total_orders\n", "FROM\n", " `${project_id}.${bigquery_chocolate_ai_dataset}.customer` AS c\n", " INNER JOIN `${project_id}.${bigquery_chocolate_ai_dataset}.order` AS o ON c.customer_id = o.customer_id\n", "WHERE CAST(o.order_datetime AS DATETIME) < CURRENT_DATETIME()\n", "GROUP BY c.customer_id\n", "ORDER BY total_orders DESC;\n", "```\n", "\n", "###### Get Locations Where Customer Has Purchased\n", "\n", "```sql\n", "# For each customer in the `customer` table, get the list of locations where they have purchased.\n", "# Output the purchase locations as an array.\n", "# The location of a purchase is listed as `store_id` in the `order` table.\n", "# Orders are associated with customers via the `customer_id` column in the `order` table.\n", "SELECT\n", " t1.customer_id,\n", " array_agg(DISTINCT t2.store_id) AS purchase_locations\n", " FROM\n", " `${project_id}.${bigquery_chocolate_ai_dataset}.customer` AS t1\n", " INNER JOIN `${project_id}.${bigquery_chocolate_ai_dataset}.order` AS t2 ON t1.customer_id = t2.customer_id\n", " WHERE CAST(t2.order_datetime AS DATETIME) < CURRENT_DATETIME()\n", " GROUP BY 1;\n", "```\n", "\n", "###### Get Most Frequent Purchase Location\n", "\n", "```sql\n", "# For each customer in the `customer` table, get the location where they most frequently purchase.\n", "# Output the purchase locations as an array.\n", "# The location of a purchase is listed as `store_id` in the `order` table.\n", "# Orders are associated with customers via the `customer_id` column in the `order` table.\n", "SELECT\n", " t1.customer_id,\n", " ARRAY_AGG(store_id ORDER BY cnt DESC LIMIT 1) AS most_frequent_purchase_location\n", " FROM\n", " `${project_id}.${bigquery_chocolate_ai_dataset}.customer` AS t1\n", " INNER JOIN (\n", " SELECT\n", " customer_id,\n", " store_id,\n", " count(*) AS cnt\n", " FROM\n", " `${project_id}.${bigquery_chocolate_ai_dataset}.order`\n", " GROUP BY 1, 2\n", " ) AS t2 ON t1.customer_id = t2.customer_id\n", " GROUP BY 1;\n", "```\n", "\n", "###### Get Favorite Menu Items\n", "\n", "```sql\n", "# For each customer in the `customer` table, get the top 3 most commonly purchased items.\n", "# The items a customer has purchased is shown as `menu_id` in the `order_item` table.\n", "# Find which order_item rows apply to each customer by joining the following tables: `customer`, `order`, `order_item`.\n", "# Output the top 3 items as an ordered array per customer\n", "WITH CustomerOrderItemCounts AS (\n", " SELECT\n", " c.customer_id,\n", " oi.menu_id,\n", " COUNT(*) AS item_count\n", " FROM\n", " `${project_id}.${bigquery_chocolate_ai_dataset}.customer` AS c\n", " INNER JOIN\n", " `${project_id}.${bigquery_chocolate_ai_dataset}.order` AS o\n", " ON c.customer_id = o.customer_id\n", " INNER JOIN\n", " `${project_id}.${bigquery_chocolate_ai_dataset}.order_item` AS oi\n", " ON o.order_id = oi.order_id\n", " GROUP BY\n", " c.customer_id, oi.menu_id\n", "),\n", "RankedCustomerOrderItemCounts AS (\n", " SELECT\n", " customer_id,\n", " menu_id,\n", " item_count,\n", " ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY item_count DESC) AS rn\n", " FROM\n", " `CustomerOrderItemCounts`\n", ")\n", "SELECT\n", " customer_id,\n", " ARRAY_AGG(menu_id ORDER BY rn) AS top_3_favorite_menu_items\n", "FROM\n", " `RankedCustomerOrderItemCounts`\n", "WHERE\n", " rn <= 3\n", "GROUP BY\n", " customer_id;\n", "```\n", "\n", "###### Get Review Counts and Sentiment\n", "\n", "```sql\n", "# For each customer in the `customer` table, get the number of reviews in the `customer_review` table, the sentiment of their most recent review, the percentage of their positive reviews, the percentage of their negative reviews, and the percentage of their netural reviews. Sentiment is defined by `review_sentiment` and the most common sentiment of their reviews.\n", "WITH CustomerReviewCounts AS (\n", " SELECT\n", " customer_id,\n", " COUNT(customer_review_id) AS total_reviews,\n", " COUNTIF(review_sentiment = 'Positive') AS positive_reviews,\n", " COUNTIF(review_sentiment = 'Negative') AS negative_reviews,\n", " COUNTIF(review_sentiment = 'Neutral') AS neutral_reviews\n", " FROM\n", " `${project_id}.${bigquery_chocolate_ai_dataset}.customer_review`\n", " GROUP BY\n", " customer_id\n", "),\n", "\n", "CustomerLatestReview AS (\n", " SELECT\n", " customer_id,\n", " ARRAY_AGG(review_sentiment ORDER BY review_datetime DESC LIMIT 1)[SAFE_OFFSET(0)] AS latest_review_sentiment\n", " FROM\n", " `${project_id}.${bigquery_chocolate_ai_dataset}.customer_review`\n", " GROUP BY\n", " customer_id\n", ")\n", "\n", "SELECT\n", " c.customer_id,\n", " COALESCE(crc.total_reviews, 0) AS total_reviews,\n", " clr.latest_review_sentiment,\n", " SAFE_DIVIDE(crc.positive_reviews * 100, crc.total_reviews) AS positive_review_percentage,\n", " SAFE_DIVIDE(crc.negative_reviews * 100, crc.total_reviews) AS negative_review_percentage,\n", " SAFE_DIVIDE(crc.neutral_reviews * 100, crc.total_reviews) AS neutral_review_percentage\n", "FROM\n", " `${project_id}.${bigquery_chocolate_ai_dataset}.customer` AS c\n", "LEFT JOIN\n", " `CustomerReviewCounts` AS crc ON c.customer_id = crc.customer_id\n", "LEFT JOIN\n", " `CustomerLatestReview` AS clr ON c.customer_id = clr.customer_id\n", "ORDER BY\n", " c.customer_id;\n", "```\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "E7gF377g9sV8" }, "source": [ "### Step 3: Run Generator and Derivation Functions for Profile Data" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Q_FG3LRfdYlP" }, "outputs": [], "source": [ "# Generate customer data\n", "sql = \"\"\"\n", "SELECT\n", " customer_id,\n", " customer_name,\n", " customer_yob,\n", " customer_inception_date,\n", " country_code\n", "FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer`\n", "WHERE customer_id NOT IN\n", " (\n", " SELECT customer_id\n", " FROM `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile`\n", " )\n", "LIMIT 50000;\n", "\"\"\"\n", "\n", "df_process = client.query(sql).to_dataframe()\n", "\n", "for row in df_process.itertuples():\n", " customer_id = row.customer_id\n", " customer_name = row.customer_name\n", " customer_yob = row.customer_yob\n", " customer_inception_date = row.customer_inception_date\n", " country_code = row.country_code\n", "\n", " # Generate and derive profile data\n", " mp_result = GenerateCustomerMarketingProfile(customer_id, customer_name, customer_yob, customer_inception_date, country_code)\n", " print(f\"Generated marketing profile data for customer: {customer_id}\")\n", " ld_result = DeriveCustomerLoyalty(customer_id)\n", " print(f\"Derived loyalty data for customer: {customer_id}\")\n", " sd_result = GenerateCustomerSegments(customer_id, mp_result, ld_result)\n", " print(f\"Generated customer segment data for customer: {customer_id}\")\n", " mi_result = GenerateMarketingInsights(customer_id, mp_result, ld_result, sd_result)\n", " print(f\"Generated marketing insights for customer: {customer_id}\")\n", "\n", " # Load the results as JSON for proper formatting and escaping\n", " mp_result = json.loads(mp_result)\n", " ld_result = json.loads(ld_result)\n", " sd_result = json.loads(sd_result)\n", " mi_result = json.loads(mi_result)\n", "\n", " # Save profile to database\n", " job_config = bigquery.QueryJobConfig(\n", " query_parameters=[\n", " bigquery.ScalarQueryParameter(\"mp_result\", \"JSON\", mp_result),\n", " bigquery.ScalarQueryParameter(\"ld_result\", \"JSON\", ld_result),\n", " bigquery.ScalarQueryParameter(\"sd_result\", \"JSON\", sd_result),\n", " bigquery.ScalarQueryParameter(\"mi_result\", \"STRING\", mi_result[\"output_string\"])\n", " ],\n", " priority=bigquery.QueryPriority.INTERACTIVE\n", " )\n", "\n", " sql=f\"\"\"INSERT INTO `${project_id}.${bigquery_chocolate_ai_dataset}.customer_marketing_profile`\n", " (customer_id, customer_profile_data, customer_loyalty_data, customer_segmentation_data, customer_marketing_insights)\n", " VALUES({customer_id}, @mp_result, @ld_result, @sd_result, @mi_result);\"\"\"\n", "\n", " RunQuery(sql, job_config)\n" ] }, { "cell_type": "markdown", "metadata": { "id": "B2VYgNTxuBXQ" }, "source": [ "## Step 4: Generate Embeddings and Explore Segment Data\n", "\n", "Now that we have our source data generated, navigate to the notebook titled `Create-Campaign-Customer-Segmentation` to generate embeddings and explore the customer segmentation data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## <font color='#4285f4'>Reference Links</font> " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- [Generate content with the Gemini Enterprise API](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference)\n", "- [Controlled Generation with Gemini](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/control-generated-output)" ] } ], "metadata": { "colab": { "collapsed_sections": [ "saCadgPsBnMS" ], "name": "Synthetic-Data-Generation-Customers-Marketing-Profile.ipynb", "private_outputs": true, "provenance": [], "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 0 }