colab-enterprise/Synthetic-Data-Generation-Campaigns.ipynb

{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "2eJaZU5YAQUf" }, "source": [ "### Overview" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This process generates synthetic data for marketing campaigns, including campaign details, performance metrics, content, and recommendations. It simulates historical campaigns with defined budgets, target audiences, expected outcomes, and performance data, providing a comprehensive dataset for analysis and testing.\n", "\n", "Process Flow:\n", "\n", "1. Creates the BigQuery table: chocolate_ai.campaign\n", "2. Creates the BigQuery table: chocolate_ai.campaign_performance\n", "3. Generates Campaign Recommendations: name, description, target audience, expected outcome and explanation\n", "4. Get the Ranked Products Monthly\n", "5. Create Campaign Budgets\n", "6. Generate Historical Campaigns\n", "7. Set the dates\n", "8. Simulate Campaign Performance\n", "\n", "\n", "Cost:\n", "* Low: Gemini, BigQuery\n", "* Medium: Remember to stop your Colab Enterprise Notebook Runtime\n", "\n", "Author: \n", "* Paul Ramsey" ] }, { "cell_type": "markdown", "metadata": { "id": "HMsUvoF4BP7Y" }, "source": [ "### License" ] }, { "cell_type": "markdown", "metadata": { "id": "jQgQkbOvj55d" }, "source": [ "```\n", "# Copyright 2024 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License.\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "UmyL-Rg4Dr_f" }, "source": [ "### Pip Installs" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# PIP Installs\n", "import sys\n", "\n", "# https://PLACEHOLDER.com/index.html\n", "\n", "# For better performance and production, deploy to Vertex AI endpoint with GPU\n", "# This takes about 5 minutes to install and you will need to reset your runtime\n", "#!{sys.executable} -m pip install timesfm" ] }, { "cell_type": "markdown", "metadata": { "id": "m65vp54BUFRi" }, "source": [ "### Initalize" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "xOYsEVSXp6IP" }, "outputs": [], "source": [ "import google.auth\n", "import requests\n", "import json\n", "import random\n", "from datetime import datetime, timedelta\n", "import numpy as np\n", "import logging\n", "\n", "from tenacity import retry, wait_exponential, stop_after_attempt, before_sleep_log, retry_if_exception\n", "from google.cloud import bigquery\n", "client = bigquery.Client()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "wMlHl3bnkFPZ" }, "outputs": [], "source": [ "# Update these variables to match your environment\n", "location=\"us-central1\" # Your region\n", "bigquery_location = \"${bigquery_location}\" # Must be \"us\" or \"eu\"\n", "\n", "### Do not change the values in this cell below this line ###\n", "project_id = !(gcloud config get-value project)\n", "user = !(gcloud auth list --filter=status:ACTIVE --format=\"value(account)\")\n", "\n", "if len(project_id) != 1:\n", " raise RuntimeError(f\"project_id is not set: {project_id}\")\n", "project_id = project_id[0]\n", "\n", "if len(user) != 1:\n", " raise RuntimeError(f\"user is not set: {user}\")\n", "user = user[0]\n", "\n", "print(f\"project_id = {project_id}\")\n", "print(f\"user = {user}\")\n", "print(f\"location = {location}\")\n", "print(f\"bigquery_location = {bigquery_location}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "sZ6m_wGrK0YG" }, "source": [ "### Helper Methods" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### RetryCondition(error)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def RetryCondition(error):\n", " error_string = str(error)\n", " print(error_string)\n", "\n", " retry_errors = [\n", " \"RESOURCE_EXHAUSTED\",\n", " \"No content in candidate\",\n", " # Add more error messages here as needed\n", " ]\n", "\n", " for retry_error in retry_errors:\n", " if retry_error in error_string:\n", " print(\"Retrying...\")\n", " return True\n", "\n", " return False" ] }, { "cell_type": "markdown", "metadata": { "id": "JbOjdSP1kN9T" }, "source": [ "#### restAPIHelper()\n", "Calls the Google Cloud REST API using the current users credentials." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "40wlwnY4kM11" }, "outputs": [], "source": [ "def restAPIHelper(url: str, http_verb: str, request_body: str) -> str:\n", " \"\"\"Calls the Google Cloud REST API passing in the current users credentials\"\"\"\n", "\n", " import requests\n", " import google.auth\n", " import json\n", "\n", " # Get an access token based upon the current user\n", " creds, project = google.auth.default()\n", " auth_req = google.auth.transport.requests.Request()\n", " creds.refresh(auth_req)\n", " access_token=creds.token\n", "\n", " headers = {\n", " \"Content-Type\" : \"application/json\",\n", " \"Authorization\" : \"Bearer \" + access_token\n", " }\n", "\n", " if http_verb == \"GET\":\n", " response = requests.get(url, headers=headers)\n", " elif http_verb == \"POST\":\n", " response = requests.post(url, json=request_body, headers=headers)\n", " elif http_verb == \"PUT\":\n", " response = requests.put(url, json=request_body, headers=headers)\n", " elif http_verb == \"PATCH\":\n", " response = requests.patch(url, json=request_body, headers=headers)\n", " elif http_verb == \"DELETE\":\n", " response = requests.delete(url, headers=headers)\n", " else:\n", " raise RuntimeError(f\"Unknown HTTP verb: {http_verb}\")\n", "\n", " if response.status_code == 200:\n", " return json.loads(response.content)\n", " #image_data = json.loads(response.content)[\"predictions\"][0][\"bytesBase64Encoded\"]\n", " else:\n", " error = f\"Error restAPIHelper -> ' Status: '{response.status_code}' Text: '{response.text}'\"\n", " raise RuntimeError(error)" ] }, { "cell_type": "markdown", "metadata": { "id": "tZSpPUde3FRX" }, "source": [ "#### GeminiLLM (Pro 1.0 , Pro 1.5)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "h-PgzuF53LoS" }, "outputs": [], "source": [ "@retry(wait=wait_exponential(multiplier=1, min=1, max=60), stop=stop_after_attempt(10), retry=retry_if_exception(RetryCondition), before_sleep=before_sleep_log(logging.getLogger(), logging.INFO))\n", "def GeminiLLM(prompt, model = \"gemini-2.0-flash\", response_schema = None,\n", " temperature = 1, topP = 1, topK = 32):\n", "\n", " # https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference#supported_models\n", " # model = \"gemini-2.0-flash\"\n", "\n", "\n", " llm_response = None\n", " if temperature < 0:\n", " temperature = 0\n", "\n", " creds, project = google.auth.default()\n", " auth_req = google.auth.transport.requests.Request() # required to acess access token\n", " creds.refresh(auth_req)\n", " access_token=creds.token\n", "\n", " headers = {\n", " \"Content-Type\" : \"application/json\",\n", " \"Authorization\" : \"Bearer \" + access_token\n", " }\n", "\n", " # https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference\n", " url = f\"https://{location}-aiplatform.googleapis.com/v1/projects/{project_id}/locations/{location}/publishers/google/models/{model}:generateContent\"\n", "\n", " generation_config = {\n", " \"temperature\": temperature,\n", " \"topP\": topP,\n", " \"maxOutputTokens\": 8192,\n", " \"candidateCount\": 1,\n", " \"responseMimeType\": \"application/json\",\n", " }\n", "\n", " # Add inthe response schema for when it is provided\n", " if response_schema is not None:\n", " generation_config[\"responseSchema\"] = response_schema\n", "\n", " if model == \"gemini-2.0-flash\":\n", " generation_config[\"topK\"] = topK\n", "\n", " payload = {\n", " \"contents\": {\n", " \"role\": \"user\",\n", " \"parts\": {\n", " \"text\": prompt\n", " },\n", " },\n", " \"generation_config\": {\n", " **generation_config\n", " },\n", " \"safety_settings\": {\n", " \"category\": \"HARM_CATEGORY_SEXUALLY_EXPLICIT\",\n", " \"threshold\": \"BLOCK_LOW_AND_ABOVE\"\n", " }\n", " }\n", "\n", " response = requests.post(url, json=payload, headers=headers)\n", "\n", " if response.status_code == 200:\n", " try:\n", " json_response = json.loads(response.content)\n", " except Exception as error:\n", " raise RuntimeError(f\"An error occurred parsing the JSON: {error}\")\n", "\n", " if \"candidates\" in json_response:\n", " candidates = json_response[\"candidates\"]\n", " if len(candidates) > 0:\n", " candidate = candidates[0]\n", " if \"content\" in candidate:\n", " content = candidate[\"content\"]\n", " if \"parts\" in content:\n", " parts = content[\"parts\"]\n", " if len(parts):\n", " part = parts[0]\n", " if \"text\" in part:\n", " text = part[\"text\"]\n", " llm_response = text\n", " else:\n", " raise RuntimeError(\"No text in part: {response.content}\")\n", " else:\n", " raise RuntimeError(\"No parts in content: {response.content}\")\n", " else:\n", " raise RuntimeError(\"No parts in content: {response.content}\")\n", " else:\n", " raise RuntimeError(\"No content in candidate: {response.content}\")\n", " else:\n", " raise RuntimeError(\"No candidates: {response.content}\")\n", " else:\n", " raise RuntimeError(\"No candidates: {response.content}\")\n", "\n", " # Remove some typically response characters (if asking for a JSON reply)\n", " llm_response = llm_response.replace(\"```json\",\"\")\n", " llm_response = llm_response.replace(\"```\",\"\")\n", " llm_response = llm_response.replace(\"\\n\",\"\")\n", "\n", " return llm_response\n", "\n", " else:\n", " raise RuntimeError(f\"Error with prompt:'{prompt}' Status:'{response.status_code}' Text:'{response.text}'\")" ] }, { "cell_type": "markdown", "metadata": { "id": "Wp6zIq-3M86P" }, "source": [ "#### RunQuery(sql)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "BbDTEwYWM4Vv" }, "outputs": [], "source": [ "def RunQuery(sql, job_config = None):\n", " import time\n", "\n", " if (sql.startswith(\"SELECT\") or sql.startswith(\"WITH\")):\n", " df_result = client.query(sql).to_dataframe()\n", " return df_result\n", " else:\n", " if job_config == None:\n", " job_config = bigquery.QueryJobConfig(priority=bigquery.QueryPriority.INTERACTIVE)\n", " query_job = client.query(sql, job_config=job_config)\n", "\n", " # Check on the progress by getting the job's updated state.\n", " query_job = client.get_job(\n", " query_job.job_id, location=query_job.location\n", " )\n", " print(\"Job {} is currently in state {} with error result of {}\".format(query_job.job_id, query_job.state, query_job.error_result))\n", "\n", " while query_job.state != \"DONE\":\n", " time.sleep(2)\n", " query_job = client.get_job(\n", " query_job.job_id, location=query_job.location\n", " )\n", " print(\"Job {} is currently in state {} with error result of {}\".format(query_job.job_id, query_job.state, query_job.error_result))\n", "\n", " if query_job.error_result == None:\n", " return True\n", " else:\n", " return False" ] }, { "cell_type": "markdown", "metadata": { "id": "QLXgLpdWE0MA" }, "source": [ "#### PrettyPrintJson(json_string)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "9VMElV2A3m56" }, "outputs": [], "source": [ "def PrettyPrintJson(json_string):\n", " json_object = json.loads(json_string)\n", " json_formatted_str = json.dumps(json_object, indent=2)\n", " print(json_formatted_str)\n", " return json.dumps(json_object)" ] }, { "cell_type": "markdown", "metadata": { "id": "LnoeM8d1oEj-" }, "source": [ "### Create Tables" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "GVDMuhE5kug4" }, "outputs": [], "source": [ "# Create the campaign table\n", "\n", "%%bigquery\n", "\n", "CREATE TABLE IF NOT EXISTS `${project_id}.${bigquery_chocolate_ai_dataset}.campaign` (\n", " campaign_id STRING NOT NULL OPTIONS(description=\"Unique identifier for each campaign (e.g., 'SummerSale2024', 'HolidayPromoDec')\"),\n", " menu_id INT64 OPTIONS(description=\"Id of the menu item being promoted in the campaign\"),\n", " campaign_name STRING OPTIONS(description=\"A descriptive name for the campaign.\"),\n", " campaign_description STRING OPTIONS(description=\"A detailed description of the campaign.\"),\n", " campaign_goal STRING OPTIONS(description=\"The objective of the campaign (e.g., 'Increase brand awareness', 'Drive website traffic', 'Generate leads')\"),\n", " target_audience STRING OPTIONS(description=\"The intended audience for the campaign (e.g., 'Women aged 25-35', 'Small business owners in Texas')\"),\n", " marketing_channels STRING OPTIONS(description=\"The marketing channels used in the campaign (e.g., 'Social Media', 'Email', 'SEO', 'Paid Ads')\"),\n", " budget FLOAT64 OPTIONS(description=\"The overall budget allocated for the campaign.\"),\n", " explanation STRING OPTIONS(description=\"The model reasoning for creating the campaign.\"),\n", " campaign_outcomes STRING OPTIONS(description=\"Expected outcomes of the campaign.\"),\n", " campaign_start_date DATE OPTIONS(description=\"The date the campaign started.\"),\n", " campaign_end_date DATE OPTIONS(description=\"The date the campaign ended.\"),\n", " campaign_created_date DATETIME OPTIONS(description=\"The date the campaign was created.\")\n", ") CLUSTER BY campaign_id;" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Km19l2ngmBOE" }, "outputs": [], "source": [ "# Create the campaign_performance table\n", "\n", "%%bigquery\n", "\n", "CREATE TABLE IF NOT EXISTS `${project_id}.${bigquery_chocolate_ai_dataset}.campaign_performance` (\n", " campaign_id STRING NOT NULL OPTIONS(description=\"Foreign key referencing the campaign table. Unique identifier for the campaign.\"),\n", " impressions INT64 OPTIONS(description=\"Number of times the campaign was displayed.\"),\n", " reach INT64 OPTIONS(description=\"Number of unique individuals exposed to the campaign.\"),\n", " website_traffic INT64 OPTIONS(description=\"Visits to your website attributed to the campaign.\"),\n", " leads_generated INT64 OPTIONS(description=\"Number of new leads captured (e.g., email sign-ups, contact form submissions).\"),\n", " conversions INT64 OPTIONS(description=\"Number of desired actions taken (e.g., purchases, downloads).\"),\n", " cost_per_click FLOAT64 OPTIONS(description=\"Average cost per click on your ads (if applicable).\"),\n", " cost_per_acquisition FLOAT64 OPTIONS(description=\"Average cost to acquire a new customer.\"),\n", " return_on_investment FLOAT64 OPTIONS(description=\"Return on Investment calculated as (Revenue - Cost) / Cost.\")\n", ") CLUSTER BY campaign_id;" ] }, { "cell_type": "markdown", "metadata": { "id": "SVC7WZCdGHMj" }, "source": [ "### Define Campaign Functions" ] }, { "cell_type": "markdown", "metadata": { "id": "TRNE1sRcFNgc" }, "source": [ "#### GenerateCampaignRecommendations()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "YHSA5g1JpXJU" }, "outputs": [], "source": [ "# Use tenacity to retry in case of resource exhausted errors (429)\n", "@retry(wait=wait_exponential(multiplier=1, min=1, max=60), stop=stop_after_attempt(10), retry=retry_if_exception(RetryCondition), before_sleep=before_sleep_log(logging.getLogger(), logging.INFO))\n", "def GenerateCampaignRecommendations(campaign_goal, product_name, product_description, budget):\n", "\n", " # For each row in customer_marketing_profile, send this prompt:\n", " prompt = f\"\"\"\n", " You are an expert Marketing Analyst, and you are creating a new marketing campaign for a\n", " French chocolatier and bakery called 'Chocolate AI'.\n", "\n", " Your task is to recommend a campaign for the product that will achieve the campaign goals within budget.\n", "\n", " You should include the following information in your response:\n", " - A creative campaign name.\n", " - A detailed description of the campaign.\n", " - The ideal target audience.\n", " - Ideal marketing channels. This should be optimized for the target audience.\n", " - Expected campaign outcomes. This should map directly to the campaign goals.\n", " - An explanation/justification for the reasoning behind choosing this campaign.\n", "\n", " Your total budget for the campaign is: {budget}\n", "\n", " Here is the goal of the campaign:\n", " {campaign_goal}\n", "\n", " Here is the product you are promoting:\n", " {product_name}: {product_description}\n", "\n", " Additional instructions:\n", " - Results can be creative, but they must be plausible.\n", "\n", " Now generate the campaign recommendation. Think step by step and explain your reasoning.\n", " \"\"\"\n", "\n", " # Ref: https://cloud.google.com/vertex-ai/docs/reference/rest/v1/Schema\n", " response_schema = {}\n", "\n", " response_schema = {\n", " \"type\": \"object\",\n", " \"required\": [\"campaign_name\", \"campaign_description\", \"target_audience\", \"marketing_channels\", \"campaign_outcomes\", \"explanation\"],\n", " \"properties\": {\n", " \"campaign_name\": {\n", " \"type\": \"string\",\n", " \"description\": \"A descriptive and creative name for the campaign.\"\n", " },\n", " \"campaign_description\": {\n", " \"type\": \"string\",\n", " \"description\": \"A detailed description of the campaign.\"\n", " },\n", " \"target_audience\": {\n", " \"type\": \"string\",\n", " \"description\": \"The intended audience of the campaign.\"\n", " },\n", " \"marketing_channels\": {\n", " \"type\": \"string\",\n", " \"description\": \"The recommended marketing channels to be used in the campaign.\"\n", " },\n", " \"campaign_outcomes\": {\n", " \"type\": \"string\",\n", " \"description\": \"The expected outcomes of the campaign.\"\n", " },\n", " \"explanation\": {\n", " \"type\": \"string\",\n", " \"description\": \"An explanation for the reasoning behind choose this campaign.\"\n", " }\n", " }\n", " }\n", "\n", " result = GeminiLLM(prompt, response_schema=response_schema)\n", " return result\n" ] }, { "cell_type": "markdown", "metadata": { "id": "4D2EQRlYFRQQ" }, "source": [ "#### GetRankedProductsMonthly()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "qLWgC201NUXo" }, "outputs": [], "source": [ "# Get lowest-performing products based on month over month sales data\n", "def GetRankedProductsMonthly(count, top_bottom = 'top'):\n", " sort_order = 'DESC' if top_bottom == 'top' else 'ASC'\n", "\n", " sql=f\"\"\"WITH current_month_sales AS (\n", " SELECT\n", " m.menu_name,\n", " m.menu_id,\n", " m.menu_description,\n", " SUM(oi.item_total) AS cumulative_sales\n", " FROM\n", " `${project_id}.${bigquery_chocolate_ai_dataset}.order_item` AS oi\n", " INNER JOIN `${project_id}.${bigquery_chocolate_ai_dataset}.order` AS o ON o.order_id = oi.order_id\n", " INNER JOIN `${project_id}.${bigquery_chocolate_ai_dataset}.menu` AS m ON m.menu_id = oi.menu_id\n", " WHERE o.order_datetime >= TIMESTAMP(DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))\n", " GROUP BY menu_name, m.menu_id, m.menu_description\n", " ORDER BY cumulative_sales\n", " ), prior_month_sales AS (\n", " SELECT\n", " m.menu_name,\n", " m.menu_id,\n", " m.menu_description,\n", " SUM(oi.item_total) AS cumulative_sales\n", " FROM\n", " `${project_id}.${bigquery_chocolate_ai_dataset}.order_item` AS oi\n", " INNER JOIN `${project_id}.${bigquery_chocolate_ai_dataset}.order` AS o ON o.order_id = oi.order_id\n", " INNER JOIN `${project_id}.${bigquery_chocolate_ai_dataset}.menu` AS m ON m.menu_id = oi.menu_id\n", " WHERE o.order_datetime BETWEEN TIMESTAMP(DATE_SUB(CURRENT_DATE(), INTERVAL 60 DAY)) AND TIMESTAMP(DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))\n", " GROUP BY menu_name, m.menu_id, m.menu_description\n", " ORDER BY cumulative_sales\n", " ) SELECT current_month_sales.menu_id,\n", " current_month_sales.menu_name,\n", " current_month_sales.menu_description,\n", " FORMAT('%s €', CAST(ROUND(current_month_sales.cumulative_sales, 0) as STRING)) AS current_month_cumulative_sales,\n", " FORMAT('%s €', CAST(ROUND(prior_month_sales.cumulative_sales, 0) as STRING)) AS prior_month_cumulative_sales,\n", " FORMAT('%s €', CAST(ROUND(current_month_sales.cumulative_sales - prior_month_sales.cumulative_sales, 0) as STRING)) AS change,\n", " ROUND((current_month_sales.cumulative_sales - prior_month_sales.cumulative_sales) / prior_month_sales.cumulative_sales * 100,0) AS change_percent\n", " FROM current_month_sales\n", " JOIN prior_month_sales ON current_month_sales.menu_name = prior_month_sales.menu_name\n", " ORDER BY change_percent {sort_order}\n", " LIMIT {count};\n", " \"\"\"\n", "\n", " return RunQuery(sql)\n" ] }, { "cell_type": "markdown", "metadata": { "id": "htX9elRZFxNF" }, "source": [ "#### GenerateCampaignPerformance(budget)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "tPZfGc8DYQkN" }, "outputs": [], "source": [ "def GenerateCampaignPerformance(budget):\n", " \"\"\"\n", " Generates randomized campaign performance data, which is influence by budget.\n", "\n", " Args:\n", " campaign_id: The ID of the campaign.\n", " budget: The budget for the campaign.\n", "\n", " Returns:\n", " A dictionary containing the performance data.\n", "\n", " Example:\n", " budget = 5000\n", " performance_data = GenerateCampaignPerformance(budget)\n", " print(performance_data)\n", " \"\"\"\n", "\n", " # Generate random scale factors within a range for non-linear relationship\n", " impressions_factor = random.uniform(0.6, 0.8)\n", " reach_factor = random.uniform(0.5, 0.7)\n", " traffic_factor = random.uniform(0.4, 0.6)\n", " leads_factor = random.uniform(0.3, 0.5)\n", " conversions_factor = random.uniform(0.2, 0.4)\n", "\n", " # Generate data points with randomness and scaling\n", " impressions = int(random.uniform(0.8, 1.2) * (budget ** impressions_factor) * 100)\n", " reach = int(random.uniform(0.7, 1.1) * (budget ** reach_factor) * 100)\n", " website_traffic = int(random.uniform(0.6, 1.0) * (budget ** traffic_factor) * 10)\n", " leads_generated = int(random.uniform(0.5, 0.9) * (budget ** leads_factor) * 5)\n", " conversions = int(random.uniform(0.4, 0.8) * (budget ** conversions_factor) * 2)\n", "\n", "\n", " # Cost per click and acquisition (with some randomness)\n", " cost_per_click = round(random.uniform(0.2, 1.5), 2)\n", " cost_per_acquisition = round(random.uniform(5, 50) * (15000 / budget), 2)\n", "\n", " # ROI (mostly positive, but some negative)\n", " roi_sign = random.choices([-1, 1], weights=[0.2, 0.8])[0] # 20% chance of negative ROI\n", " return_on_investment = round(roi_sign * random.uniform(0.5, 10.0), 2)\n", "\n", " return {\n", " \"impressions\": impressions,\n", " \"reach\": reach,\n", " \"website_traffic\": website_traffic,\n", " \"leads_generated\": leads_generated,\n", " \"conversions\": conversions,\n", " \"cost_per_click\": cost_per_click,\n", " \"cost_per_acquisition\": cost_per_acquisition,\n", " \"return_on_investment\": return_on_investment\n", " }" ] }, { "cell_type": "markdown", "metadata": { "id": "pON1Nd6LOLwf" }, "source": [ "### Generate Historical Campaigns" ] }, { "cell_type": "markdown", "metadata": { "id": "p8atiEx6C4zY" }, "source": [ "#### Build Campaigns" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "SYdgrzyUutdb" }, "outputs": [], "source": [ "# Get 52 products to promote for 1-year history\n", "products_to_promote = GetRankedProductsMonthly(52, 'bottom')\n", "\n", "# Define goal array\n", "campaign_goal_array = [\n", " \"Increase sales of the product by 15%.\",\n", " \"Drive 10,000 unique visitors to the product landing page.\",\n", " \"Generate 500 qualified leads for the sales team.\",\n", " \"Boost brand awareness by 20% among the target audience.\",\n", " \"Achieve a 5% click-through rate on online ads.\",\n", " \"Increase social media engagement by 30%.\",\n", " \"Improve customer satisfaction with the product by 10%.\",\n", " \"Gather 200 customer testimonials for marketing materials.\",\n", " \"Increase email sign-up rate by 5%.\",\n", " \"Drive 1,000 app downloads.\",\n", " \"Increase website traffic from organic search by 25%.\",\n", " \"Reduce customer churn rate by 10%.\",\n", " \"Increase average order value by 5%.\",\n", " \"Improve customer lifetime value by 15%.\",\n", " \"Generate 100 product reviews with an average rating of 4.5 stars.\",\n", " \"Increase the number of repeat purchases by 20%.\",\n", " \"Increase conversion rate from leads to customers by 10%.\",\n", " \"Expand into a new target market segment.\",\n", " \"Launch a successful new product line.\",\n", " \"Increase customer loyalty and brand advocacy.\",\n", " \"Improve brand reputation and sentiment.\",\n", " \"Increase share of voice in the industry.\",\n", " \"Become a thought leader in the industry.\",\n", " \"Build a strong online community around the brand.\",\n", " \"Increase customer engagement with email marketing.\",\n", " \"Improve the effectiveness of social media advertising.\",\n", " \"Optimize the customer journey for better conversion rates.\",\n", " \"Increase customer retention through personalized marketing.\",\n", " \"Drive sales through influencer marketing campaigns.\",\n", " \"Increase brand visibility through public relations efforts.\",\n", " \"Generate leads through content marketing initiatives.\",\n", " \"Improve customer experience across all touchpoints.\",\n", " \"Increase customer satisfaction with customer service.\",\n", " \"Reduce customer acquisition costs by 10%.\",\n", " \"Increase return on investment (ROI) for marketing campaigns.\",\n", " \"Improve the efficiency of marketing operations.\",\n", " \"Increase the speed and agility of marketing campaigns.\",\n", " \"Leverage data and analytics to improve marketing decisions.\",\n", " \"Adopt new marketing technologies to enhance performance.\",\n", " \"Increase the effectiveness of marketing automation.\",\n", " \"Improve lead nurturing processes to increase conversion rates.\",\n", " \"Increase sales through affiliate marketing programs.\",\n", " \"Drive traffic and sales through online contests and giveaways.\",\n", " \"Generate leads through webinars and online events.\",\n", " \"Increase brand awareness through sponsorships and partnerships.\",\n", " \"Improve customer communication through personalized messaging.\",\n", " \"Increase customer engagement through interactive content.\",\n", " \"Drive sales through retargeting campaigns.\",\n", " \"Increase customer lifetime value through loyalty programs.\",\n", " \"Build a strong brand identity and messaging.\"\n", "]\n", "\n", "# Define budget array\n", "budget_array = np.linspace(500, 15000, num=52, dtype=int).tolist()\n", "\n", "# Generate campaign recommendations for low-performing products\n", "for row in products_to_promote.itertuples():\n", " campaign_goal = random.choice(campaign_goal_array)\n", " budget = random.choice(budget_array)\n", "\n", "\n", " product_name\t= row.menu_name\n", " product_description = row.menu_description\n", " menu_id = row.menu_id\n", "\n", " result = GenerateCampaignRecommendations(campaign_goal, product_name, product_description, budget)\n", " print(result)\n", "\n", " # Save to database\n", " # Using json.dumps() to avoid unexpected quoting errors on insert\n", " json_result = json.loads(result)\n", " campaign_name = json.dumps(json_result[\"campaign_name\"])\n", " campaign_description = json.dumps(json_result[\"campaign_description\"])\n", " campaign_outcomes = json.dumps(json_result[\"campaign_outcomes\"])\n", " target_audience = json.dumps(json_result[\"target_audience\"])\n", " marketing_channels = json.dumps(json_result[\"marketing_channels\"])\n", " explanation = json.dumps(json_result[\"explanation\"])\n", " campaign_goal = json.dumps(campaign_goal)\n", " budget = budget\n", "\n", " sql=f\"\"\"\n", " INSERT INTO `${project_id}.${bigquery_chocolate_ai_dataset}.campaign`\n", " (campaign_id, campaign_name, campaign_description, campaign_goal, target_audience, marketing_channels, budget, explanation, campaign_outcomes, menu_id)\n", " VALUES\n", " (GENERATE_UUID(), {campaign_name}, {campaign_description}, {campaign_goal}, {target_audience}, {marketing_channels}, {budget}, {explanation}, {campaign_outcomes}, {menu_id});\n", " \"\"\"\n", "\n", " RunQuery(sql)\n" ] }, { "cell_type": "markdown", "metadata": { "id": "TuixAfTMC8n0" }, "source": [ "#### Set Campaign Dates" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "krGo5hbDA4sU" }, "outputs": [], "source": [ "# Set historical campaign dates\n", "sql=\"\"\"SELECT campaign_id, campaign_name, campaign_start_date, campaign_end_date\n", "FROM `${project_id}.${bigquery_chocolate_ai_dataset}.campaign`;\n", "\"\"\"\n", "\n", "campaign_table = RunQuery(sql)\n", "\n", "# Set seed date\n", "seed_start_date = '2024-09-09'\n", "seed_end_date = '2024-09-15'\n", "\n", "# Convert the string dates to datetime objects\n", "new_start_date_dt = datetime.strptime(seed_start_date, '%Y-%m-%d')\n", "new_end_date_dt = datetime.strptime(seed_end_date, '%Y-%m-%d')\n", "\n", "for row in campaign_table.itertuples():\n", " campaign_id = row.campaign_id\n", "\n", " # Update database\n", " sql=f\"\"\"UPDATE `${project_id}.${bigquery_chocolate_ai_dataset}.campaign`\n", " SET campaign_start_date = '{new_start_date_dt.strftime('%Y-%m-%d')}',\n", " campaign_end_date = '{new_end_date_dt.strftime('%Y-%m-%d')}',\n", " campaign_created_date = '{new_start_date_dt.strftime('%Y-%m-%d')}'\n", " WHERE campaign_id = '{campaign_id}';\n", " \"\"\"\n", "\n", " print(f\"Processing row {row.Index} for campaign {campaign_id}\")\n", " result = RunQuery(sql)\n", "\n", " # Subtract one week (7 days) for next iteration\n", " new_start_date_dt = new_start_date_dt - timedelta(weeks=1)\n", " new_end_date_dt = new_end_date_dt - timedelta(weeks=1)\n" ] }, { "cell_type": "markdown", "metadata": { "id": "KkDASQJVDBIP" }, "source": [ "#### Simulate Campaign Performance" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "_hqF97i1adqe" }, "outputs": [], "source": [ "# Generate historical campaign performance\n", "sql=\"\"\"SELECT campaign_id, budget\n", "FROM `${project_id}.${bigquery_chocolate_ai_dataset}.campaign`;\n", "\"\"\"\n", "\n", "campaign_table = RunQuery(sql)\n", "\n", "values = \"\"\n", "gen_campaign_performance = [(row.campaign_id, *[v for v in GenerateCampaignPerformance(row.budget).values()]) for row in campaign_table.itertuples()]\n", "\n", "for row in gen_campaign_performance:\n", " values = values + f\"{row},\\n\"\n", "\n", "values = values[:-2]\n", "\n", "sql=f\"\"\"INSERT INTO `${project_id}.${bigquery_chocolate_ai_dataset}.campaign_performance` (\n", " campaign_id, impressions, reach, website_traffic, leads_generated, conversions, cost_per_click, cost_per_acquisition, return_on_investment)\n", " VALUES {values}\n", "\"\"\"\n", "\n", "result = RunQuery(sql)\n", "print(result)\n" ] }, { "cell_type": "markdown", "metadata": { "id": "ASQ2BPisXDA0" }, "source": [ "## Reference Links \n" ] }, { "cell_type": "markdown", "metadata": { "id": "MNr3FTda-Hgl" }, "source": [ "- [Generate content with the Gemini Enterprise API](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/inference)\n", "- [Controlled Generation with Gemini](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/control-generated-output)" ] } ], "metadata": { "colab": { "collapsed_sections": [ "42IxhtRRrvR-" ], "name": "DB-GMA-Synthetic-Data-Generation-Campaigns.ipynb", "private_outputs": true, "provenance": [], "toc_visible": true }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 0 }

colab-enterprise/Synthetic-Data-Generation-Campaigns.ipynb (986 lines of code) (raw):