colab-enterprise/gen-ai-demo/Event-Generate-Insight-GenAI.ipynb (438 lines of code) (raw):
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "k6eIqerFOzyj"
},
"source": [
"## <img src=\"https://lh3.googleusercontent.com/mUTbNK32c_DTSNrhqETT5aQJYFKok2HB1G2nk2MZHvG5bSs0v_lmDm_ArW7rgd6SDGHXo0Ak2uFFU96X6Xd0GQ=w160-h128\" width=\"45\" valign=\"top\" alt=\"BigQuery\"> Using GenAI to postion our trucks based upon what's happening in the city"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### License"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"##################################################################################\n",
"# Copyright 2024 Google LLC\n",
"#\n",
"# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"# \n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"# \n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License.\n",
"###################################################################################"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Notebook Overview"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- This notebook will use GenAI to determine the best places to locate our coffee trucks in each city based upon the events happening in the city. The LLM can rank the events from the events that will have the highest impact on sales to the events with the lowest impact.\n",
"\n",
"- Notebook Logic:\n",
" 1. Create a row, per city, in the event_gen_ai_insight table. \n",
" - For each city:\n",
" - Query BigQuery for the current days events.\n",
" - BigQuery will return the results as JSON.\n",
" - Our LLM prompt will be created:\n",
" - The prompt will ask for insights to be created\n",
" - The prompt will pass in knowledge about past event types and the effect on profits\n",
" - The prompt will ask the LLM to compute the latitude and longitude of the event address\n",
" - The prompt will ask the LLM convert date that a human would only undertand to a BigQuery datetime.\n",
" - The prompt will ask to rank the events from 1 to {x}.\n",
" - The prompt will ask the LLM to provide an explaination of its thought process.\n",
" - The prompt will provide example JSON results so the LLM knows the output format to generate the results.\n",
" - The prompt will include the events from BigQuery as the context.\n",
"\n",
" - A row will be inserted into the event_gen_ai_insight table.\n",
"\n",
" 2. An update statement will be run on the event_gen_ai_insight table which will call GENERATE_TEXT (text-bison) to execute the LLM prompt. \n",
" - We save our prompt in the table so we can later inspect it or reprocess the prompt.\n",
"\n",
" 3. The results of the GENERATE_TEXT are JSON so the inner JSON is then parsed into the generated_insight_text and generated_insight_json fields.\n",
"\n",
" 4. The data is then displayed."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2FHvD3ffG52M"
},
"source": [
"## Initialize Python"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "h-3IL5aFP6Dl"
},
"outputs": [],
"source": [
"# No need to set these\n",
"city_names=[\"New York City\", \"London\", \"Tokyo\", \"San Francisco\"]\n",
"city_ids=[1,2,3,4]\n",
"city_timezones=[\"US/Eastern\",\"Europe/London\",\"Asia/Tokyo\",\"US/Pacific\"]\n",
"number_of_coffee_trucks = \"4\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "HTNQKUNaRsyO"
},
"outputs": [],
"source": [
"from google.cloud import bigquery\n",
"client = bigquery.Client()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Create the GenAI Insights table"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "jrWuWfUCSZUv"
},
"outputs": [],
"source": [
"%%bigquery\n",
"CREATE TABLE IF NOT EXISTS `${project_id}.${bigquery_data_beans_curated_dataset}.event_gen_ai_insight`\n",
"--CREATE OR REPLACE TABLE `${project_id}.${bigquery_data_beans_curated_dataset}.event_gen_ai_insight` -- only use this to start over\n",
"(\n",
" event_gen_ai_insight_id INTEGER NOT NULL OPTIONS(description=\"Primary key.\"),\n",
" event_gen_ai_insight_type STRING NOT NULL OPTIONS(description=\"The type of insight: WEATHER, EVENT, CAMPAIGN, SOCIAL, WAIT-TIME, SALES, VIDEO-AI\"),\n",
" insight_datetime TIMESTAMP NOT NULL OPTIONS(description=\"The datetime of the GenAI insight.\"),\n",
" applies_to_entity_type STRING NOT NULL OPTIONS(description=\"The type of entity to which the insight applies: Company, City, Truck, Customer\"),\n",
" applies_to_entity_id INTEGER NOT NULL OPTIONS(description=\"The id (primary key) of entity to which the insight applies.\"),\n",
" applies_to_entity_name STRING NOT NULL OPTIONS(description=\"The name to which the insight applies.\"),\n",
" llm_prompt STRING NOT NULL OPTIONS(description=\"The LLM prompt.\"),\n",
" ml_generate_json_result JSON OPTIONS(description=\"The raw JSON output of the LLM.\"),\n",
" generated_insight_text STRING OPTIONS(description=\"The generated insight in text\"),\n",
" generated_insight_json JSON OPTIONS(description=\"The generated insight in JSON\")\n",
")\n",
"CLUSTER BY event_gen_ai_insight_id;"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "fHzPNwSMrBNA"
},
"outputs": [],
"source": [
"%%bigquery\n",
"\n",
"-- remove the current days events (in case you re-run this notebook)\n",
"DELETE\n",
" FROM `${project_id}.${bigquery_data_beans_curated_dataset}.event_gen_ai_insight` AS event_gen_ai_insight\n",
" WHERE CAST(insight_datetime AS DATE) = (SELECT CAST(MAX(insight_datetime) AS DATE) FROM `${project_id}.${bigquery_data_beans_curated_dataset}.event_gen_ai_insight`);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Supporting Functions"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "GZ18slUFRpjG"
},
"outputs": [],
"source": [
"def RunQuery(sql):\n",
" import time\n",
"\n",
" if (sql.startswith(\"SELECT\") or sql.startswith(\"WITH\")):\n",
" df_result = client.query(sql).to_dataframe()\n",
" return df_result\n",
" else:\n",
" job_config = bigquery.QueryJobConfig(priority=bigquery.QueryPriority.INTERACTIVE)\n",
" query_job = client.query(sql, job_config=job_config)\n",
"\n",
" # Check on the progress by getting the job's updated state.\n",
" query_job = client.get_job(\n",
" query_job.job_id, location=query_job.location\n",
" )\n",
" print(\"Job {} is currently in state {} with error result of {}\".format(query_job.job_id, query_job.state, query_job.error_result))\n",
"\n",
" while query_job.state != \"DONE\":\n",
" time.sleep(2)\n",
" query_job = client.get_job(\n",
" query_job.job_id, location=query_job.location\n",
" )\n",
" print(\"Job {} is currently in state {} with error result of {}\".format(query_job.job_id, query_job.state, query_job.error_result))\n",
"\n",
" if query_job.error_result == None:\n",
" return True\n",
" else:\n",
" return False"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "qNUW0MFiUOFy"
},
"outputs": [],
"source": [
"def GetNextPrimaryKey(fully_qualified_table_name, field_name):\n",
" sql = f\"\"\"\n",
" SELECT IFNULL(MAX({field_name}),0) AS result\n",
" FROM `{fully_qualified_table_name}`\n",
" \"\"\"\n",
" # print(sql)\n",
" df_result = client.query(sql).to_dataframe()\n",
" # display(df_result)\n",
" return df_result['result'].iloc[0] + 1"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZrQ_gOebR2vX"
},
"source": [
"## Create the LLM prompt for each city and insert into or GenAI Insights table."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "zlsrcCAmAGqY"
},
"outputs": [],
"source": [
"# Get the next primary key\n",
"event_gen_ai_insight_id = GetNextPrimaryKey(\"${project_id}.${bigquery_data_beans_curated_dataset}.event_gen_ai_insight\",\"event_gen_ai_insight_id\")\n",
"\n",
"# Loop for each city\n",
"for city_index in range(0, 4):\n",
" print(f\"Processing city: {city_ids[city_index]}\")\n",
"\n",
" sql=f\"\"\"WITH event_data AS\n",
" (\n",
" SELECT DISTINCT event_id, event_title, event_venue, event_address, event_description\n",
" FROM `${project_id}.${bigquery_data_beans_curated_dataset}.event`\n",
" WHERE city_id = {city_ids[city_index]}\n",
" AND event_date = (SELECT MAX(event_date) FROM `${project_id}.${bigquery_data_beans_curated_dataset}.event`)\n",
" )\n",
" SELECT TO_JSON(t) AS event_json\n",
" FROM event_data AS t\n",
" ORDER BY GENERATE_UUID()\n",
" \"\"\"\n",
"\n",
" # Run the SQL to get the events for this city\n",
" print(f\"Gathering events for: {city_names[city_index]}\")\n",
" event_df = RunQuery(sql)\n",
"\n",
" # Loop through the dataframe (we want a JSON string for our prompt)\n",
" event_json = \"\"\n",
" for index, row in event_df.iterrows():\n",
" event_json = event_json + row['event_json'] + \"\\n\"\n",
"\n",
" # Create the LLM prompt\n",
"\n",
" # Example return JSON (without the python f\"\" formatting)\n",
" json_example_data='[ { \"rank\" : 1, \"event_id\" : 4, \"latitude\" : 40.736874, \"longitude\": -73.985394, ' + \\\n",
" '\"explanation\" : \"This will increase traffic since this is a family event.\" }, ' + \\\n",
" '{ \"rank\" : 2, \"event_id\" : 6, \"latitude\" : 41.723124, \"longitude\": -72.142231, ' + \\\n",
" '\"explanation\" : \"This will increase traffic since this is a family event.\" } ]'\n",
"\n",
" llm_prompt=f\"\"\"You run a fleet of {number_of_coffee_trucks} coffee trucks in {city_names[city_index]} and want to maximize your profits.\n",
" The below is a list of events that are happening tonight.\n",
" Think outside the box and develop a unique and unexpected solution for each location.\n",
" Encourage unconventional ideas and fresh perspectives in your recommendations.\" \n",
" Return the results in JSON with no special characters or formatting.\n",
" Use the field \"event_id\" from the events and place the result in the field \"event_id\".\n",
" Rank the events from the best to the lowest and place in the field \"rank\".\n",
" Compute the latitude of the field \"event_address\" and place the result in the field \"latitude\".\n",
" Compute the longitude of the field \"event_address\" and place the result in the field \"longitude\".\n",
" Explain your logic and place the result in the field \"explanation\".\n",
"\n",
" Example Return Data:\n",
" {json_example_data}\n",
"\n",
" Events:\n",
" {event_json}\n",
" \"\"\"\n",
"\n",
" print(f\"Created LLM Prompt for: {city_names[city_index]}\")\n",
"\n",
" # Clean illegal SQL characters from Google Events API\n",
" prompt=llm_prompt.replace(\"\\\\_\",\"\")\n",
" prompt=llm_prompt.replace(\"\\\\m\",\"\")\n",
"\n",
" # Insert new row\n",
" sql=f\"\"\"INSERT INTO `${project_id}.${bigquery_data_beans_curated_dataset}.event_gen_ai_insight`\n",
" (event_gen_ai_insight_id, event_gen_ai_insight_type, insight_datetime, applies_to_entity_type, applies_to_entity_id, applies_to_entity_name, llm_prompt)\n",
" VALUES({event_gen_ai_insight_id}, 'EVENT', CURRENT_TIMESTAMP(), 'CITY', {city_ids[city_index]}, '{city_names[city_index]}',\\\"\\\"\\\"{prompt}\\\"\\\"\\\")\n",
" \"\"\".replace(\" \",\"\")\n",
"\n",
" RunQuery(sql)\n",
"\n",
" # Get the next primary key\n",
" event_gen_ai_insight_id = event_gen_ai_insight_id + 1\n",
"\n",
" print(f\"Inserted prompt into event_gen_ai_insight with event_gen_ai_insight_id = {event_gen_ai_insight_id}\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PampTfgdRfqe"
},
"source": [
"## Process the LLM prompt with BigQuery"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "tbVKnddVSPKH"
},
"outputs": [],
"source": [
"# Run the LLM\n",
"temperature=.75\n",
"max_output_tokens=8192\n",
"top_p=.8\n",
"top_k=30\n",
"\n",
"sql=f\"\"\"UPDATE `${project_id}.${bigquery_data_beans_curated_dataset}.event_gen_ai_insight` AS event_gen_ai_insight\n",
" SET insight_datetime = CURRENT_TIMESTAMP(),\n",
" ml_generate_json_result = llm_query.ml_generate_text_result\n",
" FROM (SELECT *\n",
" FROM ML.GENERATE_TEXT(MODEL`${project_id}.${bigquery_data_beans_curated_dataset}.gemini_model`,\n",
" (SELECT event_gen_ai_insight_id,\n",
" llm_prompt AS prompt\n",
" FROM `${project_id}.${bigquery_data_beans_curated_dataset}.event_gen_ai_insight`\n",
" WHERE (ml_generate_json_result IS NULL\n",
" OR\n",
" JSON_VALUE(ml_generate_json_result, '$.candidates[0].content') IS NULL\n",
" )\n",
" ),\n",
" STRUCT(\n",
" {temperature} AS temperature,\n",
" {max_output_tokens} AS max_output_tokens,\n",
" {top_p} AS top_p,\n",
" {top_k} AS top_k\n",
" ))\n",
" ) AS llm_query\n",
" WHERE event_gen_ai_insight.event_gen_ai_insight_id = llm_query.event_gen_ai_insight_id;\n",
"\"\"\"\n",
"RunQuery(sql)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "qAaHGZKye7LV"
},
"source": [
"## Parse and view the results"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "sKYAOE6XDMHY"
},
"outputs": [],
"source": [
"%%bigquery\n",
"\n",
"# Parse the LLM results into json fields\n",
"UPDATE `${project_id}.${bigquery_data_beans_curated_dataset}.event_gen_ai_insight`\n",
" SET generated_insight_text = `${project_id}.${bigquery_data_beans_curated_dataset}.gemini_model_result_as_string`(ml_generate_json_result),\n",
" generated_insight_json = `${project_id}.${bigquery_data_beans_curated_dataset}.gemini_model_result_as_json`(ml_generate_json_result)\n",
" WHERE generated_insight_text IS NULL\n",
" --AND event_gen_ai_insight_id = 3"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Qad7nxKrcqgE"
},
"outputs": [],
"source": [
"%%bigquery\n",
"\n",
"# Display the results\n",
"SELECT event.event_id, event.event_date, event.event_title, event.event_venue, event.event_time_string,\n",
" event_gen_ai_insight.applies_to_entity_name, json_data.rank,\n",
" json_data.latitude, json_data.longitude, CAST(JSON_VALUE(json_data.explanation) AS STRING) AS explanation\n",
" FROM `${project_id}.${bigquery_data_beans_curated_dataset}.event_gen_ai_insight` AS event_gen_ai_insight\n",
" CROSS JOIN UNNEST(JSON_EXTRACT_ARRAY(event_gen_ai_insight.generated_insight_json)) AS json_data\n",
" INNER JOIN `${project_id}.${bigquery_data_beans_curated_dataset}.event` AS event\n",
" ON CAST(JSON_VALUE(json_data,'$.event_id') AS INT64) = event.event_id\n",
" WHERE event.event_date = (SELECT MAX(event_date) FROM `${project_id}.${bigquery_data_beans_curated_dataset}.event`)\n",
"ORDER BY applies_to_entity_name, CAST(JSON_VALUE(json_data.rank) AS INT64);"
]
}
],
"metadata": {
"colab": {
"collapsed_sections": [
"k6eIqerFOzyj",
"2FHvD3ffG52M",
"KOXpog83RlPG"
],
"name": "BigQuery table",
"private_outputs": true,
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}