qwiklabs/colab-enterprise/gen-ai-demo/Customer-Reviews-Generate-Insight-GenAI.ipynb (517 lines of code) (raw):

{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "k6eIqerFOzyj" }, "source": [ "## <img src=\"https://lh3.googleusercontent.com/mUTbNK32c_DTSNrhqETT5aQJYFKok2HB1G2nk2MZHvG5bSs0v_lmDm_ArW7rgd6SDGHXo0Ak2uFFU96X6Xd0GQ=w160-h128\" width=\"45\" valign=\"top\" alt=\"BigQuery\"> Generating Business Insights for our Customer Review data using GenAI\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### License" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "##################################################################################\n", "# Copyright 2024 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "# \n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "# \n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License.\n", "###################################################################################" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Notebook Overview" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- This notebook will use GenAI to generate insights from our Customer Review data. The customer reviews have had their sentiment determined and the theme(s) (\"Good value\", \"Long Wait Time\") extracted. This data will be aggregated for the past 7 days and then passed to the LLM for analysis. The LLM can provide insights and ideas on how to improve our business. We are using the LLM like a consultant as well as to generate fresh ideas.\n", "\n", "- Notebook Logic:\n", " 1. Create a row, per city, in the customer_review_gen_ai_insight table. \n", " - For each city:\n", " - The sentiment and themes will be aggregated for the past 7 days by calling BigQuery.\n", " - BigQuery will return the results as JSON.\n", " - Our LLM prompt will be created:\n", " - The prompt will ask for insights to be created.\n", " - The prompt will ask for unconventional ideas.\n", " - The prompt will ask the LLM to provide an explaination of its thought process.\n", " - The prompt will provide example JSON results so the LLM knows the output format to generate the results.\n", " - The prompt will include the results from BigQuery as the context.\n", " - The prompt will contain an explaination of the BigQuery data.\n", " - A row will be inserted into the customer_review_gen_ai_insight table.\n", "\n", " 2. An update statement will be run on the customer_review_gen_ai_insight table which will call GENERATE_TEXT (gemini-pro) to execute the LLM prompt. \n", " - We save our prompt in the table so we can later inspect it or reprocess the prompt.\n", "\n", " 3. The results of the GENERATE_TEXT are JSON so the inner JSON is then parsed into the generated_insight_text and generated_insight_json fields.\n", "\n", " 4. The data is then displayed." ] }, { "cell_type": "markdown", "metadata": { "id": "2FHvD3ffG52M" }, "source": [ "## Initialize Python" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "h-3IL5aFP6Dl" }, "outputs": [], "source": [ "# The four cities we want to process.\n", "city_names=[\"New York City\", \"London\", \"Tokyo\", \"San Francisco\"]\n", "city_ids=[1,2,3,4]\n", "number_of_coffee_trucks = \"4\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "HTNQKUNaRsyO" }, "outputs": [], "source": [ "from google.cloud import bigquery\n", "client = bigquery.Client()" ] }, { "cell_type": "markdown", "metadata": { "id": "KOXpog83RlPG" }, "source": [ "## Create the GenAI Insights table" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "jrWuWfUCSZUv" }, "outputs": [], "source": [ "%%bigquery\n", "CREATE TABLE IF NOT EXISTS `${project_id}.${bigquery_data_beans_curated_dataset}.customer_review_gen_ai_insight`\n", "--CREATE OR REPLACE TABLE `${project_id}.${bigquery_data_beans_curated_dataset}.customer_review_gen_ai_insight` -- only use to replace the entire table\n", "(\n", " customer_review_gen_ai_insight_id INTEGER NOT NULL OPTIONS(description=\"Primary key.\"),\n", " customer_review_gen_ai_insight_type STRING NOT NULL OPTIONS(description=\"The type of insight: WEATHER, EVENT, CAMPAIGN, SOCIAL, WAIT-TIME, SALES, VIDEO-AI\"),\n", " insight_datetime TIMESTAMP NOT NULL OPTIONS(description=\"The datetime of the GenAI insight.\"),\n", " applies_to_entity_type STRING NOT NULL OPTIONS(description=\"The type of entity to which the insight applies: Company, City, Truck, Customer\"),\n", " applies_to_entity_id INTEGER NOT NULL OPTIONS(description=\"The id (primary key) of entity to which the insight applies.\"),\n", " applies_to_entity_name STRING NOT NULL OPTIONS(description=\"The name to which the insight applies.\"),\n", " llm_prompt STRING NOT NULL OPTIONS(description=\"The LLM prompt.\"),\n", " ml_generate_json_result JSON OPTIONS(description=\"The raw JSON output of the LLM.\"),\n", " generated_insight_text STRING OPTIONS(description=\"The generated insight in text\"),\n", " generated_insight_json JSON OPTIONS(description=\"The generated insight in JSON\")\n", ")\n", "CLUSTER BY customer_review_gen_ai_insight_id;" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "O2ESzvG8uuUq" }, "outputs": [], "source": [ "%%bigquery\n", "\n", "-- Remove the current days items (so you can re-run)\n", "DELETE\n", " FROM `${project_id}.${bigquery_data_beans_curated_dataset}.customer_review_gen_ai_insight`\n", " WHERE CAST(insight_datetime AS DATE) = CURRENT_DATE()\n", " AND customer_review_gen_ai_insight_type = 'SOCIAL-CREATIVE';" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Supporting Functions" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "GZ18slUFRpjG" }, "outputs": [], "source": [ "# Runs a query against BigQuery and waits for it to complete\n", "def RunQuery(sql):\n", " import time\n", "\n", " if (sql.startswith(\"SELECT\") or sql.startswith(\"WITH\")):\n", " df_result = client.query(sql).to_dataframe()\n", " return df_result\n", " else:\n", " job_config = bigquery.QueryJobConfig(priority=bigquery.QueryPriority.INTERACTIVE)\n", " query_job = client.query(sql, job_config=job_config)\n", "\n", " # Check on the progress by getting the job's updated state.\n", " query_job = client.get_job(\n", " query_job.job_id, location=query_job.location\n", " )\n", " print(\"Job {} is currently in state {} with error result of {}\".format(query_job.job_id, query_job.state, query_job.error_result))\n", "\n", " while query_job.state != \"DONE\":\n", " time.sleep(2)\n", " query_job = client.get_job(\n", " query_job.job_id, location=query_job.location\n", " )\n", " print(\"Job {} is currently in state {} with error result of {}\".format(query_job.job_id, query_job.state, query_job.error_result))\n", "\n", " if query_job.error_result == None:\n", " return True\n", " else:\n", " return False" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "qNUW0MFiUOFy" }, "outputs": [], "source": [ "# Since our Primary keys are INTs we get the next available value\n", "def GetNextPrimaryKey(fully_qualified_table_name, field_name):\n", " sql = f\"\"\"\n", " SELECT IFNULL(MAX({field_name}),0) AS result\n", " FROM `{fully_qualified_table_name}`\n", " \"\"\"\n", " # print(sql)\n", " df_result = client.query(sql).to_dataframe()\n", " # display(df_result)\n", " return df_result['result'].iloc[0] + 1" ] }, { "cell_type": "markdown", "metadata": { "id": "ZrQ_gOebR2vX" }, "source": [ "## Create the LLM prompt for each city and insert into or GenAI Insights table." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "zlsrcCAmAGqY" }, "outputs": [], "source": [ "# Get the next primary key\n", "customer_review_gen_ai_insight_id = GetNextPrimaryKey(\"${project_id}.${bigquery_data_beans_curated_dataset}.customer_review_gen_ai_insight\",\"customer_review_gen_ai_insight_id\")\n", "\n", "# Loop for each city\n", "for city_index in range(0, 4):\n", " print(f\"Processing city: {city_ids[city_index]}\")\n", "\n", " ##############################################################################\n", " # Get items to process\n", " ##############################################################################\n", " sql=f\"\"\"WITH max_data AS\n", " (\n", " SELECT MAX(review_datetime) AS max_review_datetime\n", " FROM `${project_id}.${bigquery_data_beans_curated_dataset}.customer_review` AS customer_review\n", " INNER JOIN `${project_id}.${bigquery_data_beans_curated_dataset}.city` AS city\n", " ON city.city_id = {city_ids[city_index]}\n", " )\n", " , theme_data AS\n", " (\n", " SELECT customer_review.location_id,\n", " location_history.address,\n", " CAST(JSON_VALUE(themes,'$.theme') AS STRING) AS review_theme,\n", " customer_review.review_sentiment,\n", " COUNT(*) AS theme_count\n", " FROM `${project_id}.${bigquery_data_beans_curated_dataset}.customer_review` AS customer_review\n", " CROSS JOIN UNNEST(JSON_QUERY_ARRAY(llm_detected_theme,'$')) AS themes\n", " CROSS JOIN max_data\n", " INNER JOIN `${project_id}.${bigquery_data_beans_curated_dataset}.city` AS city\n", " ON city.city_id = {city_ids[city_index]}\n", " INNER JOIN `${project_id}.${bigquery_data_beans_curated_dataset}.location_history` AS location_history\n", " ON customer_review.location_id = location_history.location_id\n", " AND customer_review.review_datetime BETWEEN location_history.start_datetime AND location_history.stop_datetime\n", " WHERE customer_review.review_datetime BETWEEN TIMESTAMP_SUB(max_data.max_review_datetime, INTERVAL 7 DAY)\n", " AND max_data.max_review_datetime\n", " GROUP BY 1,2,3,4\n", " )\n", " SELECT TO_JSON(theme_data) AS customer_review_json\n", " FROM theme_data\n", " ORDER BY location_id;\n", " \"\"\"\n", "\n", " # Run the SQL to get the customer reviews\n", " print(f\"Gathering data for: {city_names[city_index]}\")\n", " customer_reviews_df = RunQuery(sql)\n", "\n", " # Loop through the dataframe (we want a JSON string for our prompt)\n", " customer_reviews_json = \"\"\n", " for index, row in customer_reviews_df.iterrows():\n", " customer_reviews_json = customer_reviews_json + row['customer_review_json'] + \"\\n\"\n", "\n", " ##############################################################################\n", " # Create the LLM prompt\n", " ##############################################################################\n", "\n", " # Example return JSON (without the python f\"\" formatting)\n", " json_example_data='[ { \"location_id\" : 1, \"recommended_action\": \"Move the truck to the following address.\", \"address\" : \"9031 Spring St. New York, NY 10033\", \"explanation\" : \"This location has lots of seating.\" }, { \"location_id\" : 2, \"recommended_action\": \"Train the employees.\", \"address\" : \"1 Queen Street London E62 8YT\", \"explanation\" : \"The lack of service means we need additional training.\" } ]'\n", "\n", " # Use this prompt for Creative responses.\n", " CREATIVE_llm_prompt=f\"\"\"You run a fleet of 4 coffee trucks in {city_names[city_index]}.\n", "Below is a list of customer reviews.\n", "Based upon the customer reviews determine the recommended action for each location.\n", "Mutate the recommended action to be as unconventional and surprising as possible.\n", "Think outside the box and develop a unique and unexpected solution for each location.\n", "Encourage unconventional ideas and fresh perspectives in your recommendations.\"\n", "Embrace unconventional ideas and mutate the recommended action in a way that surprises and inspires unique variations.\n", "The recommendation should be up to 600 words.\n", "Return the results in JSON with no special characters or formatting.\n", "Place the location_id in the field \"location_id\".\n", "Place the address in the field \"address\".\n", "Compute the recommended action for each location and place the result in the field \"recommended_action\".\n", "Explain your logic and place the result in the field \"explanation\".\n", "\n", "Example Return Data:\n", "{json_example_data}\n", "\n", "Context:\n", "- For the provided Customer Reviews here are what each of the fields mean:\n", " - \"address\" - The location of the coffee truck at the time of the review.\n", " - \"location_id\" - This identifies each truck.\n", " - \"review_theme\" - This is the review theme extracted from the customer review text.\n", " - \"theme_count\" - This is the number of reviews with the same theme.\n", " - \"review_sentiment\" - This will be sentiment of Postive, Neutral or Negative.\n", "\n", "Customer Reviews:\n", "{customer_reviews_json}\n", "\"\"\"\n", " \n", " # Use this prompt for more factual responses.\n", " FACTUAL_llm_prompt=f\"\"\"You run a fleet of 4 coffee trucks in {city_names[city_index]}.\n", "Below is a list of customer reviews.\n", "Based upon the customer reviews determine the recommended action for each location.\n", "Be very specific about the actions you recommend.\n", "Actions should improve an negative reviews and reenforce positive review behaviour.\n", "The recommendation should be up to 600 words.\n", "Return the results in JSON with no special characters or formatting.\n", "Place the location_id in the field \"location_id\".\n", "Place the address in the field \"address\".\n", "Compute the recommended action for each location and place the result in the field \"recommended_action\".\n", "Explain your logic and place the result in the field \"explanation\".\n", "\n", "Example Return Data:\n", "{json_example_data}\n", "\n", "Context:\n", "- For the provided Customer Reviews here are what each of the fields mean:\n", " - \"address\" - The location of the coffee truck at the time of the review.\n", " - \"location_id\" - This identifies each truck.\n", " - \"review_theme\" - This is the review theme extracted from the customer review text.\n", " - \"theme_count\" - This is the number of reviews with the same theme.\n", " - \"review_sentiment\" - This will be sentiment of Postive, Neutral or Negative.\n", "\n", "Customer Reviews:\n", "{customer_reviews_json}\n", "\"\"\"\n", "\n", "\n", " print(f\"Created LLM Prompt for: {city_names[city_index]}\")\n", "\n", " # Insert new row\n", " sql=f\"\"\"INSERT INTO `${project_id}.${bigquery_data_beans_curated_dataset}.customer_review_gen_ai_insight`\n", " (customer_review_gen_ai_insight_id, customer_review_gen_ai_insight_type, insight_datetime, applies_to_entity_type, applies_to_entity_id, applies_to_entity_name, llm_prompt)\n", " VALUES({customer_review_gen_ai_insight_id}, 'SOCIAL-CREATIVE', CURRENT_TIMESTAMP(), 'CITY', {city_ids[city_index]}, '{city_names[city_index]}',\\\"\\\"\\\"{CREATIVE_llm_prompt}\\\"\\\"\\\")\n", " \"\"\".replace(\" \",\"\")\n", "\n", " RunQuery(sql)\n", "\n", " # Get the next primary key\n", " customer_review_gen_ai_insight_id = customer_review_gen_ai_insight_id + 1\n", "\n", " print(f\"Inserted prompt into customer_review_gen_ai_insight with customer_review_gen_ai_insight_id = {customer_review_gen_ai_insight_id}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "PampTfgdRfqe" }, "source": [ "## Process the LLM prompt with BigQuery" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "tbVKnddVSPKH" }, "outputs": [], "source": [ "# Run the LLM\n", "temperature=.75\n", "max_output_tokens=8192\n", "top_p=.8\n", "top_k=30\n", "\n", "sql=f\"\"\"UPDATE `${project_id}.${bigquery_data_beans_curated_dataset}.customer_review_gen_ai_insight` AS customer_review_gen_ai_insight\n", " SET insight_datetime = CURRENT_TIMESTAMP(),\n", " ml_generate_json_result = llm_query.ml_generate_text_result\n", " FROM (SELECT *\n", " FROM ML.GENERATE_TEXT(MODEL`${project_id}.${bigquery_data_beans_curated_dataset}.gemini_model`,\n", " (SELECT customer_review_gen_ai_insight_id,\n", " llm_prompt AS prompt\n", " FROM `${project_id}.${bigquery_data_beans_curated_dataset}.customer_review_gen_ai_insight`\n", " WHERE (ml_generate_json_result IS NULL\n", " OR\n", " JSON_VALUE(ml_generate_json_result, '$.candidates[0].content') IS NULL\n", " )\n", " AND customer_review_gen_ai_insight_type = 'SOCIAL-CREATIVE'\n", " ),\n", " STRUCT(\n", " {temperature} AS temperature,\n", " {max_output_tokens} AS max_output_tokens,\n", " {top_p} AS top_p,\n", " {top_k} AS top_k\n", " ))\n", " ) AS llm_query\n", " WHERE customer_review_gen_ai_insight.customer_review_gen_ai_insight_id = llm_query.customer_review_gen_ai_insight_id;\n", "\"\"\"\n", "RunQuery(sql)" ] }, { "cell_type": "markdown", "metadata": { "id": "qAaHGZKye7LV" }, "source": [ "## Parse and view the results" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "sKYAOE6XDMHY" }, "outputs": [], "source": [ "%%bigquery\n", "\n", "# Parse the LLM results into json fields\n", "UPDATE `${project_id}.${bigquery_data_beans_curated_dataset}.customer_review_gen_ai_insight`\n", " SET generated_insight_text = `${project_id}.${bigquery_data_beans_curated_dataset}.gemini_model_result_as_string`(ml_generate_json_result),\n", " generated_insight_json = `${project_id}.${bigquery_data_beans_curated_dataset}.gemini_model_result_as_json`(ml_generate_json_result)\n", " WHERE generated_insight_text IS NULL\n", " AND customer_review_gen_ai_insight_type = 'SOCIAL-CREATIVE'\n", " --AND customer_review_gen_ai_insight_id = 2" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "RCCogPjFSPHO" }, "outputs": [], "source": [ "%%bigquery\n", "\n", "# Display the results\n", "SELECT customer_review_gen_ai_insight_id, generated_insight_text, generated_insight_json\n", " FROM `${project_id}.${bigquery_data_beans_curated_dataset}.customer_review_gen_ai_insight`\n", " WHERE CAST(insight_datetime AS DATE) = CURRENT_DATE()\n", " AND customer_review_gen_ai_insight_type = 'SOCIAL-CREATIVE'\n", " ORDER BY customer_review_gen_ai_insight_id" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Qad7nxKrcqgE" }, "outputs": [], "source": [ "%%bigquery\n", "\n", "-- Display the results (unnest each location)\n", "SELECT applies_to_entity_name AS city_name,\n", " CAST(JSON_VALUE(json_data.location_id) AS INT) AS location_id,\n", " CAST(JSON_VALUE(json_data.address) AS STRING) AS address,\n", " CAST(JSON_VALUE(json_data.recommended_action) AS STRING) AS recommended_action,\n", " CAST(JSON_VALUE(json_data.explanation) AS STRING) AS explanation\n", " FROM `${project_id}.${bigquery_data_beans_curated_dataset}.customer_review_gen_ai_insight` AS customer_review_gen_ai_insight\n", " CROSS JOIN UNNEST(JSON_EXTRACT_ARRAY(customer_review_gen_ai_insight.generated_insight_json)) AS json_data\n", " WHERE CAST(insight_datetime AS DATE) = CURRENT_DATE()\n", " AND customer_review_gen_ai_insight_type = 'SOCIAL-CREATIVE'\n", " ORDER BY applies_to_entity_name, location_id, address" ] } ], "metadata": { "colab": { "collapsed_sections": [ "k6eIqerFOzyj", "2FHvD3ffG52M" ], "name": "BigQuery table", "private_outputs": true, "provenance": [] }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 0 }