retail/recommendation-system/bqml/bqml_retail_recommendation

{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "ur8xi4C7S06n" }, "outputs": [], "source": [ "# Copyright 2020 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "eHLV0D7Y5jtU" }, "source": [ "<table align=\"left\">\n", " <td>\n", " <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/analytics-componentized-patterns/blob/master/notebooks/bqml_matrix_factorization_retail_ecommerce.ipynb\">\n", " <img src=\"https://cloud.google.com/ml-engine/images/colab-logo-32px.png\" alt=\"Colab logo\"> Run in Colab\n", " </a>\n", " </td>\n", " <td>\n", " <a href=\"https://console.cloud.google.com/ai-platform/notebooks/deploy-notebook?name=BigQuery%20ML%20-%20Retail%20Recommendation%20System&download_url=https%3A%2F%2Fgithub.com%2FGoogleCloudPlatform%2Fanalytics-componentized-patterns%2Fraw%2Fmaster%2Fretail%2Frecommendation-system%2Fbqml%2Fbqml_retail_recommendation_system.ipynb&url=https%3A%2F%2Fgithub.com%2FGoogleCloudPlatform%2Fanalytics-componentized-patterns%2Fblob%2Fmaster%2Fretail%2Frecommendation-system%2Fbqml%2Fbqml_retail_recommendation_system.ipynb\">\n", " <img src=\"https://cloud.google.com/images/products/ai/ai-solutions-icon.svg\" alt=\"AI Platform Notebooks\">Run on AI Platform Notebooks</a>\n", " </td>\n", " <td>\n", " <a href=\"https://github.com/GoogleCloudPlatform/analytics-componentized-patterns/blob/master/notebooks/bqml_matrix_factorization_retail_ecommerce.ipynb\">\n", " <img src=\"https://cloud.google.com/ml-engine/images/github-logo-32px.png\" alt=\"GitHub logo\">\n", " View on GitHub\n", " </a>\n", " </td>\n", "</table>" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "tvgnzT1CKxrO" }, "source": [ "# Overview\n", "In this notebook, you’ll learn how to build a product recommendation system in a retail scenario using matrix factorization, and how to use the predicted recommendations to drive marketing activation.\n", "\n", "#### Why are recommendation systems so important?\n", "\n", "The majority of consumers today expect personalization — to see products and services relevant to their interests. Naturally, they can help businesses too. By learning from user behaviours and preferences, businesses can deliver their recommendations in a variety of ways, including personalized coupons, marketing emails, and search results, or targeted ads. Ultimately, this enables businesses to attract more customer spending with targeted cross-selling or upselling, while reducing unnecessary costs by marketing irrelevant products.\n", "\n", ">>\n", "_\"Companies that fail to show customers they know them and their buying preferences risk losing business to competitors who are more attuned to what their customers want.\"_ \n", ">_Harvard Business Review. “The Age of Personalization”. September 2018_\n", "\n", "\n", "#### How does matrix factorization work?\n", "Based on user preferences, matrix factorization (collaborative filtering) is one of the most common and effective methods of creating recommendation systems. For more information about how they work, see [this introduction to recommendation systems here](https://developers.google.com/machine-learning/recommendation/collaborative/matrix). \n", "\n", "#### What is BigQuery ML?\n", "[BigQuery ML](https://cloud.google.com/bigquery-ml/docs/bigqueryml-intro) enables users to create and execute machine learning models in BigQuery by using standard SQL queries. This means, if your data is already in BigQuery, you don’t need to export your data to train and deploy machine learning models — by training, you’re also deploying in the same step. Combined with BigQuery’s auto-scaling of compute resources, you won’t have to worry about spinning up a cluster or building a model training and deployment pipeline. This means you’ll be saving time building your machine learning pipeline, enabling your business to focus more on the value of machine learning instead of spending time setting up the infrastructure.\n", "\n", "You may have also heard of [Recommendations AI](https://cloud.google.com/recommendations), a Google Cloud product purpose-built for real-time recommendations on a website using state-of-the-art deep learning models. Matrix factorization with BigQuery ML, on the other hand, is a more generic ML algorithm that can be used for offline and online recommendations (e.g. personalized e-mail campaigns).\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "5Sym0_-GsDkd" }, "source": [ "## Scope of this notebook\n", "### Dataset\n", "\n", "The [Google Analytics Sample](https://console.cloud.google.com/marketplace/details/obfuscated-ga360-data/obfuscated-ga360-data?filter=solution-type:dataset) dataset, which is hosted publicly on BigQuery, is a dataset that provides 12 months (August 2016 to August 2017) of obfuscated Google Analytics 360 data from the [Google Merchandise Store](https://www.googlemerchandisestore.com/), a real e-commerce store that sells Google-branded merchandise.\n", "\n", "### Objective\n", "\n", "By the end of this notebook, you will know how to:\n", "* _pre-process data_ into the correct format needed to create a recommender system using BigQuery ML\n", "* _train (and deploy) the matrix factorization model_ in BigQuery ML\n", "* _evaluate the model_\n", "* _make predictions using the model_\n", "* _take action on the predicted recommendations:_\n", " * for activation via Google Ads, Display & Video 360 and Search Ads 360\n", " * for activation via emails\n", " * export predictions to a pandas dataframe\n", " * export predictions into Google Cloud Storage\n", " \n", "\n", "### Costs \n", "\n", "This tutorial uses billable components of Google Cloud Platform (GCP):\n", "\n", "* BigQuery\n", "* BigQuery ML\n", "\n", "Learn about [BigQuery pricing](https://cloud.google.com/bigquery/pricing), [BigQuery ML\n", "pricing](https://cloud.google.com/bigquery-ml/pricing) and use the [Pricing\n", "Calculator](https://cloud.google.com/products/calculator/)\n", "to generate a cost estimate based on your projected usage." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "i7EUnXsZhAGF" }, "source": [ "## PIP Install Packages and dependencies" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "wyy5Lbnzg5fi" }, "outputs": [], "source": [ "!pip install google-cloud-bigquery" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "9luQpONrzPb6" }, "outputs": [], "source": [ "# Automatically restart kernel after installs\n", "import IPython\n", "app = IPython.Application.instance()\n", "app.kernel.do_shutdown(True) " ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "BF1j6f9HApxa" }, "source": [ "### Set up your GCP project\n", "\n", "_The following steps are required, regardless of your notebook environment._\n", "\n", "1. [Select or create a GCP project.](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.\n", "\n", "1. [Make sure that billing is enabled for your project.](https://cloud.google.com/billing/docs/how-to/modify-project)\n", "\n", "1. Enter your project ID and region in the cell below. Then run the cell to make sure the\n", "Cloud SDK uses the right project for all the commands in this notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "dgWHxHS-sDkl" }, "outputs": [], "source": [ "PROJECT_ID = \"your_project_id\"\n", "REGION = 'US'" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "XoEqT2Y4DJmf" }, "source": [ "### Import libraries and define constants" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "pRUOFELefqf1" }, "outputs": [], "source": [ "from google.cloud import bigquery\n", "import time\n", "import pandas as pd\n", "\n", "pd.set_option('display.float_format', lambda x: '%.3f' % x)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "znIaF2QGsDkr" }, "source": [ "### Creating a BigQuery dataset" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "H1waFXGXsDkr" }, "source": [ "In this notebook, you will need to create a dataset in your project called `bqml`. To create it, run the following cell:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "pIBm-lrVsDks" }, "outputs": [], "source": [ "!bq mk --location=$REGION --dataset $PROJECT_ID:bqml" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "QB7wXYDpsDkv" }, "source": [ "## Raw data" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "MsxSG7yfsDkv" }, "source": [ "Before beginning, take a look at the raw data:" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "Z3G-0P1vsDkv" }, "source": [ "_Note_: Jupyter runs cells starting with %%bigquery as SQL queries" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "YbCrRUyFsDkw" }, "outputs": [], "source": [ "%%bigquery --project $PROJECT_ID\n", "## follows the Google Analytics schema: \n", "#https://support.google.com/analytics/answer/3437719?hl=en\n", "\n", "SELECT \n", " CONCAT(fullVisitorID,'-',CAST(visitNumber AS STRING)) AS visitorId,\n", " hitNumber,\n", " time,\n", " page.pageTitle,\n", " type,\n", " productSKU,\n", " v2ProductName,\n", " v2ProductCategory,\n", " productPrice/1000000 as productPrice_USD\n", "\n", "FROM \n", " `bigquery-public-data.google_analytics_sample.ga_sessions_20160801`, \n", " UNNEST(hits) AS hits,\n", " UNNEST(hits.product) AS hits_product\n", "LIMIT 5" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "9dMO82QnsDky" }, "source": [ "## Pre-process the data" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "iA90j6LqsDkz" }, "source": [ "With collaborative filtering (matrix factorization), the dataset must indicate a user's preference for a product, like a rating between 1 and 5 stars. However, in the retail industry, there is usually no or insufficient explicit feedback on how much a user liked a product. Thus, other behavioral metrics need to be used to infer their implicit \"rating\". One way to infer user interest in a product is to look at the total time spent on a product detail page (e.g., session duration).\n", "\n", "With matrix factorization, in order to train the model, you will need a table with `userId`, `itemId`, and the `rating`. In this notebook example, session duration will be used as the implicit rating. If you have other metrics (e.g., frequency of pageviews), you can simply combine the metrics together using a weighted sum to compute a rating value.\n", "\n", "|userId|itemId|rating|\n", "|-|-|-|\n", "|visitor1|productSKU_1|3000|\n", "|visitor1|productSKU_4|15000|\n", "|visitor1|productSKU_9|920|\n", "|visitor2|productSKU_12|0|\n", "\n", "Notice how every row is a unique combination of userId and itemId, along with the (implicit) rating." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "YetLiwPgsDkz" }, "source": [ "The query below will pre-process the data by calculating the total pageview duration per product per user, and materialize the data in a new table, `aggregate_web_stats`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "E6ppE7imft-y" }, "outputs": [], "source": [ "%%bigquery --project $PROJECT_ID\n", "## follows schema from https://support.google.com/analytics/answer/3437719?hl=en&ref_topic=3416089\n", "CREATE OR REPLACE TABLE bqml.aggregate_web_stats AS (\n", " WITH\n", " durations AS (\n", " --calculate pageview durations\n", " SELECT\n", " CONCAT(fullVisitorID,'-', \n", " CAST(visitNumber AS STRING),'-', \n", " CAST(hitNumber AS STRING) ) AS visitorId_session_hit,\n", " LEAD(time, 1) OVER (\n", " PARTITION BY CONCAT(fullVisitorID,'-',CAST(visitNumber AS STRING))\n", " ORDER BY\n", " time ASC ) - time AS pageview_duration\n", " FROM\n", " `bigquery-public-data.google_analytics_sample.ga_sessions_2016*`,\n", " UNNEST(hits) AS hit \n", " ),\n", " \n", " prodview_durations AS (\n", " --filter for product detail pages only\n", " SELECT\n", " CONCAT(fullVisitorID,'-',CAST(visitNumber AS STRING)) AS visitorId,\n", " productSKU AS itemId,\n", " IFNULL(dur.pageview_duration,\n", " 1) AS pageview_duration,\n", " FROM\n", " `bigquery-public-data.google_analytics_sample.ga_sessions_2016*` t,\n", " UNNEST(hits) AS hits,\n", " UNNEST(hits.product) AS hits_product\n", " JOIN\n", " durations dur\n", " ON\n", " CONCAT(fullVisitorID,'-',\n", " CAST(visitNumber AS STRING),'-',\n", " CAST(hitNumber AS STRING)) = dur.visitorId_session_hit\n", " WHERE\n", " #action_type: Product detail views = 2\n", " eCommerceAction.action_type = \"2\" \n", " ),\n", " \n", " aggregate_web_stats AS(\n", " --sum pageview durations by visitorId, itemId\n", " SELECT\n", " visitorId,\n", " itemId,\n", " SUM(pageview_duration) AS session_duration\n", " FROM\n", " prodview_durations\n", " GROUP BY\n", " visitorId,\n", " itemId )\n", " SELECT\n", " *\n", " FROM\n", " aggregate_web_stats\n", ");\n", "-- Show table\n", "SELECT\n", " *\n", "FROM\n", " bqml.aggregate_web_stats\n", "LIMIT\n", " 10" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "0UmMwUzfsDk0" }, "source": [ "### The training data" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "xenhAS06sDk0" }, "source": [ "With the data stored in an output table in the correct format for matrix factorization, the data is now ready for training a matrix factorization model." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "_WyrXCRUsDk1" }, "outputs": [], "source": [ "%%bigquery --project $PROJECT_ID\n", "\n", "SELECT\n", " *\n", "FROM\n", " bqml.aggregate_web_stats\n", "LIMIT \n", " 10" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "WRUv_AVosDk4" }, "source": [ "## Train the matrix factorization model" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "r8TSRjl5sDk4" }, "source": [ "To train the matrix factorization model (with implicit feedback), you will need to set the options:\n", "* `model_type`: `'matrix_factorization'` \n", "* `user_col`: \\<user column name>\n", "* `item_col`: \\<item column name>\n", "* `rating_col`: \\<rating column name>\n", "* `feedback_type`: `'implicit'` (default is 'explicit')\n", "\n", "To learn more about the parameters when training a model, read the [documentation on the CREATE MODEL statement for Matrix Factorization](https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create-matrix-factorization)." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "RbaQgPZQ70Uu" }, "source": [ "_Note: You may need to setup slot reservations. For more information, you can read up on how to set up flex slots [programmatically](https://medium.com/google-cloud/optimize-bigquery-costs-with-flex-slots-e06ec5e4aa90) or via the [BigQuery UI](https://cloud.google.com/bigquery/docs/reservations-workload-management#getting-started-with-bigquery-reservations)._" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "SOgEf1OasDk6" }, "outputs": [], "source": [ "%%bigquery --project $PROJECT_ID\n", "\n", "CREATE OR REPLACE MODEL bqml.retail_recommender\n", "OPTIONS(model_type='matrix_factorization', \n", " user_col='visitorId', \n", " item_col='itemId',\n", " rating_col='session_duration',\n", " feedback_type='implicit'\n", " )\n", "AS\n", "SELECT * FROM bqml.aggregate_web_stats" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "vsG6GXlnsDlK" }, "source": [ "## Model Evaluation" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "BQDuG8GPsDlK" }, "source": [ "Inspect the resulting metrics from model evaluation.\n", "\n", "For more information on these metrics, read the [ML.EVALUATE documentation here](https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-evaluate)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "q_Qi2PmbsDlL" }, "outputs": [], "source": [ "%%bigquery --project $PROJECT_ID\n", "\n", "SELECT\n", " *\n", "FROM\n", " ML.EVALUATE(MODEL bqml.retail_recommender)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "RWNXE406sDlN" }, "source": [ "## Hyperparameter Tuning" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "Pl1J2024sDlN" }, "source": [ "If you want to improve your model, some of the hyperparameters you can tune are:\n", "* `NUM_FACTORS`: Specifies the number of latent factors to use for matrix factorization models (int64_value)\n", "* `L2_REG`: The amount of L2 regularization applied (float64_value)\n", "* `WALS_ALPHA`: A hyperparameter for 'IMPLICIT' matrix factorization model (float64_value)\n", "\n", "See the official documentation on [CREATE MODEL (matrix factorization)](https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create-matrix-factorization) for more information on hyperparameter tuning." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "eCQ_-wyMsDlN" }, "source": [ "## Make predictions" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "jxQ4HIeasDlO" }, "source": [ "### Inspect the predicted recommendations for a single user" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "xaoYyCHSsDlO" }, "source": [ "#### What are the top 5 items you could recommend to a specific visitorId?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "0kl0uSKmsDlO" }, "outputs": [], "source": [ "%%bigquery --project $PROJECT_ID\n", "\n", "#check for a single visitor\n", "DECLARE MY_VISITORID STRING DEFAULT \"0824461277962362623-1\";\n", "\n", "SELECT\n", " *\n", "FROM\n", " ML.RECOMMEND(MODEL `bqml.retail_recommender`,\n", " (SELECT MY_VISITORID as visitorID)\n", " )\n", "ORDER BY predicted_session_duration_confidence DESC\n", "LIMIT 5" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "eWbAurz4sDlQ" }, "source": [ "What are the names of the recommended products? Discover the product names by joining the resulting productSKU recommendations back with the product names:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "WaalmFivsDlQ" }, "outputs": [], "source": [ "%%bigquery --project $PROJECT_ID\n", "\n", "DECLARE MY_VISITORID STRING DEFAULT \"6499749315992064304-2\";\n", "\n", "WITH product_details AS(\n", " SELECT \n", " productSKU,\n", " v2ProductName,\n", " FROM \n", " `bigquery-public-data.google_analytics_sample.ga_sessions_2016*`,\n", " UNNEST(hits) AS hits,\n", " UNNEST(hits.product) AS hits_product\n", " GROUP BY 2,1 \n", ")\n", "\n", "SELECT\n", " r.*,\n", " d.v2ProductName\n", "FROM\n", " ML.RECOMMEND(MODEL `bqml.retail_recommender`,\n", " (\n", " SELECT\n", " MY_VISITORID as visitorId)) r\n", "JOIN\n", " product_details d\n", "ON\n", " r.itemId = d.productSKU\n", "ORDER BY predicted_session_duration_confidence DESC\n", "LIMIT 5" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "znIO7dFVsDlS" }, "source": [ "### Batch predictions for all users" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "9BYe_15fsDlT" }, "source": [ "To retrieve the top 5 recommended products for all existing users, run the following query. As the result can be large (num_users * num_products * top N), this also outputs the recommendations to a separate table." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "2gCkde_gsDlT" }, "outputs": [], "source": [ "%%bigquery --project $PROJECT_ID\n", "\n", "-- Create output table\n", "CREATE OR REPLACE TABLE bqml.prod_recommendations AS (\n", "WITH predictions AS (\n", " SELECT \n", " visitorId, \n", " ARRAY_AGG(STRUCT(itemId, \n", " predicted_session_duration_confidence)\n", " ORDER BY \n", " predicted_session_duration_confidence DESC\n", " LIMIT 5) as recommended\n", " FROM ML.RECOMMEND(MODEL bqml.retail_recommender)\n", " GROUP BY visitorId\n", ")\n", " \n", "\n", "SELECT\n", " visitorId,\n", " itemId,\n", " predicted_session_duration_confidence\n", "FROM\n", " predictions p,\n", " UNNEST(recommended)\n", ");\n", "\n", "-- Show table\n", "SELECT\n", " *\n", "FROM\n", " bqml.prod_recommendations\n", "ORDER BY \n", " visitorId\n", "LIMIT\n", " 20" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "NfffZ2FksDlV" }, "source": [ "## Using the predicted recommendations in production" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "2-ddW8pasDlV" }, "source": [ "Once you have the recommendations, plugging into your production pipeline will depend on your use case. \n", "\n", "Here are a few possible ways to help you get started:\n", "1. [Export recommendations for marketing activation:](#export_ga360)\n", " 1. For activation via Google Ads, Display & Video 360 and Search Ads 360\n", " 1. For activation via emails\n", "1. [Other ways to export recommendations from BigQuery](#export_other)\n", " 1. BigQuery to pandas dataframes\n", " 1. Export the predictions to Google Cloud Storage" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "uR_p3Jt2sDlW" }, "source": [ "<a id=\"export_ga360\"></a>\n", "### 1-1. Export recommendations to Google Analytics 360 (Google Marketing Platform)" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "54zMANFfsDlW" }, "source": [ "By exporting the resulting predictions from BigQuery ML back to Google Analytics, you will be able to generate custom remarketing audiences and target customers more effectively with ads, search, or email activation." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "RsfdKkA5sDlW" }, "source": [ "##### Formatting the data for Google Analytics 360" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "MON2xqDYsDlX" }, "source": [ "You may need to format the data output into something that Google Analytics, for example:\n", "\n", "|clientId | LikelyToBuyProductA | \n", "|-|-| \n", "| 123 | 0.70 | \n", "| 345 | 0.90 | " ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "kLPaIJil8gQ8" }, "source": [ "Here's a sample query for an itemId \"GGOEYOLR018699\", that normalizes the confidence scores between 0 and 1, using [ML.MIN_MAX_SCALER](https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-preprocessing-functions#mlmin_max_scaler):\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "Dlna9gd78iQ_" }, "outputs": [], "source": [ "%%bigquery --project $PROJECT_ID\n", "WITH predictions AS (\n", " SELECT \n", " visitorId, \n", " ARRAY_AGG(STRUCT(itemId, \n", " predicted_session_duration_confidence)\n", " ORDER BY \n", " predicted_session_duration_confidence) as recommended\n", " FROM ML.RECOMMEND(MODEL bqml.retail_recommender)\n", " WHERE itemId = \"GGOEYOLR018699\"\n", " GROUP BY visitorId\n", ")\n", " \n", "SELECT\n", " visitorId,\n", " ML.MIN_MAX_SCALER(\n", " predicted_session_duration_confidence\n", " ) OVER() as GGOEYOLR018699\n", "FROM\n", " predictions p,\n", " UNNEST(recommended)\n", "ORDER BY GGOEYOLR018699 DESC" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "k9xvYYTOsDlX" }, "source": [ "To create a column per product, you can use the pivot() function as described in [this blogpost](https://towardsdatascience.com/easy-pivot-in-bigquery-one-step-5a1f13c6c710).\n", "\n", "For Google Analytics Data Import, it's recommended that you use `clientId` as the key, along with individual columns that show some propensity score. In other words, you may need to create a new column for each product that you are interested in recommending, and create a custom dimension in Google Analytics that can be then used to build your audiences. It's also likely best to ensure that you have one row per `clientId`. If you know you will be exporting predictions to Google Analytics, it's recommended that you train your models using `clientId` directly instead of `visitorId`.\n" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "RQyG7oTosDlX" }, "source": [ "##### Exporting the data from BigQuery into Google Analytics 360" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "q4PLyv_AsDlY" }, "source": [ "The easiest way to export your BigQuery ML predictions from a BigQuery table to Google Analytics 360 is to use the [MoDeM (Model Deployment for Marketing)](https://github.com/google/modem) reference implementation. MoDeM helps you load data into Google Analytics for eventual activation in Google Ads, Display & Video 360 and Search Ads 360. \n", "\n", "To export to Google Analytics 360 from BigQuery:\n", "- Follow the [step-by-step instructions here](https://github.com/google/modem/tree/master/bqml) to build your ETL pipeline from BigQuery ML to Google Analytics using MoDeM. You can also view the interactive instructions in this [notebook](https://colab.research.google.com/github/google/modem/blob/master/bqml/utils/BQML_Deployment_Template_Cloud_Function.ipynb)." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "B1Lb8eKAsDlY" }, "source": [ "### 1-2. Email activation using Salesforce Marketing Cloud" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "6ezUUPtnsDlY" }, "source": [ "As Google Analytics does not contain email addresses, you may need to integrate with a 3rd-party platform like Salesforce Marketing Cloud for email activations.\n", "\n", "Google Analytics 360 customers can activate their Analytics 360 audiences in Marketing Cloud on Salesforce direct marketing channels (email and SMS). This enables your marketing team to build audiences based on online web behavior and engage with those customers via emails and SMS.\n", "\n", "Follow the [step-by-step instructions here](https://support.google.com/analytics/answer/9250031?hl=en) to integrate Google Analytics 360 with Salesforce Marketing Cloud, or learn more about [Audience Activation through Salesforce Trailhead](https://trailhead.salesforce.com/content/learn/modules/google-analytics-360-integration-for-marketing-cloud)." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "56KmAjV8sDlZ" }, "source": [ "<a id=\"export_other\"></a>\n", "### 2. Other ways to export recommendations from BigQuery\n", "If you want to use the predicted recommendations in other services, two ways to leverage the results are to export the data from BigQuery as a pandas dataframe, or if you want to store the result on Google Cloud Storage, you can also export the table directly as a CSV file." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "wTeO6BYSsDlZ" }, "source": [ "#### 2-1. Read from the predictions directly from BigQuery" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "Gr091SfdsDlZ" }, "source": [ "With the predictions stored in a separate table, you can export the data into a Pandas dataframe using the BigQuery Storage API (see [documentation and code samples](https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas#download_table_data_using_the_client_library)). You can also use other [BigQuery client libraries](https://cloud.google.com/bigquery/docs/reference/libraries). \n", "\n", "Alternatively you can also export directly into pandas in a notebook using the `%%bigquery <variable name>` as in:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "yrarIhXy9pP-" }, "outputs": [], "source": [ "%%bigquery df --project $PROJECT_ID\n", "SELECT\n", " *\n", "FROM \n", " bqml.prod_recommendations\n", "LIMIT 100" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "dkCf_l-v953l" }, "outputs": [], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "ENDbm3HksDlb" }, "source": [ "#### 2-2. Export predictions table to Google Cloud Storage" ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "Ms9dYeutsDlc" }, "source": [ "There are several ways to export the predictions table to Google Cloud Storage (GCS), so that you can use them in a separate service. Perhaps the easiest way is to export directly to GCS using SQL ([documentation](https://cloud.google.com/bigquery/docs/reference/standard-sql/other-statements#export_data_statement))." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": {}, "colab_type": "code", "id": "wmBRzUdY9aKB" }, "outputs": [], "source": [ "%%bigquery --project $PROJECT_ID\n", "\n", "EXPORT DATA OPTIONS (\n", "uri=\"gs://mybucket/myfile/recommendations_*.csv\", \n", " format=CSV\n", ") AS \n", "SELECT\n", " *\n", "FROM \n", " bqml.prod_recommendations" ] }, { "source": [ "## Questions? Feedback?\n", "If you have any questions or feedback, please open up a [new issue](https://github.com/GoogleCloudPlatform/analytics-componentized-patterns/issues)." ], "cell_type": "markdown", "metadata": {} } ], "metadata": { "colab": { "collapsed_sections": [], "name": "bqml_matrix_factorization_retail_ecommerce.ipynb", "provenance": [], "toc_visible": true }, "environment": { "name": "common-cpu.m55", "type": "gcloud", "uri": "gcr.io/deeplearning-platform-release/base-cpu:m55" }, "kernelspec": { "name": "python3", "display_name": "Python 3.6.8 64-bit" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" }, "interpreter": { "hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49" } }, "nbformat": 4, "nbformat_minor": 4 }

retail/recommendation-system/bqml/bqml_retail_recommendation_system.ipynb (1,097 lines of code) (raw):