dashboards/geospatial/bigquery_geospatial_visualization.ipynb (986 lines of code) (raw):

{ "cells": [ { "cell_type": "code", "source": [ "# Copyright 2025 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License." ], "metadata": { "id": "wPkADnrmJFrG" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "# BigQuery Geospatial Visualization in Colab\n", "\n", "\n", "<table align=\"left\">\n", " <td style=\"text-align: center\">\n", " <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/bigquery-utils/blob/master/dashboards/geospatial/bigquery_geospatial_visualization.ipynb\">\n", " <img width=\"32px\" src=\"https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg\" alt=\"Google Colaboratory logo\"><br> Open in Colab\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fbigquery-utils%2Fmaster%2Fdashboards%2Fgeospatial%2Fbigquery_geospatial_visualization.ipynb\">\n", " <img width=\"32px\" src=\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\" alt=\"Google Cloud Colab Enterprise logo\"><br> Open in Colab Enterprise\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"https://console.cloud.google.com/bigquery/import?url=https://github.com/GoogleCloudPlatform/bigquery-utils/blob/master/dashboards/geospatial/bigquery_geospatial_visualization.ipynb\">\n", " <img src=\"https://www.gstatic.com/images/branding/gcpiconscolors/bigquery/v1/32px.svg\" alt=\"BigQuery Studio logo\"><br> Open in BigQuery Studio\n", " </a>\n", " </td>\n", " <td style=\"text-align: center\">\n", " <a href=\"https://github.com/GoogleCloudPlatform/bigquery-utils/blob/master/dashboards/geospatial/bigquery_geospatial_visualization.ipynb\">\n", " <img width=\"32px\" src=\"https://upload.wikimedia.org/wikipedia/commons/9/91/Octicons-mark-github.svg\" alt=\"GitHub logo\"><br> View on GitHub\n", " </a>\n", " </td>\n", "</table>\n", "\n", "<div style=\"clear: both;\"></div>\n", "\n" ], "metadata": { "id": "cijTxNXeJQPk" } }, { "cell_type": "markdown", "source": [ "| | |\n", "|-|-|\n", "| Author(s) | [Bijan Vakili](https://github.com/bijanvakili/) and [Alicia Williams](https://github.com/aliciawilliams/)" ], "metadata": { "id": "veFFJYrAJa11" } }, { "cell_type": "markdown", "metadata": { "id": "UK9RTdsz3_ft" }, "source": [ "# Introduction\n", "---\n", "\n", "The goal of this notebook is to walk through the steps for visualizing geospatial data in a [Colab](https://colab.research.google.com/) notebook. It will source geospatial data from [Google BigQuery](https://cloud.google.com/bigquery).\n", "\n", "We will go through:\n", "* Authenticating to Google Cloud\n", "* Reading data from BigQuery into Colab\n", "* Using Python data science tools to perform transformations and analysis\n", "* Using `geemap` and `pydeck` to render scatter plots, polygons, choropleths, and heatmaps" ] }, { "cell_type": "markdown", "source": [ "## Datasets and Analysis Goals\n", "\n", "The datasets we'll be working with include:\n", "\n", "* [San Francisco Ford GoBike Share](https://console.cloud.google.com/bigquery(cameo:product/san-francisco-public-data/sf-bike-share)\n", "* [San Francisco Neighborhoods](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1sbigquery-public-data!2ssan_francisco_neighborhoods)\n", "* [San Francisco Police Department (SFPD) Reports](https://console.cloud.google.com/bigquery(cameo:product/san-francisco-public-data/sfpd-reports))\n", "\n", "These [public datasets](https://cloud.google.com/bigquery/public-data) are hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Read this [documentation page](https://cloud.google.com/bigquery/docs/sandbox) to learn how to get started quickly using BigQuery to access public datasets.\n", "\n", "Our goals are to query and plot the following information in San Francisco:\n", "* All bike share stations using `ScatterplotLayer`\n", "* All neighborhoods polygons using a `GeoJSONLayer`\n", "* Choropleth (discrete intensity) of available bike share stations per neighborhood using a `PolygonLayer`\n", "* Continous density of SFPD incidents using a `HeatmapLayer`" ], "metadata": { "id": "6M1py-WrJGCV" } }, { "cell_type": "markdown", "source": [ "## Related Data Science Tools/Techniques\n", "\n", "In each step of the above journey, we show how BigQuery and Python data science and plotting libraries (via Colab) can work together to enable this type of analysis at scale. Some key tools and techniques we'll employ:\n", "* [colabtools](https://github.com/googlecolab/colabtools) (`google.colab` python modules)\n", "* [Google BigQuery](https://cloud.google.com/bigquery/what-is-bigquery)\n", "* Python Data Science Libraries: [pandas](https://pandas.pydata.org/)\n", "* Python Geospatial Libraries:\n", " * [geopandas](https://geopandas.org/en/stable/index.html) to extend the datatypes used by `pandas` to allow spatial operations on geometric types\n", " * [shapely](https://shapely.readthedocs.io/en/stable/index.html) for manipulation and analysis of individual planar geometric objects \n", "* [branca](https://python-visualization.github.io/branca/) to generate generate HTML + JS colormaps\n", "* [pydeck](https://deckgl.readthedocs.io/en/latest/) for chart visualization which is powered by [deck.gl](https://deck.gl/#/)\n", "* [geemap](https://geemap.org/) for visualization with `pydeck` and `earthengine-api`\n", " * [geemap.deck](https://geemap.org/deck/) module\n", " * [github source](https://github.com/gee-community/geemap)\n", "* [h3](https://uber.github.io/h3-py/intro.html) for Hierarchical Geospatial Indexing System\n" ], "metadata": { "id": "i_Azv1m6I_nZ" } }, { "cell_type": "markdown", "source": [ "## BigQuery Pricing and Cost of Running This Notebook\n", "If you don’t already have a Google Cloud project, there are 2 no-cost options available for accessing this data:\n", "\n", "1. Sign up for [BigQuery sandbox](https://cloud.google.com/bigquery/docs/sandbox) to try it without enabling billing.\n", "2. If you want to experiment with multiple Google Cloud products, activate the [free trial](https://cloud.google.com/free/) ($300 credit for up to 90 days).\n", "\n", "You can use [BigQuery's free tier](https://cloud.google.com/bigquery/pricing#free-tier) to store and analyze a certain amount of data at no cost (1 TB query, 10 GB storage capacity per month), even after a free trial period.\n", "\n", "Running this notebook in its entirety, with the default values, should fall within the free usage tier and shouldn't cost you anything. However, changing the default settings and/or running it multiple times may incur additional charges. See the [BigQuery pricing guide](https://cloud.google.com/bigquery/pricing) for more information." ], "metadata": { "id": "9EDjG3SiwF2V" } }, { "cell_type": "markdown", "source": [ "This is not an official Google product but sample code provided for an educational purpose." ], "metadata": { "id": "trYL6TIJ-zNZ" } }, { "cell_type": "markdown", "source": [ "## Help Google improve Tutorials\n", "\n", "* File issues [here](https://github.com/GoogleCloudPlatform/bigquery-utils/issues) on Github.\n", "* Submit feedback [here](https://docs.google.com/forms/d/e/1FAIpQLSdNSDbBM2rohqtqWtIPgKD_14oLc8TPqqdtKD11oqsrtA8_NQ/viewform) via a Google Form.\n", "\n" ], "metadata": { "id": "SL4oVWJ3X02c" } }, { "cell_type": "markdown", "metadata": { "id": "FdzVBp6XBlH8" }, "source": [ "#Setup" ] }, { "cell_type": "markdown", "source": [ "## Install latest version of BigQuery Magic" ], "metadata": { "id": "RL3PkpB1dDvA" } }, { "cell_type": "markdown", "source": [ "Install the latest [IPython Magics for BigQuery](https://googleapis.dev/python/bigquery-magics/latest/) package (`bigquery-magics`) required to execute this notebook." ], "metadata": { "id": "4L6y9c_Vdtbh" } }, { "cell_type": "code", "source": [ "!pip install --upgrade --quiet bigquery-magics" ], "metadata": { "id": "DHCRqbFEc_IJ" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "To use the newly installed package in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which will restart the current kernel." ], "metadata": { "id": "FG6Dpw8pdxOD" } }, { "cell_type": "code", "source": [ "# Automatically restart kernel after installs so that your environment can access the new packages\n", "import IPython\n", "\n", "app = IPython.Application.instance()\n", "app.kernel.do_shutdown(True)" ], "metadata": { "id": "uc8J1Jfddpts" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## GCP Authentication\n", "\n", "We set up our Colab by installing, importing, and enabling the usage of a few Python libraries within Colab, as well as authenticating this Colab runtime with the appropriate Google Cloud `GCP_PROJECT_ID`. This follows closely the instructions in the [\"Getting started with BigQuery\"](https://colab.sandbox.google.com/notebooks/bigquery.ipynb#scrollTo=SeTJb51SKs_W)\n", "example Colab.\n", "\n", "The authentication step in the next cell will require manually going through some pop-up screens and copy/pasting an authentication code from another window back into the cell to complete (on the 1st run; may run automatically thereafter)." ], "metadata": { "id": "eypcd2MyYn2j" } }, { "cell_type": "code", "source": [ "# REQUIRED: authenticate with GCP project\n", "GCP_PROJECT_ID = \"\" #@param {type:\"string\"}\n", "\n", "from google.colab import auth\n", "from google.colab import userdata\n", "\n", "auth.authenticate_user(project_id=GCP_PROJECT_ID)\n", "\n", "# Set GMP_API_KEY to none\n", "GMP_API_KEY = None" ], "metadata": { "id": "YWxjUUvNhs4w" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## (Optional) GMP Authentication\n", "\n", "If you wish to use Google Maps Platform (GMP) as the map provider for base maps, you must provide a [Google Maps Platform API key](https://developers.google.com/maps/documentation/javascript/get-api-key). This notebook can retrieve it from your [Colab Secrets](https://colab.sandbox.google.com/github/google-gemini/cookbook/blob/main/quickstarts/Authentication.ipynb#scrollTo=dEoigYI9Jw_K) if you provide a key name.\n", "\n", "If no key name is provided, `pydeck` will default to using the `carto` map provided." ], "metadata": { "id": "7TRL5KKRYuD2" } }, { "cell_type": "code", "source": [ "GMP_API_SECRET_KEY_NAME = \"\" #@param {type:\"string\"}\n", "\n", "if GMP_API_SECRET_KEY_NAME:\n", " GMP_API_KEY = userdata.get(GMP_API_SECRET_KEY_NAME) if GMP_API_SECRET_KEY_NAME else None\n", "else:\n", " GMP_API_KEY = None" ], "metadata": { "id": "sXDTkCKpYkQ8" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "##Enable BigQuery API" ], "metadata": { "id": "i2V3arTCjjmp" } }, { "cell_type": "code", "source": [ "!gcloud services enable \\\n", " --project $GCP_PROJECT_ID \\\n", " bigquery.googleapis.com" ], "metadata": { "id": "2rImtm8xMPWA" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "##(Optional) Enable Google Maps Javascript API" ], "metadata": { "id": "OkNoiTixjt9F" } }, { "cell_type": "code", "source": [ "!gcloud services enable \\\n", " --project $GCP_PROJECT_ID \\\n", " \"maps-backend.googleapis.com\"" ], "metadata": { "id": "LK4gaoU0INDv" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "# Install Packages" ], "metadata": { "id": "s7_S0DklRQip" } }, { "cell_type": "code", "source": [ "!pip install pydeck>=0.9 h3>=4.2" ], "metadata": { "id": "DpeceuOPRSP_" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "##Import libraries" ], "metadata": { "id": "ycaYw0I-j8P1" } }, { "cell_type": "code", "source": [ "import branca\n", "import geemap.deck as gmdk\n", "import h3\n", "import pydeck as pdk\n", "import geopandas as gpd\n", "import shapely" ], "metadata": { "id": "kSrLzumZq2Wj" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Enable interactive tables for `pandas` DataFrames in Colab.\n", "\n", "> See [this article](https://colab.google/articles/alive) for more details." ], "metadata": { "id": "iB0Hd0JBNWHC" } }, { "cell_type": "code", "source": [ "# Enable displaying pandas data frames as interactive tables by default\n", "from google.colab import data_table\n", "data_table.enable_dataframe_formatter()" ], "metadata": { "id": "fa0BY4cqpEUt" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "# Shared Routines\n", "\n", "We create `display_pydeck_map` here to render charts using `pydeck`. This\n", "involves creating a `pydeck.Map` instance and then adding layers (`pydeck.Layer`) to the map instance." ], "metadata": { "id": "wathhRxeKFA0" } }, { "cell_type": "code", "source": [ "MAP_PROVIDER_GOOGLE = pdk.bindings.base_map_provider.BaseMapProvider.GOOGLE_MAPS.value\n", "\n", "# Shared routine for rendering layers on a map using geemap.deck\n", "def display_pydeck_map(layers, view_state, **kwargs):\n", " deck_kwargs = kwargs.copy()\n", "\n", " # use Google Maps as the base map only if the API key is provided\n", " if GMP_API_KEY:\n", " deck_kwargs.update({\n", " \"map_provider\": MAP_PROVIDER_GOOGLE,\n", " \"map_style\": pdk.bindings.map_styles.GOOGLE_ROAD,\n", " \"api_keys\": {MAP_PROVIDER_GOOGLE: GMP_API_KEY},\n", " })\n", "\n", " m = gmdk.Map(initial_view_state=view_state, ee_initialize=False, **deck_kwargs)\n", "\n", " for layer in layers:\n", " m.add_layer(layer)\n", " return m" ], "metadata": { "id": "OpkP0-i4cOna" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "# Create a scatter plot with `ScatterplotLayer`\n", "\n", "Scatter plots are most useful when you need to review a subsample of individual points. This is sometimes referred to as \"spot checking\".\n", "\n", "In this section, we'll create a scatter plot of all bike share stations listed in the [San Francisco Ford GoBike Share](https://console.cloud.google.com/bigquery(cameo:product/san-francisco-public-data/sf-bike-share) public dataset." ], "metadata": { "id": "LUlhMtkiKKq7" } }, { "cell_type": "markdown", "source": [ "## Import dataset from BigQuery: Ford GoBike stations in San Francisco\n", "\n", "We use the `%%bigquery` magic to run a query and return the results in a `geopandas.GeoDataFrame`." ], "metadata": { "id": "uIOaRfugKM5U" } }, { "cell_type": "code", "source": [ "# NOTE: For the purposes of this tutorial, we ignore the denormalized 'lat' and 'lon' columns which are decomposed components of the geometry\n", "%%bigquery gdf_sf_bikestations --project {GCP_PROJECT_ID} --use_geodataframe station_geom\n", "\n", "SELECT\n", " station_id,\n", " name,\n", " short_name,\n", " station_geom\n", "FROM\n", " `bigquery-public-data.san_francisco_bikeshare.bikeshare_station_info`" ], "metadata": { "id": "1cn3dOwzliS8" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "## Rendering a Scatter plot\n", "\n", "This example demonstrates how to use [pydeck.Layer](https://deckgl.readthedocs.io/en/latest/layer.html#pydeck.bindings.layer.Layer) and the [ScatterPlotLayer](https://deck.gl/docs/api-reference/layers/scatterplot-layer) to render individual points as circles. This requires extracting the longitude and latitude as x and y coordinates respectively from the `station_geom` column." ], "metadata": { "id": "pVhXzB-NKTwr" } }, { "cell_type": "code", "source": [ "gdf_sf_bikestations.info()" ], "metadata": { "id": "h5RkDjMjlkFs" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "gdf_sf_bikestations.head()" ], "metadata": { "id": "j6iqr8kGKGmK" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Since `gdf_sf_bikestations` is a `geopandas.GeoDataFrame`, coordinates can be accessed directly from its `station_geom` geometry column. The code retrieves the longitude using the column's `.x` attribute and the latitude using its `.y` attribute, storing them in new `longitude` and `latitude` columns." ], "metadata": { "id": "GaIC5aZyAGsT" } }, { "cell_type": "code", "source": [ "# Extract the longitude (x) and latitude (y)\n", "gdf_sf_bikestations[\"longitude\"] = gdf_sf_bikestations[\"station_geom\"].x\n", "gdf_sf_bikestations[\"latitude\"] = gdf_sf_bikestations[\"station_geom\"].y" ], "metadata": { "id": "Dst4ZuUtJ7If" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "The cell below renders a scatter plot.\n", "\n", "If you wish to customize it, please review the [pydeck.Layer](https://deckgl.readthedocs.io/en/latest/layer.html#pydeck.bindings.layer.Layer) API docs and the [ScatterPlotLayer](https://deck.gl/docs/api-reference/layers/scatterplot-layer) docs to discover other parameters." ], "metadata": { "id": "vBWzFz8yHYVm" } }, { "cell_type": "code", "source": [ "# render a scatter plot using pydeck with the extracted longitude and latitude columns\n", "# in the gdf_sf_bikestations geopandas.GeoDataFrame.\n", "scatterplot_layer = pdk.Layer(\n", " \"ScatterplotLayer\",\n", " id=\"bike_stations_scatterplot\",\n", " data=gdf_sf_bikestations,\n", " get_position=['longitude', 'latitude'],\n", " get_radius=100,\n", " get_fill_color=[255, 0, 0, 140], # Adjust color as desired\n", " pickable=True,\n", ")\n", "\n", "view_state = pdk.ViewState(latitude=37.77613, longitude=-122.42284, zoom=12)\n", "display_pydeck_map([scatterplot_layer], view_state)" ], "metadata": { "id": "VNrTcs_imOEZ" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "# Visualize polygons with a `GeoJSONLayer`\n", "\n", "BigQuery's [GEOGRAPHY](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#geography_type) data type can represent a geometry value or geometry collection. Examples of shapes include:\n", "* points\n", "* lines\n", "* polygons\n", "* multi-polygons\n", "\n", "\n", "Sometimes you are provided geospatial data without knowing in advance its expected shapes. \n", "\n", "In this case, visualizing the data can help you discover the shapes enabling further analysis. This visualization can be done by converting the geographic data to GeoJSON and then visualizing it with a `GeoJSONLayer`.\n", "\n", "As an example, we'll import geographic data from the [San Francisco Neighborhoods](https://console.cloud.google.com/bigquery?ws=!1m4!1m3!3m2!1sbigquery-public-data!2ssan_francisco_neighborhoods) dataset and then visualize it." ], "metadata": { "id": "CiNuk-hZKeCE" } }, { "cell_type": "markdown", "source": [ "## Import dataset from BigQuery: San Francisco Neighborhoods" ], "metadata": { "id": "CpwS4pYUKg5q" } }, { "cell_type": "code", "source": [ "%%bigquery gdf_sanfrancisco_neighborhoods --project {GCP_PROJECT_ID} --use_geodataframe geometry\n", "\n", "SELECT\n", " neighborhood,\n", " neighborhood_geom AS geometry\n", "FROM\n", " `bigquery-public-data.san_francisco_neighborhoods.boundaries`" ], "metadata": { "id": "sghEl1_ow-cw" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "gdf_sanfrancisco_neighborhoods.info()" ], "metadata": { "id": "AspV9xguuO29" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# peek at the first item\n", "gdf_sanfrancisco_neighborhoods.head(1)" ], "metadata": { "id": "ftEWN1wwJnRm" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "We see above that the data is a `POLYGON`.\n", "\n", "The cell below renders the data by relying on `pydeck` to convert each `shapely` object instance in the `geometry` column to the [GeoJSON format](https://datatracker.ietf.org/doc/html/rfc7946).\n", "\n", "If you wish to customize the visualization, please review the [pydeck.Layer](https://deckgl.readthedocs.io/en/latest/layer.html#pydeck.bindings.layer.Layer) API docs and the [GeoJsonLayer](https://deck.gl/docs/api-reference/layers/geojson-layer) docs to discover other parameters." ], "metadata": { "id": "TB9JsPgXKSd8" } }, { "cell_type": "code", "source": [ "geojson_layer = pdk.Layer(\n", " 'GeoJsonLayer',\n", " id=\"sf_neighborhoods\",\n", " data=gdf_sanfrancisco_neighborhoods,\n", " get_line_color=[127, 0, 127, 255],\n", " get_fill_color=[60, 60, 60, 50],\n", " get_line_width=100,\n", " pickable=True,\n", " stroked=True,\n", " filled=True,\n", " )\n", "view_state = pdk.ViewState(latitude=37.77613, longitude=-122.42284, zoom=12)\n", "display_pydeck_map([geojson_layer], view_state)" ], "metadata": { "id": "ATWmDkFJzhgE" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "#Render a choropleth using `PolygonLayer`\n", "\n", "If you are exploring data that are polgyons which are not easily converted to GeoJSON, you can consider using a [PolygonLayer](https://deck.gl/docs/api-reference/layers/polygon-layer) instead. The `PolygonLayer` can process input data of [specific types](https://deck.gl/docs/api-reference/layers/polygon-layer#getpolygon) such as an array of points.\n", "\n", "We'll show how to use a `PolygonLayer` to render an array of points and take it a step further to produce a choropleth." ], "metadata": { "id": "KmmOJVxTKwfW" } }, { "cell_type": "code", "source": [ "# aggregate and count the number of stations per neighborhood\n", "gdf_count_stations = gdf_sanfrancisco_neighborhoods.sjoin(gdf_sf_bikestations, how='left', predicate='contains')\n", "gdf_count_stations = gdf_count_stations.groupby(by='neighborhood')['station_id'].count().rename('num_stations')\n", "gdf_stations_x_neighborhood = gdf_sanfrancisco_neighborhoods.join(gdf_count_stations, on='neighborhood', how='inner')\n", "\n", "\n", "# to simulate non-GeoJSON input data, create 'polygon' column which contains an array of points using the pandas.Series.map method.\n", "gdf_stations_x_neighborhood['polygon'] = gdf_stations_x_neighborhood['geometry'].map(lambda g: list(g.exterior.coords))" ], "metadata": { "id": "JAm-ze1coZ3C" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# add a 'fill_color' column for each of the polygons. this is done by first creating a color map gradient using the branch library.\n", "colormap = branca.colormap.LinearColormap(\n", " colors=[\"lightblue\", \"darkred\"],\n", " vmin=0,\n", " vmax=gdf_stations_x_neighborhood['num_stations'].max(),\n", ")\n", "gdf_stations_x_neighborhood['fill_color'] = gdf_stations_x_neighborhood['num_stations'] \\\n", " .map(lambda c: list(colormap.rgba_bytes_tuple(c)[:3]) + [0.7 * 255]) # force opacity of 0.7" ], "metadata": { "id": "BO1opvJyOZP7" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "The cell below renders a polygon layer.\n", "\n", "If you wish to customize it, please review the [pydeck.Layer](https://deckgl.readthedocs.io/en/latest/layer.html#pydeck.bindings.layer.Layer) API docs and the [PolygonLayer](https://www.google.com/url?q=https%3A%2F%2Fdeck.gl%2Fdocs%2Fapi-reference%2Flayers%2Fpolygon-layer) docs to discover other parameters." ], "metadata": { "id": "5R5siL0VOnV9" } }, { "cell_type": "code", "source": [ "polygon_layer = pdk.Layer(\n", " 'PolygonLayer',\n", " id=\"bike_stations_choropleth\",\n", " data=gdf_stations_x_neighborhood,\n", " get_polygon='polygon',\n", " get_fill_color='fill_color',\n", " get_line_color=[0, 0, 0, 255],\n", " get_line_width=50,\n", " pickable=True,\n", " stroked=True,\n", " filled=True,\n", ")\n", "view_state = pdk.ViewState(latitude=37.77613, longitude=-122.42284, zoom=12)\n", "display_pydeck_map([polygon_layer], view_state)" ], "metadata": { "id": "CccZqXb3euS0" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "# Create a heatmap with `HeatmapLayer`\n", "\n", "Choropleths are useful when you have meaningful boundaries known in advance. However, when you have lots of data with no known meaningful boundaries, consider using a `HeatmapLayer` to render its continuous density.\n", "\n", "For this example, we will use the [San Francisco Police Department (SFPD) Reports](https://console.cloud.google.com/bigquery(cameo:product/san-francisco-public-data/sfpd-reports)) dataset to query and visualize the distribution of incidents in 2015.\n", "\n", "For heatmaps, it is recommended to quantize and aggregate the data before rendering. This example will do so using Carto's [H3](https://docs.carto.com/data-and-analysis/analytics-toolbox-for-bigquery/sql-reference/h3) spatial indexing." ], "metadata": { "id": "vAz0FlkqDb7T" } }, { "cell_type": "markdown", "source": [ "## Import dataset from BigQuery: San Francisco Police Department Incidents" ], "metadata": { "id": "VkjqXbbjLC7d" } }, { "cell_type": "code", "source": [ "%%bigquery gdf_incidents --project {GCP_PROJECT_ID} --use_geodataframe location_geography\n", "\n", "SELECT\n", " unique_key,\n", " location_geography\n", "FROM (\n", " SELECT\n", " unique_key,\n", " SAFE.ST_GEOGFROMTEXT(location) AS location_geography, # WKT string to GEOMETRY\n", " EXTRACT(YEAR FROM timestamp) AS year,\n", " FROM `bigquery-public-data.san_francisco_sfpd_incidents.sfpd_incidents` incidents\n", ")\n", "WHERE year = 2015" ], "metadata": { "id": "0Srvc7yVaUxe" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Quantizing can be done using the [h3](https://github.com/uber/h3-py) python library. This will aggregate the incident points into hexagons.\n", "\n", "Specifically, we use:\n", "* [h3.latlng_to_cell](https://uber.github.io/h3-py/api_verbose.html#h3.latlng_to_cell) to map the incident's position (latitude and longitude) to an H3 cell index. An H3 [resolution](https://h3geo.org/docs/core-library/restable/) of `9` provides sufficient aggregated hexagons for the heatmap.\n", "* [h3.cell_to_latlng](https://uber.github.io/h3-py/api_verbose.html#h3.cell_to_latlng) to determine the center of each hexagon.\n", "\n", "> Please note that you can also use Carto's [H3 functions](https://docs.carto.com/data-and-analysis/analytics-toolbox-for-bigquery/sql-reference/h3) in BigQuery SQL to perform a similar conversion.\n" ], "metadata": { "id": "Keg6YkECQGTH" } }, { "cell_type": "code", "source": [ "# compute the cell for each incident's latitude and longtitude\n", "H3_RESOLUTION = 9\n", "gdf_incidents['h3_cell'] = gdf_incidents.geometry.apply(\n", " lambda geom: h3.latlng_to_cell(geom.y, geom.x, H3_RESOLUTION)\n", ")\n", "\n", "# aggregate the incidents for each hexagon cell\n", "count_incidents = gdf_incidents.groupby(by='h3_cell')['unique_key'].count().rename('num_incidents')\n", "\n", "# construct a new geopandas.GeoDataFrame with the aggregate results.\n", "# add the center of each hexagon for the HeatmapLayer to render\n", "gdf_incidents_x_cell = gpd.GeoDataFrame(data=count_incidents).reset_index()\n", "gdf_incidents_x_cell['h3_center'] = gdf_incidents_x_cell['h3_cell'].apply(h3.cell_to_latlng)\n", "gdf_incidents_x_cell.info()" ], "metadata": { "id": "XoOtNy3mbXxo" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "gdf_incidents_x_cell.head()" ], "metadata": { "id": "32ASEitfTBx5" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# convert to a JSON format recognized by the HeatmapLayer\n", "def _make_heatmap_datum(row) -> dict:\n", " return {\n", " \"latitude\": row['h3_center'][0],\n", " \"longitude\": row['h3_center'][1],\n", " \"weight\": float(row['num_incidents']),\n", " }\n", "\n", "heatmap_data = gdf_incidents_x_cell.apply(_make_heatmap_datum, axis='columns').values.tolist()" ], "metadata": { "id": "W0zFasPQotvH" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "The cell below renders a heatmap layer.\n", "\n", "If you wish to customize it, please review the [pydeck.Layer](https://deckgl.readthedocs.io/en/latest/layer.html#pydeck.bindings.layer.Layer) API docs and the [HeatmapLayer](https://deck.gl/docs/api-reference/aggregation-layers/heatmap-layer) docs to discover other parameters." ], "metadata": { "id": "huVU_44fTLRL" } }, { "cell_type": "code", "source": [ "heatmap_layer = pdk.Layer(\n", " \"HeatmapLayer\",\n", " id=\"sfpd_heatmap\",\n", " data=heatmap_data,\n", " get_position=['longitude', 'latitude'],\n", " get_weight='weight',\n", " opacity=0.7,\n", " radius_pixels=99, # this limitation can introduce artifacts (see above)\n", " aggregation='MEAN',\n", ")\n", "view_state = pdk.ViewState(latitude=37.77613, longitude=-122.42284, zoom=12)\n", "display_pydeck_map([heatmap_layer], view_state)" ], "metadata": { "id": "cak45p4WEtmW" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "# Further Reading\n", "\n", "You've learn how to visualize geospatial data stored in BigQuery using `pydeck`.\n", "\n", "If you want to learn more about `pydeck` and other `deck.gl` chart types, please see the `pydeck` [Gallery](https://deckgl.readthedocs.io/en/latest/) for examples. You can also review the `deck.gl` [Layer Catalog](https://deck.gl/docs/api-reference/layers) and [github source](https://github.com/visgl/deck.gl) for other chart types.\n", "\n", "If you want to read more about geospatial data, please read the GeoPandas [Getting Started](https://geopandas.org/en/stable/getting_started.html) page and the [User guide](https://geopandas.org/en/stable/docs/user_guide.html). For geometric object manipulation, please check out the Shapely [user manual](https://shapely.readthedocs.io/en/stable/manual.html).\n", "\n", "You can read more in the BigQuery documentation about [geospatial analytics](https://cloud.google.com/bigquery/docs/geospatial-intro) and [geospatial data visualization with BigQuery](https://cloud.google.com/bigquery/docs/geospatial-visualize).\n", "\n", "To explore using Earth Engine data with BigQuery for visualization, please see Google Earth Engine [Exporting to BigQuery](https://developers.google.com/earth-engine/guides/exporting_to_bigquery). For more details on interactive geospatial analysis and visualization with Google Earth Engine, please see [geemap.org](https://geemap.org/)." ], "metadata": { "id": "nx3E5fczJzOm" } }, { "cell_type": "markdown", "source": [ "## Cleaning up\n", "\n", "If you created a Google Cloud project specifically for this tutorial, you can [delete the project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects).\n", "\n", "If you created a Google Maps Platform API key for this tutorial, it is recommended to [delete the key](https://developers.google.com/maps/api-security-best-practices#deleting-unused-apikeys)." ], "metadata": { "id": "j0GUsZj4oqSk" } }, { "cell_type": "markdown", "source": [ "## Help Google improve Tutorials\n", "\n", "File issues [here](https://github.com/GoogleCloudPlatform/bigquery-utils/issues) on Github.\n", "\n", "Submit feedback [here](https://docs.google.com/forms/d/e/1FAIpQLSdNSDbBM2rohqtqWtIPgKD_14oLc8TPqqdtKD11oqsrtA8_NQ/viewform) via a Google Form." ], "metadata": { "id": "cfoFiYdpunng" } } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }