data-analytics-demos/bigquery-data-governance/colab-enterprise/02-Data-Profile.ipynb

{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "8rRxIQAxABNK" }, "source": [ "### Overview" ] }, { "cell_type": "markdown", "metadata": { "id": "IauvRjuUABNL" }, "source": [ "**Overview**: Data profiling lets you identify common statistical characteristics of the columns in your BigQuery tables. This information helps you to understand and analyze your data more effectively.\n", "\n", "This notebook will gather all the tables in the raw, enriched and curated zones and create a data profile scan on each table. Data profiling lets you identify common statistical characteristics of the columns in your BigQuery tables.\n", "\n", "**Process Flow**:\n", "\n", "1. **Select all the tables** in the raw, enriched, and curated datasets.\n", "\n", "2. **Create a list of dictionaries** to hold the table details and some additional fields to hold the scan name and scan state.\n", "\n", "3. **Set concurrency level.** We will run up to 5 scans at once.\n", "\n", "4. **While all scans are not completed:**\n", "\n", " * **Count the number of scans** currently in the Pending, Running, Unspecified, and Cancelling states.\n", " * **If we are less than our concurrency level (5) then start more scans.**\n", " * When starting the scan, save the scan name and set the state to Unspecified.\n", " * **Count the number of scans** currently in the Pending, Running, Unspecified, and Cancelling states.\n", " * **If zero are running, exit loop.**\n", "\n", "5. **For each successful scan:**\n", "\n", " * **Update the labels** with the associated BigQuery table so the scan will show in the Google Console user interface.\n", "\n", "Notes:\n", "* This notebook runs the scans manually. Typically, you should schedule a scan on a schedule and not worry about processing.\n", "\n", "Cost:\n", "* Approximate cost: Less than a dollar\n", "\n", "Author:\n", "* Adam Paternostro" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Uynoj7r5ABNL" }, "outputs": [], "source": [ "# Architecture Diagram\n", "from IPython.display import Image\n", "Image(url='https://storage.googleapis.com/data-analytics-golden-demo/colab-diagrams/BigQuery-Data-Governance-Data-Profiling.png', width=1200)" ] }, { "cell_type": "markdown", "metadata": { "id": "S7zKuqTpABNL" }, "source": [ "### Video Walkthrough" ] }, { "cell_type": "markdown", "metadata": { "id": "8qRI86K6ABNL" }, "source": [ "[Video](https://storage.googleapis.com/data-analytics-golden-demo/colab-videos/Data-Profile.mp4)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Fc0k4fw4ABNL" }, "outputs": [], "source": [ "from IPython.display import HTML\n", "\n", "HTML(\"\"\"\n", "<video width=\"800\" height=\"600\" controls>\n", " <source src=\"https://storage.googleapis.com/data-analytics-golden-demo/colab-videos/Data-Profile.mp4\" type=\"video/mp4\">\n", " Your browser does not support the video tag.\n", "</video>\n", "\"\"\")" ] }, { "cell_type": "markdown", "metadata": { "id": "HMsUvoF4BP7Y" }, "source": [ "### License" ] }, { "cell_type": "markdown", "metadata": { "id": "jQgQkbOvj55d" }, "source": [ "```\n", "# Copyright 2024 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License.\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "m65vp54BUFRi" }, "source": [ "### Pip installs" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "5MaWM6H5i6rX" }, "outputs": [], "source": [ "# NOTE: All calls in this notebooks are done via REST APIs\n", "\n", "# PIP Installs (if necessary)\n", "import sys\n", "\n", "# !{sys.executable} -m pip install REPLACE-ME" ] }, { "cell_type": "markdown", "metadata": { "id": "UmyL-Rg4Dr_f" }, "source": [ "### Initialize" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "xOYsEVSXp6IP" }, "outputs": [], "source": [ "from PIL import Image\n", "from IPython.display import HTML\n", "import IPython.display\n", "import google.auth\n", "import requests\n", "import json\n", "import uuid\n", "import base64\n", "import os\n", "import cv2\n", "import random\n", "import time\n", "import datetime\n", "import base64\n", "import random\n", "\n", "import logging\n", "from tenacity import retry, wait_exponential, stop_after_attempt, before_sleep_log, retry_if_exception" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "wMlHl3bnkFPZ" }, "outputs": [], "source": [ "# Set these (run this cell to verify the output)\n", "\n", "bigquery_location = \"${bigquery_location}\"\n", "dataplex_region = \"${dataplex_region}\"\n", "\n", "# Get the current date and time\n", "now = datetime.datetime.now()\n", "\n", "# Format the date and time as desired\n", "formatted_date = now.strftime(\"%Y-%m-%d-%H-%M\")\n", "\n", "# Get some values using gcloud\n", "project_id = os.environ[\"GOOGLE_CLOUD_PROJECT\"]\n", "user = !(gcloud auth list --filter=status:ACTIVE --format=\"value(account)\")\n", "\n", "if len(user) != 1:\n", " raise RuntimeError(f\"user is not set: {user}\")\n", "user = user[0]\n", "\n", "print(f\"project_id = {project_id}\")\n", "print(f\"user = {user}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "sZ6m_wGrK0YG" }, "source": [ "### Helper Methods" ] }, { "cell_type": "markdown", "metadata": { "id": "JbOjdSP1kN9T" }, "source": [ "#### restAPIHelper\n", "Calls the Google Cloud REST API using the current users credentials." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "40wlwnY4kM11" }, "outputs": [], "source": [ "def restAPIHelper(url: str, http_verb: str, request_body: str) -> str:\n", " \"\"\"Calls the Google Cloud REST API passing in the current users credentials\"\"\"\n", "\n", " import requests\n", " import google.auth\n", " import json\n", "\n", " # Get an access token based upon the current user\n", " creds, project = google.auth.default()\n", " auth_req = google.auth.transport.requests.Request()\n", " creds.refresh(auth_req)\n", " access_token=creds.token\n", "\n", " headers = {\n", " \"Content-Type\" : \"application/json\",\n", " \"Authorization\" : \"Bearer \" + access_token\n", " }\n", "\n", " if http_verb == \"GET\":\n", " response = requests.get(url, headers=headers)\n", " elif http_verb == \"POST\":\n", " response = requests.post(url, json=request_body, headers=headers)\n", " elif http_verb == \"PUT\":\n", " response = requests.put(url, json=request_body, headers=headers)\n", " elif http_verb == \"PATCH\":\n", " response = requests.patch(url, json=request_body, headers=headers)\n", " elif http_verb == \"DELETE\":\n", " response = requests.delete(url, headers=headers)\n", " else:\n", " raise RuntimeError(f\"Unknown HTTP verb: {http_verb}\")\n", "\n", " if response.status_code == 200:\n", " return json.loads(response.content)\n", " #image_data = json.loads(response.content)[\"predictions\"][0][\"bytesBase64Encoded\"]\n", " else:\n", " error = f\"Error restAPIHelper -> ' Status: '{response.status_code}' Text: '{response.text}'\"\n", " raise RuntimeError(error)" ] }, { "cell_type": "markdown", "metadata": { "id": "ILcRKf-zgsP5" }, "source": [ "#### RunQuery\n", "Runs a BigQuery query." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "HCo0Mr6zgsyn" }, "outputs": [], "source": [ "def RunQuery(sql):\n", " import time\n", " from google.cloud import bigquery\n", " client = bigquery.Client()\n", "\n", " if (sql.startswith(\"SELECT\") or sql.startswith(\"WITH\")):\n", " df_result = client.query(sql).to_dataframe()\n", " return df_result\n", " else:\n", " job_config = bigquery.QueryJobConfig(priority=bigquery.QueryPriority.INTERACTIVE)\n", " query_job = client.query(sql, job_config=job_config)\n", "\n", " # Check on the progress by getting the job's updated state.\n", " query_job = client.get_job(\n", " query_job.job_id, location=query_job.location\n", " )\n", " print(\"Job {} is currently in state {} with error result of {}\".format(query_job.job_id, query_job.state, query_job.error_result))\n", "\n", " while query_job.state != \"DONE\":\n", " time.sleep(2)\n", " query_job = client.get_job(\n", " query_job.job_id, location=query_job.location\n", " )\n", " print(\"Job {} is currently in state {} with error result of {}\".format(query_job.job_id, query_job.state, query_job.error_result))\n", "\n", " if query_job.error_result == None:\n", " return True\n", " else:\n", " raise Exception(query_job.error_result)" ] }, { "cell_type": "markdown", "metadata": { "id": "ZQYL-OEvAkZ7" }, "source": [ "### Data Profile Scan - Helper Methods" ] }, { "cell_type": "markdown", "metadata": { "id": "BVXA0swqCbLn" }, "source": [ "#### existsDataProfileScan\n", "- Tests to see if a Data Profile Scan exists\n", "- Returns True/False" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "XEqmygbrCaYn" }, "outputs": [], "source": [ "def existsDataProfileScan(project_id, dataplex_region, data_profile_scan_name):\n", " \"\"\"Creates the data profile scan.\"\"\"\n", "\n", " # Gather existing data scans\n", " # https://cloud.google.com/dataplex/docs/reference/rest/v1/projects.locations.dataScans/list\n", "\n", " url = f\"https://dataplex.googleapis.com/v1/projects/{project_id}/locations/{dataplex_region}/dataScans\"\n", "\n", " # Gather existing data scans\n", " json_result = restAPIHelper(url, \"GET\", None)\n", " print(f\"createDataDocumentScan (GET) json_result: {json_result}\")\n", "\n", " # Test to see if data scan exists, if so return\n", " if \"dataScans\" in json_result:\n", " for item in json_result[\"dataScans\"]:\n", " # print(f\"Scan names: {item['name']}\")\n", " if item[\"name\"] == f\"projects/{project_id}/locations/{dataplex_region}/dataScans/{data_profile_scan_name}\":\n", " print(f\"Data Document Scan {data_profile_scan_name} already exists\")\n", " return True\n", "\n", " return False" ] }, { "cell_type": "markdown", "metadata": { "id": "D-k_P6NwAvT5" }, "source": [ "#### createDataProfileScan\n", "- Creates a scan, but does not run it\n", "- If the scan exists, the does nothing" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "GCfNKo1TAkBE" }, "outputs": [], "source": [ "def createDataProfileScan(project_id, dataplex_region, data_profile_scan_name, data_profile_display_name, bigquery_dataset_name, bigquery_table_name):\n", " \"\"\"Creates the data profile scan.\"\"\"\n", "\n", " if existsDataProfileScan(project_id, dataplex_region, data_profile_scan_name) == False:\n", " # Create a new scan\n", " # https://cloud.google.com/dataplex/docs/reference/rest/v1/projects.locations.dataScans/create\n", " print(\"Creating Data Profile Scan\")\n", "\n", " url = f\"https://dataplex.googleapis.com/v1/projects/{project_id}/locations/{dataplex_region}/dataScans?dataScanId={data_profile_scan_name}\"\n", "\n", " request_body = {\n", " \"dataProfileSpec\": { \"samplingPercent\": 25 },\n", " \"data\": { \"resource\": f\"//bigquery.googleapis.com/projects/{project_id}/datasets/{bigquery_dataset_name}/tables/{bigquery_table_name}\" },\n", " \"description\": data_profile_display_name,\n", " \"displayName\": data_profile_display_name\n", " }\n", "\n", " json_result = restAPIHelper(url, \"POST\", request_body)\n", "\n", " name = json_result[\"metadata\"][\"target\"]\n", " print(f\"Data Profile Scan created: {name}\")\n", " else:\n", " print(f\"Data Profile Scan exists: projects/{project_id}/locations/{dataplex_region}/dataScans/{data_profile_scan_name}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "q7LBwnD6A2pM" }, "source": [ "#### startDataProfileScan\n", "- Starts a data profile scan (async)\n", "- Returns the \"job name\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "C7Xdx8bKAj-a" }, "outputs": [], "source": [ "def startDataProfileScan(project_id, dataplex_region, data_profile_scan_name):\n", " \"\"\"Runs the data profile scan job and monitors until it completes\"\"\"\n", "\n", " # Create a new scan\n", " # https://cloud.google.com/dataplex/docs/reference/rest/v1/projects.locations.dataScans/run\n", " print(\"Running Data Profile Scan\")\n", "\n", " url = f\"https://dataplex.googleapis.com/v1/projects/{project_id}/locations/{dataplex_region}/dataScans/{data_profile_scan_name}:run\"\n", "\n", " request_body = { }\n", "\n", " json_result = restAPIHelper(url, \"POST\", request_body)\n", " job_name = json_result[\"job\"][\"name\"]\n", " job_state = json_result[\"job\"][\"state\"]\n", " print(f\"Document Data Scan Run created: {job_name} - State: {job_state}\")\n", "\n", " return job_name\n" ] }, { "cell_type": "markdown", "metadata": { "id": "X8db9qIHBOO8" }, "source": [ "#### getStateDataProfileScan\n", "- Gets the state of a scan (to see if it is done)\n", "- Returns the \"state\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "JhHrg0lcAj7r" }, "outputs": [], "source": [ "def getStateDataProfileScan(project_id, dataplex_region, data_profile_scan_job_name):\n", " \"\"\"Runs the data profile scan job and monitors until it completes\"\"\"\n", "\n", " # Gets the \"state\" of a scan\n", " url = f\"https://dataplex.googleapis.com/v1/{data_profile_scan_job_name}\"\n", " json_result = restAPIHelper(url, \"GET\", None)\n", " return json_result[\"state\"]\n", " #== \"STATE_UNSPECIFIED\" or json_result[\"state\"] == \"RUNNING\" or json_result[\"state\"] == \"PENDING\":\n" ] }, { "cell_type": "markdown", "metadata": { "id": "Kd51jAhcZBUm" }, "source": [ "#### updateBigQueryTableDataplexLabels\n", "- Patches the BigQuery table so that we associate the a Dataplex item with the BigQuery table so you see it in the UI\n", "- Returns nothing" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "q0D8cTWCZBpM" }, "outputs": [], "source": [ "def updateBigQueryTableDataplexLabels(project_id, dataplex_region, dataplex_asset_type, dataplex_asset_scan_name, bigquery_dataset_name, bigquery_table_name):\n", " \"\"\"Sets the labels on the BigQuery table so users can see the data profile in the Console.\"\"\"\n", "\n", " # Patch BigQuery\n", " # https://cloud.google.com/dataplex/docs/reference/rest/v1/projects.locations.dataScans/create\n", " print(\"Patching BigQuery Dataplex Labels\")\n", "\n", " url = f\"https://bigquery.googleapis.com/bigquery/v2/projects/{project_id}/datasets/{bigquery_dataset_name}/tables/{bigquery_table_name}\"\n", "\n", " request_body = {}\n", " if dataplex_asset_type == \"DATA-PROFILE-SCAN\":\n", " request_body = {\n", " \"labels\" : {\n", " \"dataplex-dp-published-project\" : project_id,\n", " \"dataplex-dp-published-location\" : dataplex_region,\n", " \"dataplex-dp-published-scan\" : dataplex_asset_scan_name,\n", " }\n", " }\n", " elif dataplex_asset_type == \"DATA-INSIGHTS-SCAN\":\n", " request_body = {\n", " \"labels\" : {\n", " \"dataplex-data-documentation-project\" : project_id,\n", " \"dataplex-data-documentation-location\" : dataplex_region,\n", " \"dataplex-data-documentation-scan\" : dataplex_asset_scan_name,\n", " }\n", " }\n", " else:\n", " raise Exception(f\"Unknown dataplex_asset_type of {dataplex_asset_type}\")\n", "\n", " json_result = restAPIHelper(url, \"PATCH\", request_body)\n", " print(json_result)" ] }, { "cell_type": "markdown", "metadata": { "id": "QCFEfpdmEydA" }, "source": [ "### Run Data Profile Scan - Algorithm" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "q702fsShe5vH" }, "outputs": [], "source": [ "# Get all the tables we want to scan for each dataset\n", "scans_to_perform = []\n", "dataset_list = [\"${bigquery_governed_data_raw_dataset}\",\"${bigquery_governed_data_enriched_dataset}\",\"${bigquery_governed_data_curated_dataset}\"]\n", "\n", "sql = \"\"\n", "for dataset_name in dataset_list:\n", " sql += f\"SELECT table_schema, table_name, table_type from `{dataset_name}.INFORMATION_SCHEMA.TABLES` UNION ALL \"\n", "\n", "# Remove training union all\n", "sql = sql.rstrip(\" UNION ALL \")\n", "\n", "result_df = RunQuery(sql)\n", "table_list = []\n", "\n", "# data_profile_scan_name: \"Field data_scan_id must contain only lowercase letters, numbers, and/or hyphens\n", "for index, row in result_df.iterrows():\n", " item = {\n", " \"project_id\": project_id,\n", " \"dataplex_region\": dataplex_region,\n", " \"data_profile_scan_name\": f\"{row['table_schema']}-{row['table_name']}-profile-scan\".lower().replace(\"_\",\"-\"),\n", " \"data_profile_display_name\": f\"{row['table_schema']}-{row['table_name']} profile scan\",\n", " \"bigquery_dataset_name\": row['table_schema'],\n", " \"bigquery_table_name\": row['table_name'],\n", "\n", " # Used by below loop for processing\n", " \"data_profile_scan_state\": \"\",\n", " \"data_profile_scan_job_name\": \"\"\n", " }\n", " scans_to_perform.append(item)\n", "\n", " print(f\"item: {item}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ljbISsPbDk4l" }, "outputs": [], "source": [ "# Run the scans (up to a certain concurrency level)\n", "numberOfScansToRunConcurrently = 5\n", "\n", "while True:\n", " # Count the number of scans that are running\n", " concurrentScanCount = 0\n", " for item in scans_to_perform:\n", " if item[\"data_profile_scan_state\"] == \"PENDING\" or \\\n", " item[\"data_profile_scan_state\"] == \"STATE_UNSPECIFIED\" or \\\n", " item[\"data_profile_scan_state\"] == \"RUNNING\" or \\\n", " item[\"data_profile_scan_state\"] == \"CANCELING\":\n", " # Update our count\n", " print(f\"Concurrent Scan Count: {item['bigquery_dataset_name']}.{item['bigquery_table_name']} -> {item['data_profile_scan_state']}\")\n", " concurrentScanCount += 1\n", " else:\n", " print(f\"Concurrent Scan Count: {item['bigquery_dataset_name']}.{item['bigquery_table_name']} -> {item['data_profile_scan_state']}\")\n", "\n", "\n", " print(f\"concurrentScanCount: {concurrentScanCount}\")\n", "\n", " # Start new scans under our concurrency count\n", " scansStarted = -1\n", " while concurrentScanCount < numberOfScansToRunConcurrently and scansStarted != 0:\n", " # Start new scans up to the concurrency limit\n", " scansStarted = 0\n", " for item in scans_to_perform:\n", " if concurrentScanCount < numberOfScansToRunConcurrently and \\\n", " item[\"data_profile_scan_state\"] == \"\":\n", " # start a new scan\n", " createDataProfileScan(item[\"project_id\"], item[\"dataplex_region\"],\n", " item[\"data_profile_scan_name\"], item[\"data_profile_display_name\"],\n", " item[\"bigquery_dataset_name\"], item[\"bigquery_table_name\"])\n", "\n", " started = False\n", " item[\"data_profile_scan_job_name\"] = \"\"\n", " while started == False:\n", " try:\n", " item[\"data_profile_scan_job_name\"] = startDataProfileScan(item[\"project_id\"], item[\"dataplex_region\"], item[\"data_profile_scan_name\"])\n", " item[\"data_profile_scan_state\"] = \"STATE_UNSPECIFIED\"\n", " started = True\n", " scansStarted += 1\n", " concurrentScanCount += 1\n", " except Exception as e:\n", " scan_full_name = f'projects/{item[\"project_id\"]}/locations/{item[\"dataplex_region\"]}/dataScans/{item[\"data_profile_scan_name\"]}'\n", " message = f\"Provided DataScan '{scan_full_name}' does not exist.\"\n", " print(message)\n", " if message in str(e):\n", " print(f\"Data scan is not available to start. Waiting...\")\n", " time.sleep(5)\n", " else:\n", " raise e # Re-raise the exception for other errors\n", "\n", "\n", " # Update the status for the scans that are processing\n", " for item in scans_to_perform:\n", " if item[\"data_profile_scan_state\"] == \"PENDING\" or \\\n", " item[\"data_profile_scan_state\"] == \"STATE_UNSPECIFIED\" or \\\n", " item[\"data_profile_scan_state\"] == \"RUNNING\" or \\\n", " item[\"data_profile_scan_state\"] == \"CANCELING\":\n", " # Get the latest state\n", " item[\"data_profile_scan_state\"] = getStateDataProfileScan(item[\"project_id\"], item[\"dataplex_region\"], item[\"data_profile_scan_job_name\"])\n", "\n", " if concurrentScanCount == 0:\n", " # nothing processing- exit\n", " break\n", " else:\n", " # wait for processing\n", " print(f\"concurrentScanCount: {concurrentScanCount}\")\n", " time.sleep(10)\n", "\n", "# Update the BigQuery labels so our scans show in the Console UI\n", "for item in scans_to_perform:\n", " if item[\"data_profile_scan_state\"] == \"SUCCEEDED\":\n", " # skip CANCELLED or FAILED states\n", " updateBigQueryTableDataplexLabels(item[\"project_id\"], item[\"dataplex_region\"],\n", " \"DATA-PROFILE-SCAN\", item[\"data_profile_scan_name\"],\n", " item[\"bigquery_dataset_name\"], item[\"bigquery_table_name\"])\n", "\n", " print(f\"Associated scan for table {item['bigquery_dataset_name']}.{item['bigquery_table_name']} associated with BigQuery Console UI.\")\n" ] }, { "cell_type": "markdown", "metadata": { "id": "42IxhtRRrvR-" }, "source": [ "### Clean Up" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "6lF2Z7skFbvf" }, "outputs": [], "source": [ "# Placeholder (you would need to un-patch the BigQuery tables and delete the scans)\n", "print(f\"You can delete scans here: https://console.cloud.google.com/dataplex/govern/profile?project={project_id}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "ASQ2BPisXDA0" }, "source": [ "### Reference Links\n" ] }, { "cell_type": "markdown", "metadata": { "id": "rTY6xJdZ3ul8" }, "source": [ "- [REPLACE-ME](https://REPLACE-ME)" ] } ], "metadata": { "colab": { "collapsed_sections": [ "8rRxIQAxABNK", "S7zKuqTpABNL", "HMsUvoF4BP7Y", "m65vp54BUFRi", "UmyL-Rg4Dr_f", "sZ6m_wGrK0YG", "JbOjdSP1kN9T", "42IxhtRRrvR-", "ASQ2BPisXDA0" ], "name": "01-Data-Profile", "private_outputs": true, "provenance": [] }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 0 }

data-analytics-demos/bigquery-data-governance/colab-enterprise/02-Data-Profile.ipynb (761 lines of code) (raw):