data-analytics-demos/bigquery-data-governance/colab-enterprise/04-Data-Governance.ipynb (3,513 lines of code) (raw):

{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "B3uS1mXQQiRS" }, "source": [ "### <font color='#4285f4'>Overview</font>" ] }, { "cell_type": "markdown", "metadata": { "id": "8eHuX1ihQiRS" }, "source": [ "**Overview**: This notebook will take you step by step through on how to create Entry Groups, Aspect Types, Entry Types and Entry(s). You will then setup governance on a series of entries both custom and system (BigQuery) specifying the values for your aspects.\n", "\n", "You can then search for tables, Pub/Sub, analytics hub, Vertex models and more by the different aspect types, etc.\n", "\n", "**Process Flow**:\n", "1. **Create helper methods:**\n", " * A method to check for the existence of an item (to avoid recreation on re-runs).\n", " * A method to create each artifact.\n", "\n", "2. **Create custom artifacts:**\n", " * Entry Group\n", " * Aspect Type\n", " * Entry Type (containing Aspect Type(s))\n", " * Entry (containing an Entry Type, placed within an Entry Group)\n", "\n", "3. **Associate an Entry Type** with a BigQuery table.\n", "\n", "4. **Update the table overview and contacts (roles)** on a BigQuery table.\n", "\n", "5. **Create Aspect Types and Entry Types** to associate with each table and column in our BigQuery tables.\n", "\n", "6. **Apply to all zones:**\n", " * Raw\n", " * Enriched\n", " * Curated\n", "\n", "7. **Update the BigQuery overview and contacts.**\n", "\n", "Notes:\n", "* This notebook uses REST API calls to create the Entry Groups, Aspect Types, Entry Types and Entry(s). You can also do this in Terraform. Please see the sample code provided in the notebook.\n", "* If you get the ERROR, cannot write to BigQuery during the PATCH command (updateDataplexSystemEntry_BigQueryTable), you might need to wait several hours before the cell will work for new projects.\n", " ```\n", " {\n", " \"error\": {\n", " \"code\": 403,\n", " \"message\": \"Write access to project 'xxx' was denied: If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.\",\n", " \"status\": \"PERMISSION_DENIED\"\n", " }\n", " }\n", " ```\n", "\n", "Cost:\n", "* Approximate cost: Less than a dollar\n", "\n", "Author:\n", "* Adam Paternostro" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "kRk6Ns_YQiRS" }, "outputs": [], "source": [ "# Architecture Diagram\n", "from IPython.display import Image\n", "Image(url='https://storage.googleapis.com/data-analytics-golden-demo/colab-diagrams/BigQuery-Data-Governance-Data-Governance.png', width=1200)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Architecture Diagram\n", "from IPython.display import Image\n", "Image(url='https://storage.googleapis.com/data-analytics-golden-demo/colab-diagrams/BigQuery-Data-Governance-Data-Governance-Arch.png', width=1200)" ] }, { "cell_type": "markdown", "metadata": { "id": "1kLXxGaxQiRT" }, "source": [ "### <font color='#4285f4'>Video Walkthrough</font>" ] }, { "cell_type": "markdown", "metadata": { "id": "7DfZ4jnQQiRT" }, "source": [ "[Video](https://storage.googleapis.com/data-analytics-golden-demo/colab-videos/Data-Governance.mp4)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "o_vBdpR7QiRT" }, "outputs": [], "source": [ "from IPython.display import HTML\n", "\n", "HTML(\"\"\"\n", "<video width=\"800\" height=\"600\" controls>\n", " <source src=\"https://storage.googleapis.com/data-analytics-golden-demo/colab-videos/Data-Governance.mp4\" type=\"video/mp4\">\n", " Your browser does not support the video tag.\n", "</video>\n", "\"\"\")" ] }, { "cell_type": "markdown", "metadata": { "id": "HMsUvoF4BP7Y" }, "source": [ "### <font color='#4285f4'>License</font>" ] }, { "cell_type": "markdown", "metadata": { "id": "jQgQkbOvj55d" }, "source": [ "```\n", "# Copyright 2024 Google LLC\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# https://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License.\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "m65vp54BUFRi" }, "source": [ "### <font color='#4285f4'>Pip installs</font>" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "5MaWM6H5i6rX" }, "outputs": [], "source": [ "# PIP Installs (if necessary)\n", "import sys\n", "\n", "# !{sys.executable} -m pip install REPLACE-ME" ] }, { "cell_type": "markdown", "metadata": { "id": "UmyL-Rg4Dr_f" }, "source": [ "### <font color='#4285f4'>Initialize</font>" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "xOYsEVSXp6IP" }, "outputs": [], "source": [ "from PIL import Image\n", "from IPython.display import HTML\n", "import IPython.display\n", "import google.auth\n", "import requests\n", "import json\n", "import uuid\n", "import base64\n", "import os\n", "import cv2\n", "import random\n", "import time\n", "import datetime\n", "import base64\n", "import random\n", "\n", "import logging\n", "from tenacity import retry, wait_exponential, stop_after_attempt, before_sleep_log, retry_if_exception" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "wMlHl3bnkFPZ" }, "outputs": [], "source": [ "# Set these (run this cell to verify the output)\n", "\n", "bigquery_location = \"${bigquery_location}\"\n", "region = \"${dataplex_region}\"\n", "location = \"${location}\"\n", "random_extension=\"${random_extension}\"\n", "\n", "# Get the current date and time\n", "now = datetime.datetime.now()\n", "\n", "# Format the date and time as desired\n", "formatted_date = now.strftime(\"%Y-%m-%d-%H-%M\")\n", "\n", "# Get some values using gcloud\n", "project_id = os.environ[\"GOOGLE_CLOUD_PROJECT\"]\n", "user = !(gcloud auth list --filter=status:ACTIVE --format=\"value(account)\")\n", "\n", "if len(user) != 1:\n", " raise RuntimeError(f\"user is not set: {user}\")\n", "user = user[0]\n", "\n", "print(f\"project_id = {project_id}\")\n", "print(f\"user = {user}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "sZ6m_wGrK0YG" }, "source": [ "### <font color='#4285f4'>Helper Methods</font>" ] }, { "cell_type": "markdown", "metadata": { "id": "JbOjdSP1kN9T" }, "source": [ "#### restAPIHelper\n", "Calls the Google Cloud REST API using the current users credentials." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "40wlwnY4kM11" }, "outputs": [], "source": [ "def restAPIHelper(url: str, http_verb: str, request_body: str) -> str:\n", " \"\"\"Calls the Google Cloud REST API passing in the current users credentials\"\"\"\n", "\n", " import requests\n", " import google.auth\n", " import json\n", "\n", " # Get an access token based upon the current user\n", " creds, project = google.auth.default()\n", " auth_req = google.auth.transport.requests.Request()\n", " creds.refresh(auth_req)\n", " access_token=creds.token\n", "\n", " headers = {\n", " \"Content-Type\" : \"application/json\",\n", " \"Authorization\" : \"Bearer \" + access_token\n", " }\n", "\n", " if http_verb == \"GET\":\n", " response = requests.get(url, headers=headers)\n", " elif http_verb == \"POST\":\n", " response = requests.post(url, json=request_body, headers=headers)\n", " elif http_verb == \"PUT\":\n", " response = requests.put(url, json=request_body, headers=headers)\n", " elif http_verb == \"PATCH\":\n", " response = requests.patch(url, json=request_body, headers=headers)\n", " elif http_verb == \"DELETE\":\n", " response = requests.delete(url, headers=headers)\n", " else:\n", " raise RuntimeError(f\"Unknown HTTP verb: {http_verb}\")\n", "\n", " if response.status_code == 200:\n", " return json.loads(response.content)\n", " #image_data = json.loads(response.content)[\"predictions\"][0][\"bytesBase64Encoded\"]\n", " else:\n", " error = f\"Error restAPIHelper -> ' Status: '{response.status_code}' Text: '{response.text}'\"\n", " raise RuntimeError(error)" ] }, { "cell_type": "markdown", "metadata": { "id": "bI-KJELZ1jgt" }, "source": [ "#### RunQuery\n", "Runs a query in BigQuery" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "pmnCwYvA1kZv" }, "outputs": [], "source": [ "def RunQuery(sql):\n", " import time\n", " from google.cloud import bigquery\n", " client = bigquery.Client()\n", "\n", " if (sql.startswith(\"SELECT\") or sql.startswith(\"WITH\")):\n", " df_result = client.query(sql).to_dataframe()\n", " return df_result\n", " else:\n", " job_config = bigquery.QueryJobConfig(priority=bigquery.QueryPriority.INTERACTIVE)\n", " query_job = client.query(sql, job_config=job_config)\n", "\n", " # Check on the progress by getting the job's updated state.\n", " query_job = client.get_job(\n", " query_job.job_id, location=query_job.location\n", " )\n", " print(\"Job {} is currently in state {} with error result of {}\".format(query_job.job_id, query_job.state, query_job.error_result))\n", "\n", " while query_job.state != \"DONE\":\n", " time.sleep(2)\n", " query_job = client.get_job(\n", " query_job.job_id, location=query_job.location\n", " )\n", " print(\"Job {} is currently in state {} with error result of {}\".format(query_job.job_id, query_job.state, query_job.error_result))\n", "\n", " if query_job.error_result == None:\n", " return True\n", " else:\n", " raise Exception(query_job.error_result)" ] }, { "cell_type": "markdown", "metadata": { "id": "c51M89g0Ejmz" }, "source": [ "### <font color='#4285f4'>Entry Group - Helper Methods</font>" ] }, { "cell_type": "markdown", "metadata": { "id": "BX5YIUdQXnFo" }, "source": [ "#### existsEntryGroup\n", "- Tests to see if a Entry Group exists\n", "- Returns True/False" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "U5DzEgSSVgX2" }, "outputs": [], "source": [ "def existsEntryGroup(project_id, entryGroupId, entryGroupLocation):\n", " \"\"\"Checks to see if an Entry Group already exists\"\"\"\n", "\n", " # https://cloud.google.com/dataplex/docs/reference/rest/v1/projects.locations.entryGroups/list\n", " url = f\"https://dataplex.googleapis.com/v1/projects/{project_id}/locations/{entryGroupLocation}/entryGroups\"\n", "\n", " json_result = restAPIHelper(url, \"GET\", None)\n", " print(f\"existsEntryGroup (GET) json_result: {json_result}\")\n", "\n", " # Test to see if exists, if so return\n", " if \"entryGroups\" in json_result:\n", " for item in json_result[\"entryGroups\"]:\n", " # print(f\"Name: {item['name']}\")\n", " if item[\"name\"] == f\"projects/{project_id}/locations/{entryGroupLocation}/entryGroups/{entryGroupId}\":\n", " # print(f\"Entry Group {entryGroupId} already exists\")\n", " return True\n", "\n", " return False" ] }, { "cell_type": "markdown", "metadata": { "id": "jzJpXz7RXn_d" }, "source": [ "#### createEntryGroup\n", "- Creates an Entry Group if it does not exist" ] }, { "cell_type": "markdown", "metadata": { "id": "yCVwcsfioCwh" }, "source": [ "##### Sample Terraform" ] }, { "cell_type": "markdown", "metadata": { "id": "Qr4yLXSCnXn-" }, "source": [ "```\n", "resource \"google_dataplex_entry_group\" \"my_entry_group\" {\n", " project = var.project_id\n", " entry_group_id = \"my-entry-group\"\n", " location = \"global\"\n", "\n", " labels = { \"tag\": \"test-tf\" }\n", " display_name = \"My Entry Group (entry group)\"\n", " description = \"Entry group used for testing\"\n", "}\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "cNEY08D8oELb" }, "source": [ "##### Sample REST API Code" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "VyrxLDN3ThgI" }, "outputs": [], "source": [ "def createEntryGroup(project_id, entryGroupId, entryGroupLocation, entryGroupName, entryGroupDescription):\n", " \"\"\"Creates an Entry Group if it does not exist\"\"\"\n", "\n", " if existsEntryGroup(project_id, entryGroupId, entryGroupLocation) == False:\n", " # https://cloud.google.com/dataplex/docs/reference/rest/v1/projects.locations.entryGroups/create\n", " url = f\"https://dataplex.googleapis.com/v1/projects/{project_id}/locations/{entryGroupLocation}/entryGroups?entryGroupId={entryGroupId}\"\n", "\n", " data = {\n", " \"displayName\": entryGroupName,\n", " \"description\": entryGroupDescription\n", " }\n", "\n", " json_result = restAPIHelper(url, \"POST\", data)\n", " print(f\"createEntryGroup (POST) json_result: {json_result}\")\n", " else:\n", " print(f\"createEntryGroup (POST) Entry Group {entryGroupId} already exists\")\n" ] }, { "cell_type": "markdown", "metadata": { "id": "nDgg1kttoOQf" }, "source": [ "### <font color='#4285f4'>Aspect Type - Helper Methods</font>" ] }, { "cell_type": "markdown", "metadata": { "id": "toHr6QPYba6v" }, "source": [ "#### existsAspectType\n", "- Tests to see if a Aspect Type exists\n", "- Returns True/False" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "9Eb-wvOKbaWi" }, "outputs": [], "source": [ "def existsAspectType(project_id, aspectTypeId, aspectTypeLocation):\n", " \"\"\"Checks to see if an Entry Group already exists\"\"\"\n", "\n", " # https://cloud.google.com/dataplex/docs/reference/rest/v1/projects.locations.aspectTypes/list\n", " url = f\"https://dataplex.googleapis.com/v1/projects/{project_id}/locations/{aspectTypeLocation}/aspectTypes\"\n", "\n", " json_result = restAPIHelper(url, \"GET\", None)\n", " print(f\"existsAspectType (GET) json_result: {json_result}\")\n", "\n", " # Test to see if exists, if so return\n", " if \"aspectTypes\" in json_result:\n", " for item in json_result[\"aspectTypes\"]:\n", " print(f\"Name: {item['name']}\")\n", " if item[\"name\"] == f\"projects/{project_id}/locations/{aspectTypeLocation}/aspectTypes/{aspectTypeId}\":\n", " # print(f\"Aspect Type {aspectTypeId} already exists\")\n", " return True\n", "\n", " return False" ] }, { "cell_type": "markdown", "metadata": { "id": "4kGKhAEikUwP" }, "source": [ "#### createAspectType\n", "- Creates an Aspect Type if it does not exist" ] }, { "cell_type": "markdown", "metadata": { "id": "D0H6CgTZntFy" }, "source": [ "##### Sample Terraform" ] }, { "cell_type": "markdown", "metadata": { "id": "1Ntm2xl2nk4u" }, "source": [ "```\n", "resource \"google_dataplex_aspect_type\" \"my_aspect_type\" {\n", " project = var.project_id \n", " aspect_type_id = \"my-aspect-type\"\n", " location = \"us\"\n", "\n", " labels = { \"tag\": \"test-tf\" }\n", " display_name = \"My Aspect Type (aspect type)\"\n", " description = \"PII data aspect type\"\n", " metadata_template = <<EOF\n", "{\n", " \"name\": \"tf-test-template\",\n", " \"type\": \"record\",\n", " \"recordFields\": [\n", " {\n", " \"name\": \"type\",\n", " \"type\": \"enum\",\n", " \"annotations\": {\n", " \"displayName\": \"Type\",\n", " \"description\": \"Specifies the type of view represented by the entry.\"\n", " },\n", " \"index\": 1,\n", " \"constraints\": {\n", " \"required\": true\n", " },\n", " \"enumValues\": [\n", " {\n", " \"name\": \"FILE\",\n", " \"index\": 1\n", " }\n", " ]\n", " }\n", " ]\n", "}\n", "EOF\n", "}\n", "\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "dJArMlOjn5AF" }, "source": [ "##### REST API Code" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "iG2PU5sDkXpq" }, "outputs": [], "source": [ "def createAspectType(project_id, aspectTypeId, aspectTypeLocation, aspectTypeName, aspectTypeDescription, metadataTemplate):\n", " \"\"\"Creates an Aspect Type if it does not exist\"\"\"\n", "\n", " if existsAspectType(project_id, aspectTypeId, aspectTypeLocation) == False:\n", " # https://cloud.google.com/dataplex/docs/reference/rest/v1/projects.locations.aspectTypes/create\n", " url = f\"https://dataplex.googleapis.com/v1/projects/{project_id}/locations/{aspectTypeLocation}/aspectTypes?aspectTypeId={aspectTypeId}\"\n", "\n", " data = {\n", " \"displayName\": aspectTypeName,\n", " \"description\": aspectTypeDescription,\n", " \"metadataTemplate\": metadataTemplate\n", " }\n", "\n", " json_result = restAPIHelper(url, \"POST\", data)\n", " print(f\"createAspectType (POST) json_result: {json_result}\")\n", " else:\n", " print(f\"createAspectType (POST) Aspect Type {aspectTypeId} already exists\")\n" ] }, { "cell_type": "markdown", "metadata": { "id": "s8ZF1ccNoiB_" }, "source": [ "### <font color='#4285f4'>Entry Type - Helper Methods</font>" ] }, { "cell_type": "markdown", "metadata": { "id": "kEUh1istoiB_" }, "source": [ "#### existsEntryType\n", "- Tests to see if a Entry Type exists\n", "- Returns True/False" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "SD_9IMaAoiB_" }, "outputs": [], "source": [ "def existsEntryType(project_id, entryTypeId, entryTypeLocation):\n", " \"\"\"Checks to see if an Entry Type already exists\"\"\"\n", "\n", " # https://cloud.google.com/dataplex/docs/reference/rest/v1/projects.locations.entryTypes/list\n", " url = f\"https://dataplex.googleapis.com/v1/projects/{project_id}/locations/{entryTypeLocation}/entryTypes\"\n", "\n", " json_result = restAPIHelper(url, \"GET\", None)\n", " print(f\"existsAspectType (GET) json_result: {json_result}\")\n", "\n", " # Test to see if exists, if so return\n", " if \"entryTypes\" in json_result:\n", " for item in json_result[\"entryTypes\"]:\n", " print(f\"Name: {item['name']}\")\n", " if item[\"name\"] == f\"projects/{project_id}/locations/{entryTypeLocation}/entryTypes/{entryTypeId}\":\n", " # print(f\"Entry Type {entryTypeId} already exists\")\n", " return True\n", "\n", " return False" ] }, { "cell_type": "markdown", "metadata": { "id": "DsdRV9-3oiCA" }, "source": [ "#### createEntryType\n", "- Creates an Entry Type if it does not exist" ] }, { "cell_type": "markdown", "metadata": { "id": "eyYp2SbMoiCA" }, "source": [ "##### Sample Terraform" ] }, { "cell_type": "markdown", "metadata": { "id": "VSLi6GsloiCA" }, "source": [ "```\n", "resource \"google_dataplex_entry_type\" \"my_entry_type\" {\n", " project = var.project_id\n", " entry_type_id = \"my-entry-type\"\n", " location = \"us\"\n", "\n", " labels = { \"tag\": \"test-tf\" }\n", " display_name = \"My Entry Type (entry type)\"\n", " description = \"My Entry Type entry type\"\n", "\n", " type_aliases = [\"TABLE\", \"DATABASE\"]\n", " platform = \"GCS\"\n", " system = \"BigQuery\"\n", "\n", " required_aspects {\n", " type = google_dataplex_aspect_type.my_aspect_type.name\n", " }\n", "\n", " depends_on = [google_dataplex_aspect_type.my_aspect_type]\n", "}\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "_bAPoQe4oiCA" }, "source": [ "##### REST API Code" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "V2XLxQzfoiCA" }, "outputs": [], "source": [ "def createEntryType(project_id, entryTypeId, entryTypeLocation, entryTypeName, entryTypeDescription, type_aliases, platform, system, requiredAspects):\n", " \"\"\"Creates an Entry Type if it does not exist\"\"\"\n", "\n", " if existsEntryType(project_id, entryTypeId, entryTypeLocation) == False:\n", " # https://cloud.google.com/dataplex/docs/reference/rest/v1/projects.locations.entryTypes/create\n", " url = f\"https://dataplex.googleapis.com/v1/projects/{project_id}/locations/{entryTypeLocation}/entryTypes?entryTypeId={entryTypeId}\"\n", "\n", " data = {\n", " \"displayName\": entryTypeName,\n", " \"description\": entryTypeDescription,\n", " \"type_aliases\": type_aliases,\n", " \"platform\": platform,\n", " \"system\": system,\n", " \"requiredAspects\": requiredAspects,\n", " }\n", "\n", " json_result = restAPIHelper(url, \"POST\", data)\n", " print(f\"createEntryType (POST) json_result: {json_result}\")\n", " else:\n", " print(f\"createEntryType (POST) Entry Type {entryTypeId} already exists\")\n" ] }, { "cell_type": "markdown", "metadata": { "id": "3v7xlNiur1Wo" }, "source": [ "### <font color='#4285f4'>Entry (**Custom**) - Helper Methods</font>" ] }, { "cell_type": "markdown", "metadata": { "id": "0gDcerBWr1Wo" }, "source": [ "#### existsEntry\n", "- Tests to see if a Entry exists\n", "- Returns True/False" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "tOBEISUyr1Wp" }, "outputs": [], "source": [ "def existsEntry(project_id, entryGroupId, entryGroupLocation, entryId):\n", " \"\"\"Checks to see if an Entry already exists\"\"\"\n", "\n", " # https://cloud.google.com/dataplex/docs/reference/rest/v1/projects.locations.entryGroups.entries/list\n", " url = f\"https://dataplex.googleapis.com/v1/projects/{project_id}/locations/{entryGroupLocation}/entryGroups/{entryGroupId}/entries\"\n", "\n", " json_result = restAPIHelper(url, \"GET\", None)\n", " print(f\"existsAspectType (GET) json_result: {json_result}\")\n", "\n", " # Test to see if exists, if so return\n", " if \"entries\" in json_result:\n", " for item in json_result[\"entries\"]:\n", " print(f\"Name: {item['name']}\")\n", " if item[\"name\"] == f\"projects/{project_id}/locations/{entryGroupLocation}/entryGroups/{entryGroupId}/entries/{entryId}\":\n", " print(f\"Entry {entryId} already exists in Entry Group {entryGroupId}\")\n", " return True\n", "\n", " return False" ] }, { "cell_type": "markdown", "metadata": { "id": "yU0KlcFhr1Wp" }, "source": [ "#### createEntry\n", "- Creates an Entry if it does not exist" ] }, { "cell_type": "markdown", "metadata": { "id": "ZEhXhXxlr1Wp" }, "source": [ "##### Sample Terraform" ] }, { "cell_type": "markdown", "metadata": { "id": "9GDQ-gMVr1Wp" }, "source": [ "```\n", "Terraform currently not available\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "vRBXTRxPr1Wp" }, "source": [ "##### REST API Code" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "C5eG4PCZr1Wp" }, "outputs": [], "source": [ "def createEntry(project_id,\n", " entryGroupId, entryGroupLocation,\n", " entryTypeId, entryTypeLocation,\n", " entryId, entryName, entryDescription, system, aspects):\n", " \"\"\"Creates an Entry (custom) if it does not exist\"\"\"\n", "\n", " if existsEntry(project_id, entryGroupId, entryGroupLocation, entryId) == False:\n", " # https://cloud.google.com/dataplex/docs/reference/rest/v1/projects.locations.entryGroups.entries/create\n", " url = f\"https://dataplex.googleapis.com/v1/projects/{project_id}/locations/{entryGroupLocation}/entryGroups/{entryGroupId}/entries?entryId={entryId}\"\n", "\n", " data = {\n", " \"entrySource\": {\n", " \"displayName\": entryName,\n", " \"description\": entryDescription,\n", " \"system\": system # This allow us to then search for \"Custom\" Entries\n", " },\n", " \"entryType\": f\"projects/{project_id}/locations/{entryTypeLocation}/entryTypes/{entryTypeId}\",\n", " \"aspects\": aspects,\n", " }\n", "\n", " json_result = restAPIHelper(url, \"POST\", data)\n", " print(f\"createEntry (POST) json_result: {json_result}\")\n", " else:\n", " print(f\"createEntry (POST) Entry {entryTypeId} already exists in Entry Group {entryGroupId}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "0YMx2nKC8tHb" }, "source": [ "### <font color='#4285f4'>Entry - Update Dataplex Metadata on BigQuery Table / Column (**System Entry**) - Helper Methods</font>" ] }, { "cell_type": "markdown", "metadata": { "id": "4aQLNJXR8tHd" }, "source": [ "#### updateDataplexSystemEntry_BigQueryTable\n", "- Adds an entry type and aspect to a built in system type (e.g. BigQuery Table)" ] }, { "cell_type": "markdown", "metadata": { "id": "w6wIM6QY8tHd" }, "source": [ "##### Sample Terraform" ] }, { "cell_type": "markdown", "metadata": { "id": "DqHlrPxu8tHd" }, "source": [ "```\n", "Terraform currently not available\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "M0VrYiaq8tHd" }, "source": [ "##### REST API Code" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Apv1E3tC8tHd" }, "outputs": [], "source": [ "def updateDataplexSystemEntry_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " entryTypeId, entryTypeLocation,\n", " aspects):\n", " \"\"\"Associates an Entry Type and Aspect Type to a BigQuery (Dataplex System Entry Group)\"\"\"\n", "\n", " # https://cloud.google.com/dataplex/docs/reference/rest/v1/projects.locations.entryGroups.entries/patch\n", " url = f\"https://dataplex.googleapis.com/v1/projects/{project_id}/locations/{entryGroupLocation}/entryGroups/\" + \\\n", " f\"@bigquery/entries/bigquery.googleapis.com/projects/{bigqueryProjectId}/datasets/{bigqueryDataset}/tables/{bigqueryTable}?update_mask=aspects\"\n", "\n", " # IMPORTANT NOTE:\n", " # If you uncomment out the entryType below you will Replace the System Entry Type of \"BigQuery Table\" which is not a reccomended pattern\n", " # You should just add the aspects as \"optional\" in order to avoid replacing the default System Entry Type.\n", " data = {\n", " # \"entryType\": f\"projects/{project_id}/locations/{entryTypeLocation}/entryTypes/{entryTypeId}\",\n", " \"aspects\": aspects\n", " }\n", "\n", " json_result = restAPIHelper(url, \"PATCH\", data)\n", " print(f\"updateDataplexSystemEntry_BigQueryTable (PATCH) json_result: {json_result}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "_z84yma-NRf7" }, "source": [ "#### updateDataplexMetatdata_BigQueryTable\n", "- Updates the overview and roles Dataplex metadata on a BigQuery table" ] }, { "cell_type": "markdown", "metadata": { "id": "_nTyYn1ANRf7" }, "source": [ "##### Sample Terraform" ] }, { "cell_type": "markdown", "metadata": { "id": "5KSQWTlfNRf7" }, "source": [ "```\n", "Terraform currently not available\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "pbqj58w2NRf7" }, "source": [ "##### REST API Code" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "d638IaM6NRf7" }, "outputs": [], "source": [ "def updateDataplexMetatdata_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " entryTypeId, entryTypeLocation,\n", " overviewText, roleList):\n", " \"\"\"Updates the Overview text and the Roles (replaces them)\"\"\"\n", "\n", " # https://cloud.google.com/dataplex/docs/reference/rest/v1/projects.locations.entryGroups.entries/patch\n", " url = f\"https://dataplex.googleapis.com/v1/projects/{project_id}/locations/{entryGroupLocation}/entryGroups/\" + \\\n", " f\"@bigquery/entries/bigquery.googleapis.com/projects/{bigqueryProjectId}/datasets/{bigqueryDataset}/tables/{bigqueryTable}?update_mask=aspects\"\n", "\n", " data = {\n", " \"aspects\": {\n", " \"dataplex-types.global.overview\": {\n", " \"data\": {\n", " \"content\": overviewText\n", " }\n", " },\n", " \"dataplex-types.global.contacts\": {\n", " \"data\": {\n", " \"identities\": []\n", " }\n", " }\n", " }\n", " }\n", "\n", " for item in roleList:\n", " data[\"aspects\"][\"dataplex-types.global.contacts\"][\"data\"][\"identities\"].append( {\"role\": item[\"role\"], \"name\": item[\"name\"] } )\n", "\n", " json_result = restAPIHelper(url, \"PATCH\", data)\n", " print(f\"updateDataplexMetatdata_BigQueryTable (PATCH) json_result: {json_result}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "2ICU0zj8Xy7u" }, "source": [ "### <font color='#4285f4'>Example: Create Data Data Governance Structure</font>" ] }, { "cell_type": "markdown", "metadata": { "id": "tZBrW9T2rcGN" }, "source": [ "##### Entry Group" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "VPUUEOqIV63N" }, "outputs": [], "source": [ "myEntryGroupId = \"my-entry-group\"\n", "myEntryGroupLocation = \"global\"\n", "myEntryGroupName = \"My Entry Group\" + f\" ({random_extension})\"\n", "myEntryGroupDescription = \"A test entry group\"\n", "\n", "existsEntryGroup(project_id, myEntryGroupId, myEntryGroupLocation)\n", "\n", "createEntryGroup(project_id, myEntryGroupId, myEntryGroupLocation, myEntryGroupName, myEntryGroupDescription)\n", "\n", "print(f\"To view Entry Groups: https://console.cloud.google.com/dataplex/catalog/entry-groups?project={project_id}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "gxXrrnxIrc0q" }, "source": [ "##### Aspect Type" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "OVRsKROjlAIP" }, "outputs": [], "source": [ "myAspectTypeId = \"my-aspect-type\"\n", "myAspectTypeName = \"My Aspect Type\" + f\" ({random_extension})\"\n", "myAspectTypeDescription = \"Test aspect type\"\n", "myAspectTypeLocation = \"global\"\n", "myAspectTypeTemplateName = \"my-metadataTemplate-template\"\n", "\n", "existsAspectType(project_id, myAspectTypeId, myAspectTypeLocation)\n", "\n", "metadataTemplate = {\n", " \"name\": myAspectTypeTemplateName,\n", " \"type\": \"record\",\n", " \"recordFields\": [\n", " {\n", " \"name\": \"type\",\n", " \"type\": \"enum\",\n", " \"annotations\": {\n", " \"displayName\": \"Type\",\n", " \"description\": \"Specifies the type of view represented by the entry.\"\n", " },\n", " \"index\": 1,\n", " \"constraints\": {\n", " \"required\": True\n", " },\n", " \"enumValues\": [\n", " {\n", " \"name\": \"FILE\",\n", " \"index\": 1\n", " }\n", " ]\n", " }\n", " ]\n", "}\n", "createAspectType(project_id, myAspectTypeId, myAspectTypeLocation, myAspectTypeName, myAspectTypeDescription, metadataTemplate)\n", "print(f\"To view Aspect Types: https://console.cloud.google.com/dataplex/catalog/aspect-types?project={project_id}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "WfZb8t4grWqx" }, "source": [ "##### Entry Type" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "63iPhCsPo7bM" }, "outputs": [], "source": [ "myEntryGroupId = \"my-entry-group\"\n", "myEntryGroupLocation = \"global\"\n", "\n", "myEntryTypeId = \"my-entry-type\"\n", "myEntryTypeName = \"My Entry Type\" + f\" ({random_extension})\"\n", "myEntryTypeDescription = \"Test entry type\"\n", "myEntryTypeLocation = \"global\"\n", "myEntryTypeTypeAliases = [\"LISTING\"]\n", "myEntryTypePlatform = \"GCS\"\n", "myEntryTypeSystem = \"Custom\"\n", "\n", "myAspectId = \"my-aspect-type\"\n", "myAspectLocation = \"global\"\n", "\n", "existsEntryType(project_id, myEntryTypeId, myEntryTypeLocation)\n", "\n", "requiredAspects = [\n", " {\n", " \"type\": f\"projects/{project_id}/locations/{myAspectLocation}/aspectTypes/{myAspectId}\"\n", " }\n", "]\n", "createEntryType(project_id, myEntryTypeId, myEntryTypeLocation, myEntryTypeName, myEntryTypeDescription, myEntryTypeTypeAliases, myEntryTypePlatform, myEntryTypeSystem, requiredAspects)\n", "print(f\"To view Entry Types: https://console.cloud.google.com/dataplex/catalog/entry-types?project={project_id}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "fRlFgl9vri70" }, "source": [ "##### Entry (Custom Entry)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "AdaaV9V8tHGC" }, "outputs": [], "source": [ "myEntryGroupId = \"my-entry-group\"\n", "myEntryGroupLocation = \"global\"\n", "\n", "myEntryTypeId = \"my-entry-type\"\n", "myEntryTypeLocation = \"global\"\n", "\n", "myAspectId = \"my-aspect-type\"\n", "myAspectLocation = \"global\"\n", "\n", "myEntryId = \"my-entry\"\n", "myEntryName = \"My Entry\" + f\" ({random_extension})\"\n", "myEntryDescription = \"Test entry (custom)\"\n", "myEntrySystem = \"Custom\" # This will show as a custom object\n", "\n", "existsEntry(project_id, myEntryGroupId, myEntryGroupLocation, myEntryId)\n", "\n", "aspects = {\n", " f\"{project_id}.{myAspectLocation}.{myAspectId}\": {\n", " \"data\": {\"type\": \"FILE\"}\n", " }\n", " }\n", "createEntry(project_id,\n", " myEntryGroupId, myEntryGroupLocation,\n", " myEntryTypeId, myEntryTypeLocation,\n", " myEntryId, myEntryName, myEntryDescription, myEntrySystem, aspects)\n", "print(f\"To view Entry: https://console.cloud.google.com/dataplex/dp-entries/projects/{project_id}/locations/{myEntryGroupLocation}/entryGroups/{myEntryGroupId}/entries/{myEntryId}?&project={project_id}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "d_BQqD2nrm3o" }, "source": [ "##### Entry - Update Dataplex Metadata on BigQuery Table (System Entry)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "7A1GBw75rpkK" }, "outputs": [], "source": [ "entryGroupLocation = \"us\" # This has to be \"us\" since our tables are US multi-region\n", "\n", "bigqueryProjectId = project_id\n", "bigqueryDataset = \"${bigquery_governed_data_raw_dataset}\"\n", "bigqueryTable = \"customer\"\n", "\n", "myEntryTypeId = \"my-entry-type\"\n", "myEntryTypeLocation = \"global\"\n", "\n", "myAspectId = \"my-aspect-type\"\n", "myAspectLocation = \"global\"\n", "\n", "aspects = {\n", " f\"{project_id}.{myAspectLocation}.{myAspectId}\": {\n", " \"data\": {\"type\": \"FILE\"}\n", " }\n", " }\n", "\n", "updateDataplexSystemEntry_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " myEntryTypeId, myEntryTypeLocation,\n", " aspects)\n", "\n", "overviewText = \"This is a test description for a BigQuery table.\"\n", "roleList = [\n", " {\n", " \"role\" : \"Project Manager\",\n", " \"name\" : \"Bugs Bunny\"\n", " },\n", " {\n", " \"role\" : \"Owner\",\n", " \"name\" : \"Google\"\n", " }\n", " ]\n", "updateDataplexMetatdata_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " myEntryTypeId, myEntryTypeLocation,\n", " overviewText, roleList)\n", "\n", "print(f\"To view Table: https://console.cloud.google.com/dataplex/dp-entries/projects/{project_id}/locations/{entryGroupLocation}/entryGroups/@bigquery/entries/bigquery.googleapis.com%2Fprojects%2F{bigqueryProjectId}%2Fdatasets%2F{bigqueryDataset}%2Ftables%2F{bigqueryTable}?&project={project_id}\")\n" ] }, { "cell_type": "markdown", "metadata": { "id": "cTzSMyAvrd_Q" }, "source": [ "### <font color='#4285f4'>BigQuery Data Governance Use Case: Create Data Data Governance Structure</font>" ] }, { "cell_type": "markdown", "metadata": { "id": "EC1D8KO3r2ae" }, "source": [ "##### **Create Aspect Types**\n", "- We will be decorating our BigQuery Tables with Aspect Types and since BigQuery tables are System \"Entry(s)\" they are already in an Entry Group." ] }, { "cell_type": "markdown", "metadata": { "id": "yqqs2qpqOzZa" }, "source": [ "###### Aspect Type: Data Domain" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "xX8FGYuarkyt" }, "outputs": [], "source": [ "dataDomainAspectId = \"data-domain-aspect-type\"\n", "dataDomainAspectTypeName = \"Data Domain\" + f\" ({random_extension})\"\n", "dataDomainAspectTypeDescription = \"This classification shows how much the data has been processed and prepared for use. Think of it as the 'maturity' level of the data. Raw data is just as it was initially collected. Enriched data has been cleaned and may have additional information added to it. Curated data is the most refined and is ready for business analysis and reporting.\"\n", "dataDomainAspectLocation = \"global\"\n", "dataDomainAspectTemplateName = f\"{dataDomainAspectId}-metadataTemplate\"\n", "\n", "metadataTemplate = {\n", " \"name\": dataDomainAspectTemplateName,\n", " \"type\": \"record\",\n", " \"recordFields\": [\n", " {\n", " \"name\": \"zone\",\n", " \"type\": \"enum\",\n", " \"annotations\": {\n", " \"displayName\": \"Zone\",\n", " \"description\": \"Indicates the level of processing the data has undergone (Raw, Enriched, or Curated).\"\n", " },\n", " \"index\": 1,\n", " \"constraints\": {\n", " \"required\": True\n", " },\n", " \"enumValues\": [\n", " {\n", " \"name\": \"Raw\",\n", " \"index\": 1\n", " },\n", " {\n", " \"name\": \"Enriched\",\n", " \"index\": 2\n", " },\n", " {\n", " \"name\": \"Curated\",\n", " \"index\": 3\n", " },\n", " ]\n", " }\n", " ]\n", "}\n", "createAspectType(project_id, dataDomainAspectId, dataDomainAspectLocation, dataDomainAspectTypeName, dataDomainAspectTypeDescription, metadataTemplate)\n", "print(f\"To view Aspect Types: https://console.cloud.google.com/dataplex/catalog/aspect-types?project={project_id}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "3SjORUnZO14j" }, "source": [ "###### Aspect Type: Data Retention" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "jB2XIMwSxnMw" }, "outputs": [], "source": [ "dataRetentionAspectId = \"data-retention-aspect-type\"\n", "dataRetentionAspectTypeName = \"Data Retention\" + f\" ({random_extension})\"\n", "dataRetentionAspectTypeDescription = \"This aspect type defines how long a data asset should be retained, along with the relevant policies.\"\n", "dataRetentionAspectLocation = \"global\"\n", "dataRetentionAspectTemplateName = f\"{dataRetentionAspectId}-metadataTemplate\"\n", "\n", "metadataTemplate = {\n", " \"name\": dataRetentionAspectTemplateName,\n", " \"type\": \"record\",\n", " \"recordFields\": [\n", " {\n", " \"name\": \"retention-days\",\n", " \"type\": \"int\",\n", " \"annotations\": {\n", " \"displayName\": \"Retention Days\",\n", " \"description\": \"The number of days this data asset should be retained before deletion or archiving.\"\n", " },\n", " \"index\": 1,\n", " \"constraints\": {\n", " \"required\": True\n", " }\n", " },\n", " {\n", " \"name\": \"retention-policy\",\n", " \"type\": \"string\",\n", " \"annotations\": {\n", " \"displayName\": \"Retention Policy\",\n", " \"description\": \"A URL or reference to the policy that governs the data retention for this asset.\"\n", " },\n", " \"index\": 2,\n", " \"constraints\": {\n", " \"required\": False\n", " }\n", " }\n", " ]\n", "}\n", "\n", "createAspectType(project_id, dataRetentionAspectId, dataRetentionAspectLocation, dataRetentionAspectTypeName, dataRetentionAspectTypeDescription, metadataTemplate)\n", "print(f\"To view Aspect Types: https://console.cloud.google.com/dataplex/catalog/aspect-types?project={project_id}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "N68TOhISO6o_" }, "source": [ "###### Aspect Type: Data Goverance" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "61tt_iofxnJ3" }, "outputs": [], "source": [ "dataGovernanceAspectId = \"data-governance-aspect-type\"\n", "dataGovernanceAspectTypeName = \"Data Governance\" + f\" ({random_extension})\"\n", "dataGovernanceAspectTypeDescription = \"This aspect type defines if a table contains Personally Identifiable Information (PII) and provides table level governance information.\"\n", "dataGovernanceAspectLocation = \"global\"\n", "dataGovernanceAspectTemplateName = f\"{dataGovernanceAspectId}-metadataTemplate\"\n", "\n", "\n", "metadataTemplate = {\n", " \"name\": dataGovernanceAspectTemplateName,\n", " \"type\": \"record\",\n", " \"recordFields\": [\n", " {\n", " \"name\": \"data-steward\",\n", " \"type\": \"string\",\n", " \"annotations\": {\n", " \"displayName\": \"Data Steward\",\n", " \"description\": \"The name or ID of the data steward responsible for this table.\"\n", " },\n", " \"index\": 1,\n", " \"constraints\": {\n", " \"required\": True\n", " }\n", " },\n", " {\n", " \"name\": \"owner-group\",\n", " \"type\": \"string\",\n", " \"annotations\": {\n", " \"displayName\": \"Owner Group\",\n", " \"description\": \"The IAM group or team responsible for this table.\"\n", " },\n", " \"index\": 2,\n", " \"constraints\": {\n", " \"required\": True\n", " }\n", " },\n", " {\n", " \"name\": \"business-owner\",\n", " \"type\": \"string\",\n", " \"annotations\": {\n", " \"displayName\": \"Business Owner\",\n", " \"description\": \"Name of the owner or contact for the data asset\"\n", " },\n", " \"index\": 3,\n", " \"constraints\": {\n", " \"required\": True\n", " }\n", " },\n", " {\n", " \"name\": \"documentation-url\",\n", " \"type\": \"string\",\n", " \"annotations\": {\n", " \"displayName\": \"Documentation URL\",\n", " \"description\": \"URL to documentation about the table, including access, usage, etc.\"\n", " },\n", " \"index\": 4,\n", " \"constraints\": {\n", " \"required\": False\n", " }\n", " },\n", " {\n", " \"name\": \"data-lifecycle\",\n", " \"type\": \"enum\",\n", " \"annotations\": {\n", " \"displayName\": \"Data Lifecycle\",\n", " \"description\": \"The lifecycle stage of the asset (Dev, Test, QA, Production, Deprecated)\"\n", " },\n", " \"index\": 5,\n", " \"constraints\": {\n", " \"required\": True\n", " },\n", " \"enumValues\": [\n", " {\n", " \"name\": \"Dev\",\n", " \"index\": 1\n", " },\n", " {\n", " \"name\": \"Test\",\n", " \"index\": 2\n", " },\n", " {\n", " \"name\": \"QA\",\n", " \"index\": 3\n", " },\n", " {\n", " \"name\": \"Production\",\n", " \"index\": 4\n", " },\n", " {\n", " \"name\": \"Deprecated\",\n", " \"index\": 6\n", " },\n", " ]\n", " },\n", " {\n", " \"name\": \"classification-level\",\n", " \"type\": \"enum\",\n", " \"annotations\": {\n", " \"displayName\": \"Classification Level\",\n", " \"description\": \"Indicates the sensitivity and access restrictions for this data asset (Public, Internal, Confidential, Restricted).\"\n", " },\n", " \"index\": 6,\n", " \"constraints\": {\n", " \"required\": True\n", " },\n", " \"enumValues\": [\n", " {\n", " \"name\": \"Public\",\n", " \"index\": 1\n", " },\n", " {\n", " \"name\": \"Internal\",\n", " \"index\": 2\n", " },\n", " {\n", " \"name\": \"Confidential\",\n", " \"index\": 3\n", " },\n", " {\n", " \"name\": \"Restricted\",\n", " \"index\": 4\n", " },\n", " ]\n", " },\n", " {\n", " \"name\": \"data-sensitivity-level\",\n", " \"type\": \"enum\",\n", " \"annotations\": {\n", " \"displayName\": \"Data Sensitivity Level\",\n", " \"description\": \"The general sensitivity classification of the table. (Low, Medium, High, Critical)\"\n", " },\n", " \"index\": 7,\n", " \"constraints\": {\n", " \"required\": True\n", " },\n", " \"enumValues\": [\n", " {\n", " \"name\": \"Low\",\n", " \"index\": 1\n", " },\n", " {\n", " \"name\": \"Medium\",\n", " \"index\": 2\n", " },\n", " {\n", " \"name\": \"High\",\n", " \"index\": 3\n", " },\n", " {\n", " \"name\": \"Critical\",\n", " \"index\": 4\n", " },\n", " ]\n", " },\n", " {\n", " \"name\": \"contains-pii\",\n", " \"type\": \"bool\",\n", " \"annotations\": {\n", " \"displayName\": \"Contains PII\",\n", " \"description\": \"Indicates if this table contains any Personally Identifiable Information (PII).\"\n", " },\n", " \"index\": 8,\n", " \"constraints\": {\n", " \"required\": True\n", " }\n", " }\n", " ]\n", "}\n", "\n", "createAspectType(project_id, dataGovernanceAspectId, dataGovernanceAspectLocation, dataGovernanceAspectTypeName, dataGovernanceAspectTypeDescription, metadataTemplate)\n", "print(f\"To view Aspect Types: https://console.cloud.google.com/dataplex/catalog/aspect-types?project={project_id}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "ybb7jHhFO-tP" }, "source": [ "###### Aspect Type: Data Sensitivity" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "00Kp7a_M_5QY" }, "outputs": [], "source": [ "dataSensitivityAspectId = \"data-sensitivity-aspect-type\"\n", "dataSensitivityAspectTypeName = \"Column Data Sensitivity\" + f\" ({random_extension})\"\n", "dataSensitivityAspectTypeDescription = \"This aspect type defines if a column contains Personally Identifiable Information (PII) and provides column level governance information.\"\n", "dataSensitivityAspectLocation = \"global\"\n", "dataSensitivityAspectTemplateName = f\"{dataSensitivityAspectId}-metadataTemplate\"\n", "\n", "metadataTemplate = {\n", " \"name\": dataSensitivityAspectTemplateName,\n", " \"type\": \"record\",\n", " \"recordFields\": [\n", " {\n", " \"name\": \"contains-pii\",\n", " \"type\": \"bool\",\n", " \"annotations\": {\n", " \"displayName\": \"Contains PII\",\n", " \"description\": \"Indicates if this column contains any Personally Identifiable Information (PII).\"\n", " },\n", " \"index\": 1,\n", " \"constraints\": {\n", " \"required\": True\n", " }\n", " },\n", " {\n", " \"name\": \"pii-type\",\n", " \"type\": \"string\",\n", " \"annotations\": {\n", " \"displayName\": \"PII Type\",\n", " \"description\": \"The type of PII contained within this column (e.g., Name, Email, Phone Number, etc.).\"\n", " },\n", " \"index\": 2,\n", " \"constraints\": {\n", " \"required\": False\n", " }\n", " },\n", " {\n", " \"name\": \"data-sensitivity-level\",\n", " \"type\": \"enum\",\n", " \"annotations\": {\n", " \"displayName\": \"Data Sensitivity Level\",\n", " \"description\": \"The sensitivity level of the data for data masking or other protection needs. (Low, Medium, High, Critical)\"\n", " },\n", " \"index\": 3,\n", " \"constraints\": {\n", " \"required\": True\n", " },\n", " \"enumValues\": [\n", " {\n", " \"name\": \"Low\",\n", " \"index\": 1\n", " },\n", " {\n", " \"name\": \"Medium\",\n", " \"index\": 2\n", " },\n", " {\n", " \"name\": \"High\",\n", " \"index\": 3\n", " },\n", " {\n", " \"name\": \"Critical\",\n", " \"index\": 4\n", " },\n", " ]\n", " },\n", " {\n", " \"name\": \"compliance-requirements\",\n", " \"type\": \"array\",\n", " \"arrayItems\":\n", " {\n", " \"name\": \"compliance-requirements-metadata-template\",\n", " \"type\": \"string\"\n", " },\n", " \"annotations\": {\n", " \"displayName\": \"Compliance Requirements\",\n", " \"description\": \"List of regulations that are relevant to this column (e.g., GDPR, CCPA, HIPAA).\"\n", " },\n", " \"index\": 4,\n", " \"constraints\": {\n", " \"required\": False\n", " }\n", " }\n", " ]\n", "}\n", "\n", "createAspectType(project_id, dataSensitivityAspectId, dataSensitivityAspectLocation, dataSensitivityAspectTypeName, dataSensitivityAspectTypeDescription, metadataTemplate)\n", "print(f\"To view Aspect Types: https://console.cloud.google.com/dataplex/catalog/aspect-types?project={project_id}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "FL-NdzPwHvrY" }, "source": [ "##### **Create Entry Types**\n", "- Create the templates that hold the aspect types we just created." ] }, { "cell_type": "markdown", "metadata": { "id": "4QfhdvfCPH8a" }, "source": [ "###### Entry Type: Governed Table" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "CO80WrODxnFJ" }, "outputs": [], "source": [ "# Governed Table Entry Type\n", "governedTableEntryTypeId = \"governed-table\"\n", "governedTableEntryTypeName = \"Governed Table\" + f\" ({random_extension})\"\n", "governedTableEntryTypeLocation = \"global\"\n", "governedTableDescription = \"A table, either physical or logical, that is actively managed under data governance principles.\"\n", "governedTableLocation = \"global\"\n", "\n", "governedTableRequiredAspects = [\n", " {\n", " \"type\": f\"projects/{project_id}/locations/{governedTableLocation}/aspectTypes/{dataDomainAspectId}\"\n", " },\n", " {\n", " \"type\": f\"projects/{project_id}/locations/{governedTableLocation}/aspectTypes/{dataRetentionAspectId}\"\n", " },\n", " {\n", " \"type\": f\"projects/{project_id}/locations/{governedTableLocation}/aspectTypes/{dataGovernanceAspectId}\"\n", " }\n", "]\n", "\n", "createEntryType(project_id, governedTableEntryTypeId, governedTableEntryTypeLocation, governedTableEntryTypeName, governedTableDescription, [\"TABLE\"], \"GCS\", \"BigQuery\", governedTableRequiredAspects)\n", "print(f\"To view Entry Types: https://console.cloud.google.com/dataplex/catalog/entry-types?project={project_id}\")\n" ] }, { "cell_type": "markdown", "metadata": { "id": "1CW0aWq6PLop" }, "source": [ "###### Entry Type: Governed Column" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "XW62KnTYxnCg" }, "outputs": [], "source": [ "# Governed Column Entry Type\n", "governedColumnEntryTypeId = \"governed-column\"\n", "governedColumnEntryTypeName = \"Governed Column\" + f\" ({random_extension})\"\n", "governedColumnEntryTypeLocation = \"global\"\n", "governedColumnDescription = \"A column, either physical or logical, that is actively managed under data governance principles.\"\n", "governedColumnLocation = \"global\"\n", "\n", "governedColumnRequiredAspects = [\n", " {\n", " \"type\": f\"projects/{project_id}/locations/{governedColumnLocation}/aspectTypes/{dataSensitivityAspectId}\"\n", " }\n", "]\n", "\n", "# The set of allowed type aliases are: [BUCKET, CLUSTER, CODE_ASSET, CONNECTION, DASHBOARD, DASHBOARD_ELEMENT, DATABASE, DATABASE_SCHEMA,\n", "# DATASET, DATA_EXCHANGE, DATA_SOURCE_CONNECTION, DATA_STREAM, EXPLORE, FEATURE_GROUP,\n", "# FEATURE_ONLINE_STORE, FEATURE_VIEW, FILESET, FOLDER, FUNCTION, GLOSSARY, GLOSSARY_CATEGORY,\n", "# GLOSSARY_TERM, LISTING, LOOK, MODEL, POLICY, REPOSITORY, RESOURCE, ROUTINE, SERVICE, TABLE,\n", "# TOPIC, VIEW]\"\n", "\n", "createEntryType(project_id, governedColumnEntryTypeId, governedColumnEntryTypeLocation,governedColumnEntryTypeName, governedColumnDescription, [\"DATABASE_SCHEMA\"], \"GCS\", \"BigQuery\", governedColumnRequiredAspects)\n", "print(f\"To view Entry Types: https://console.cloud.google.com/dataplex/catalog/entry-types?project={project_id}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "Y_UhCyT7KJiO" }, "source": [ "##### **Assign Aspect Types to BigQuery Tables (and columns)**\n", "- Assign the aspect type to each table and specify the values." ] }, { "cell_type": "markdown", "metadata": { "id": "TShOxIa5K4OI" }, "source": [ "###### Raw Zone: Customer Table" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "JUXYMK9nVrZ2" }, "outputs": [], "source": [ "entryGroupLocation = \"us\" # This has to be \"us\" since our tables are US multi-region\n", "\n", "bigqueryProjectId = project_id\n", "bigqueryDataset = \"${bigquery_governed_data_raw_dataset}\"\n", "bigqueryTable = \"customer\"\n", "\n", "aspects = {\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataDomainAspectId}\": {\n", " \"data\": {\n", " \"zone\": \"Raw\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataRetentionAspectId}\": {\n", " \"data\": {\n", " \"retention-days\": 365,\n", " \"retention-policy\": \"http://yourcompany.com/retention-policy\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataGovernanceAspectId}\": {\n", " \"data\": {\n", " \"data-steward\": \"Jane Doe\",\n", " \"owner-group\": \"data-governance-team\",\n", " \"business-owner\": \"Jane Doe\",\n", " \"documentation-url\": \"http://yourcompany.com/customer-table-documentation\",\n", " \"data-lifecycle\": \"Dev\",\n", " \"classification-level\": \"Restricted\",\n", " \"data-sensitivity-level\": \"High\",\n", " \"contains-pii\": True\n", " }}\n", "}\n", "\n", "\n", "# PII Columns\n", "pii_columns = {\n", " \"ssn\": {\"pii_type\": \"ssn\"},\n", " \"first_name\": {\"pii_type\": \"Name\"},\n", " \"last_name\": {\"pii_type\": \"Name\"},\n", " \"email\": {\"pii_type\": \"Email\"},\n", " \"phone\": {\"pii_type\": \"Phone Number\"},\n", " \"ip_address\": {\"pii_type\": \"IP Address\"},\n", " \"address\" : {\"pii_type\": \"Street Address\"}\n", "}\n", "\n", "for column_name, pii_info in pii_columns.items():\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": True,\n", " \"pii-type\": pii_info[\"pii_type\"],\n", " \"data-sensitivity-level\": \"High\",\n", " \"compliance-requirements\": [\"GDPR\", \"CCPA\"]\n", " }}\n", "\n", "non_pii_columns = [\"customer_id\", \"gender\", \"city\", \"state\", \"zip\"]\n", "\n", "for column_name in non_pii_columns:\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": False,\n", " \"data-sensitivity-level\": \"Low\"\n", " }}\n", "\n", "updateDataplexSystemEntry_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " aspects)\n", "\n", "overviewText = \"This is customer master data and contains PII.\"\n", "roleList = [\n", " {\n", " \"role\" : \"Data Steward\",\n", " \"name\" : \"Jane Doe\"\n", " },\n", " {\n", " \"role\" : \"Owner\",\n", " \"name\" : \"Data Team\"\n", " }\n", " ]\n", "\n", "updateDataplexMetatdata_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " overviewText, roleList)\n", "\n", "print(f\"To view Table: https://console.cloud.google.com/dataplex/dp-entries/projects/{project_id}/locations/{entryGroupLocation}/entryGroups/@bigquery/entries/bigquery.googleapis.com%2Fprojects%2F{bigqueryProjectId}%2Fdatasets%2F{bigqueryDataset}%2Ftables%2F{bigqueryTable}?&project={project_id}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "fqSywLgaOYcb" }, "source": [ "###### Raw Zone: Customer Transaction Table" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "cDKDzR4Kxm_Z" }, "outputs": [], "source": [ "entryGroupLocation = \"us\" # This has to be \"us\" since our tables are US multi-region\n", "\n", "bigqueryProjectId = project_id\n", "bigqueryDataset = \"${bigquery_governed_data_raw_dataset}\"\n", "bigqueryTable = \"customer_transaction\"\n", "\n", "aspects = {\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataDomainAspectId}\": {\n", " \"data\": {\n", " \"zone\": \"Raw\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataRetentionAspectId}\": {\n", " \"data\": {\n", " \"retention-days\": 365,\n", " \"retention-policy\": \"http://yourcompany.com/retention-policy\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataGovernanceAspectId}\": {\n", " \"data\": {\n", " \"data-steward\": \"Jane Doe\",\n", " \"owner-group\": \"data-governance-team\",\n", " \"business-owner\": \"Jane Doe\",\n", " \"documentation-url\": \"http://yourcompany.com/customer-transaction-table-documentation\",\n", " \"data-lifecycle\": \"Dev\",\n", " \"classification-level\": \"Restricted\",\n", " \"data-sensitivity-level\": \"Low\",\n", " \"contains-pii\": False\n", " }}\n", "}\n", "\n", "\n", "# PII Columns\n", "pii_columns = {\n", "}\n", "\n", "for column_name, pii_info in pii_columns.items():\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": True,\n", " \"pii-type\": pii_info[\"pii_type\"],\n", " \"data-sensitivity-level\": \"High\",\n", " \"compliance-requirements\": [\"GDPR\", \"CCPA\"]\n", " }}\n", "\n", "non_pii_columns = [\"transaction_id\", \"customer_id\", \"order_date\", \"order_time\", \"transaction_type\", \"region\", \"quantity\", \"product\", \"product_category\", \"price\"]\n", "\n", "for column_name in non_pii_columns:\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": False,\n", " \"data-sensitivity-level\": \"Low\"\n", " }}\n", "\n", "updateDataplexSystemEntry_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " aspects)\n", "\n", "overviewText = \"This is the customer transaction table and contains order details.\"\n", "roleList = [\n", " {\n", " \"role\" : \"Data Steward\",\n", " \"name\" : \"Jane Doe\"\n", " },\n", " {\n", " \"role\" : \"Owner\",\n", " \"name\" : \"Data Team\"\n", " }\n", " ]\n", "\n", "updateDataplexMetatdata_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " overviewText, roleList)\n", "\n", "print(f\"To view Table: https://console.cloud.google.com/dataplex/dp-entries/projects/{project_id}/locations/{entryGroupLocation}/entryGroups/@bigquery/entries/bigquery.googleapis.com%2Fprojects%2F{bigqueryProjectId}%2Fdatasets%2F{bigqueryDataset}%2Ftables%2F{bigqueryTable}?&project={project_id}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "Orw-Jye_P7CC" }, "source": [ "###### Raw Zone: Product Table" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ZwUUx5zxP84z" }, "outputs": [], "source": [ "entryGroupLocation = \"us\" # This has to be \"us\" since our tables are US multi-region\n", "\n", "bigqueryProjectId = project_id\n", "bigqueryDataset = \"${bigquery_governed_data_raw_dataset}\"\n", "bigqueryTable = \"product\"\n", "\n", "aspects = {\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataDomainAspectId}\": {\n", " \"data\": {\n", " \"zone\": \"Raw\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataRetentionAspectId}\": {\n", " \"data\": {\n", " \"retention-days\": 365,\n", " \"retention-policy\": \"http://yourcompany.com/retention-policy\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataGovernanceAspectId}\": {\n", " \"data\": {\n", " \"data-steward\": \"Jane Doe\",\n", " \"owner-group\": \"data-governance-team\",\n", " \"business-owner\": \"Jane Doe\",\n", " \"documentation-url\": \"http://yourcompany.com/product-table-documentation\",\n", " \"data-lifecycle\": \"Dev\",\n", " \"classification-level\": \"Restricted\",\n", " \"data-sensitivity-level\": \"Low\",\n", " \"contains-pii\": False\n", " }}\n", "}\n", "\n", "\n", "# PII Columns\n", "pii_columns = {\n", "}\n", "\n", "for column_name, pii_info in pii_columns.items():\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": True,\n", " \"pii-type\": pii_info[\"pii_type\"],\n", " \"data-sensitivity-level\": \"High\",\n", " \"compliance-requirements\": [\"GDPR\", \"CCPA\"]\n", " }}\n", "\n", "non_pii_columns = [\"description\", \"product\"]\n", "\n", "for column_name in non_pii_columns:\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": False,\n", " \"data-sensitivity-level\": \"Low\"\n", " }}\n", "\n", "updateDataplexSystemEntry_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " aspects)\n", "\n", "overviewText = \"This is the product table and contains product details.\"\n", "roleList = [\n", " {\n", " \"role\" : \"Data Steward\",\n", " \"name\" : \"Jane Doe\"\n", " },\n", " {\n", " \"role\" : \"Owner\",\n", " \"name\" : \"Data Team\"\n", " }\n", " ]\n", "\n", "updateDataplexMetatdata_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " overviewText, roleList)\n", "\n", "print(f\"To view Table: https://console.cloud.google.com/dataplex/dp-entries/projects/{project_id}/locations/{entryGroupLocation}/entryGroups/@bigquery/entries/bigquery.googleapis.com%2Fprojects%2F{bigqueryProjectId}%2Fdatasets%2F{bigqueryDataset}%2Ftables%2F{bigqueryTable}?&project={project_id}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "AjUdAfrPRFNi" }, "source": [ "###### Raw Zone: Product Category Table" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "NUrSD0mARFkl" }, "outputs": [], "source": [ "entryGroupLocation = \"us\" # This has to be \"us\" since our tables are US multi-region\n", "\n", "bigqueryProjectId = project_id\n", "bigqueryDataset = \"${bigquery_governed_data_raw_dataset}\"\n", "bigqueryTable = \"product_category\"\n", "\n", "aspects = {\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataDomainAspectId}\": {\n", " \"data\": {\n", " \"zone\": \"Raw\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataRetentionAspectId}\": {\n", " \"data\": {\n", " \"retention-days\": 365,\n", " \"retention-policy\": \"http://yourcompany.com/retention-policy\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataGovernanceAspectId}\": {\n", " \"data\": {\n", " \"data-steward\": \"JohnSmith\",\n", " \"owner-group\": \"data-governance-team\",\n", " \"business-owner\": \"JohnSmith\",\n", " \"documentation-url\": \"http://yourcompany.com/product-category-table-documentation\",\n", " \"data-lifecycle\": \"Dev\",\n", " \"classification-level\": \"Restricted\",\n", " \"data-sensitivity-level\": \"Low\",\n", " \"contains-pii\": False\n", " }}\n", "}\n", "\n", "\n", "# PII Columns\n", "pii_columns = {\n", "}\n", "\n", "for column_name, pii_info in pii_columns.items():\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": True,\n", " \"pii-type\": pii_info[\"pii_type\"],\n", " \"data-sensitivity-level\": \"High\",\n", " \"compliance-requirements\": [\"GDPR\", \"CCPA\"]\n", " }}\n", "\n", "non_pii_columns = [\"product_category\", \"description\"]\n", "\n", "for column_name in non_pii_columns:\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": False,\n", " \"data-sensitivity-level\": \"Low\"\n", " }}\n", "\n", "updateDataplexSystemEntry_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " aspects)\n", "\n", "overviewText = \"This is the product category table and contains product category details.\"\n", "roleList = [\n", " {\n", " \"role\" : \"Data Steward\",\n", " \"name\" : \"John Smith\"\n", " },\n", " {\n", " \"role\" : \"Owner\",\n", " \"name\" : \"Data Team\"\n", " }\n", " ]\n", "\n", "updateDataplexMetatdata_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " overviewText, roleList)\n", "\n", "print(f\"To view Table: https://console.cloud.google.com/dataplex/dp-entries/projects/{project_id}/locations/{entryGroupLocation}/entryGroups/@bigquery/entries/bigquery.googleapis.com%2Fprojects%2F{bigqueryProjectId}%2Fdatasets%2F{bigqueryDataset}%2Ftables%2F{bigqueryTable}?&project={project_id}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "elpHMFymRlt1" }, "source": [ "###### Enriched Zone: Customer Table" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Zg4M81G9Rmrp" }, "outputs": [], "source": [ "entryGroupLocation = \"us\" # This has to be \"us\" since our tables are US multi-region\n", "\n", "bigqueryProjectId = project_id\n", "bigqueryDataset = \"${bigquery_governed_data_enriched_dataset}\"\n", "bigqueryTable = \"customer\"\n", "\n", "aspects = {\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataDomainAspectId}\": {\n", " \"data\": {\n", " \"zone\": \"Enriched\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataRetentionAspectId}\": {\n", " \"data\": {\n", " \"retention-days\": 365,\n", " \"retention-policy\": \"http://yourcompany.com/retention-policy\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataGovernanceAspectId}\": {\n", " \"data\": {\n", " \"data-steward\": \"Jane Doe\",\n", " \"owner-group\": \"data-governance-team\",\n", " \"business-owner\": \"Jane Doe\",\n", " \"documentation-url\": \"http://yourcompany.com/customer-table-documentation\",\n", " \"data-lifecycle\": \"QA\",\n", " \"classification-level\": \"Restricted\",\n", " \"data-sensitivity-level\": \"High\",\n", " \"contains-pii\": True\n", " }}\n", "}\n", "\n", "\n", "# PII Columns\n", "pii_columns = {\n", " \"ssn\": {\"pii_type\": \"ssn\"},\n", " \"first_name\": {\"pii_type\": \"Name\"},\n", " \"last_name\": {\"pii_type\": \"Name\"},\n", " \"email\": {\"pii_type\": \"Email\"},\n", " \"phone\": {\"pii_type\": \"Phone Number\"},\n", " \"ip_address\": {\"pii_type\": \"IP Address\"},\n", " \"address\" : {\"pii_type\": \"Street Address\"},\n", " \"credit_card_number\" : {\"pii_type\": \"Credit Card Number\"}\n", "}\n", "\n", "for column_name, pii_info in pii_columns.items():\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": True,\n", " \"pii-type\": pii_info[\"pii_type\"],\n", " \"data-sensitivity-level\": \"High\",\n", " \"compliance-requirements\": [\"GDPR\", \"CCPA\"]\n", " }}\n", "\n", "non_pii_columns = [\"customer_id\", \"gender\", \"city\", \"state\", \"zip\"]\n", "\n", "for column_name in non_pii_columns:\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": False,\n", " \"data-sensitivity-level\": \"Low\"\n", " }}\n", "\n", "updateDataplexSystemEntry_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " aspects)\n", "\n", "overviewText = \"This is customer master data and contains PII.\"\n", "roleList = [\n", " {\n", " \"role\" : \"Data Steward\",\n", " \"name\" : \"Jane Doe\"\n", " },\n", " {\n", " \"role\" : \"Owner\",\n", " \"name\" : \"Data Team\"\n", " }\n", " ]\n", "\n", "updateDataplexMetatdata_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " overviewText, roleList)\n", "\n", "print(f\"To view Table: https://console.cloud.google.com/dataplex/dp-entries/projects/{project_id}/locations/{entryGroupLocation}/entryGroups/@bigquery/entries/bigquery.googleapis.com%2Fprojects%2F{bigqueryProjectId}%2Fdatasets%2F{bigqueryDataset}%2Ftables%2F{bigqueryTable}?&project={project_id}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "KvymeWd_RpqE" }, "source": [ "###### Enriched Zone: Order Detail Table" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "CnxNCIF6RqHr" }, "outputs": [], "source": [ "entryGroupLocation = \"us\" # This has to be \"us\" since our tables are US multi-region\n", "\n", "bigqueryProjectId = project_id\n", "bigqueryDataset = \"${bigquery_governed_data_enriched_dataset}\"\n", "bigqueryTable = \"order_detail\"\n", "\n", "aspects = {\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataDomainAspectId}\": {\n", " \"data\": {\n", " \"zone\": \"Enriched\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataRetentionAspectId}\": {\n", " \"data\": {\n", " \"retention-days\": 365,\n", " \"retention-policy\": \"http://yourcompany.com/retention-policy\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataGovernanceAspectId}\": {\n", " \"data\": {\n", " \"data-steward\": \"JohnSmith\",\n", " \"owner-group\": \"data-governance-team\",\n", " \"business-owner\": \"JohnSmith\",\n", " \"documentation-url\": \"http://yourcompany.com/order-detail-table-documentation\",\n", " \"data-lifecycle\": \"QA\",\n", " \"classification-level\": \"Restricted\",\n", " \"data-sensitivity-level\": \"Low\",\n", " \"contains-pii\": False\n", " }}\n", "}\n", "\n", "\n", "# PII Columns\n", "pii_columns = {\n", "}\n", "\n", "for column_name, pii_info in pii_columns.items():\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": True,\n", " \"pii-type\": pii_info[\"pii_type\"],\n", " \"data-sensitivity-level\": \"High\",\n", " \"compliance-requirements\": [\"GDPR\", \"CCPA\"]\n", " }}\n", "\n", "non_pii_columns = [\"order_id\", \"product_id\", \"quantity\", \"price\"]\n", "\n", "for column_name in non_pii_columns:\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": False,\n", " \"data-sensitivity-level\": \"Low\"\n", " }}\n", "\n", "updateDataplexSystemEntry_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " aspects)\n", "\n", "overviewText = \"This is the order detail table and contains individual order item details.\"\n", "roleList = [\n", " {\n", " \"role\" : \"Data Steward\",\n", " \"name\" : \"John Smith\"\n", " },\n", " {\n", " \"role\" : \"Owner\",\n", " \"name\" : \"Data Team\"\n", " }\n", " ]\n", "\n", "updateDataplexMetatdata_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " overviewText, roleList)\n", "\n", "print(f\"To view Table: https://console.cloud.google.com/dataplex/dp-entries/projects/{project_id}/locations/{entryGroupLocation}/entryGroups/@bigquery/entries/bigquery.googleapis.com%2Fprojects%2F{bigqueryProjectId}%2Fdatasets%2F{bigqueryDataset}%2Ftables%2F{bigqueryTable}?&project={project_id}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "s6ce_7XIRsrQ" }, "source": [ "###### Enriched Zone: Order Header Table" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "cpAWysdhRtHD" }, "outputs": [], "source": [ "entryGroupLocation = \"us\" # This has to be \"us\" since our tables are US multi-region\n", "\n", "bigqueryProjectId = project_id\n", "bigqueryDataset = \"${bigquery_governed_data_enriched_dataset}\"\n", "bigqueryTable = \"order_header\"\n", "\n", "aspects = {\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataDomainAspectId}\": {\n", " \"data\": {\n", " \"zone\": \"Enriched\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataRetentionAspectId}\": {\n", " \"data\": {\n", " \"retention-days\": 365,\n", " \"retention-policy\": \"http://yourcompany.com/retention-policy\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataGovernanceAspectId}\": {\n", " \"data\": {\n", " \"data-steward\": \"Jane Doe\",\n", " \"owner-group\": \"data-governance-team\",\n", " \"business-owner\": \"Jane Doe\",\n", " \"documentation-url\": \"http://yourcompany.com/order-header-table-documentation\",\n", " \"data-lifecycle\": \"QA\",\n", " \"classification-level\": \"Restricted\",\n", " \"data-sensitivity-level\": \"Low\",\n", " \"contains-pii\": False\n", " }}\n", "}\n", "\n", "\n", "# PII Columns\n", "pii_columns = {\n", "}\n", "\n", "for column_name, pii_info in pii_columns.items():\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": True,\n", " \"pii-type\": pii_info[\"pii_type\"],\n", " \"data-sensitivity-level\": \"High\",\n", " \"compliance-requirements\": [\"GDPR\", \"CCPA\"]\n", " }}\n", "\n", "non_pii_columns = [\"customer_id\", \"order_id\", \"region\", \"order_datetime\"]\n", "\n", "for column_name in non_pii_columns:\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": False,\n", " \"data-sensitivity-level\": \"Low\"\n", " }}\n", "\n", "updateDataplexSystemEntry_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " aspects)\n", "\n", "overviewText = \"This is the order header table and contains the overall order details.\"\n", "roleList = [\n", " {\n", " \"role\" : \"Data Steward\",\n", " \"name\" : \"Jane Doe\"\n", " },\n", " {\n", " \"role\" : \"Owner\",\n", " \"name\" : \"Data Team\"\n", " }\n", " ]\n", "\n", "updateDataplexMetatdata_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " overviewText, roleList)\n", "\n", "print(f\"To view Table: https://console.cloud.google.com/dataplex/dp-entries/projects/{project_id}/locations/{entryGroupLocation}/entryGroups/@bigquery/entries/bigquery.googleapis.com%2Fprojects%2F{bigqueryProjectId}%2Fdatasets%2F{bigqueryDataset}%2Ftables%2F{bigqueryTable}?&project={project_id}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "9DZKq-z_Rw81" }, "source": [ "###### Enriched Zone: Product Table" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "1IKvYhgWRxaN" }, "outputs": [], "source": [ "entryGroupLocation = \"us\" # This has to be \"us\" since our tables are US multi-region\n", "\n", "bigqueryProjectId = project_id\n", "bigqueryDataset = \"${bigquery_governed_data_enriched_dataset}\"\n", "bigqueryTable = \"product\"\n", "\n", "aspects = {\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataDomainAspectId}\": {\n", " \"data\": {\n", " \"zone\": \"Enriched\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataRetentionAspectId}\": {\n", " \"data\": {\n", " \"retention-days\": 365,\n", " \"retention-policy\": \"http://yourcompany.com/retention-policy\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataGovernanceAspectId}\": {\n", " \"data\": {\n", " \"data-steward\": \"JohnSmith\",\n", " \"owner-group\": \"data-governance-team\",\n", " \"business-owner\": \"JohnSmith\",\n", " \"documentation-url\": \"http://yourcompany.com/product-table-documentation\",\n", " \"data-lifecycle\": \"QA\",\n", " \"classification-level\": \"Restricted\",\n", " \"data-sensitivity-level\": \"Low\",\n", " \"contains-pii\": False\n", " }}\n", "}\n", "\n", "\n", "# PII Columns\n", "pii_columns = {\n", "}\n", "\n", "for column_name, pii_info in pii_columns.items():\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": True,\n", " \"pii-type\": pii_info[\"pii_type\"],\n", " \"data-sensitivity-level\": \"High\",\n", " \"compliance-requirements\": [\"GDPR\", \"CCPA\"]\n", " }}\n", "\n", "non_pii_columns = [\"product_id\", \"product_name\", \"product_description\", \"product_category_id\"]\n", "\n", "for column_name in non_pii_columns:\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": False,\n", " \"data-sensitivity-level\": \"Low\"\n", " }}\n", "\n", "updateDataplexSystemEntry_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " aspects)\n", "\n", "overviewText = \"This is the product table and contains enriched product details.\"\n", "roleList = [\n", " {\n", " \"role\" : \"Data Steward\",\n", " \"name\" : \"John Smith\"\n", " },\n", " {\n", " \"role\" : \"Owner\",\n", " \"name\" : \"Data Team\"\n", " }\n", " ]\n", "\n", "updateDataplexMetatdata_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " overviewText, roleList)\n", "\n", "print(f\"To view Table: https://console.cloud.google.com/dataplex/dp-entries/projects/{project_id}/locations/{entryGroupLocation}/entryGroups/@bigquery/entries/bigquery.googleapis.com%2Fprojects%2F{bigqueryProjectId}%2Fdatasets%2F{bigqueryDataset}%2Ftables%2F{bigqueryTable}?&project={project_id}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "uOGsB68YRzv_" }, "source": [ "###### Enriched Zone: Product Category Table" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "2xEP1ZDaR0CF" }, "outputs": [], "source": [ "entryGroupLocation = \"us\" # This has to be \"us\" since our tables are US multi-region\n", "\n", "bigqueryProjectId = project_id\n", "bigqueryDataset = \"${bigquery_governed_data_enriched_dataset}\"\n", "bigqueryTable = \"product_category\"\n", "\n", "aspects = {\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataDomainAspectId}\": {\n", " \"data\": {\n", " \"zone\": \"Enriched\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataRetentionAspectId}\": {\n", " \"data\": {\n", " \"retention-days\": 365,\n", " \"retention-policy\": \"http://yourcompany.com/retention-policy\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataGovernanceAspectId}\": {\n", " \"data\": {\n", " \"data-steward\": \"Jane Doe\",\n", " \"owner-group\": \"data-governance-team\",\n", " \"business-owner\": \"Jane Doe\",\n", " \"documentation-url\": \"http://yourcompany.com/product-category-table-documentation\",\n", " \"data-lifecycle\": \"QA\",\n", " \"classification-level\": \"Restricted\",\n", " \"data-sensitivity-level\": \"Low\",\n", " \"contains-pii\": False\n", " }}\n", "}\n", "\n", "\n", "# PII Columns\n", "pii_columns = {\n", "}\n", "\n", "for column_name, pii_info in pii_columns.items():\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": True,\n", " \"pii-type\": pii_info[\"pii_type\"],\n", " \"data-sensitivity-level\": \"High\",\n", " \"compliance-requirements\": [\"GDPR\", \"CCPA\"]\n", " }}\n", "\n", "non_pii_columns = [\"product_category_id\", \"product_category_name\", \"product_category_description\"]\n", "\n", "for column_name in non_pii_columns:\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": False,\n", " \"data-sensitivity-level\": \"Low\"\n", " }}\n", "\n", "updateDataplexSystemEntry_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " aspects)\n", "\n", "overviewText = \"This is the product category table and contains enriched product category details.\"\n", "roleList = [\n", " {\n", " \"role\" : \"Data Steward\",\n", " \"name\" : \"Jane Doe\"\n", " },\n", " {\n", " \"role\" : \"Owner\",\n", " \"name\" : \"Data Team\"\n", " }\n", " ]\n", "\n", "updateDataplexMetatdata_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " overviewText, roleList)\n", "\n", "print(f\"To view Table: https://console.cloud.google.com/dataplex/dp-entries/projects/{project_id}/locations/{entryGroupLocation}/entryGroups/@bigquery/entries/bigquery.googleapis.com%2Fprojects%2F{bigqueryProjectId}%2Fdatasets%2F{bigqueryDataset}%2Ftables%2F{bigqueryTable}?&project={project_id}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "d8j7Ce68UKNk" }, "source": [ "###### Curated Zone: Customer Table" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "z80CqpepUN9W" }, "outputs": [], "source": [ "entryGroupLocation = \"us\" # This has to be \"us\" since our tables are US multi-region\n", "\n", "bigqueryProjectId = project_id\n", "bigqueryDataset = \"${bigquery_governed_data_curated_dataset}\"\n", "bigqueryTable = \"customer\"\n", "\n", "aspects = {\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataDomainAspectId}\": {\n", " \"data\": {\n", " \"zone\": \"Curated\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataRetentionAspectId}\": {\n", " \"data\": {\n", " \"retention-days\": 365,\n", " \"retention-policy\": \"http://yourcompany.com/retention-policy\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataGovernanceAspectId}\": {\n", " \"data\": {\n", " \"data-steward\": \"Jane Doe\",\n", " \"owner-group\": \"data-governance-team\",\n", " \"business-owner\": \"Jane Doe\",\n", " \"documentation-url\": \"http://yourcompany.com/customer-table-documentation\",\n", " \"data-lifecycle\": \"Production\",\n", " \"classification-level\": \"Restricted\",\n", " \"data-sensitivity-level\": \"High\",\n", " \"contains-pii\": True\n", " }}\n", "}\n", "\n", "\n", "# PII Columns\n", "pii_columns = {\n", " \"ssn\": {\"pii_type\": \"ssn\"},\n", " \"first_name\": {\"pii_type\": \"Name\"},\n", " \"last_name\": {\"pii_type\": \"Name\"},\n", " \"email\": {\"pii_type\": \"Email\"},\n", " \"phone\": {\"pii_type\": \"Phone Number\"},\n", " \"ip_address\": {\"pii_type\": \"IP Address\"},\n", " \"address\" : {\"pii_type\": \"Street Address\"},\n", " \"credit_card_number\" : {\"pii_type\": \"Credit Card Number\"}\n", "}\n", "\n", "for column_name, pii_info in pii_columns.items():\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": True,\n", " \"pii-type\": pii_info[\"pii_type\"],\n", " \"data-sensitivity-level\": \"High\",\n", " \"compliance-requirements\": [\"GDPR\", \"CCPA\"]\n", " }}\n", "\n", "non_pii_columns = [\"customer_id\", \"gender\", \"city\", \"state\", \"zip\"]\n", "\n", "for column_name in non_pii_columns:\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": False,\n", " \"data-sensitivity-level\": \"Low\"\n", " }}\n", "\n", "updateDataplexSystemEntry_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " aspects)\n", "\n", "overviewText = \"This is customer master data and contains PII.\"\n", "roleList = [\n", " {\n", " \"role\" : \"Data Steward\",\n", " \"name\" : \"Jane Doe\"\n", " },\n", " {\n", " \"role\" : \"Owner\",\n", " \"name\" : \"Data Team\"\n", " }\n", " ]\n", "\n", "updateDataplexMetatdata_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " overviewText, roleList)\n", "\n", "print(f\"To view Table: https://console.cloud.google.com/dataplex/dp-entries/projects/{project_id}/locations/{entryGroupLocation}/entryGroups/@bigquery/entries/bigquery.googleapis.com%2Fprojects%2F{bigqueryProjectId}%2Fdatasets%2F{bigqueryDataset}%2Ftables%2F{bigqueryTable}?&project={project_id}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "2PPV01fxUQH_" }, "source": [ "###### Curated Zone: Customer Training Data Table" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "gkZQSXinURzf" }, "outputs": [], "source": [ "entryGroupLocation = \"us\" # This has to be \"us\" since our tables are US multi-region\n", "\n", "bigqueryProjectId = project_id\n", "bigqueryDataset = \"${bigquery_governed_data_curated_dataset}\"\n", "bigqueryTable = \"customer_training_data\"\n", "\n", "aspects = {\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataDomainAspectId}\": {\n", " \"data\": {\n", " \"zone\": \"Curated\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataRetentionAspectId}\": {\n", " \"data\": {\n", " \"retention-days\": 365,\n", " \"retention-policy\": \"http://yourcompany.com/retention-policy\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataGovernanceAspectId}\": {\n", " \"data\": {\n", " \"data-steward\": \"Jane Doe\",\n", " \"owner-group\": \"ml-team\",\n", " \"business-owner\": \"Jane Doe\",\n", " \"documentation-url\": \"http://yourcompany.com/customer-training-data-table-documentation\",\n", " \"data-lifecycle\": \"Production\",\n", " \"classification-level\": \"Internal\",\n", " \"data-sensitivity-level\": \"Medium\",\n", " \"contains-pii\": False\n", " }}\n", "}\n", "\n", "\n", "# PII Columns\n", "pii_columns = {\n", "}\n", "\n", "for column_name, pii_info in pii_columns.items():\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": True,\n", " \"pii-type\": pii_info[\"pii_type\"],\n", " \"data-sensitivity-level\": \"High\",\n", " \"compliance-requirements\": [\"GDPR\", \"CCPA\"]\n", " }}\n", "\n", "non_pii_columns = [\"customer_id\", \"total_spent\"]\n", "\n", "for column_name in non_pii_columns:\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": False,\n", " \"data-sensitivity-level\": \"Low\"\n", " }}\n", "\n", "updateDataplexSystemEntry_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " aspects)\n", "\n", "overviewText = \"This table is for training ML models and contains customer spending information. This data is private.\"\n", "roleList = [\n", " {\n", " \"role\" : \"Data Steward\",\n", " \"name\" : \"Jane Doe\"\n", " },\n", " {\n", " \"role\" : \"Owner\",\n", " \"name\" : \"ML Team\"\n", " }\n", " ]\n", "\n", "updateDataplexMetatdata_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " overviewText, roleList)\n", "\n", "print(f\"To view Table: https://console.cloud.google.com/dataplex/dp-entries/projects/{project_id}/locations/{entryGroupLocation}/entryGroups/@bigquery/entries/bigquery.googleapis.com%2Fprojects%2F{bigqueryProjectId}%2Fdatasets%2F{bigqueryDataset}%2Ftables%2F{bigqueryTable}?&project={project_id}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "z_GIiEBuUSO4" }, "source": [ "###### Curated Zone: Order Detail Table" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "RBBLu8dyUVfk" }, "outputs": [], "source": [ "entryGroupLocation = \"us\" # This has to be \"us\" since our tables are US multi-region\n", "\n", "bigqueryProjectId = project_id\n", "bigqueryDataset = \"${bigquery_governed_data_curated_dataset}\"\n", "bigqueryTable = \"order_detail\"\n", "\n", "aspects = {\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataDomainAspectId}\": {\n", " \"data\": {\n", " \"zone\": \"Curated\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataRetentionAspectId}\": {\n", " \"data\": {\n", " \"retention-days\": 365,\n", " \"retention-policy\": \"http://yourcompany.com/retention-policy\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataGovernanceAspectId}\": {\n", " \"data\": {\n", " \"data-steward\": \"JohnSmith\",\n", " \"owner-group\": \"data-governance-team\",\n", " \"business-owner\": \"JohnSmith\",\n", " \"documentation-url\": \"http://yourcompany.com/order-detail-table-documentation\",\n", " \"data-lifecycle\": \"Production\",\n", " \"classification-level\": \"Internal\",\n", " \"data-sensitivity-level\": \"Medium\",\n", " \"contains-pii\": False\n", " }}\n", "}\n", "\n", "\n", "# PII Columns\n", "pii_columns = {\n", "}\n", "\n", "for column_name, pii_info in pii_columns.items():\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": True,\n", " \"pii-type\": pii_info[\"pii_type\"],\n", " \"data-sensitivity-level\": \"High\",\n", " \"compliance-requirements\": [\"GDPR\", \"CCPA\"]\n", " }}\n", "\n", "non_pii_columns = [\"order_id\", \"product_id\", \"quantity\", \"price\"]\n", "\n", "for column_name in non_pii_columns:\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": False,\n", " \"data-sensitivity-level\": \"Low\"\n", " }}\n", "\n", "updateDataplexSystemEntry_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " aspects)\n", "\n", "overviewText = \"This is the order detail table and contains individual order item details.\"\n", "roleList = [\n", " {\n", " \"role\" : \"Data Steward\",\n", " \"name\" : \"John Smith\"\n", " },\n", " {\n", " \"role\" : \"Owner\",\n", " \"name\" : \"Data Team\"\n", " }\n", " ]\n", "\n", "updateDataplexMetatdata_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " overviewText, roleList)\n", "\n", "print(f\"To view Table: https://console.cloud.google.com/dataplex/dp-entries/projects/{project_id}/locations/{entryGroupLocation}/entryGroups/@bigquery/entries/bigquery.googleapis.com%2Fprojects%2F{bigqueryProjectId}%2Fdatasets%2F{bigqueryDataset}%2Ftables%2F{bigqueryTable}?&project={project_id}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "retg8N00UWNF" }, "source": [ "###### Curated Zone: Order Header Table" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Do7zSJAWUZCY" }, "outputs": [], "source": [ "entryGroupLocation = \"us\" # This has to be \"us\" since our tables are US multi-region\n", "\n", "bigqueryProjectId = project_id\n", "bigqueryDataset = \"${bigquery_governed_data_curated_dataset}\"\n", "bigqueryTable = \"order_header\"\n", "\n", "aspects = {\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataDomainAspectId}\": {\n", " \"data\": {\n", " \"zone\": \"Curated\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataRetentionAspectId}\": {\n", " \"data\": {\n", " \"retention-days\": 365,\n", " \"retention-policy\": \"http://yourcompany.com/retention-policy\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataGovernanceAspectId}\": {\n", " \"data\": {\n", " \"data-steward\": \"Jane Doe\",\n", " \"owner-group\": \"data-governance-team\",\n", " \"business-owner\": \"Jane Doe\",\n", " \"documentation-url\": \"http://yourcompany.com/order-header-table-documentation\",\n", " \"data-lifecycle\": \"Production\",\n", " \"classification-level\": \"Internal\",\n", " \"data-sensitivity-level\": \"Low\",\n", " \"contains-pii\": False\n", " }}\n", "}\n", "\n", "\n", "# PII Columns\n", "pii_columns = {\n", "}\n", "\n", "for column_name, pii_info in pii_columns.items():\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": True,\n", " \"pii-type\": pii_info[\"pii_type\"],\n", " \"data-sensitivity-level\": \"High\",\n", " \"compliance-requirements\": [\"GDPR\", \"CCPA\"]\n", " }}\n", "\n", "non_pii_columns = [\"customer_id\", \"order_id\", \"region\", \"order_datetime\"]\n", "\n", "for column_name in non_pii_columns:\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": False,\n", " \"data-sensitivity-level\": \"Low\"\n", " }}\n", "\n", "updateDataplexSystemEntry_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " aspects)\n", "\n", "overviewText = \"This is the order header table and contains the overall order details.\"\n", "roleList = [\n", " {\n", " \"role\" : \"Data Steward\",\n", " \"name\" : \"Jane Doe\"\n", " },\n", " {\n", " \"role\" : \"Owner\",\n", " \"name\" : \"Data Team\"\n", " }\n", " ]\n", "\n", "updateDataplexMetatdata_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " overviewText, roleList)\n", "\n", "print(f\"To view Table: https://console.cloud.google.com/dataplex/dp-entries/projects/{project_id}/locations/{entryGroupLocation}/entryGroups/@bigquery/entries/bigquery.googleapis.com%2Fprojects%2F{bigqueryProjectId}%2Fdatasets%2F{bigqueryDataset}%2Ftables%2F{bigqueryTable}?&project={project_id}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "4tP8lp_fUZaE" }, "source": [ "###### Curated Zone: Product Table" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "KGo08DlRXQwE" }, "outputs": [], "source": [ "entryGroupLocation = \"us\" # This has to be \"us\" since our tables are US multi-region\n", "\n", "bigqueryProjectId = project_id\n", "bigqueryDataset = \"${bigquery_governed_data_curated_dataset}\"\n", "bigqueryTable = \"product\"\n", "\n", "aspects = {\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataDomainAspectId}\": {\n", " \"data\": {\n", " \"zone\": \"Curated\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataRetentionAspectId}\": {\n", " \"data\": {\n", " \"retention-days\": 365,\n", " \"retention-policy\": \"http://yourcompany.com/retention-policy\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataGovernanceAspectId}\": {\n", " \"data\": {\n", " \"data-steward\": \"JohnSmith\",\n", " \"owner-group\": \"data-governance-team\",\n", " \"business-owner\": \"JohnSmith\",\n", " \"documentation-url\": \"http://yourcompany.com/product-table-documentation\",\n", " \"data-lifecycle\": \"Production\",\n", " \"classification-level\": \"Public\",\n", " \"data-sensitivity-level\": \"Low\",\n", " \"contains-pii\": False\n", " }}\n", "}\n", "\n", "\n", "# PII Columns\n", "pii_columns = {\n", "}\n", "\n", "for column_name, pii_info in pii_columns.items():\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": True,\n", " \"pii-type\": pii_info[\"pii_type\"],\n", " \"data-sensitivity-level\": \"High\",\n", " \"compliance-requirements\": [\"GDPR\", \"CCPA\"]\n", " }}\n", "\n", "non_pii_columns = [\"product_id\", \"product_name\", \"product_description\", \"product_category_id\"]\n", "\n", "for column_name in non_pii_columns:\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": False,\n", " \"data-sensitivity-level\": \"Low\"\n", " }}\n", "\n", "updateDataplexSystemEntry_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " aspects)\n", "\n", "overviewText = \"This is the product table and contains product details.\"\n", "roleList = [\n", " {\n", " \"role\" : \"Data Steward\",\n", " \"name\" : \"John Smith\"\n", " },\n", " {\n", " \"role\" : \"Owner\",\n", " \"name\" : \"Data Team\"\n", " }\n", " ]\n", "\n", "updateDataplexMetatdata_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " overviewText, roleList)\n", "\n", "print(f\"To view Table: https://console.cloud.google.com/dataplex/dp-entries/projects/{project_id}/locations/{entryGroupLocation}/entryGroups/@bigquery/entries/bigquery.googleapis.com%2Fprojects%2F{bigqueryProjectId}%2Fdatasets%2F{bigqueryDataset}%2Ftables%2F{bigqueryTable}?&project={project_id}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "M8jLU1jJUcH_" }, "source": [ "###### Curated Zone: Product Category Table" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "l0RDJqJVUcZK" }, "outputs": [], "source": [ "entryGroupLocation = \"us\" # This has to be \"us\" since our tables are US multi-region\n", "\n", "bigqueryProjectId = project_id\n", "bigqueryDataset = \"${bigquery_governed_data_curated_dataset}\"\n", "bigqueryTable = \"product_category\"\n", "\n", "aspects = {\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataDomainAspectId}\": {\n", " \"data\": {\n", " \"zone\": \"Curated\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataRetentionAspectId}\": {\n", " \"data\": {\n", " \"retention-days\": 365,\n", " \"retention-policy\": \"http://yourcompany.com/retention-policy\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataGovernanceAspectId}\": {\n", " \"data\": {\n", " \"data-steward\": \"Jane Doe\",\n", " \"owner-group\": \"data-governance-team\",\n", " \"business-owner\": \"Jane Doe\",\n", " \"documentation-url\": \"http://yourcompany.com/product-category-table-documentation\",\n", " \"data-lifecycle\": \"Production\",\n", " \"classification-level\": \"Public\",\n", " \"data-sensitivity-level\": \"Low\",\n", " \"contains-pii\": False\n", " }}\n", "}\n", "\n", "\n", "# PII Columns\n", "pii_columns = {\n", "}\n", "\n", "for column_name, pii_info in pii_columns.items():\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": True,\n", " \"pii-type\": pii_info[\"pii_type\"],\n", " \"data-sensitivity-level\": \"High\",\n", " \"compliance-requirements\": [\"GDPR\", \"CCPA\"]\n", " }}\n", "\n", "non_pii_columns = [\"product_category_id\", \"product_category_name\", \"product_category_description\"]\n", "\n", "for column_name in non_pii_columns:\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": False,\n", " \"data-sensitivity-level\": \"Low\"\n", " }}\n", "\n", "updateDataplexSystemEntry_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " aspects)\n", "\n", "overviewText = \"This is the product category table and contains product category details.\"\n", "roleList = [\n", " {\n", " \"role\" : \"Data Steward\",\n", " \"name\" : \"Jane Doe\"\n", " },\n", " {\n", " \"role\" : \"Owner\",\n", " \"name\" : \"Data Team\"\n", " }\n", " ]\n", "\n", "updateDataplexMetatdata_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " overviewText, roleList)\n", "\n", "print(f\"To view Table: https://console.cloud.google.com/dataplex/dp-entries/projects/{project_id}/locations/{entryGroupLocation}/entryGroups/@bigquery/entries/bigquery.googleapis.com%2Fprojects%2F{bigqueryProjectId}%2Fdatasets%2F{bigqueryDataset}%2Ftables%2F{bigqueryTable}?&project={project_id}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "###### Curated Zone: Sales Table" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "entryGroupLocation = \"us\" # This has to be \"us\" since our tables are US multi-region\n", "\n", "bigqueryProjectId = project_id\n", "bigqueryDataset = \"${bigquery_governed_data_curated_dataset}\"\n", "bigqueryTable = \"sales\"\n", "\n", "aspects = {\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataDomainAspectId}\": {\n", " \"data\": {\n", " \"zone\": \"Curated\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataRetentionAspectId}\": {\n", " \"data\": {\n", " \"retention-days\": 180,\n", " \"retention-policy\": \"http://yourcompany.com/retention-policy\"\n", " }},\n", " f\"{project_id}.{governedTableEntryTypeLocation}.{dataGovernanceAspectId}\": {\n", " \"data\": {\n", " \"data-steward\": \"Sales Team\",\n", " \"owner-group\": \"data-governance-team\",\n", " \"business-owner\": \"Sales Team\",\n", " \"documentation-url\": \"http://yourcompany.com/product-category-table-documentation\",\n", " \"data-lifecycle\": \"Production\",\n", " \"classification-level\": \"Public\",\n", " \"data-sensitivity-level\": \"High\",\n", " \"contains-pii\": True\n", " }}\n", "}\n", "\n", "# PII Columns\n", "pii_columns = { \n", " \"ssn\": {\"pii_type\": \"ssn\"},\n", " \"first_name\": {\"pii_type\": \"Name\"},\n", " \"last_name\": {\"pii_type\": \"Name\"},\n", " \"email\": {\"pii_type\": \"Email\"},\n", " \"phone\": {\"pii_type\": \"Phone Number\"},\n", " \"ip_address\": {\"pii_type\": \"IP Address\"},\n", " \"address\" : {\"pii_type\": \"Street Address\"},\n", " \"credit_card_number\" : {\"pii_type\": \"Credit Card Number\"} \n", "}\n", "\n", "for column_name, pii_info in pii_columns.items():\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": True,\n", " \"pii-type\": pii_info[\"pii_type\"],\n", " \"data-sensitivity-level\": \"High\",\n", " \"compliance-requirements\": [\"GDPR\", \"CCPA\"]\n", " }}\n", "\n", "non_pii_columns = [\n", " \"product_name\", \n", " \"product_description\", \n", " \"product_category_name\", \n", " \"product_category_description\", \n", " \"region\", \n", " \"order_datetime\", \n", " \"price\", \n", " \"quantity\", \n", " \"customer_id\", \n", " \"gender\", \n", " \"city\", \n", " \"state\", \n", " \"zip\"\n", "]\n", "\n", "for column_name in non_pii_columns:\n", " aspects[f\"{project_id}.{governedColumnEntryTypeLocation}.{dataSensitivityAspectId}@Schema.{column_name}\"] = {\n", " \"data\": {\n", " \"contains-pii\": False,\n", " \"data-sensitivity-level\": \"Low\"\n", " }}\n", "\n", "updateDataplexSystemEntry_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " aspects)\n", "\n", "overviewText = \"This is the product category table and contains product category details.\"\n", "roleList = [\n", " {\n", " \"role\" : \"Data Steward\",\n", " \"name\" : \"Jane Doe\"\n", " },\n", " {\n", " \"role\" : \"Owner\",\n", " \"name\" : \"Data Team\"\n", " }\n", " ]\n", "\n", "updateDataplexMetatdata_BigQueryTable(project_id,\n", " entryGroupLocation,\n", " bigqueryProjectId, bigqueryDataset, bigqueryTable,\n", " governedTableEntryTypeId, governedTableEntryTypeLocation,\n", " overviewText, roleList)\n", "\n", "print(f\"To view Table: https://console.cloud.google.com/dataplex/dp-entries/projects/{project_id}/locations/{entryGroupLocation}/entryGroups/@bigquery/entries/bigquery.googleapis.com%2Fprojects%2F{bigqueryProjectId}%2Fdatasets%2F{bigqueryDataset}%2Ftables%2F{bigqueryTable}?&project={project_id}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "42IxhtRRrvR-" }, "source": [ "### <font color='#4285f4'>Clean Up</font>" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "6lF2Z7skFbvf" }, "outputs": [], "source": [ "# Placeholder" ] }, { "cell_type": "markdown", "metadata": { "id": "ASQ2BPisXDA0" }, "source": [ "### <font color='#4285f4'>Reference Links</font>\n" ] }, { "cell_type": "markdown", "metadata": { "id": "rTY6xJdZ3ul8" }, "source": [ "- [REPLACE-ME](https://REPLACE-ME)" ] } ], "metadata": { "colab": { "collapsed_sections": [ "B3uS1mXQQiRS", "1kLXxGaxQiRT", "HMsUvoF4BP7Y", "m65vp54BUFRi", "UmyL-Rg4Dr_f", "sZ6m_wGrK0YG", "JbOjdSP1kN9T", "bI-KJELZ1jgt", "c51M89g0Ejmz", "BX5YIUdQXnFo", "jzJpXz7RXn_d", "yCVwcsfioCwh", "cNEY08D8oELb", "nDgg1kttoOQf", "toHr6QPYba6v", "D0H6CgTZntFy", "dJArMlOjn5AF", "s8ZF1ccNoiB_", "kEUh1istoiB_", "DsdRV9-3oiCA", "eyYp2SbMoiCA", "_bAPoQe4oiCA", "3v7xlNiur1Wo", "0gDcerBWr1Wo", "yU0KlcFhr1Wp", "ZEhXhXxlr1Wp", "vRBXTRxPr1Wp", "0YMx2nKC8tHb", "4aQLNJXR8tHd", "w6wIM6QY8tHd", "_z84yma-NRf7", "_nTyYn1ANRf7", "2ICU0zj8Xy7u", "tZBrW9T2rcGN", "gxXrrnxIrc0q", "WfZb8t4grWqx", "fRlFgl9vri70", "d_BQqD2nrm3o", "yqqs2qpqOzZa", "3SjORUnZO14j", "N68TOhISO6o_", "ybb7jHhFO-tP", "4QfhdvfCPH8a", "1CW0aWq6PLop", "TShOxIa5K4OI", "fqSywLgaOYcb", "Orw-Jye_P7CC", "AjUdAfrPRFNi", "elpHMFymRlt1", "KvymeWd_RpqE", "s6ce_7XIRsrQ", "9DZKq-z_Rw81", "uOGsB68YRzv_", "d8j7Ce68UKNk", "z_GIiEBuUSO4", "retg8N00UWNF", "4tP8lp_fUZaE", "M8jLU1jJUcH_", "42IxhtRRrvR-", "ASQ2BPisXDA0" ], "name": "04-Data-Governance", "private_outputs": true, "provenance": [] }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 0 }