sql-scripts/taxi_dataset/sp_demo_ingest_data.sql (10 lines of code) (raw):
/*##################################################################################
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
###################################################################################*/
/*
Use Cases:
- Quickly load files from a data lake into BigQuery
- Show creating tables from the BigQuery UI (load from GCS or direct upload, etc.)
- Show loading tables via SQL commands
- Show loading tables via the CLI
- Show extracting data back to a data lake
- Extract data from BigQuery for providing customer extracts and/or external applications
Description:
- BigQuery has Free data loading!
- Ingest data into BigQuery using the UI and Load command in formats: Parquet, ORC, Avro, CSV and JSON
- Export data for consumption by data application.
Reference:
- SQL LOAD command: https://cloud.google.com/bigquery/docs/reference/standard-sql/other-statements#load_data_statement
- Command Line: https://cloud.google.com/bigquery/docs/bq-command-line-tool#loading_data
Clean up / Reset script:
DROP TABLE IF EXISTS `${project_id}.${bigquery_taxi_dataset}.load_taxi_ui`;
DROP TABLE IF EXISTS `${project_id}.${bigquery_taxi_dataset}.load_taxi_sql`;
DROP TABLE IF EXISTS `${project_id}.${bigquery_taxi_dataset}.load_taxi_cli`;
*/
-- APPROACH 1 (Using the BigQuery User Interface)
-- Open BigQuery UI
-- Click on ${bigquery_taxi_dataset} (...) and select Create Table
-- Select
-- CREATE TABLE FROM: GCS
-- GCS BUCKET PATTERN: ${processed_bucket_name}/processed/taxi-data/green/trips_table/csv/year=2021/month=12/*.csv'
-- FILE FORMAT: CSV (also show the others)
-- TABLE (name): load_taxi_ui
-- SCEHMA: Auto Dectect
-- Under Advanced
-- HEADER ROWS TO SKIP: 1
-- Click "Create Table"
-- APPROACH 2
-- Using SQL commands
-- https://cloud.google.com/bigquery/docs/reference/standard-sql/other-statements#load_data_statement
LOAD DATA OVERWRITE ${bigquery_taxi_dataset}.load_taxi_sql
FROM FILES (
format = 'CSV',
skip_leading_rows = 1,
field_delimiter=',',
null_marker='',
uris = ['gs://${processed_bucket_name}/processed/taxi-data/green/trips_table/csv/year=2021/month=12/*.csv']);
SELECT * FROM ${bigquery_taxi_dataset}.load_taxi_sql;
-- APPROACH 3
-- https://cloud.google.com/bigquery/docs/bq-command-line-tool#loading_data
-- Using CLI via command line
-- Open a Cloud Shell and paste each command below
/*
bq load \
--location=us \
--skip_leading_rows=1 \
--autodetect \
--replace \
--source_format='CSV' \
--field_delimiter=',' \
--null_marker='' \
"${project_id}:${bigquery_taxi_dataset}.load_taxi_cli" \
"gs://${processed_bucket_name}/processed/taxi-data/green/trips_table/csv/year=2021/month=12/*.csv"
bq query 'SELECT * FROM ${bigquery_taxi_dataset}.load_taxi_cli'
SELECT * FROM ${bigquery_taxi_dataset}.load_taxi_cli;
*/
-- EXPORTING DATA 3
EXPORT DATA
OPTIONS (
uri = 'gs://${raw_bucket_name}/taxi-data/green/trips_table/csv/*.csv',
format = 'CSV',
overwrite = true,
header = true,
field_delimiter = '|')
AS (
SELECT *
FROM ${bigquery_taxi_dataset}.load_taxi_sql
);