markdown/tutorial/gettingstarted/introhawqtbls.html.md.erb (158 lines of code) (raw):
---
title: Lesson 5 - HAWQ Tables
---
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
HAWQ writes data to, and reads data from, HDFS natively. HAWQ tables are similar to tables in any relational database, except that table rows (data) are distributed across the different segments in the cluster.
In this exercise, you will run scripts that use the SQL `CREATE TABLE` command to create HAWQ tables. You will load the Retail demo fact data into the HAWQ tables using the SQL `COPY` command. You will then perform simple and complex queries on the data.
## <a id="tut_introhawqtblprereq"></a>Prerequisites
Ensure that you have:
- [Set Up your HAWQ Runtime Environment](introhawqenv.html#tut_runtime_setup)
- [Created the HAWQ Tutorial Database](basicdbadmin.html#tut_ex_createdb)
- [Downloaded the Retail Data and Script Files](dataandscripts.html#tut_exdownloadfilessteps)
- [Created the Retail Demo HAWQ Schema](dataandscripts.html#tut_dsschema_ex)
- Started your HAWQ cluster.
## <a id="tut_excreatehawqtblsteps"></a>Exercise: Create, Add Data to, and Query HAWQ Retail Demo Tables
Perform the following steps to create and load HAWQ tables from the sample Retail demo data set.
1. Navigate to the HAWQ script directory:
``` shell
gpadmin@master$ cd $HAWQGSBASE/tutorials/getstart/hawq
```
2. Create tables for the Retail demo fact data using the script provided:
``` shell
gpadmin@master$ psql -f ./create_hawq_tables.sql
psql:./create_hawq_tables.sql:2: NOTICE: table "order_lineitems_hawq" does not exist, skipping
DROP TABLE
CREATE TABLE
psql:./create_hawq_tables.sql:41: NOTICE: table "orders_hawq" does not exist, skipping
DROP TABLE
CREATE TABLE
```
**Note**: The `create_hawq_tables.sql` script deletes each table before attempting to create it. If this is your first time performing this exercise, you can safely ignore the `psql` "table does not exist, skipping" messages.)
3. Let's take a look at the `create_hawq_tables.sql` script; for example:
``` shell
gpadmin@master$ vi create_hawq_tables.sql
```
Notice the use of the `retail_demo.` schema name prefix to the `order_lineitems_hawq` table name:
``` sql
DROP TABLE IF EXISTS retail_demo.order_lineitems_hawq;
CREATE TABLE retail_demo.order_lineitems_hawq
(
order_id TEXT,
order_item_id TEXT,
product_id TEXT,
product_name TEXT,
customer_id TEXT,
store_id TEXT,
item_shipment_status_code TEXT,
order_datetime TEXT,
ship_datetime TEXT,
item_return_datetime TEXT,
item_refund_datetime TEXT,
product_category_id TEXT,
product_category_name TEXT,
payment_method_code TEXT,
tax_amount TEXT,
item_quantity TEXT,
item_price TEXT,
discount_amount TEXT,
coupon_code TEXT,
coupon_amount TEXT,
ship_address_line1 TEXT,
ship_address_line2 TEXT,
ship_address_line3 TEXT,
ship_address_city TEXT,
ship_address_state TEXT,
ship_address_postal_code TEXT,
ship_address_country TEXT,
ship_phone_number TEXT,
ship_customer_name TEXT,
ship_customer_email_address TEXT,
ordering_session_id TEXT,
website_url TEXT
)
WITH (appendonly=true, compresstype=zlib) DISTRIBUTED RANDOMLY;
```
The `CREATE TABLE` statement above creates a table named `order_lineitems_hawq` in the `retail_demo` schema. `order_lineitems_hawq` has several columns. `order_id` and `customer_id` provide keys into the orders fact and customers dimension tables. The data in `order_lineitems_hawq` is distributed randomly and is compressed using the `zlib` compression algorithm.
The `create_hawq_tables.sql` script also creates the `orders_hawq` fact table.
6. Take a look at the `load_hawq_tables.sh` script:
``` shell
gpadmin@master$ vi load_hawq_tables.sh
```
Again, notice the use of the `retail_demo.` schema name prefix to the table names.
Examine the `psql -c` `COPY` commands:
``` shell
zcat $DATADIR/order_lineitems.tsv.gz | psql -d hawqgsdb -c "COPY retail_demo.order_lineitems_hawq FROM STDIN DELIMITER E'\t' NULL E'';"
zcat $DATADIR/orders.tsv.gz | psql -d hawqgsdb -c "COPY retail_demo.orders_hawq FROM STDIN DELIMITER E'\t' NULL E'';"
```
The `load_hawq_tables.sh` shell script uses the `zcat` command to uncompress the `.tsv.gz` data files. The SQL `COPY` command copies `STDIN` (i.e. the output of the `zcat` command) to the HAWQ table. The `COPY` command also identifies the `DELIMITER` used in the file (tab) and the `NULL` string ('').
6. Use the `load_hawq_tables.sh` script to load the Retail demo fact data into the newly-created tables. This process may take some time to complete.
``` shell
gpadmin@master$ ./load_hawq_tables.sh
```
6. Use the provided script to verify that the Retail demo fact tables were loaded successfully:
``` shell
gpadmin@master$ ./verify_load_hawq_tables.sh
```
The output of the `verify_load_hawq_tables.sh` script should match the following:
``` shell
Table Name | Count
------------------------------+------------------------
order_lineitems_hawq | 744196
orders_hawq | 512071
------------------------------+------------------------
```
7. Run a query on the `order_lineitems_hawq` table that returns the `product_id`, `item_quantity`, `item_price`, and `coupon_amount` for all order line items associated with order id `8467975147`:
``` shell
gpadmin@master$ psql
hawqgsdb=# SELECT product_id, item_quantity, item_price, coupon_amount
FROM retail_demo.order_lineitems_hawq
WHERE order_id='8467975147' ORDER BY item_price;
product_id | item_quantity | item_price | coupon_amount
------------+---------------+------------+---------------
1611429 | 1 | 11.38 | 0.00000
1035114 | 1 | 12.95 | 0.15000
1382850 | 1 | 17.56 | 0.50000
1562908 | 1 | 18.50 | 0.00000
1248913 | 1 | 34.99 | 0.50000
741706 | 1 | 45.99 | 0.00000
(6 rows)
```
The `ORDER BY` clause identifies the sort column, `item_price`. If you do not specify an `ORDER BY` column(s), the rows are returned in the order in which they were added to the table.
7. Determine the top three postal codes by order revenue by running the following query on the `orders_hawq` table:
``` sql
hawqgsdb=# SELECT billing_address_postal_code,
sum(total_paid_amount::float8) AS total,
sum(total_tax_amount::float8) AS tax
FROM retail_demo.orders_hawq
GROUP BY billing_address_postal_code
ORDER BY total DESC LIMIT 3;
```
Notice the use of the `sum()` aggregate function to add the order totals (`total_amount_paid`) and tax totals (`total_tax_paid`) for all orders. These totals are grouped/summed for each `billing_address_postal_code`.
Compare your output to the following:
``` pre
billing_address_postal_code | total | tax
----------------------------+-----------+-----------
48001 | 111868.32 | 6712.0992
15329 | 107958.24 | 6477.4944
42714 | 103244.58 | 6194.6748
(3 rows)
```
10. Run the following query on the `orders_hawq` and `order_lineitems_hawq` tables to display the `product_id`, `item_quantity`, and `item_price` for all line items identifying a `product_id` of `1869831`:
``` sql
hawqgsdb=# SELECT retail_demo.order_lineitems_hawq.order_id, product_id, item_quantity, item_price
FROM retail_demo.order_lineitems_hawq, retail_demo.orders_hawq
WHERE retail_demo.order_lineitems_hawq.order_id=retail_demo.orders_hawq.order_id AND retail_demo.order_lineitems_hawq.product_id=1869831
ORDER BY retail_demo.order_lineitems_hawq.order_id, product_id;
order_id | product_id | item_quantity | item_price
------------+------------+---------------+------------
4831097728 | 1869831 | 1 | 11.87
6734073469 | 1869831 | 1 | 11.87
(2 rows)
```
11. Exit the `psql` subsystem:
``` sql
hawqgsdb=# \q
```
## <a id="tut_introhawqtbl_summary"></a>Summary
In this lesson, you created and loaded Retail order and order line item data into HAWQ fact tables. You also queried these tables, learning how to filter the data to your needs.
In Lesson 6, you use PXF external tables to similarly access dimension data stored in HDFS.
**Lesson 6**: [HAWQ Extension Framework (PXF)](intropxfhdfs.html)