markdown/reference/cli/admin_utilities/analyzedb.html.md.erb (101 lines of code) (raw):

--- title: analyzedb --- <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> A utility that performs `ANALYZE` operations on tables incrementally and concurrently. ## <a id="topic1__section2"></a>Synopsis ``` pre analyzedb -d <dbname> -s <schema> [ --full ] [ -l | --list ] [ -p <parallel-level> ] [ -v | --verbose ] [ -a ] analyzedb -d <dbname> -t <schema>.<table> [ -i col1[, col2, ...] | -x col1[, col2, ...] ] [ --full ] [ -l | --list ] [ -p <parallel-level> ] [ -v | --verbose ] [ -a ] analyzedb -d <dbname> -f <config-file> | --file <config-file> [ --full ] [ -l | --list ] [ -p <parallel-level> ] [ -v | --verbose ] [ -a ] analyzedb -d <dbname> --clean_last | --clean_all analyzedb --version analyzedb -? | -h | --help ``` ## <a id="topic1__section3"></a>Description The `analyzedb` utility updates statistics on table data for the specified tables in a HAWQ database incrementally and concurrently. While performing `ANALYZE` operations, `analyzedb` creates a snapshot of the table metadata and stores it on disk on the master host. An `ANALYZE` operation is performed only if the table has been modified. If a table or partition has not been modified since the last time it was analyzed, `analyzedb` automatically skips the table or partition because it already contains up-to-date statistics. For a partitioned table `analyzedb` analyzes only those partitions that have no statistics, or that have stale statistics. `analyzedb` also refreshes the statistics on the root partition. By default, `analyzedb` creates a maximum of 5 concurrent sessions to analyze tables in parallel. For each session, `analyzedb` issues an `ANALYZE` command to the database and specifies different table names. The `-p` option controls the maximum number of concurrent sessions. ## <a id="topic1__section4"></a>Notes The utility determines if a table has been modified by comparing catalog metadata of tables with the snapshot of metadata taken during a previous `analyzedb` operation. The snapshots of table metadata are stored as state files in the directory `db_analyze` in the HAWQ master data directory. You can specify the `--clean_last` or `--clean_all` option to remove state files generated by `analyzedb`. If you do not specify a table, set of tables, or schema, the `analyzedb` utility collects the statistics as needed on all system catalog tables and user-defined tables in the database. External tables are not affected by `analyzedb`. Table names that contain spaces are not supported. ## <a id="topic1__section5"></a>Arguments <dt>-d \<dbname\> </dt> <dd>Specifies the name of the database that contains the tables to be analyzed. If this option is not specified, the database name is read from the environment variable `PGDATABASE`. If `PGDATABASE` is not set, the user name specified for the connection is used.</dd> <dt>-s \<schema\> </dt> <dd>Specify a schema to analyze. All tables in the schema will be analyzed. Only a single schema name can be specified on the command line. Only one of the options can be used to specify the files to be analyzed: `-f` or `--file`, `-t` , or `-s`.</dd> <dt>-t \<schema\>.\<table\> </dt> <dd>Collect statistics only on \<schema\>.\<table\>. The table name must be qualified with a schema name. Only a single table name can be specified on the command line. You can specify the `-f` option to specify multiple tables in a file or the `-s` option to specify all the tables in a schema. Only one of these options can be used to specify the files to be analyzed: `-f` or `--file`, `-t` , or `-s`.</dd> <dt>-f, -\\\-file \<config-file\> </dt> <dd>Text file that contains a list of tables to be analyzed. A relative file path from current directory can be specified. The file lists one table per line. Table names must be qualified with a schema name. Optionally, a list of columns can be specified using the `-i` or `-x`. No other options are allowed in the file. Other options such as `--full` must be specified on the command line. Only one of the options can be used to specify the files to be analyzed: `-f` or `--file`, `-t` , or `-s`. When performing `ANALYZE` operations on multiple tables, `analyzedb` creates concurrent sessions to analyze tables in parallel. The `-p` option controls the maximum number of concurrent sessions. In the following example, the first line performs an `ANALYZE` operation on the table `public.nation`, the second line performs an `ANALYZE` operation only on the columns `l_shipdate` and `l_receiptdate` in the table `public.lineitem`. ``` pre public.nation public.lineitem -i l_shipdate, l_receiptdate ``` </dd> ## <a id="topic1__section5"></a>Options <dt>-x \<col1\>, \<col2\>, ... </dt> <dd>Optional. Must be specified with the `-t` option. For the table specified with the `-t` option, exclude statistics collection for the specified columns. Statistics are collected only on the columns that are not listed. Only `-i`, or `-x` can be specified. Both options cannot be specified.</dd> <dt>-i \<col1\>, \<col2\>, ... </dt> <dd>Optional. Must be specified with the `-t` option. For the table specified with the `-t` option, collect statistics only for the specified columns. Only `-i`, or `-x` can be specified. Both options cannot be specified.</dd> <dt>-\\\-full </dt> <dd>Perform an `ANALYZE` operation on all the specified tables. The operation is performed even if the statistics are up to date.</dd> <dt>-l, -\\\-list </dt> <dd>Lists the tables that would have been analyzed with the specified options. The `ANALYZE` operations are not performed.</dd> <dt>-p \<parallel-level\> </dt> <dd>The number of tables that are analyzed in parallel. The value for <parallel-level> can be an integer between 1 and 10, inclusive. Default value is 5.</dd> <dt>-a </dt> <dd>Quiet mode. Do not prompt for user confirmation.</dd> <dt> -v, -\\\-verbose </dt> <dd>If specified, sets the logging level to verbose. Additional log information is written to the log file and the command line during command execution.</dd> <dt>-\\\-clean\_last </dt> <dd>Remove the state files generated by last `analyzedb` operation. All other options except `-d` are ignored.</dd> <dt>-\\\-clean\_all </dt> <dd>Remove all the state files generated by `analyzedb`. All other options except` -d` are ignored.</dd> <dt>-h, -?, -\\\-help </dt> <dd>Displays the online help.</dd> <dt>-\\\-version </dt> <dd>Displays the version of this utility.</dd> ## <a id="topic1__section6"></a>Examples An example that collects statistics only on a set of table columns. In the database `mytest`, collect statistics on the columns `shipdate` and `receiptdate` in the table `public.orders`: ``` shell $ analyzedb -d mytest -t public.orders -i shipdate, receiptdate ``` An example that collects statistics on a table and exclude a set of columns. In the database `mytest`, collect statistics on the table `public.foo`, and do not collect statistics on the columns `bar` and `test2`. ``` shell $ analyzedb -d mytest -t public.foo -x bar, test2 ``` An example that specifies a file that contains a list of tables. This command collect statistics on the tables listed in the file `analyze-tables` in the database named `mytest`. ``` shell $ analyzedb -d mytest -f analyze-tables ``` If you do not specify a table, set of tables, or schema, the `analyzedb` utility collects the statistics as needed on all catalog tables and user-defined tables in the specified database. This command refreshes table statistics on the system catalog tables and user-defined tables in the database `mytest`. ``` shell $ analyzedb -d mytest ```