release-notes-1-0-0.html [355:721]: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Getting TM error 97 when tables split or get moved

Defect: 1274651

Symptom: HBase Region Splits, Load Balancing, and Error 97.

Cause: As part of an HBase environment’s ongoing operations (and based on the policies configured for the HBase environment), an HBase region can either get split (into two daughter regions) or moved to a different region server. (Please see the blog: http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/.) If that happens when a Trafodion transaction is active (and operates on rows within the region that is being split or load-balanced), then a subsequent transaction commit operation by the application might encounter an error 97. Please note that under such conditions the Trafodion Transaction Manager will abort the transaction and will preserve the integrity of the database.

Solution: To minimize disruptions when this happens, we suggest that you use one or more of the following approaches:

  1. Enhance your JDBC application logic to retry when an error 97 is returned for a commit operation.
  2. Update the HBase configuration to reduce the times when such disruptions happen. It involves updates to some properties that can be set in hbase-site.xml (or via the manageability interface of your Hadoop distribution).
  3. Disable HBase Region Load Balancing. Use the HBase shell command balance_switch false to disable the movement of a region from one server to another.

    Example

    hbase shell
    hbase(main):002:0> balance_switch false
    true  -- Output will be the last setting of the balance_switch value
    0 row(s) in 0.0080 seconds
            
  4. Pre-split the table into multiple regions by using the SALT USING n PARTITIONS clause when creating the table. The number of partitions that you specify could be a function of the number of region servers present in the HBase cluster. Here is a simple example in which the table INVENTORY is pre-split into four regions when created:
    CREATE TABLE INVENTORY
      (
        ITEM_ID       INT UNSIGNED NO DEFAULT NOT NULL
      , ITEM_TYPE     INT UNSIGNED NO DEFAULT NOT NULL
      , ITEM_COUNT    INT UNSIGNED NO DEFAULT NOT NULL
      , PRIMARY KEY (ITEM_ID ASC)
      )  SALT USING 4 PARTITIONS
      ;    

EXECUTE.BATCH update creates core-file

Defect: 1274962

Symptom: EXECUTE.BATCH hangs for a long time doing updates, and the update creates a core file.

Cause: To be determined.

Solution: Batch updates and ODBC row arrays do not currently work.

Random update statistics failures with HBase OutOfOrderScannerNextException

Defect: 1391271

Symptom: While running update statistics commands, you see HBase OutOfOrderScannerNextException errors.

Cause: The default hbase.rpc.timeout and hbase.client.scanner.timeout.period values might be too low given the size of the tables. Sampling in update statistics is implemented using the HBase Random RowFilter. For very large tables with several billion rows, the sampling ratio required to get a sample of 1 million rows is very small. This can result in HBase client connection timeout errors since there may be no row returned by a RegionServer for an extended period of time.

Solution: Increase the hbase.rpc.timeout and hbase.client.scanner.timeout.period values. We have found that increasing those values to 600 seconds (10 minutes) might sometimes prevent many timeout-related errors. For more information, see the HBase Configuration and Fine Tuning Recommendations.

If increasing the hbase.rpc.timeout and hbase.client.scanner.timeout.period values does not work, try increasing the chosen sampling size. Choose a sampling percentage higher than the default setting of 1 million rows for large tables. For example, suppose table T has one billion rows. The following UPDATE STATISTICS statement will sample a million rows, or approximately one-tenth of one percent of the total rows:

update statistics for table T on every column sample;

To sample one percent of the rows, regardless of the table size, you must explicitly state the sampling rate as follows:

update statistics for table T on every column sample random 1 percent;

Following update statistics, stats do not take effect immediately

Defect: 1409937

Symptom: Immediately following an update statistics operation, the generated query plan does not seem to reflect the existence of statistics. For example, in a session, you create, and populate a table and then run update statistics on the table, prepare a query, and exit. A serial plan is generated and the estimated cardinality is 100 for both tables. In a new session, you prepare the same query, and a parallel plan is generated where the estimated cardinality reflects the statistics.

Cause: This is a day-one issue.

Solution: Retry the query after two minutes. Set CQD HIST_NO_STATS_REFRESH_INTERVAL to ‘0’. Run an UPDATE STATISTICS statement. Perform DML operations in a different session.

Back to top


Disclaimer: Apache Trafodion is an effort undergoing incubation at the Apache Software Foundation (ASF), sponsored by the Apache Incubator PMC. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.

Apache, Apache Maven, Apache Maven Fluido Skin, the Apache feather logo, the Apache Maven project logo and the Apache Incubator project logo are trademarks of The Apache Software Foundation.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - release-notes-1-0-1.html [351:717]: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Getting TM error 97 when tables split or get moved

Defect: 1274651

Symptom: HBase Region Splits, Load Balancing, and Error 97.

Cause: As part of an HBase environment’s ongoing operations (and based on the policies configured for the HBase environment), an HBase region can either get split (into two daughter regions) or moved to a different region server. (Please see the blog: http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/.) If that happens when a Trafodion transaction is active (and operates on rows within the region that is being split or load-balanced), then a subsequent transaction commit operation by the application might encounter an error 97. Please note that under such conditions the Trafodion Transaction Manager will abort the transaction and will preserve the integrity of the database.

Solution: To minimize disruptions when this happens, we suggest that you use one or more of the following approaches:

  1. Enhance your JDBC application logic to retry when an error 97 is returned for a commit operation.
  2. Update the HBase configuration to reduce the times when such disruptions happen. It involves updates to some properties that can be set in hbase-site.xml (or via the manageability interface of your Hadoop distribution).
  3. Disable HBase Region Load Balancing. Use the HBase shell command balance_switch false to disable the movement of a region from one server to another.

    Example

    hbase shell
    hbase(main):002:0> balance_switch false
    true  -- Output will be the last setting of the balance_switch value
    0 row(s) in 0.0080 seconds
            
  4. Pre-split the table into multiple regions by using the SALT USING n PARTITIONS clause when creating the table. The number of partitions that you specify could be a function of the number of region servers present in the HBase cluster. Here is a simple example in which the table INVENTORY is pre-split into four regions when created:
    CREATE TABLE INVENTORY
      (
        ITEM_ID       INT UNSIGNED NO DEFAULT NOT NULL
      , ITEM_TYPE     INT UNSIGNED NO DEFAULT NOT NULL
      , ITEM_COUNT    INT UNSIGNED NO DEFAULT NOT NULL
      , PRIMARY KEY (ITEM_ID ASC)
      )  SALT USING 4 PARTITIONS
      ;    

EXECUTE.BATCH update creates core-file

Defect: 1274962

Symptom: EXECUTE.BATCH hangs for a long time doing updates, and the update creates a core file.

Cause: To be determined.

Solution: Batch updates and ODBC row arrays do not currently work.

Random update statistics failures with HBase OutOfOrderScannerNextException

Defect: 1391271

Symptom: While running update statistics commands, you see HBase OutOfOrderScannerNextException errors.

Cause: The default hbase.rpc.timeout and hbase.client.scanner.timeout.period values might be too low given the size of the tables. Sampling in update statistics is implemented using the HBase Random RowFilter. For very large tables with several billion rows, the sampling ratio required to get a sample of 1 million rows is very small. This can result in HBase client connection timeout errors since there may be no row returned by a RegionServer for an extended period of time.

Solution: Increase the hbase.rpc.timeout and hbase.client.scanner.timeout.period values. We have found that increasing those values to 600 seconds (10 minutes) might sometimes prevent many timeout-related errors. For more information, see the HBase Configuration and Fine Tuning Recommendations.

If increasing the hbase.rpc.timeout and hbase.client.scanner.timeout.period values does not work, try increasing the chosen sampling size. Choose a sampling percentage higher than the default setting of 1 million rows for large tables. For example, suppose table T has one billion rows. The following UPDATE STATISTICS statement will sample a million rows, or approximately one-tenth of one percent of the total rows:

update statistics for table T on every column sample;

To sample one percent of the rows, regardless of the table size, you must explicitly state the sampling rate as follows:

update statistics for table T on every column sample random 1 percent;

Following update statistics, stats do not take effect immediately

Defect: 1409937

Symptom: Immediately following an update statistics operation, the generated query plan does not seem to reflect the existence of statistics. For example, in a session, you create, and populate a table and then run update statistics on the table, prepare a query, and exit. A serial plan is generated and the estimated cardinality is 100 for both tables. In a new session, you prepare the same query, and a parallel plan is generated where the estimated cardinality reflects the statistics.

Cause: This is a day-one issue.

Solution: Retry the query after two minutes. Set CQD HIST_NO_STATS_REFRESH_INTERVAL to ‘0’. Run an UPDATE STATISTICS statement. Perform DML operations in a different session.

Back to top


Disclaimer: Apache Trafodion is an effort undergoing incubation at the Apache Software Foundation (ASF), sponsored by the Apache Incubator PMC. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.

Apache, Apache Maven, Apache Maven Fluido Skin, the Apache feather logo, the Apache Maven project logo and the Apache Incubator project logo are trademarks of The Apache Software Foundation.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -