--- layout: post status: PUBLISHED published: true title: 'HDFS HSM and HBase: Tuning (Part 3 of 7)' id: 855e5449-b803-4888-b2da-e326c554b161 date: '2016-04-22 23:49:14 -0400' categories: hbase tags: - evaluation - performance - part3of7 - tuning permalink: hbase/entry/hdfs_hsm_and_hbase_tuning ---
This is part 3 of a 7 part report by HBase Contributor, Jingcheng Du and HDFS contributor, Wei Zhou (Jingcheng and Wei are both Software Engineers at Intel)
To perform the study, we made a set of enhancements in the software stack:
HDFS:
Support a new storage RAMDISK
Add file level mover support, a user can move blocks per file without scanning all metadata in NameNode
HBase:
WAL, flushed HFiles, HFiles generated in compactions, and archived HFiles can be stored in different storage
When renaming HFiles across storage, the blocks of that file would be moved to the target storage asynchronously
This step is to find the best configurations for HDFS and HBase.
These are the key performance factors in HBase:
WAL: write ahead log to guarantee the non-volatility and consistency of the data. Each record that is inserted to HBase must be written to WAL which can slow down user operations. It’s latency-sensitive.
Memstore and Flush: The records inserted into HBase are cached in memstore, and when reaches a threshold the memstore is flushed to a store file. Slow flush can lead to high GC (Garbage Collection) pause, and make memory usage reach the thresholds in regions and region server, which can block the user operations.
Compaction and Number of Store Files: HBase compaction compacts small store files to a larger one which can reduce the number of store files and accelerate the reading, but it can generate heavy I/O and consume the disk bandwidth in runtime. Less compaction can accelerate the writing but generates too many store files, which slow down the reading. When there are too many store files, the memstore flush can be slowed down which can lead to a large memstore and further slow the user operations.
Based on this understanding, the following are the tuned parameters we finally used.
Property |
Value |
dfs.datanode.handler.count |
64 |
dfs.namenode.handler.count |
100 |
Table 8. HDFS configuration
Property |
Value |
hbase.regionserver.thread.compaction.small |
3 for non-SSD test cases. 8 for all SSD related test cases. |
hbase.hstore.flusher.count |
5 for non-SSD test cases. 15 for all SSD related test cases. |
hbase.wal.regiongrouping.numgroups |
4 |
hbase.wal.provider |
multiwal |
hbase.hstore.blockingStoreFiles |
15 |
hbase.regionserver.handler.count |
200 |
hbase.hregion.memstore.chunkpool.maxsize |
1 |
hbase.hregion.memstore.chunkpool.initialsize |
0.5 |
Table 9. HBase configuration
Go to part 4, Experiment