14.4. HBase Configurations

See Section 2.6.2, “Recommended Configurations”.

14.4.1. Managing Compactions

For larger systems, managing compactions and splits may be something you want to consider.

14.4.2. hbase.regionserver.handler.count

See hbase.regionserver.handler.count.

14.4.3. hfile.block.cache.size

See hfile.block.cache.size. A memory setting for the RegionServer process.

14.4.4. Prefetch Option for Blockcache

HBASE-9857 adds a new option to prefetch HFile contents when opening the blockcache, if a columnfamily or regionserver property is set. This option is available for HBase 0.98.3 and later. The purpose is to warm the blockcache as rapidly as possible after the cache is opened, using in-memory table data, and not counting the prefetching as cache misses. This is great for fast reads, but is not a good idea if the data to be preloaded will not fit into the blockcache. It is useful for tuning the IO impact of prefetching versus the time before all data blocks are in cache.

To enable prefetching on a given column family, you can use HBase Shell or use the API.

Example 14.1. Enable Prefetch Using HBase Shell

hbase> create 'MyTable', { NAME => 'myCF', PREFETCH_BLOCKS_ON_OPEN => 'true' }

Example 14.2. Enable Prefetch Using the API

// ...
HTableDescriptor tableDesc = new HTableDescriptor("myTable");
HColumnDescriptor cfDesc = new HColumnDescriptor("myCF");
// ...        

See the API documentation for CacheConfig.

14.4.5. hbase.regionserver.global.memstore.size

See ???. This memory setting is often adjusted for the RegionServer process depending on needs.

14.4.6. hbase.regionserver.global.memstore.size.lower.limit

See ???. This memory setting is often adjusted for the RegionServer process depending on needs.

14.4.7. hbase.hstore.blockingStoreFiles

See hbase.hstore.blockingStoreFiles. If there is blocking in the RegionServer logs, increasing this can help.

14.4.8. hbase.hregion.memstore.block.multiplier

See hbase.hregion.memstore.block.multiplier. If there is enough RAM, increasing this can help.

14.4.9. hbase.regionserver.checksum.verify

Have HBase write the checksum into the datablock and save having to do the checksum seek whenever you read.

See hbase.regionserver.checksum.verify, hbase.hstore.bytes.per.checksum and hbase.hstore.checksum.algorithm For more information see the release note on HBASE-5074 support checksums in HBase block cache.

14.4.10. Tuning callQueue Options

HBASE-11355 introduces several callQueue tuning mechanisms which can increase performance. See the JIRA for some benchmarking information.

  • To increase the number of callqueues, set hbase.ipc.server.num.callqueue to a value greater than 1.

  • To split the callqueue into separate read and write queues, set hbase.ipc.server.callqueue.read.ratio to a value between 0 and 1. This factor weights the queues toward writes (if below .5) or reads (if above .5). Another way to say this is that the factor determines what percentage of the split queues are used for reads. The following examples illustrate some of the possibilities. Note that you always have at least one write queue, no matter what setting you use.

    • The default value of 0 does not split the queue.

    • A value of .3 uses 30% of the queues for reading and 60% for writing. Given a value of 10 for hbase.ipc.server.num.callqueue, 3 queues would be used for reads and 7 for writes.

    • A value of .5 uses the same number of read queues and write queues. Given a value of 10 for hbase.ipc.server.num.callqueue, 5 queues would be used for reads and 5 for writes.

    • A value of .6 uses 60% of the queues for reading and 30% for reading. Given a value of 10 for hbase.ipc.server.num.callqueue, 7 queues would be used for reads and 3 for writes.

    • A value of 1.0 uses one queue to process write requests, and all other queues process read requests. A value higher than 1.0 has the same effect as a value of 1.0. Given a value of 10 for hbase.ipc.server.num.callqueue, 9 queues would be used for reads and 1 for writes.

  • You can also split the read queues so that separate queues are used for short reads (from Get operations) and long reads (from Scan operations), by setting the hbase.ipc.server.callqueue.scan.ratio option. This option is a factor between 0 and 1, which determine the ratio of read queues used for Gets and Scans. More queues are used for Gets if the value is below .5 and more are used for scans if the value is above .5. No matter what setting you use, at least one read queue is used for Get operations.

    • A value of 0 does not split the read queue.

    • A value of .3 uses 60% of the read queues for Gets and 30% for Scans. Given a value of 20 for hbase.ipc.server.num.callqueue and a value of .5 for hbase.ipc.server.callqueue.read.ratio, 10 queues would be used for reads, out of those 10, 7 would be used for Gets and 3 for Scans.

    • A value of .5 uses half the read queues for Gets and half for Scans. Given a value of 20 for hbase.ipc.server.num.callqueue and a value of .5 for hbase.ipc.server.callqueue.read.ratio, 10 queues would be used for reads, out of those 10, 5 would be used for Gets and 5 for Scans.

    • A value of .6 uses 30% of the read queues for Gets and 60% for Scans. Given a value of 20 for hbase.ipc.server.num.callqueue and a value of .5 for hbase.ipc.server.callqueue.read.ratio, 10 queues would be used for reads, out of those 10, 3 would be used for Gets and 7 for Scans.

    • A value of 1.0 uses all but one of the read queues for Scans. Given a value of 20 for hbase.ipc.server.num.callqueue and a value of .5 for hbase.ipc.server.callqueue.read.ratio, 10 queues would be used for reads, out of those 10, 1 would be used for Gets and 9 for Scans.

  • You can use the new option hbase.ipc.server.callqueue.handler.factor to programmatically tune the number of queues:

    • A value of 0 uses a single shared queue between all the handlers.

    • A value of 1 uses a separate queue for each handler.

    • A value between 0 and 1 tunes the number of queues against the number of handlers. For instance, a value of .5 shares one queue between each two handlers.

    Having more queues, such as in a situation where you have one queue per handler, reduces contention when adding a task to a queue or selecting it from a queue. The trade-off is that if you have some queues with long-running tasks, a handler may end up waiting to execute from that queue rather than processing another queue which has waiting tasks.

For these values to take effect on a given Region Server, the Region Server must be restarted. These parameters are intended for testing purposes and should be used carefully.

comments powered by Disqus