The Apache HBase™ Reference Guide

Question

A.9.1.

What can go where?

Answer 1

There is often confusion about which child elements are valid in a given context. When in doubt, Docbook: The Definitive Guide is the best resource. It has an appendix which is indexed by element and contains all valid child and parent elements of any given element. If you edit Docbook often, a schema-aware XML editor makes things easier.

Answer 2

It is a common pattern, and it is technically valid, to put an admonition such as a <note> inside a <para> element. Because admonitions render as block-level elements (they take the whole width of the page), it is better to mark them up as siblings to the paragraphs around them, like this:

<para>This is the paragraph.</para>
<note>
    <para>This is an admonition which occurs after the paragraph.</para>
</note>

Answer 3

Because the contents of a <listitem> (an element in an itemized, ordered, or variable list) or an <entry> (a cell in a table) can consist of things other than plain text, they need to be wrapped in some element. If they are plain text, they need to be inclosed in <para> tags. This is tedious but necessary for validity.

<itemizedlist>
    <listitem>
        <para>This is a paragraph.</para>
    </listitem>
    <listitem>
        <screen>This is screen output.</screen>
    </listitem>
</itemizedlist>

Answer 4

The first two are in-line tags, which can occur within the flow of paragraphs or titles. The second two are block elements.

Use <command> to mention a command such as hbase shell in the flow of a sentence. Use <code> for other inline text referring to code. Incidentally, use <literal> to specify literal strings that should be typed or entered exactly as shown. Within a <screen> listing, it can be helpful to use the <userinput> and <computeroutput> elements to mark up the text further.

Use <screen> to display input and output as the user would see it on the screen, in a log file, etc. Use <programlisting> only for blocks of code that occur within a file, such as Java or XML code, or a Bash shell script.

Answer 5

For one-off instances or short in-line mentions, use the < and > encoded characters. For longer mentions, or blocks of code, enclose it with <![CDATA[]]>, which is much easier to maintain and parse in the source files..

Answer 6

Text within <screen> and <programlisting> elements is shown exactly as it appears in the source, including indentation, tabs, and line wrap.

Indent the starting and closing XML elements, but do not indent the content. Also, to avoid having an extra blank line at the beginning of the programlisting output, do not put the CDATA element on its own line. For example:
```
        <programlisting>
case $1 in
  --cleanZk|--cleanHdfs|--cleanAll)
    matches="yes" ;;
  *) ;;
esac
        </programlisting>
```
After pasting code into a programlisting, fix the indentation manually, using two spaces per desired indentation. For screen output, be sure to include line breaks so that the text is no longer than 100 characters.

Answer 7

Be careful with pretty-printing or re-formatting an entire XML file, even if the formatting has degraded over time. If you need to reformat a file, do that in a separate JIRA where you do not change any content. Be careful because some XML editors do a bulk-reformat when you open a new file, especially if you use GUI mode in the editor.

Answer 8

The HBase Reference Guide uses the XSLT Syntax Highlighting Maven module for syntax highlighting. To enable syntax highlighting for a given <programlisting> or <screen> (or possibly other elements), add the attribute language=LANGUAGE_OF_CHOICE to the element, as in the following example:

<programlisting language="xml">
    <foo>bar</foo>
    <bar>foo</bar>
</programlisting>

Several syntax types are supported. The most interesting ones for the HBase Reference Guide are java, xml, sql, and bourne (for BASH shell output or Linux command-line examples).

Answer 9

A:

See the Section 9.1, “Overview” in the Architecture chapter.

Answer 10

A:

See the FAQ that is up on the wiki, HBase Wiki FAQ.

Answer 11

A:

Not really. SQL-ish support for HBase via Hive is in development, however Hive is based on MapReduce which is not generally suitable for low-latency requests. See the Chapter 5, Data Model section for examples on the HBase client.

Answer 12

A:

See the link to the BigTable paper in Appendix I, Other Information About HBase in the appendix, as well as the other papers.

Answer 13

A:

See Appendix J, HBase History.

Answer 14

A:

In HBase 0.96, the project moved to a modular structure. Adjust your project's dependencies to rely upon the hbase-client module or another module as appropriate, rather than a single JAR. You can model your Maven depency after one of the following, depending on your targeted version of HBase. See Section 3.5, “Upgrading from 0.94.x to 0.96.x” or Section 3.3, “Upgrading from 0.96.x to 0.98.x” for more information.

Example B.1. Maven Dependency for HBase 0.98

<dependency>
	<groupId>org.apache.hbase</groupId>
	<artifactId>hbase-client</artifactId>
	<version>0.98.5-hadoop2</version>
</dependency>

Example B.2. Maven Dependency for HBase 0.96

<dependency>
	<groupId>org.apache.hbase</groupId>
	<artifactId>hbase-client</artifactId>
	<version>0.96.2-hadoop2</version>
</dependency>

Example B.3. Maven Dependency for HBase 0.94

<dependency>
	<groupId>org.apache.hbase</groupId>
	<artifactId>hbase</artifactId>
	<version>0.94.3</version>
</dependency>

Answer 15

A:

See Section 9.7, “Regions”.

Answer 16

A:

See Section 1.2, “Quick Start - Standalone HBase”.

Answer 17

A:

See Chapter 2, Apache HBase Configuration.

Answer 18

A:

See Chapter 5, Data Model and Chapter 6, HBase and Schema Design

Answer 19

A:

See Section 6.5, “ Supported Datatypes ”.

Answer 20

A:

See Section 6.9, “ Secondary Indexes and Alternate Query Paths ”

Answer 21

A:

This is a very common question. You can't. See Section 6.3.6, “Immutability of Rowkeys”.

Answer 22

A:

See Chapter 5, Data Model, Section 9.3, “Client” and Section 11.1, “Non-Java Languages Talking to the JVM”.

Answer 23

A:

See Chapter 7, HBase and MapReduce

Answer 24

A:

See Chapter 14, Apache HBase Performance Tuning.

Answer 25

A:

See Chapter 15, Troubleshooting and Debugging Apache HBase.

Answer 26

A:

EC2 issues are a special case. See Troubleshooting Section 15.12, “Amazon EC2” and Performance Section 14.12, “Amazon EC2” sections.

Answer 27

A:

See Chapter 17, Apache HBase Operational Management

Answer 28

A:

See Section 17.7, “HBase Backup”

Answer 29

A:

See Appendix I, Other Information About HBase

	HBase-0.92.x	HBase-0.94.x	HBase-0.96.x	HBase-0.98.x (Support for Hadoop 1.x is deprecated.)	HBase-1.0.x (Hadoop 1.x is NOT supported)
Hadoop-0.20.205	S	X	X	X	X
Hadoop-0.22.x	S	X	X	X	X
Hadoop-1.0.0-1.0.2 (HBase requires hadoop 1.0.3 at a minimum; there is an issue where we cannot find KerberosUtil compiling against earlier versions of Hadoop.)	X	X	X	X	X
Hadoop-1.0.3+	S	S	S	X	X
Hadoop-1.1.x	NT	S	S	X	X
Hadoop-0.23.x	X	S	NT	X	X
Hadoop-2.0.x-alpha	X	NT	X	X	X
Hadoop-2.1.0-beta	X	NT	S	X	X
Hadoop-2.2.0	X	NT - To get 0.94.x to run on hadoop 2.2.0, you need to change the hadoop 2 and protobuf versions in the `pom.xml`: Here is a diff with pom.xml changes: $ svn diff pom.xml Index: pom.xml =================================================================== --- pom.xml (revision 1545157) +++ pom.xml (working copy) @@ -1034,7 +1034,7 @@ <slf4j.version>1.4.3</slf4j.version> <log4j.version>1.2.16</log4j.version> <mockito-all.version>1.8.5</mockito-all.version> - <protobuf.version>2.4.0a</protobuf.version> + <protobuf.version>2.5.0</protobuf.version> <stax-api.version>1.0.1</stax-api.version> <thrift.version>0.8.0</thrift.version> <zookeeper.version>3.4.5</zookeeper.version> @@ -2241,7 +2241,7 @@ </property> </activation> <properties> - <hadoop.version>2.0.0-alpha</hadoop.version> + <hadoop.version>2.2.0</hadoop.version> <slf4j.version>1.6.1</slf4j.version> </properties> <dependencies> The next step is to regenerate Protobuf files and assuming that the Protobuf has been installed: Go to the hbase root folder, using the command line; Type the following commands: $ protoc -Isrc/main/protobuf --java_out=src/main/java src/main/protobuf/hbase.proto $ protoc -Isrc/main/protobuf --java_out=src/main/java src/main/protobuf/ErrorHandling.proto Building against the hadoop 2 profile by running something like the following command: $ mvn clean install assembly:single -Dhadoop.profile=2.0 -DskipTests	S	S	NT
Hadoop-2.3.x	X	NT	S	S	NT
Hadoop-2.4.x	X	NT	S	S	S
Hadoop-2.5.x	X	NT	S	S	S

Row Key	Time Stamp	ColumnFamily `contents`	ColumnFamily `anchor`	ColumnFamily `people`
"com.cnn.www"	t9		`anchor:cnnsi.com` = "CNN"
"com.cnn.www"	t8		`anchor:my.look.ca` = "CNN.com"
"com.cnn.www"	t6	`contents:html` = "<html>..."
"com.cnn.www"	t5	`contents:html` = "<html>..."
"com.cnn.www"	t3	`contents:html` = "<html>..."
"com.example.www"	t5	`contents:html` = "<html>..."		people:author = "John Doe"

Row Key	Time Stamp	Column Family `anchor`
"com.cnn.www"	t9	`anchor:cnnsi.com` = "CNN"
"com.cnn.www"	t8	`anchor:my.look.ca` = "CNN.com"

Row Key	Time Stamp	ColumnFamily "contents:"
"com.cnn.www"	t6	`contents:html` = "<html>..."
"com.cnn.www"	t5	`contents:html` = "<html>..."
"com.cnn.www"	t3	`contents:html` = "<html>..."

Permission	Operation
Read	Get
	Exists
	Scan
Write	Put
	Delete
	IncrementColumnValue
	CheckAndDelete/Put
Create	Create
	Alter
	Drop
	Bulk Load
Admin	Enable/Disable
	Snapshot/Restore/Clone
	Split
	Flush
	Compact
	Major Compact
	Roll HLog
	Grant
	Revoke
	Shutdown
Execute	Execute coprocessor endpoints

Node Name	Master	ZooKeeper	RegionServer
node-a.example.com	yes	yes	no
node-b.example.com	backup	yes	yes
node-c.example.com	no	yes	yes

HBase Version	JDK 6	JDK 7	JDK 8
1.0	Not Supported	yes	Running with JDK 8 will work but is not well tested.
0.98	yes	yes	Running with JDK 8 works but is not well tested. Building with JDK 8 would require removal of the deprecated remove() method of the PoolMap class and is under consideration. See ee HBASE-7608 for more information about JDK 8 support.
0.96	yes	yes
0.94	yes	yes

Expression	Interpretation
fulltime	Allow accesss to users associated with the `fulltime` label.
!public	Allow access to users not associated with the `public` label.
Allow access to users associated with either the `secret` or `topsecret` label and not associated with the `probationary` label.

Parameter	Description	Default
hbase.hstore.compaction.min	The minimum number of StoreFiles which must be eligible for compaction before compaction can run. The goal of tuning `hbase.hstore.compaction.min` is to avoid ending up with too many tiny StoreFiles to compact. Setting this value to `2` would cause a minor compaction each time you have two StoreFiles in a Store, and this is probably not appropriate. If you set this value too high, all the other values will need to be adjusted accordingly. For most cases, the default value is appropriate. In previous versions of HBase, the parameter `hbase.hstore.compaction.min` was called `hbase.hstore.compactionThreshold`.	3
hbase.hstore.compaction.max	The maximum number of StoreFiles which will be selected for a single minor compaction, regardless of the number of eligible StoreFiles. Effectively, the value of `hbase.hstore.compaction.max` controls the length of time it takes a single compaction to complete. Setting it larger means that more StoreFiles are included in a compaction. For most cases, the default value is appropriate.	10
hbase.hstore.compaction.min.size	A StoreFile smaller than this size will always be eligible for minor compaction. StoreFiles this size or larger are evaluated by `hbase.hstore.compaction.ratio` to determine if they are eligible. Because this limit represents the "automatic include" limit for all StoreFiles smaller than this value, this value may need to be reduced in write-heavy environments where many files in the 1-2 MB range are being flushed, because every StoreFile will be targeted for compaction and the resulting StoreFiles may still be under the minimum size and require further compaction. If this parameter is lowered, the ratio check is triggered more quickly. This addressed some issues seen in earlier versions of HBase but changing this parameter is no longer necessary in most situations.	128 MB
hbase.hstore.compaction.max.size	An StoreFile larger than this size will be excluded from compaction. The effect of raising `hbase.hstore.compaction.max.size` is fewer, larger StoreFiles that do not get compacted often. If you feel that compaction is happening too often without much benefit, you can try raising this value.	Long.MAX_VALUE
hbase.hstore.compaction.ratio	For minor compaction, this ratio is used to determine whether a given StoreFile which is larger than `hbase.hstore.compaction.min.size` is eligible for compaction. Its effect is to limit compaction of large StoreFile. The value of `hbase.hstore.compaction.ratio` is expressed as a floating-point decimal. A large ratio, such as `10`, will produce a single giant StoreFile. Conversely, a value of `.25`, will produce behavior similar to the BigTable compaction algorithm, producing four StoreFiles. A moderate value of between 1.0 and 1.4 is recommended. When tuning this value, you are balancing write costs with read costs. Raising the value (to something like 1.4) will have more write costs, because you will compact larger StoreFiles. However, during reads, HBase will need to seek through fewer StpreFo;es to accomplish the read. Consider this approach if you cannot take advantage of Section 14.6.4, “Bloom Filters”. Alternatively, you can lower this value to something like 1.0 to reduce the background cost of writes, and use Section 14.6.4, “Bloom Filters” to limit the number of StoreFiles touched during reads. For most cases, the default value is appropriate.	1.2F
hbase.hstore.compaction.ratio.offpeak	The compaction ratio used during off-peak compactions, if off-peak hours are also configured (see below). Expressed as a floating-point decimal. This allows for more aggressive (or less aggressive, if you set it lower than `hbase.hstore.compaction.ratio`) compaction during a set time period. Ignored if off-peak is disabled (default). This works the same as `hbase.hstore.compaction.ratio`.	5.0F
hbase.offpeak.start.hour	The start of off-peak hours, expressed as an integer between 0 and 23, inclusive. Set to `-1` to disable off-peak.	-1 (disabled)
hbase.offpeak.end.hour	The end of off-peak hours, expressed as an integer between 0 and 23, inclusive. Set to `-1` to disable off-peak.	-1 (disabled)
hbase.regionserver.thread.compaction.throttle	There are two different thread pools for compactions, one for large compactions and the other for small compactions. This helps to keep compaction of lean tables (such as `hbase:meta`) fast. If a compaction is larger than this threshold, it goes into the large compaction pool. In most cases, the default value is appropriate.	2 x hbase.hstore.compaction.max x hbase.hregion.memstore.flush.size (which defaults to 128)
hbase.hregion.majorcompaction	Time between major compactions, expressed in milliseconds. Set to 0 to disable time-based automatic major compactions. User-requested and size-based major compactions will still run. This value is multiplied by `hbase.hregion.majorcompaction.jitter` to cause compaction to start at a somewhat-random time during a given window of time.	7 days (604800000 milliseconds)
hbase.hregion.majorcompaction.jitter	A multiplier applied to `hbase.hregion.majorcompaction` to cause compaction to occur a given amount of time either side of `hbase.hregion.majorcompaction`. The smaller the number, the closer the compactions will happen to the `hbase.hregion.majorcompaction` interval. Expressed as a floating-point decimal.	.50F

Setting	Notes
`hbase.store.stripe.initialStripeCount`	The number of stripes to create when stripe compaction is enabled. You can use it as follows: For relatively uniform row keys, if you know the approximate target number of stripes from the above, you can avoid some splitting overhead by starting with several stripes (2, 5, 10...). If the early data is not representative of overall row key distribution, this will not be as efficient. For existing tables with a large amount of data, this setting will effectively pre-split your stripes. For keys such as hash-prefixed sequential keys, with more than one hash prefix per region, pre-splitting may make sense.
`hbase.store.stripe.sizeToSplit`	The maximum size a stripe grows before splitting. Use this in conjunction with `hbase.store.stripe.splitPartCount` to control the target stripe size (sizeToSplit = splitPartsCount * target stripe size), according to the above sizing considerations.
`hbase.store.stripe.splitPartCount`	The number of new stripes to create when splitting a stripe. The default is 2, which is appropriate for most cases. For non-uniform row keys, you can experiment with increasing the number to 3 or 4, to isolate the arriving updates into narrower slice of the region without additional splits being required.

Parameter	Default	Description
`io.hfile.bloom.enabled`	`yes`	Set to `no` to kill bloom filters server-wide if something goes wrong
`io.hfile.bloom.error.rate`	`.01`	The average false positive rate for bloom filters. Folding is used to maintain the false positive rate. Expressed as a decimal representation of a percentage.
`io.hfile.bloom.max.fold`	`7`	The guaranteed maximum fold rate. Changing this setting should not be necessary and is not recommended.
`io.storefile.bloom.max.keys`	`128000000`	For default (single-block) Bloom filters, this specifies the maximum number of keys.
`io.storefile.delete.family.bloom.enabled`	`true`	Master switch to enable Delete Family Bloom filters and store them in the StoreFile.
`io.storefile.bloom.block.size`	`65536`	Target Bloom block size. Bloom filter blocks of approximately this size are interleaved with data blocks.
`hfile.block.bloom.cacheonwrite`	`false`	Enables cache-on-write for inline blocks of a compound Bloom filter.

Pre-0.98.x	0.98-x And Newer
`ipc.server.listen.queue.size`	`hbase.ipc.server.listen.queue.size`
`ipc.server.max.callqueue.size`	`hbase.ipc.server.max.callqueue.size`
`ipc.server.callqueue.handler.factor`	`hbase.ipc.server.callqueue.handler.factor`
`ipc.server.callqueue.read.share`	`hbase.ipc.server.callqueue.read.share`
`ipc.server.callqueue.type`	`hbase.ipc.server.callqueue.type`
`ipc.server.queue.max.call.delay`	`hbase.ipc.server.queue.max.call.delay`
`ipc.server.max.callqueue.length`	`hbase.ipc.server.max.callqueue.length`
`ipc.server.read.threadpool.size`	`hbase.ipc.server.read.threadpool.size`
`ipc.server.tcpkeepalive`	`hbase.ipc.server.tcpkeepalive`
`ipc.server.tcpnodelay`	`hbase.ipc.server.tcpnodelay`
`ipc.client.call.purge.timeout`	`hbase.ipc.client.call.purge.timeout`
`ipc.client.connection.maxidletime`	`hbase.ipc.client.connection.maxidletime`
`ipc.client.idlethreshold`	`hbase.ipc.client.idlethreshold`
`ipc.client.kill.max`	`hbase.ipc.client.kill.max`
`ipc.server.scan.vtime.weight`	`hbase.ipc.server.scan.vtime.weight`

Option	Description	Default
`zookeeper.znode.parent`	The name of the base ZooKeeper znode used for HBase	`/hbase`
`zookeeper.znode.replication`	The name of the base znode used for replication	`replication`
`zookeeper.znode.replication.peers`	The name of the `peer` znode	`peers`
`zookeeper.znode.replication.peers.state`	The name of `peer-state` znode	`peer-state`
`zookeeper.znode.replication.rs`	The name of the `rs` znode	`rs`
`hbase.replication`	Whether replication is enabled or disabled on a given cluster	`false`
`eplication.sleep.before.failover`	How many milliseconds a worker should sleep before attempting to replicate a dead region server's WAL queues.
`replication.executor.workers`	The number of region servers a given region server should attempt to failover simultaneously.	`1`

B.1. General
Q: When should I use HBase? Q: Are there other HBase FAQs? Q: Does HBase support SQL? Q: How can I find examples of NoSQL/HBase? Q: What is the history of HBase?
Q:	When should I use HBase?
A:	See the Section 9.1, “Overview” in the Architecture chapter.
Q:	Are there other HBase FAQs?
A:	See the FAQ that is up on the wiki, HBase Wiki FAQ.
Q:	Does HBase support SQL?
A:	Not really. SQL-ish support for HBase via Hive is in development, however Hive is based on MapReduce which is not generally suitable for low-latency requests. See the Chapter 5, Data Model section for examples on the HBase client.
Q:	How can I find examples of NoSQL/HBase?
A:	See the link to the BigTable paper in Appendix I, Other Information About HBase in the appendix, as well as the other papers.
Q:	What is the history of HBase?
A:	See Appendix J, HBase History.
B.2. Upgrading
Q: How do I upgrade Maven-managed projects from HBase 0.94 to HBase 0.96+?
Q:	How do I upgrade Maven-managed projects from HBase 0.94 to HBase 0.96+?
A:	In HBase 0.96, the project moved to a modular structure. Adjust your project's dependencies to rely upon the `hbase-client` module or another module as appropriate, rather than a single JAR. You can model your Maven depency after one of the following, depending on your targeted version of HBase. See Section 3.5, “Upgrading from 0.94.x to 0.96.x” or Section 3.3, “Upgrading from 0.96.x to 0.98.x” for more information. Example B.1. Maven Dependency for HBase 0.98 <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifactId> <version>0.98.5-hadoop2</version> </dependency> Example B.2. Maven Dependency for HBase 0.96 <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase-client</artifactId> <version>0.96.2-hadoop2</version> </dependency> Example B.3. Maven Dependency for HBase 0.94 <dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase</artifactId> <version>0.94.3</version> </dependency>
B.3. Architecture
Q: How does HBase handle Region-RegionServer assignment and locality?
Q:	How does HBase handle Region-RegionServer assignment and locality?
A:	See Section 9.7, “Regions”.
B.4. Configuration
Q: How can I get started with my first cluster? Q: Where can I learn about the rest of the configuration options?
Q:	How can I get started with my first cluster?
A:	See Section 1.2, “Quick Start - Standalone HBase”.
Q:	Where can I learn about the rest of the configuration options?
A:	See Chapter 2, Apache HBase Configuration.
B.5. Schema Design / Data Access
Q: How should I design my schema in HBase? Q: How can I store (fill in the blank) in HBase? Q: How can I handle secondary indexes in HBase? Q: Can I change a table's rowkeys? Q: What APIs does HBase support?
Q:	How should I design my schema in HBase?
A:	See Chapter 5, Data Model and Chapter 6, HBase and Schema Design
Q:	How can I store (fill in the blank) in HBase?
A:	See Section 6.5, “ Supported Datatypes ”.
Q:	How can I handle secondary indexes in HBase?
A:	See Section 6.9, “ Secondary Indexes and Alternate Query Paths ”
Q:	Can I change a table's rowkeys?
A:	This is a very common question. You can't. See Section 6.3.6, “Immutability of Rowkeys”.
Q:	What APIs does HBase support?
A:	See Chapter 5, Data Model, Section 9.3, “Client” and Section 11.1, “Non-Java Languages Talking to the JVM”.
B.6. MapReduce
Q: How can I use MapReduce with HBase?
Q:	How can I use MapReduce with HBase?
A:	See Chapter 7, HBase and MapReduce
B.7. Performance and Troubleshooting
Q: How can I improve HBase cluster performance? Q: How can I troubleshoot my HBase cluster?
Q:	How can I improve HBase cluster performance?
A:	See Chapter 14, Apache HBase Performance Tuning.
Q:	How can I troubleshoot my HBase cluster?
A:	See Chapter 15, Troubleshooting and Debugging Apache HBase.
B.8. Amazon EC2
Q: I am running HBase on Amazon EC2 and...
Q:	I am running HBase on Amazon EC2 and...
A:	EC2 issues are a special case. See Troubleshooting Section 15.12, “Amazon EC2” and Performance Section 14.12, “Amazon EC2” sections.
B.9. Operations
Q: How do I manage my HBase cluster? Q: How do I back up my HBase cluster?
Q:	How do I manage my HBase cluster?
A:	See Chapter 17, Apache HBase Operational Management
Q:	How do I back up my HBase cluster?
A:	See Section 17.7, “HBase Backup”
B.10. HBase in Action
Q: Where can I find interesting videos and presentations on HBase?
Q:	Where can I find interesting videos and presentations on HBase?
A:	See Appendix I, Other Information About HBase

Interface	Operation	Minimum Scope	Minimum Permission
Master	createTable	Global	C
	modifyTable	Table	A\|C
	deleteTable	Table	A\|C
	truncateTable	Table	A\|C
	addColumn	Table	A\|C
	modifyColumn	Table	A\|C
	deleteColumn	Table	A\|C
	disableTable	Table	A\|C
	disableAclTable	None	Not allowed
	enableTable	Table	A\|C
	move	Global	A
	assign	Global	A
	unassign	Global	A
	regionOffline	Global	A
	balance	Global	A
	balanceSwitch	Global	A
	shutdown	Global	A
	stopMaster	Global	A
	snapshot	Global	A
	clone	Global	A
	restore	Global	A
	deleteSnapshot	Global	A
	createNamespace	Global	A
	deleteNamespace	Namespace	A
	modifyNamespace	Namespace	A
	flushTable	Table	A\|C
	getTableDescriptors	Global\|Table	A
	mergeRegions	Global	A
Region	open	Global	A
	openRegion	Global	A
	close	Global	A
	closeRegion	Global	A
	stopRegionServer	Global	A
	rollHLog	Global	A
	mergeRegions	Global	A
	append	Table\|CF\|CQ	W
	delete	Table\|CF\|CQ\|Cell (if the user has write permission for all cells)	W
	exists	Table\|CF\|CQ	R
	get	Table\|CF\|CQ	R
	getClosestRowBefore	Table\|CF\|CQ	R
	increment	Table\|CF\|CQ	W
	put	Table\|CF\|CQ	W
	flush	Global\|Table	A\|C
	split	Global\|Table	A
	compact	Global\|Table	A\|C
	bulkLoadHFile	Table	W
	prepareBulkLoad	Table	C
	cleanupBulkLoad	Table	W
	checkAndDelete	Table\|CF\|CQ	RW
	checkAndPut	Table\|CF\|CQ	RW
	incrementColumnValue	Table\|CF\|CQ	RW
	scannerClose	Table	R
	scannerNext	Table	R
scannerOpen	Table\|CQ\|CF	R
Endpoint	invoke	Endpoint	X
AccessController	grant	Global\|Table\|NS	A
	revoke	Global\|Table\|NS	A
	getUserPermissions	Global\|Table\|NS	A
	checkPermissions	Global\|Table\|NS	A

RegionServer	test-01	test-02
rs1	r1	r2
rs2	r2
rs3	r2	r1

Release	Release Manager
0.98	Andrew Purtell
1.0	Enis Soztutar

Version 1	Version 2
File info offset (long)
Data index offset (long)	loadOnOpenOffset (long) The offset of the section that we need toload when opening the file.
Number of data index entries (int)
metaIndexOffset (long) This field is not being used by the version 1 reader, so we removed it from version 2.	uncompressedDataIndexSize (long) The total uncompressed size of the whole data block index, including root-level, intermediate-level, and leaf-level blocks.
Number of meta index entries (int)
Total uncompressed bytes (long)
numEntries (int)	numEntries (long)
Compression codec: 0 = LZO, 1 = GZ, 2 = NONE (int)
	The number of levels in the data block index (int)
	firstDataBlockOffset (long) The offset of the first first data block. Used when scanning.
	lastDataBlockEnd (long) The offset of the first byte after the last key/value data block. We don't need to go beyond this offset when scanning.
Version: 1 (int)	Version: 2 (int)

Version 1 & 2 Version 3 without MAX_TAGS_LEN	Version 3 with MAX_TAGS_LEN
Key Length (4 bytes)
Value Length (4 bytes)
Key bytes (variable)
Value bytes (variable)
	Tags Length (2 bytes)
	Tags bytes (variable)

hfile.LASTKEY	The last key of the file (byte array)
hfile.AVG_KEY_LEN	The average key length in the file (int)
hfile.AVG_VALUE_LEN	The average value length in the file (int)

The Apache HBase™ Reference Guide

Preface

Heads-up if this is your first foray into the world of distributed computing...

Chapter 1. Getting Started

1.1. Introduction

1.2. Quick Start - Standalone HBase

Local Filesystem and Durability

Loopback IP - HBase 0.94.x and earlier

1.2.1. JDK Version Requirements

1.2.2. Get Started with HBase

Note

Note

1.2.3. Intermediate - Pseudo-Distributed Local Install

Hadoop Configuration

Note

1.2.4. Advanced - Fully Distributed

Note

ZooKeeper Process Name

Web UI Port Changes

1.2.5. Where to go next

Chapter 2. Apache HBase Configuration

Note

Checking XML Validity

Keep Configuration In Sync Across the Cluster

2.1. Basic Prerequisites

Note

2.1.1. Hadoop

Hadoop 2.x is recommended.

Replace the Hadoop Bundled With HBase!

2.1.1.1. Apache HBase 0.92 and 0.94

2.1.1.2. Apache HBase 0.96

2.1.1.3. Hadoop versions 0.20.x - 1.x

2.1.1.4. Apache HBase on Secure Hadoop

2.1.1.5. dfs.datanode.max.transfer.threads

2.2. HBase run modes: Standalone and Distributed

2.2.1. Standalone HBase

2.2.2. Distributed

2.2.2.1. Pseudo-distributed

Pseudo-Distributed Quickstart

2.2.3. Fully-distributed

2.3. Running and Confirming Your Installation

2.4. Configuration Files

2.4.1. hbase-site.xml and hbase-default.xml

HBase Default Configuration

2.4.2. hbase-env.sh

2.4.3. log4j.properties

2.4.4. Client configuration and dependencies connecting to an HBase cluster

2.4.4.1. Java client configuration

2.5. Example Configurations

2.5.1. Basic Distributed HBase Install

2.5.1.1. hbase-site.xml

2.5.1.2. regionservers

2.5.1.3. hbase-env.sh

2.6. The Important Configurations

2.6.1. Required Configurations

2.6.1.1. Big Cluster Configurations

2.6.1.2. If a backup Master, making primary Master fail fast

2.6.2. Recommended Configurations

2.6.2.1. ZooKeeper Configuration

2.6.2.1.1. zookeeper.session.timeout

2.6.2.1.2. Number of ZooKeeper Instances

2.6.2.2. HDFS Configurations

2.6.2.2.1. dfs.datanode.failed.volumes.tolerated

2.6.2.3. hbase.regionserver.handler.count

2.6.2.4. Configuration for large memory machines

2.6.2.5. Compression

2.6.2.6. Configuring the size and number of WAL files

2.6.2.7. Managed Splitting

Automatic Splitting Is Recommended

2.6.2.8. Managed Compactions

Do Not Disable Major Compactions

2.6.2.9. Speculative Execution

2.6.3. Other Configurations

2.6.3.1. Balancer

2.6.3.2. Disabling Blockcache

2.6.3.3. Nagle's or the small package problem

2.6.3.4. Better Mean Time to Recover (MTTR)

2.6.3.5. JMX

Chapter 3. Upgrading

Note

2.1.1.5. `dfs.datanode.max.transfer.threads`

2.4.1. `hbase-site.xml` and `hbase-default.xml`

2.4.2. `hbase-env.sh`

2.4.3. `log4j.properties`

2.5.1.1. `hbase-site.xml`

2.5.1.2. `regionservers`

2.5.1.3. `hbase-env.sh`

2.6.2.1.1. `zookeeper.session.timeout`

2.6.2.3. `hbase.regionserver.handler.count`

3.5.1.4. Upgrading `META` to use Protocol Buffers (Protobuf)

4.6.2. `irbrc`