Developers, at a minimum, should familiarize themselves with the unit test detail; unit tests in HBase have a character not usually seen in other projects.
This information is about unit tests for HBase itself. For developing unit tests for your HBase applications, see Chapter 19, Unit Testing HBase Applications.
As of 0.96, Apache HBase is split into multiple modules. This creates
"interesting" rules for how and where tests are written. If you are writing code for
hbase-server
, see Section 18.9.2, “Unit Tests” for
how to write your tests. These tests can spin up a minicluster and will need to be
categorized. For any other module, for example hbase-common
,
the tests must be strict unit tests and just test the class under test - no use of
the HBaseTestingUtility or minicluster is allowed (or even possible given the
dependency tree).
The HBase shell and its tests are predominantly written in jruby. In order to make these
tests run as a part of the standard build, there is a single JUnit test,
TestShell
, that takes care of loading the jruby implemented tests and
running them. You can run all of these tests from the top level with:
mvn clean test -Dtest=TestShell
Alternatively, you may limit the shell tests that run using the system variable
shell.test
. This value may specify a particular test case by name. For
example, the tests that cover the shell commands for altering tables are contained in the test
case AdminAlterTableTest
and you can run them with:
mvn clean test -Dtest=TestShell -Dshell.test=AdminAlterTableTest
You may also use a Ruby Regular Expression
literal (in the /pattern/
style) to select a set of test cases.
You can run all of the HBase admin related tests, including both the normal administration and
the security administration, with the command:
mvn clean test -Dtest=TestShell -Dshell.test=/.*Admin.*Test/
In the event of a test failure, you can see details by examining the XML version of the surefire report results
vim hbase-shell/target/surefire-reports/TEST-org.apache.hadoop.hbase.client.TestShell.xml
If the module you are developing in has no other dependencies on other HBase modules, then you can cd into that module and just run:
mvn test
which will just run the tests IN THAT MODULE. If there are other dependencies on other modules, then you will have run the command from the ROOT HBASE DIRECTORY. This will run the tests in the other modules, unless you specify to skip the tests in that module. For instance, to skip the tests in the hbase-server module, you would run:
mvn clean test -PskipServerTests
from the top level directory to run all the tests in modules other than hbase-server. Note that you
can specify to skip tests in multiple modules as well as just for a single module. For example, to skip
the tests in hbase-server
and hbase-common
, you would run:
mvn clean test -PskipServerTests -PskipCommonTests
Also, keep in mind that if you are running tests in the hbase-server
module you will need to
apply the maven profiles discussed in Section 18.9.3, “Running tests” to get the tests to run properly.
Apache HBase unit tests are subdivided into four categories: small, medium, large, and
integration with corresponding JUnit categories:
SmallTests
, MediumTests
,
LargeTests
, IntegrationTests
.
JUnit categories are denoted using java annotations and look like this in your unit test code.
... @Category(SmallTests.class) public class TestHRegionInfo { @Test public void testCreateHRegionInfoName() throws Exception { // ... } }
The above example shows how to mark a unit test as belonging to the
small
category. All unit tests in HBase have a
categorization.
The first three categories, small
, medium
,
and large
, are for tests run when you type $ mvn
test
. In other words, these three categorizations are for HBase unit
tests. The integration
category is not for unit tests, but for
integration tests. These are run when you invoke $ mvn verify
.
Integration tests are described in Section 18.9.5, “Integration Tests”.
HBase uses a patched maven surefire plugin and maven profiles to implement its unit test characterizations.
Keep reading to figure which annotation of the set small, medium, and large to put on your new HBase unit test.
Categorizing Tests
Small tests are executed in a shared JVM. We put in this category all the tests that can be executed quickly in a shared JVM. The maximum execution time for a small test is 15 seconds, and small tests should not use a (mini)cluster.
Medium tests represent tests that must be executed before proposing a patch. They are designed to run in less than 30 minutes altogether, and are quite stable in their results. They are designed to last less than 50 seconds individually. They can use a cluster, and each of them is executed in a separate JVM.
Large tests are everything else. They are typically large-scale tests, regression tests for specific bugs, timeout tests, performance tests. They are executed before a commit on the pre-integration machines. They can be run on the developer machine as well.
Integration tests are system level tests. See Section 18.9.5, “Integration Tests” for more info.
Running mvn test
will
execute all small tests in a single JVM (no fork) and then medium tests in a
separate JVM for each test instance. Medium tests are NOT executed if there is
an error in a small test. Large tests are NOT executed. There is one report for
small tests, and one report for medium tests if they are executed.
Running
mvn test -P runAllTests
will
execute small tests in a single JVM then medium and large tests in a separate
JVM for each test. Medium and large tests are NOT executed if there is an error
in a small test. Large tests are NOT executed if there is an error in a small or
medium test. There is one report for small tests, and one report for medium and
large tests if they are executed.
To run an individual test, e.g. MyTest
, rum mvn test -Dtest=MyTest
You can also pass multiple,
individual tests as a comma-delimited list: mvn test
-Dtest=MyTest1,MyTest2,MyTest3
You can also pass a package, which
will run all tests under the package: mvn test
'-Dtest=org.apache.hadoop.hbase.client.*'
When -Dtest
is specified, the localTests
profile
will be used. It will use the official release of maven surefire, rather than
our custom surefire plugin, and the old connector (The HBase build uses a
patched version of the maven surefire plugin). Each junit test is executed in a
separate JVM (A fork per test class). There is no parallelization when tests are
running in this mode. You will see a new message at the end of the -report:
"[INFO] Tests are skipped"
. It's harmless. However, you
need to make sure the sum of Tests run:
in the Results
:
section of test reports matching the number of tests you specified
because no error will be reported when a non-existent test case is specified.
Running mvn test -P runSmallTests will execute "small" tests only, using a single JVM.
Running mvn test -P runMediumTests will execute "medium" tests only, launching a new JVM for each test-class.
Running mvn test -P runLargeTests will execute "large" tests only, launching a new JVM for each test-class.
For convenience, you can run mvn test -P runDevTests to execute both small and medium tests, using a single JVM.
By default, $ mvn test -P runAllTests
runs 5 tests in parallel.
It can be increased on a developer's machine. Allowing that you can have 2 tests
in parallel per core, and you need about 2GB of memory per test (at the
extreme), if you have an 8 core, 24GB box, you can have 16 tests in parallel.
but the memory available limits it to 12 (24/2), To run all tests with 12 tests
in parallel, do this: mvn test -P runAllTests
-Dsurefire.secondPartForkCount=12. If using a version earlier than
2.0, do: mvn test -P runAllTests -Dsurefire.secondPartThreadCount=12
. To increase the speed, you can as well use a ramdisk. You will need 2GB
of memory to run all tests. You will also need to delete the files between two
test run. The typical way to configure a ramdisk on Linux is:
$ sudo mkdir /ram2G sudo mount -t tmpfs -o size=2048M tmpfs /ram2G
You can then use it to run all HBase tests on 2.0 with the command:
mvn test -P runAllTests -Dsurefire.secondPartForkCount=12 -Dtest.build.data.basedirectory=/ram2G
On earlier versions, use:
mvn test -P runAllTests -Dsurefire.secondPartThreadCount=12 -Dtest.build.data.basedirectory=/ram2G
It's also possible to use the script hbasetests.sh. This
script runs the medium and large tests in parallel with two maven instances, and
provides a single report. This script does not use the hbase version of surefire
so no parallelization is being done other than the two maven instances the
script sets up. It must be executed from the directory which contains the
pom.xml
.
For example running ./dev-support/hbasetests.sh will execute small and medium tests. Running ./dev-support/hbasetests.sh runAllTests will execute all tests. Running ./dev-support/hbasetests.sh replayFailed will rerun the failed tests a second time, in a separate jvm and without parallelisation.
A custom Maven SureFire plugin listener checks a number of resources before
and after each HBase unit test runs and logs its findings at the end of the test
output files which can be found in target/surefire-reports
per Maven module (Tests write test reports named for the test class into this
directory. Check the *-out.txt
files). The resources
counted are the number of threads, the number of file descriptors, etc. If the
number has increased, it adds a LEAK? comment in the logs.
As you can have an HBase instance running in the background, some threads can be
deleted/created without any specific action in the test. However, if the test
does not work as expected, or if the test should not impact these resources,
it's worth checking these log lines
...hbase.ResourceChecker(157): before...
and ...hbase.ResourceChecker(157): after...
.
For example:
2012-09-26 09:22:15,315 INFO [pool-1-thread-1] hbase.ResourceChecker(157): after: regionserver.TestColumnSeeking#testReseeking Thread=65 (was 65), OpenFileDescriptor=107 (was 107), MaxFileDescriptor=10240 (was 10240), ConnectionCount=1 (was 1)
As much as possible, tests should be written as category small tests.
All tests must be written to support parallel execution on the same machine, hence they should not use shared resources as fixed ports or fixed file names.
Tests should not overlog. More than 100 lines/second makes the logs complex to read and use i/o that are hence not available for the other tests.
Tests can be written with HBaseTestingUtility
.
This class offers helper functions to create a temp directory and do the
cleanup, or to start a cluster.
All tests must be categorized, if not they could be skipped.
All tests should be written to be as fast as possible.
Small category tests should last less than 15 seconds, and must not have any side effect.
Medium category tests should last less than 50 seconds.
Large category tests should last less than 3 minutes. This should ensure a good parallelization for people using it, and ease the analysis when the test fails.
Whenever possible, tests should not use Thread.sleep
,
but rather waiting for the real event they need. This is faster and clearer for
the reader. Tests should not do a Thread.sleep
without
testing an ending condition. This allows understanding what the test is waiting
for. Moreover, the test will work whatever the machine performance is. Sleep
should be minimal to be as fast as possible. Waiting for a variable should be
done in a 40ms sleep loop. Waiting for a socket operation should be done in a
200 ms sleep loop.
Tests using a HRegion do not have to start a cluster: A region can use the
local file system. Start/stopping a cluster cost around 10 seconds. They should
not be started per test method but per test class. Started cluster must be
shutdown using HBaseTestingUtility#shutdownMiniCluster
,
which cleans the directories. As most as possible, tests should use the default
settings for the cluster. When they don't, they should document it. This will
allow to share the cluster later.
HBase integration/system tests are tests that are beyond HBase unit tests. They are generally long-lasting, sizeable (the test can be asked to 1M rows or 1B rows), targetable (they can take configuration that will point them at the ready-made cluster they are to run against; integration tests do not include cluster start/stop code), and verifying success, integration tests rely on public APIs only; they do not attempt to examine server internals asserting success/fail. Integration tests are what you would run when you need to more elaborate proofing of a release candidate beyond what unit tests can do. They are not generally run on the Apache Continuous Integration build server, however, some sites opt to run integration tests as a part of their continuous testing on an actual cluster.
Integration tests currently live under the src/test
directory in the hbase-it submodule and will match the regex:
**/IntegrationTest*.java
. All integration tests are also
annotated with @Category(IntegrationTests.class)
.
Integration tests can be run in two modes: using a mini cluster, or against an
actual distributed cluster. Maven failsafe is used to run the tests using the mini
cluster. IntegrationTestsDriver class is used for executing the tests against a
distributed cluster. Integration tests SHOULD NOT assume that they are running
against a mini cluster, and SHOULD NOT use private API's to access cluster state. To
interact with the distributed or mini cluster uniformly,
IntegrationTestingUtility
, and HBaseCluster
classes,
and public client API's can be used.
On a distributed cluster, integration tests that use ChaosMonkey or otherwise
manipulate services thru cluster manager (e.g. restart regionservers) use SSH to do
it. To run these, test process should be able to run commands on remote end, so ssh
should be configured accordingly (for example, if HBase runs under hbase user in
your cluster, you can set up passwordless ssh for that user and run the test also
under it). To facilitate that, hbase.it.clustermanager.ssh.user
,
hbase.it.clustermanager.ssh.opts
and
hbase.it.clustermanager.ssh.cmd
configuration settings can be used.
"User" is the remote user that cluster manager should use to perform ssh commands.
"Opts" contains additional options that are passed to SSH (for example, "-i
/tmp/my-key"). Finally, if you have some custom environment setup, "cmd" is the
override format for the entire tunnel (ssh) command. The default string is
{/usr/bin/ssh %1$s %2$s%3$s%4$s "%5$s"
} and is a good starting
point. This is a standard Java format string with 5 arguments that is used to
execute the remote command. The argument 1 (%1$s) is SSH options set the via opts
setting or via environment variable, 2 is SSH user name, 3 is "@" if username is set
or "" otherwise, 4 is the target host name, and 5 is the logical command to execute
(that may include single quotes, so don't use them). For example, if you run the
tests under non-hbase user and want to ssh as that user and change to hbase on
remote machine, you can use {/usr/bin/ssh %1$s %2$s%3$s%4$s "su hbase - -c
\"%5$s\""
}. That way, to kill RS (for example) integration tests may run
{/usr/bin/ssh some-hostname "su hbase - -c \"ps aux | ... | kill
...\""
}. The command is logged in the test logs, so you can verify it is
correct for your environment.
HBase 0.92 added a verify
maven target. Invoking it, for
example by doing mvn verify
, will run all the phases up to and
including the verify phase via the maven failsafe
plugin, running all the above mentioned HBase unit tests as well as
tests that are in the HBase integration test group. After you have completed
mvn install -DskipTests You can run just the integration
tests by invoking:
cd hbase-it mvn verify
If you just want to run the integration tests in top-level, you need to run two commands. First: mvn failsafe:integration-test This actually runs ALL the integration tests.
This command will always output BUILD SUCCESS
even if there
are test failures.
At this point, you could grep the output by hand looking for failed tests. However, maven will do this for us; just use: mvn failsafe:verify The above command basically looks at all the test results (so don't remove the 'target' directory) for test failures and reports the results.
This is very similar to how you specify running a subset of unit tests
(see above), but use the property it.test
instead of
test
. To just run
IntegrationTestClassXYZ.java
, use: mvn
failsafe:integration-test -Dit.test=IntegrationTestClassXYZ
The next thing you might want to do is run groups of integration tests, say
all integration tests that are named IntegrationTestClassX*.java:
mvn failsafe:integration-test -Dit.test=*ClassX* This
runs everything that is an integration test that matches *ClassX*. This
means anything matching: "**/IntegrationTest*ClassX*". You can also run
multiple groups of integration tests using comma-delimited lists (similar to
unit tests). Using a list of matches still supports full regex matching for
each of the groups.This would look something like: mvn
failsafe:integration-test -Dit.test=*ClassX*, *ClassY
If you have an already-setup HBase cluster, you can launch the integration
tests by invoking the class IntegrationTestsDriver
. You may have to
run test-compile first. The configuration will be picked by the bin/hbase
script.
mvn test-compile
Then launch the tests with:
bin/hbase [--config config_dir] org.apache.hadoop.hbase.IntegrationTestsDriver
Pass -h
to get usage on this sweet tool. Running the
IntegrationTestsDriver without any argument will launch tests found under
hbase-it/src/test
, having
@Category(IntegrationTests.class)
annotation, and a name
starting with IntegrationTests
. See the usage, by passing -h, to
see how to filter test classes. You can pass a regex which is checked against
the full class name; so, part of class name can be used. IntegrationTestsDriver
uses Junit to run the tests. Currently there is no support for running
integration tests against a distributed cluster using maven (see HBASE-6201).
The tests interact with the distributed cluster by using the methods in the
DistributedHBaseCluster
(implementing
HBaseCluster
) class, which in turn uses a pluggable
ClusterManager
. Concrete implementations provide actual
functionality for carrying out deployment-specific and environment-dependent
tasks (SSH, etc). The default ClusterManager
is
HBaseClusterManager
, which uses SSH to remotely execute
start/stop/kill/signal commands, and assumes some posix commands (ps, etc). Also
assumes the user running the test has enough "power" to start/stop servers on
the remote machines. By default, it picks up HBASE_SSH_OPTS, HBASE_HOME,
HBASE_CONF_DIR
from the env, and uses
bin/hbase-daemon.sh
to carry out the actions. Currently tarball
deployments, deployments which uses hbase-daemons.sh, and Apache Ambari
deployments are supported. /etc/init.d/ scripts are not supported for now, but
it can be easily added. For other deployment options, a ClusterManager can be
implemented and plugged in.
In 0.96, a tool named ChaosMonkey
has been introduced. It is
modeled after the same-named tool by Netflix. Some of the tests use ChaosMonkey to
simulate faults in the running cluster in the way of killing random servers,
disconnecting servers, etc. ChaosMonkey can also be used as a stand-alone tool
to run a (misbehaving) policy while you are running other tests.
ChaosMonkey defines Action's and Policy's. Actions are sequences of events. We have at least the following actions:
Restart active master (sleep 5 sec)
Restart random regionserver (sleep 5 sec)
Restart random regionserver (sleep 60 sec)
Restart META regionserver (sleep 5 sec)
Restart ROOT regionserver (sleep 5 sec)
Batch restart of 50% of regionservers (sleep 5 sec)
Rolling restart of 100% of regionservers (sleep 5 sec)
Policies on the other hand are responsible for executing the actions based on a strategy. The default policy is to execute a random action every minute based on predefined action weights. ChaosMonkey executes predefined named policies until it is stopped. More than one policy can be active at any time.
To run ChaosMonkey as a standalone tool deploy your HBase cluster as usual. ChaosMonkey uses the configuration from the bin/hbase script, thus no extra configuration needs to be done. You can invoke the ChaosMonkey by running:
bin/hbase org.apache.hadoop.hbase.util.ChaosMonkey
This will output smt like:
12/11/19 23:21:57 INFO util.ChaosMonkey: Using ChaosMonkey Policy: class org.apache.hadoop.hbase.util.ChaosMonkey$PeriodicRandomActionPolicy, period:60000 12/11/19 23:21:57 INFO util.ChaosMonkey: Sleeping for 26953 to add jitter 12/11/19 23:22:24 INFO util.ChaosMonkey: Performing action: Restart active master 12/11/19 23:22:24 INFO util.ChaosMonkey: Killing master:master.example.com,60000,1353367210440 12/11/19 23:22:24 INFO hbase.HBaseCluster: Aborting Master: master.example.com,60000,1353367210440 12/11/19 23:22:24 INFO hbase.ClusterManager: Executing remote command: ps aux | grep master | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s SIGKILL , hostname:master.example.com 12/11/19 23:22:25 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output: 12/11/19 23:22:25 INFO hbase.HBaseCluster: Waiting service:master to stop: master.example.com,60000,1353367210440 12/11/19 23:22:25 INFO hbase.ClusterManager: Executing remote command: ps aux | grep master | grep -v grep | tr -s ' ' | cut -d ' ' -f2 , hostname:master.example.com 12/11/19 23:22:25 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output: 12/11/19 23:22:25 INFO util.ChaosMonkey: Killed master server:master.example.com,60000,1353367210440 12/11/19 23:22:25 INFO util.ChaosMonkey: Sleeping for:5000 12/11/19 23:22:30 INFO util.ChaosMonkey: Starting master:master.example.com 12/11/19 23:22:30 INFO hbase.HBaseCluster: Starting Master on: master.example.com 12/11/19 23:22:30 INFO hbase.ClusterManager: Executing remote command: /homes/enis/code/hbase-0.94/bin/../bin/hbase-daemon.sh --config /homes/enis/code/hbase-0.94/bin/../conf start master , hostname:master.example.com 12/11/19 23:22:31 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output:starting master, logging to /homes/enis/code/hbase-0.94/bin/../logs/hbase-enis-master-master.example.com.out .... 12/11/19 23:22:33 INFO util.ChaosMonkey: Started master: master.example.com,60000,1353367210440 12/11/19 23:22:33 INFO util.ChaosMonkey: Sleeping for:51321 12/11/19 23:23:24 INFO util.ChaosMonkey: Performing action: Restart random region server 12/11/19 23:23:24 INFO util.ChaosMonkey: Killing region server:rs3.example.com,60020,1353367027826 12/11/19 23:23:24 INFO hbase.HBaseCluster: Aborting RS: rs3.example.com,60020,1353367027826 12/11/19 23:23:24 INFO hbase.ClusterManager: Executing remote command: ps aux | grep regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 | xargs kill -s SIGKILL , hostname:rs3.example.com 12/11/19 23:23:25 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output: 12/11/19 23:23:25 INFO hbase.HBaseCluster: Waiting service:regionserver to stop: rs3.example.com,60020,1353367027826 12/11/19 23:23:25 INFO hbase.ClusterManager: Executing remote command: ps aux | grep regionserver | grep -v grep | tr -s ' ' | cut -d ' ' -f2 , hostname:rs3.example.com 12/11/19 23:23:25 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output: 12/11/19 23:23:25 INFO util.ChaosMonkey: Killed region server:rs3.example.com,60020,1353367027826. Reported num of rs:6 12/11/19 23:23:25 INFO util.ChaosMonkey: Sleeping for:60000 12/11/19 23:24:25 INFO util.ChaosMonkey: Starting region server:rs3.example.com 12/11/19 23:24:25 INFO hbase.HBaseCluster: Starting RS on: rs3.example.com 12/11/19 23:24:25 INFO hbase.ClusterManager: Executing remote command: /homes/enis/code/hbase-0.94/bin/../bin/hbase-daemon.sh --config /homes/enis/code/hbase-0.94/bin/../conf start regionserver , hostname:rs3.example.com 12/11/19 23:24:26 INFO hbase.ClusterManager: Executed remote command, exit code:0 , output:starting regionserver, logging to /homes/enis/code/hbase-0.94/bin/../logs/hbase-enis-regionserver-rs3.example.com.out 12/11/19 23:24:27 INFO util.ChaosMonkey: Started region server:rs3.example.com,60020,1353367027826. Reported num of rs:6
As you can see from the log, ChaosMonkey started the default PeriodicRandomActionPolicy, which is configured with all the available actions, and ran RestartActiveMaster and RestartRandomRs actions. ChaosMonkey tool, if run from command line, will keep on running until the process is killed.
Since HBase version 1.0.0 (HBASE-11348), the chaos monkeys is used to run integration tests can
be configured per test run. Users can create a java properties file and and pass
this to the chaos monkey with timing configurations. The properties file needs
to be in the HBase classpath. The various properties that can be configured and
their default values can be found listed in the
org.apache.hadoop.hbase.chaos.factories.MonkeyConstants
class. If any chaos monkey configuration is missing from the property file, then
the default values are assumed. For example:
$bin/hbase org.apache.hadoop.hbase.IntegrationTestIngest -m slowDeterministic -monkeyProps monkey.properties
The above command will start the integration tests and chaos monkey passing
the properties file monkey.properties
. Here is an example
chaos monkey file:
sdm.action1.period=120000 sdm.action2.period=40000 move.regions.sleep.time=80000 move.regions.max.time=1000000 move.regions.sleep.time=80000 batch.restart.rs.ratio=0.4f