Do not deploy 0.96.x Deploy a 0.98.x at least. See EOL 0.96.
You will have to stop your old 0.94.x cluster completely to upgrade. If you are replicating between clusters, both clusters will have to go down to upgrade. Make sure it is a clean shutdown. The less WAL files around, the faster the upgrade will run (the upgrade will split any log files it finds in the filesystem as part of the upgrade process). All clients must be upgraded to 0.96 too.
The API has changed. You will need to recompile your code against 0.96 and you may need to adjust applications to go against new APIs (TODO: List of changes).
HDFS and ZooKeeper should be up and running during the upgrade process.
hbase-0.96.0 comes with an upgrade script. Run
$ bin/hbase upgrade
to see its usage. The script has two main modes: -check, and -execute.
The check step is run against a running 0.94 cluster. Run
it from a downloaded 0.96.x binary. The check step is
looking for the presence of HFileV1
files. These are
unsupported in hbase-0.96.0. To purge them -- have them rewritten as HFileV2 --
you must run a compaction.
The check step prints stats at the end of its run (grep for “Result:” in the log) printing absolute path of the tables it scanned, any HFileV1 files found, the regions containing said files (the regions we need to major compact to purge the HFileV1s), and any corrupted files if any found. A corrupt file is unreadable, and so is undefined (neither HFileV1 nor HFileV2).
To run the check step, run $ bin/hbase upgrade -check. Here is sample output:
Tables Processed: hdfs://localhost:41020/myHBase/.META. hdfs://localhost:41020/myHBase/usertable hdfs://localhost:41020/myHBase/TestTable hdfs://localhost:41020/myHBase/t Count of HFileV1: 2 HFileV1: hdfs://localhost:41020/myHBase/usertable /fa02dac1f38d03577bd0f7e666f12812/family/249450144068442524 hdfs://localhost:41020/myHBase/usertable /ecdd3eaee2d2fcf8184ac025555bb2af/family/249450144068442512 Count of corrupted files: 1 Corrupted Files: hdfs://localhost:41020/myHBase/usertable/fa02dac1f38d03577bd0f7e666f12812/family/1 Count of Regions with HFileV1: 2 Regions to Major Compact: hdfs://localhost:41020/myHBase/usertable/fa02dac1f38d03577bd0f7e666f12812 hdfs://localhost:41020/myHBase/usertable/ecdd3eaee2d2fcf8184ac025555bb2af There are some HFileV1, or corrupt files (files with incorrect major version)
In the above sample output, there are two HFileV1 in two regions, and one corrupt file. Corrupt files should probably be removed. The regions that have HFileV1s need to be major compacted. To major compact, start up the hbase shell and review how to compact an individual region. After the major compaction is done, rerun the check step and the HFileV1s shoudl be gone, replaced by HFileV2 instances.
By default, the check step scans the hbase root directory (defined as hbase.rootdir in the configuration). To scan a specific directory only, pass the -dir option.
$ bin/hbase upgrade -check -dir /myHBase/testTable
The above command would detect HFileV1s in the /myHBase/testTable directory.
Once the check step reports all the HFileV1 files have been rewritten, it is safe to proceed with the upgrade.
After the check step shows the cluster is free of HFileV1, it is safe to proceed with the upgrade. Next is the execute step. You must SHUTDOWN YOUR 0.94.x CLUSTER before you can run the execute step. The execute step will not run if it detects running HBase masters or regionservers.
HDFS and ZooKeeper should be up and running during the upgrade process. If zookeeper is managed by HBase, then you can start zookeeper so it is available to the upgrade by running $ ./hbase/bin/hbase-daemon.sh start zookeeper
The execute upgrade step is made of three substeps.
Namespaces: HBase 0.96.0 has support for namespaces. The upgrade needs to reorder directories in the filesystem for namespaces to work.
ZNodes: All znodes are purged so that new ones can be written in their place using a new protobuf'ed format and a few are migrated in place: e.g. replication and table state znodes
WAL Log Splitting: If the 0.94.x cluster shutdown was not clean, we'll split WAL logs as part of migration before we startup on 0.96.0. This WAL splitting runs slower than the native distributed WAL splitting because it is all inside the single upgrade process (so try and get a clean shutdown of the 0.94.0 cluster if you can).
To run the execute step, make sure that first you have copied hbase-0.96.0 binaries everywhere under servers and under clients. Make sure the 0.94.0 cluster is down. Then do as follows:
$ bin/hbase upgrade -execute
Here is some sample output.
Starting Namespace upgrade Created version file at hdfs://localhost:41020/myHBase with version=7 Migrating table testTable to hdfs://localhost:41020/myHBase/.data/default/testTable ….. Created version file at hdfs://localhost:41020/myHBase with version=8 Successfully completed NameSpace upgrade. Starting Znode upgrade …. Successfully completed Znode upgrade Starting Log splitting … Successfully completed Log splitting
If the output from the execute step looks good, stop the zookeeper instance you started to do the upgrade:
$ ./hbase/bin/hbase-daemon.sh stop zookeeper
Now start up hbase-0.96.0.
It will fail with an exception like the below. Upgrade.
17:22:15 Exception in thread "main" java.lang.IllegalArgumentException: Not a host:port pair: PBUF 17:22:15 * 17:22:15 api-compat-8.ent.cloudera.com �� ���( 17:22:15 at org.apache.hadoop.hbase.util.Addressing.parseHostname(Addressing.java:60) 17:22:15 at org.apache.hadoop.hbase.ServerName.&init>(ServerName.java:101) 17:22:15 at org.apache.hadoop.hbase.ServerName.parseVersionedServerName(ServerName.java:283) 17:22:15 at org.apache.hadoop.hbase.MasterAddressTracker.bytesToServerName(MasterAddressTracker.java:77) 17:22:15 at org.apache.hadoop.hbase.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:61) 17:22:15 at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:703) 17:22:15 at org.apache.hadoop.hbase.client.HBaseAdmin.&init>(HBaseAdmin.java:126) 17:22:15 at Client_4_3_0.setup(Client_4_3_0.java:716) 17:22:15 at Client_4_3_0.main(Client_4_3_0.java:63)
When you upgrade from versions prior to 0.96, META
needs to be
converted to use protocol buffers. This is controlled by the configuration
option hbase.MetaMigrationConvertingToPB
, which is set to
true
by default. Therefore, by default, no action is
required on your part.
The migration is a one-time event. However, every time your cluster starts,
META
is scanned to ensure that it does not need to be
converted. If you have a very large number of regions, this scan can take a long
time. Starting in 0.98.5, you can set
hbase.MetaMigrationConvertingToPB
to
false
in hbase-site.xml
, to disable
this start-up scan. This should be considered an expert-level setting.