15.15. HBase and HDFS

General configuration guidance for Apache HDFS is out of the scope of this guide. Refer to the documentation available at http://hadoop.apache.org/ for extensive information about configuring HDFS. This section deals with HDFS in terms of HBase.

In most cases, HBase stores its data in Apache HDFS. This includes the HFiles containing the data, as well as the write-ahead logs (WALs) which store data before it is written to the HFiles and protect against RegionServer crashes. HDFS provides reliability and protection to data in HBase because it is distributed. To operate with the most efficiency, HBase needs data to be available locally. Therefore, it is a good practice to run an HDFS datanode on each RegionServer.

Important Information and Guidelines for HBase and HDFS

HBase is a client of HDFS.

HBase is an HDFS client, using the HDFS DFSClient class, and references to this class appear in HBase logs with other HDFS client log messages.

Configuration is necessary in multiple places.

Some HDFS configurations relating to HBase need to be done at the HDFS (server) side. Others must be done within HBase (at the client side). Other settings need to be set at both the server and client side.

Write errors which affect HBase may be logged in the HDFS logs rather than HBase logs.

When writing, HDFS pipelines communications from one datanode to another. HBase communicates to both the HDFS namenode and datanode, using the HDFS client classes. Communication problems between datanodes are logged in the HDFS logs, not the HBase logs.

HDFS writes are always local when possible. HBase RegionServers should not experience many write errors, because they write the local datanode. If the datanode cannot replicate the blocks, the errors are logged in HDFS, not in the HBase RegionServer logs.

HBase communicates with HDFS using two different ports.

HBase communicates with datanodes using the ipc.Client interface and the DataNode class. References to these will appear in HBase logs. Each of these communication channels use a different port (50010 and 50020 by default). The ports are configured in the HDFS configuration, via the dfs.datanode.address and dfs.datanode.ipc.address parameters.

Errors may be logged in HBase, HDFS, or both.

When troubleshooting HDFS issues in HBase, check logs in both places for errors.

HDFS takes a while to mark a node as dead. You can configure HDFS to avoid using stale datanodes.

By default, HDFS does not mark a node as dead until it is unreachable for 630 seconds. In Hadoop 1.1 and Hadoop 2.x, this can be alleviated by enabling checks for stale datanodes, though this check is disabled by default. You can enable the check for reads and writes separately, via dfs.namenode.avoid.read.stale.datanode and dfs.namenode.avoid.write.stale.datanode settings. A stale datanode is one that has not been reachable for dfs.namenode.stale.datanode.interval (default is 30 seconds). Stale datanodes are avoided, and marked as the last possible target for a read or write operation. For configuration details, see the HDFS documentation.

Settings for HDFS retries and timeouts are important to HBase.

You can configure settings for various retries and timeouts. Always refer to the HDFS documentation for current recommendations and defaults. Some of the settings important to HBase are listed here. Defaults are current as of Hadoop 2.3. Check the Hadoop documentation for the most current values and recommendations.

Retries

ipc.client.connect.max.retries (default: 10)

The number of times a client will attempt to establish a connection with the server. This value sometimes needs to be increased. You can specify different setting for the maximum number of retries if a timeout occurs. For SASL connections, the number of retries is hard-coded at 15 and cannot be configured.

ipc.client.connect.max.retries.on.timeouts (default: 45)

The number of times a client will attempt to establish a connection with the server in the event of a timeout. If some retries are due to timeouts and some are due to other reasons, this counter is added to ipc.client.connect.max.retries, so the maximum number of retries for all reasons could be the combined value.

dfs.client.block.write.retries (default: 3)

How many times the client attempts to write to the datanode. After the number of retries is reached, the client reconnects to the namenode to get a new location of a datanode. You can try increasing this value.

HDFS Heartbeats

HDFS heartbeats are entirely on the HDFS side, between the namenode and datanodes.

dfs.heartbeat.interval (default: 3)

The interval at which a node heartbeats.

dfs.namenode.heartbeat.recheck-interval (default: 300000)

The interval of time between heartbeat checks. The total time before a node is marked as stale is determined by the following formula, which works out to 10 minutes and 30 seconds:

 2 * (dfs.namenode.heartbeat.recheck-interval) + 10 * 1000 * (dfs.heartbeat.interval)
dfs.namenode.stale.datanode.interval (default: 3000)

How long (in milliseconds) a node can go without a heartbeat before it is determined to be stale, if the other options to do with stale datanodes are configured (off by default).

Connection Timeouts

Connection timeouts occur between the client (HBASE) and the HDFS datanode. They may occur when establishing a connection, attempting to read, or attempting to write. The two settings below are used in combination, and affect connections between the DFSClient and the datanode, the ipc.cClient and the datanode, and communication between two datanodes.

dfs.client.socket-timeout (default: 60000)

The amount of time before a client connection times out when establishing a connection or reading. The value is expressed in milliseconds, so the default is 60 seconds.

dfs.datanode.socket.write.timeout (default: 480000)

The amount of time before a write operation times out. The default is 8 minutes, expressed as milliseconds.

Typical Error Logs

The following types of errors are often seen in the logs.

INFO HDFS.DFSClient: Failed to connect to /xxx50010, add to deadNodes and continue java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/region-server-1:50010]

All datanodes for a block are dead, and recovery is not possible. Here is the sequence of events that leads to this error:

  • The client attempts to connect to a dead datanode.

  • The connection fails, so the client moves down the list of datanodes and tries the next one. It also fails.

  • When the client exhausts its entire list, it sleeps for 3 seconds and requests a new list. It is very likely to receive the exact same list as before, in which case the error occurs again.

INFO org.apache.hadoop.HDFS.DFSClient: Exception in createBlockOutputStream java.net.SocketTimeoutException: 69000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/ xxx:50010]

This type of error indicates a write issue. In this case, the master wants to split the log. It does not have a local datanode so it tries to connect to a remote datanode, but the datanode is dead.

In this situation, there will be three retries (by default). If all retries fail, a message like the following is logged:

WARN HDFS.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block
          

If the operation was an attempt to split the log, the following type of message may also appear:

FATAL wal.HLogSplitter: WriterThread-xxx Got while writing log entry to log            
          
comments powered by Disqus