General configuration guidance for Apache HDFS is out of the scope of this guide. Refer to the documentation available at http://hadoop.apache.org/ for extensive information about configuring HDFS. This section deals with HDFS in terms of HBase.
In most cases, HBase stores its data in Apache HDFS. This includes the HFiles containing the data, as well as the write-ahead logs (WALs) which store data before it is written to the HFiles and protect against RegionServer crashes. HDFS provides reliability and protection to data in HBase because it is distributed. To operate with the most efficiency, HBase needs data to be available locally. Therefore, it is a good practice to run an HDFS datanode on each RegionServer.
Important Information and Guidelines for HBase and HDFS
HBase is an HDFS client, using the HDFS DFSClient
class, and references
to this class appear in HBase logs with other HDFS client log messages.
Some HDFS configurations relating to HBase need to be done at the HDFS (server) side. Others must be done within HBase (at the client side). Other settings need to be set at both the server and client side.
When writing, HDFS pipelines communications from one datanode to another. HBase communicates to both the HDFS namenode and datanode, using the HDFS client classes. Communication problems between datanodes are logged in the HDFS logs, not the HBase logs.
HDFS writes are always local when possible. HBase RegionServers should not experience many write errors, because they write the local datanode. If the datanode cannot replicate the blocks, the errors are logged in HDFS, not in the HBase RegionServer logs.
HBase communicates with datanodes using the ipc.Client
interface and
the DataNode
class. References to these will appear in HBase logs. Each of
these communication channels use a different port (50010 and 50020 by default). The
ports are configured in the HDFS configuration, via the
dfs.datanode.address
and dfs.datanode.ipc.address
parameters.
When troubleshooting HDFS issues in HBase, check logs in both places for errors.
By default, HDFS does not mark a node as dead until it is unreachable for 630
seconds. In Hadoop 1.1 and Hadoop 2.x, this can be alleviated by enabling checks for
stale datanodes, though this check is disabled by default. You can enable the check for
reads and writes separately, via dfs.namenode.avoid.read.stale.datanode
and
dfs.namenode.avoid.write.stale.datanode settings
. A stale datanode is one
that has not been reachable for dfs.namenode.stale.datanode.interval
(default is 30 seconds). Stale datanodes are avoided, and marked as the last possible
target for a read or write operation. For configuration details, see the HDFS
documentation.
You can configure settings for various retries and timeouts. Always refer to the HDFS documentation for current recommendations and defaults. Some of the settings important to HBase are listed here. Defaults are current as of Hadoop 2.3. Check the Hadoop documentation for the most current values and recommendations.
Retries
ipc.client.connect.max.retries
(default: 10)The number of times a client will attempt to establish a connection with the server. This value sometimes needs to be increased. You can specify different setting for the maximum number of retries if a timeout occurs. For SASL connections, the number of retries is hard-coded at 15 and cannot be configured.
ipc.client.connect.max.retries.on.timeouts
(default: 45)The number of times a client will attempt to establish a connection
with the server in the event of a timeout. If some retries are due to timeouts and
some are due to other reasons, this counter is added to
ipc.client.connect.max.retries
, so the maximum number of retries for
all reasons could be the combined value.
dfs.client.block.write.retries
(default: 3)How many times the client attempts to write to the datanode. After the number of retries is reached, the client reconnects to the namenode to get a new location of a datanode. You can try increasing this value.
HDFS Heartbeats
HDFS heartbeats are entirely on the HDFS side, between the namenode and datanodes.
dfs.heartbeat.interval
(default: 3)The interval at which a node heartbeats.
dfs.namenode.heartbeat.recheck-interval
(default: 300000)The interval of time between heartbeat checks. The total time before a node is marked as stale is determined by the following formula, which works out to 10 minutes and 30 seconds:
2 * (dfs.namenode.heartbeat.recheck-interval) + 10 * 1000 * (dfs.heartbeat.interval)
dfs.namenode.stale.datanode.interval
(default: 3000)How long (in milliseconds) a node can go without a heartbeat before it is determined to be stale, if the other options to do with stale datanodes are configured (off by default).
Connection Timeouts
Connection timeouts occur between the client (HBASE) and the HDFS datanode. They may occur when establishing a connection, attempting to read, or attempting to write. The two settings below are used in combination, and affect connections between the DFSClient and the datanode, the ipc.cClient and the datanode, and communication between two datanodes.
dfs.client.socket-timeout
(default: 60000)The amount of time before a client connection times out when establishing a connection or reading. The value is expressed in milliseconds, so the default is 60 seconds.
dfs.datanode.socket.write.timeout
(default: 480000)The amount of time before a write operation times out. The default is 8 minutes, expressed as milliseconds.
Typical Error Logs
The following types of errors are often seen in the logs.
INFO HDFS.DFSClient: Failed to connect to /xxx50010, add to deadNodes and
continue java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel
to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending
remote=/region-server-1:50010]
All datanodes for a block are dead, and recovery is not possible. Here is the sequence of events that leads to this error:
The client attempts to connect to a dead datanode.
The connection fails, so the client moves down the list of datanodes and tries the next one. It also fails.
When the client exhausts its entire list, it sleeps for 3 seconds and requests a new list. It is very likely to receive the exact same list as before, in which case the error occurs again.
INFO org.apache.hadoop.HDFS.DFSClient: Exception in createBlockOutputStream
java.net.SocketTimeoutException: 69000 millis timeout while waiting for channel to be
ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/
xxx:50010]
This type of error indicates a write issue. In this case, the master wants to split the log. It does not have a local datanode so it tries to connect to a remote datanode, but the datanode is dead.
In this situation, there will be three retries (by default). If all retries fail, a message like the following is logged:
WARN HDFS.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block
If the operation was an attempt to split the log, the following type of message may also appear:
FATAL wal.HLogSplitter: WriterThread-xxx Got while writing log entry to log