public class Balancer extends Object implements Tool
The balancer is a tool that balances disk space usage on an HDFS cluster when some datanodes become full or when new empty nodes join the cluster. The tool is deployed as an application program that can be run by the cluster administrator on a live HDFS cluster while applications adding and deleting files.
SYNOPSIS
To start: bin/start-balancer.sh [-threshold] Example: bin/ start-balancer.sh start the balancer with a default threshold of 10% bin/ start-balancer.sh -threshold 5 start the balancer with a threshold of 5% To stop: bin/ stop-balancer.sh
DESCRIPTION
The threshold parameter is a fraction in the range of (0%, 100%) with a default value of 10%. The threshold sets a target for whether the cluster is balanced. A cluster is balanced if for each datanode, the utilization of the node (ratio of used space at the node to total capacity of the node) differs from the utilization of the (ratio of used space in the cluster to total capacity of the cluster) by no more than the threshold value. The smaller the threshold, the more balanced a cluster will become. It takes more time to run the balancer for small threshold values. Also for a very small threshold the cluster may not be able to reach the balanced state when applications write and delete files concurrently.
The tool moves blocks from highly utilized datanodes to poorly utilized datanodes iteratively. In each iteration a datanode moves or receives no more than the lesser of 10G bytes or the threshold fraction of its capacity. Each iteration runs no more than 20 minutes. At the end of each iteration, the balancer obtains updated datanodes information from the namenode.
A system property that limits the balancer's use of bandwidth is defined in the default configuration file:
dfs.balance.bandwidthPerSec 1048576 Specifies the maximum bandwidth that each datanode can utilize for the balancing purpose in term of the number of bytes per second.
This property determines the maximum speed at which a block will be moved from one datanode to another. The default value is 1MB/s. The higher the bandwidth, the faster a cluster can reach the balanced state, but with greater competition with application processes. If an administrator changes the value of this property in the configuration file, the change is observed when HDFS is next restarted.
MONITERING BALANCER PROGRESS
After the balancer is started, an output file name where the balancer progress will be recorded is printed on the screen. The administrator can monitor the running of the balancer by reading the output file. The output shows the balancer's status iteration by iteration. In each iteration it prints the starting time, the iteration number, the total number of bytes that have been moved in the previous iterations, the total number of bytes that are left to move in order for the cluster to be balanced, and the number of bytes that are being moved in this iteration. Normally "Bytes Already Moved" is increasing while "Bytes Left To Move" is decreasing.
Running multiple instances of the balancer in an HDFS cluster is prohibited by the tool.
The balancer automatically exits when any of the following five conditions is satisfied:
Upon exit, a balancer returns an exit code and prints one of the following messages to the output file in corresponding to the above exit reasons:
The administrator can interrupt the execution of the balancer at any time by running the command "stop-balancer.sh" on the machine where the balancer is running.
限定符和类型 | 字段和说明 |
---|---|
static int |
ALREADY_RUNNING |
static int |
ILLEGAL_ARGS |
static int |
IO_EXCEPTION |
static int |
MAX_NO_PENDING_BLOCK_ITERATIONS |
static int |
MAX_NUM_CONCURRENT_MOVES
The maximum number of concurrent blocks moves for
balancing purpose at a datanode
|
static int |
NO_MOVE_BLOCK |
static int |
NO_MOVE_PROGRESS |
static int |
SUCCESS |
限定符和类型 | 方法和说明 |
---|---|
Configuration |
getConf()
return this balancer's configuration
|
static void |
main(String[] args)
Run a balancer
|
int |
run(String[] args)
main method of Balancer
|
void |
setConf(Configuration conf)
set this balancer's configuration
|
public static final int MAX_NUM_CONCURRENT_MOVES
public static final int MAX_NO_PENDING_BLOCK_ITERATIONS
public static final int SUCCESS
public static final int ALREADY_RUNNING
public static final int NO_MOVE_BLOCK
public static final int NO_MOVE_PROGRESS
public static final int IO_EXCEPTION
public static final int ILLEGAL_ARGS
public static void main(String[] args)
args
- public Configuration getConf()
getConf
在接口中 Configurable
public void setConf(Configuration conf)
setConf
在接口中 Configurable
Copyright © 2009 The Apache Software Foundation