For more information on the NameNode, see Section 9.9, “HDFS”.
To determine how much space HBase is using on HDFS use the hadoop
shell
commands from the NameNode. For example...
hadoop fs -dus /hbase/
...returns the summarized disk utilization for all HBase objects.
hadoop fs -dus /hbase/myTable
...returns the summarized disk utilization for the HBase table 'myTable'.
hadoop fs -du /hbase/myTable
...returns a list of the regions under the HBase table 'myTable' and their disk utilization.
For more information on HDFS shell commands, see the HDFS FileSystem Shell documentation.
Sometimes it will be necessary to explore the HBase objects that exist on HDFS. These objects could include the WALs (Write Ahead Logs), tables, regions, StoreFiles, etc. The easiest way to do this is with the NameNode web application that runs on port 50070. The NameNode web application will provide links to the all the DataNodes in the cluster so that they can be browsed seamlessly.
The HDFS directory structure of HBase tables in the cluster is...
/hbase
/<Table>
(Tables in the cluster)/<Region>
(Regions for the table)/<ColumnFamily>
(ColumnFamilies for the Region for the table)/<StoreFile>
(StoreFiles for the ColumnFamily for the Regions for the table)
The HDFS directory structure of HBase WAL is..
/hbase
/.logs
/<RegionServer>
(RegionServers)/<HLog>
(WAL HLog files for the RegionServer)
See the HDFS User
Guide for other non-shell diagnostic utilities like fsck
.
Problem: when getting a listing of all the files in a region server's .logs directory, one file has a size of 0 but it contains data.
Answer: It's an HDFS quirk. A file that's currently being to will appear to have a size of 0 but once it's closed it will show its true size
Two common use-cases for querying HDFS for HBase objects is research the degree of uncompaction of a table. If there are a large number of StoreFiles for each ColumnFamily it could indicate the need for a major compaction. Additionally, after a major compaction if the resulting StoreFile is "small" it could indicate the need for a reduction of ColumnFamilies for the table.