For more information on the HBase client, see Section 9.3, “Client”.
This is thrown if the time between RPC calls from the client to RegionServer exceeds the
scan timeout. For example, if Scan.setCaching
is set to 500, then there will be
an RPC call to fetch the next batch of rows every 500 .next()
calls on the
ResultScanner because data is being transferred in blocks of 500 rows to the client.
Reducing the setCaching value may be an option, but setting this value too low makes for
inefficient processing on numbers of rows.
In some situations clients that fetch data from a RegionServer get a LeaseException
instead of the usual Section 15.5.1, “ScannerTimeoutException or UnknownScannerException”. Usually the source of the exception is
org.apache.hadoop.hbase.regionserver.Leases.removeLease(Leases.java:230)
(line number may vary). It tends to happen in the context of a slow/freezing
RegionServer#next call. It can be prevented by having hbase.rpc.timeout
>
hbase.regionserver.lease.period
. Harsh J investigated the issue as part
of the mailing list thread HBase,
mail # user - Lease does not exist exceptions
Since 0.20.0 the default log level for org.apache.hadoop.hbase.*
is DEBUG.
On your clients, edit $HBASE_HOME/conf/log4j.properties
and change
this: log4j.logger.org.apache.hadoop.hbase=DEBUG
to this:
log4j.logger.org.apache.hadoop.hbase=INFO
, or even
log4j.logger.org.apache.hadoop.hbase=WARN
.
This is a fairly frequent question on the Apache HBase dist-list. The scenario is that a client is typically inserting a lot of data into a relatively un-optimized HBase cluster. Compression can exacerbate the pauses, although it is not the source of the problem.
See Section 14.8.2, “ Table Creation: Pre-Creating Regions ” on the pattern for pre-creating regions and confirm that the table isn't starting with a single region.
See Section 14.4, “HBase Configurations” for cluster configuration, particularly
hbase.hstore.blockingStoreFiles
,
hbase.hregion.memstore.block.multiplier
, MAX_FILESIZE
(region
size), and MEMSTORE_FLUSHSIZE.
A slightly longer explanation of why pauses can happen is as follows: Puts are sometimes blocked on the MemStores which are blocked by the flusher thread which is blocked because there are too many files to compact because the compactor is given too many small files to compact and has to compact the same data repeatedly. This situation can occur even with minor compactions. Compounding this situation, Apache HBase doesn't compress data in memory. Thus, the 64MB that lives in the MemStore could become a 6MB file after compression - which results in a smaller StoreFile. The upside is that more data is packed into the same region, but performance is achieved by being able to write larger files - which is why HBase waits until the flushize before writing a new StoreFile. And smaller StoreFiles become targets for compaction. Without compression the files are much bigger and don't need as much compaction, however this is at the expense of I/O.
For additional information, see this thread on Long client pauses with compression.
You may encounter the following error:
Secure Client Connect ([Caused by GSSException: No valid credentials provided (Mechanism level: Request is a replay (34) V PROCESS_TGS)])
This issue is caused by bugs in the MIT Kerberos replay_cache component, #1201 and #5924. These bugs
caused the old version of krb5-server to erroneously block subsequent requests sent from a
Principal. This caused krb5-server to block the connections sent from one Client (one HTable
instance with multi-threading connection instances for each regionserver); Messages, such as
Request is a replay (34)
, are logged in the client log You can ignore
the messages, because HTable will retry 5 * 10 (50) times for each failed connection by
default. HTable will throw IOException if any connection to the regionserver fails after the
retries, so that the user client code for HTable instance can handle it further.
Alternatively, update krb5-server to a version which solves these issues, such as krb5-server-1.10.3. See JIRA HBASE-10379 for more details.
Errors like this...
11/07/05 11:26:41 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused: no further information at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078) 11/07/05 11:26:43 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181 11/07/05 11:26:44 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused: no further information at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown Source) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078) 11/07/05 11:26:45 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181
... are either due to ZooKeeper being down, or unreachable due to network issues.
The utility Section 15.4.1.3, “zkcli” may help investigate ZooKeeper issues.
You are likely running into the issue that is described and worked through in the mail
thread HBase,
mail # user - Suspected memory leak and continued over in HBase,
mail # dev - FeedbackRe: Suspected memory leak. A workaround is passing your
client-side JVM a reasonable value for -XX:MaxDirectMemorySize
. By default, the
MaxDirectMemorySize
is equal to your -Xmx
max heapsize
setting (if -Xmx
is set). Try seting it to something smaller (for example, one
user had success setting it to 1g
when they had a client-side heap of
12g
). If you set it too small, it will bring on FullGCs
so keep
it a bit hefty. You want to make this setting client-side only especially if you are running
the new experiemental server-side off-heap cache since this feature depends on being able to
use big direct buffers (You may have to keep separate client-side and server-side config
dirs).
This is a client issue fixed by HBASE-5073 in 0.90.6. There was a ZooKeeper leak in the client and the client was getting pummeled by ZooKeeper events with each additional invocation of the admin API.
There can be several causes that produce this symptom.
First, check that you have a valid Kerberos ticket. One is required in order to set up communication with a secure Apache HBase cluster. Examine the ticket currently in the credential cache, if any, by running the klist command line utility. If no ticket is listed, you must obtain a ticket by running the kinit command with either a keytab specified, or by interactively entering a password for the desired principal.
Then, consult the Java Security Guide troubleshooting section. The most common problem addressed there is resolved by setting javax.security.auth.useSubjectCredsOnly system property value to false.
Because of a change in the format in which MIT Kerberos writes its credentials cache, there is a bug in the Oracle JDK 6 Update 26 and earlier that causes Java to be unable to read the Kerberos credentials cache created by versions of MIT Kerberos 1.8.1 or higher. If you have this problematic combination of components in your environment, to work around this problem, first log in with kinit and then immediately refresh the credential cache with kinit -R. The refresh will rewrite the credential cache without the problematic formatting.
Finally, depending on your Kerberos configuration, you may need to install the Java Cryptography Extension, or JCE. Insure the JCE jars are on the classpath on both server and client systems.
You may also need to download the unlimited strength JCE policy files. Uncompress and extract the downloaded file, and install the policy jars into <java-home>/lib/security.