Chapter 13. Apache HBase Coprocessors

Table of Contents

13.1. Coprocessor Framework
13.2. Examples
13.3. Building A Coprocessor
13.3.1. Load from Configuration
13.3.2. Load from the HBase Shell
13.4. Check the Status of a Coprocessor
13.5. Monitor Time Spent in Coprocessors
13.6. Status of Coprocessors in HBase

HBase coprocessors are modeled after the coprocessors which are part of Google's BigTable (http://www.scribd.com/doc/21631448/Dean-Keynote-Ladis2009, pages 66-67.). Coprocessors function in a similar way to Linux kernel modules. They provide a way to run server-level code against locally-stored data. The functionality they provide is very powerful, but also carries great risk and can have adverse effects on the system, at the level of the operating system. The information in this chapter is primarily sourced and heavily reused from Mingjie Lai's blog post at https://blogs.apache.org/hbase/entry/coprocessor_introduction.

Coprocessors are not designed to be used by end users of HBase, but by HBase developers who need to add specialized functionality to HBase. One example of the use of coprocessors is pluggable compaction and scan policies, which are provided as coprocessors in HBASE-6427.

13.1. Coprocessor Framework

The implementation of HBase coprocessors diverges from the BigTable implementation. The HBase framework provides a library and runtime environment for executing user code within the HBase region server and master processes.

The framework API is provided in the coprocessor package.

Two different types of coprocessors are provided by the framework, based on their scope.

Types of Coprocessors

System Coprocessors

System coprocessors are loaded globally on all tables and regions hosted by a region server.

Table Coprocessors

You can specify which coprocessors should be loaded on all regions for a table on a per-table basis.

The framework provides two different aspects of extensions as well: observers and endpoints.

Observers

Observers are analogous to triggers in conventional databases. They allow you to insert user code by overriding upcall methods provided by the coprocessor framework. Callback functions are executed from core HBase code when events occur. Callbacks are handled by the framework, and the coprocessor itself only needs to insert the extended or alternate functionality.

Provided Observer Interfaces

RegionObserver

A RegionObserver provides hooks for data manipulation events, such as Get, Put, Delete, Scan. An instance of a RegionObserver coprocessor exists for each table region. The scope of the observations a RegionObserver can make is constrained to that region.

RegionServerObserver

A RegionServerObserver provides for operations related to the RegionServer, such as stopping the RegionServer and performing operations before or after merges, commits, or rollbacks.

WALObserver

A WALObserver provides hooks for operations related to the write-ahead log (WAL). You can observe or intercept WAL writing and reconstruction events. A WALObserver runs in the context of WAL processing. A single WALObserver exists on a single region server.

MasterObserver

A MasterObserver provides hooks for DDL-type operation, such as create, delete, modify table. The MasterObserver runs within the context of the HBase master.

More than one observer of a given type can be loaded at once. Multiple observers are chained to execute sequentially by order of assigned priority. Nothing prevents a coprocessor implementor from communicating internally among its installed observers.

An observer of a higher priority can preempt lower-priority observers by throwing an IOException or a subclass of IOException.

Endpoints (HBase 0.96.x and later)

The implementation for endpoints changed significantly in HBase 0.96.x due to the introduction of protocol buffers (protobufs) (HBASE-5488). If you created endpoints before 0.96.x, you will need to rewrite them. Endpoints are now defined and callable as protobuf services, rather than endpoint invocations passed through as Writable blobs

Dynamic RPC endpoints resemble stored procedures. An endpoint can be invoked at any time from the client. When it is invoked, it is executed remotely at the target region or regions, and results of the executions are returned to the client.

The endpoint implementation is installed on the server and is invoked using HBase RPC. The client library provides convenience methods for invoking these dynamic interfaces.

An endpoint, like an observer, can communicate with any installed observers. This allows you to plug new features into HBase without modifying or recompiling HBase itself.

Steps to Implement an Endpoint

  • Define the coprocessor service and related messages in a .proto file

  • Run the protoc command to generate the code.

  • Write code to implement the following:

  • The client calls the new HTable.coprocessorService() methods to perform the endpoint RPCs.

For more information and examples, refer to the API documentation for the coprocessor package, as well as the included RowCount example in the /hbase-examples/src/test/java/org/apache/hadoop/hbase/coprocessor/example/ of the HBase source.

Endpoints (HBase 0.94.x and earlier)

Dynamic RPC endpoints resemble stored procedures. An endpoint can be invoked at any time from the client. When it is invoked, it is executed remotely at the target region or regions, and results of the executions are returned to the client.

The endpoint implementation is installed on the server and is invoked using HBase RPC. The client library provides convenience methods for invoking these dynamic interfaces.

An endpoint, like an observer, can communicate with any installed observers. This allows you to plug new features into HBase without modifying or recompiling HBase itself.

Steps to Implement an Endpoint

  • Server-Side

    • Create new protocol interface which extends CoprocessorProtocol.

    • Implement the Endpoint interface. The implementation will be loaded into and executed from the region context.

    • Extend the abstract class BaseEndpointCoprocessor. This convenience class hides some internal details that the implementer does not need to be concerned about, ˆ such as coprocessor framework class loading.

  • Client-Side

    Endpoint can be invoked by two new HBase client APIs:

    • HTableInterface.coprocessorProxy(Class<T> protocol, byte[] row) for executing against a single region

    • HTableInterface.coprocessorExec(Class<T> protocol, byte[] startKey, byte[] endKey, Batch.Call<T,R> callable) for executing over a range of regions

comments powered by Disqus