8.3. Securing Access To Your Data

After you have configured secure authentication between HBase client and server processes and gateways, you need to consider the security of your data itself. HBase provides several strategies for securing your data:

Server-side configuration, administration, and implementation details of each of these features are discussed below, along with any performance trade-offs. An example security configuration is given at the end, to show these features all used together, as they might be in a real-world scenario.

Caution

All aspects of security in HBase are in active development and evolving rapidly. Any strategy you employ for security of your data should be thoroughly tested. In addition, some of these features are still in the experimental stage of development. To take advantage of many of these features, you must be running HBase 0.98+ and using the HFile v3 file format.

Protecting Sensitive Files

Several procedures in this section require you to copy files between cluster nodes. When copying keys, configuration files, or other files containing sensitive strings, use a secure method, such as ssh, to avoid leaking sensitive data.

Procedure 8.1. Basic Server-Side Configuration

  1. Enable HFile v3, by setting hfile.format.version to 3 in hbase-site.xml. This is the default for HBase 1.0 and newer.

    <property>
      <name>hfile.format.version</name>
      <value>3</value>
    </property>
              
  2. Enable SASL and Kerberos authentication for RPC and ZooKeeper, as described in Section 8.1.1, “Prerequisites” and Section 20.2, “SASL Authentication with ZooKeeper”.

8.3.1. Tags

Tags are a feature of HFile v3. A tag is a piece of metadata which is part of a cell, separate from the key, value, and version. Tags are an implementation detail which provides a foundation for other security-related features such as cell-level ACLs and visibility labels. Tags are stored in the HFiles themselves. It is possible that in the future, tags will be used to implement other HBase features. You don't need to know a lot about tags in order to use the security features they enable.

8.3.1.1. Implementation Details

Every cell can have zero or more tags. Every tag has a type and the actual tag byte array.

Just as row keys, column families, qualifiers and values can be encoded (see Data Block Encoding Types), tags can also be encoded as well. You can enable or disable tag encoding at the level of the column family, and it is enabled by default. Use the HColumnDescriptor#setCompressionTags(boolean compressTags) method to manage encoding settings on a column family. You also need to enable the DataBlockEncoder for the column family, for encoding of tags to take effect.

You can enable compression of each tag in the WAL, if WAL compression is also enabled, by setting the value of hbase.regionserver.wal.tags.enablecompression to true in hbase-site.xml. Tag compression uses dictionary encoding.

Tag compression is not supported when using WAL encryption.

8.3.2. Access Control Labels (ACLs)

8.3.2.1. How It Works

ACLs in HBase are based upon a user's membership in or exclusion from groups, and a given group's permissions to access a given resource. ACLs are implemented as a coprocessor called AccessController.

HBase does not maintain a private group mapping, but relies on a Hadoop group mapper, which maps between entities in a directory such as LDAP or Active Directory, and HBase users. Any supported Hadoop group mapper will work. Users are then granted specific permissions (Read, Write, Execute, Create, Admin) against resources (global, namespaces, tables, cells, or endpoints).

Note

With Kerberos and Access Control enabled, client access to HBase is authenticated and user data is private unless access has been explicitly granted.

HBase has a simpler security model than relational databases, especially in terms of client operations. No distinction is made between an insert (new record) and update (of existing record), for example, as both collapse down into a Put. Accordingly, the important operations condense to four permissions: READ, WRITE, CREATE, and ADMIN.

Table 8.1. Operation To Permission Mapping

PermissionOperation
ReadGet
 Exists
 Scan
WritePut
 Delete
 IncrementColumnValue
 CheckAndDelete/Put
CreateCreate
 Alter
 Drop
 Bulk Load
AdminEnable/Disable
 Snapshot/Restore/Clone
 Split
 Flush
 Compact
 Major Compact
 Roll HLog
 Grant
 Revoke
 Shutdown
ExecuteExecute coprocessor endpoints

Permissions can be granted in any of the following scopes, though CREATE and ADMIN permissions are effective only at table, namespace, and global scopes.

Namespace
  • Read: User can read any table in the namespace.

  • Write: User can write to any table in the namespace.

  • Create: User can create tables in the namespace.

  • Admin: User can alter table attributes; add, alter, or drop column families; and enable, disable, or drop the table. User can also trigger region (re)assignments or relocation.

Table
  • Read: User can read from any column family in table

  • Write: User can write to any column family in table

  • Create: User can alter table attributes; add, alter, or drop column families; and drop the table.

  • Admin: User can alter table attributes; add, alter, or drop column families; and enable, disable, or drop the table. User can also trigger region (re)assignments or relocation.

Column Family / Column Qualifier / Cell
  • Read: User can read at the specified scope.

  • Write: User can write at the specified scope.

Coprocessor Endpoint

Execute: the user can execute the coprocessor endpoint.

Global

Superusers are specified as a comma-separated list of users and groups, in the hbase.superuser option in hbase-site.xml. The superuser is equivalent to the root user in a UNIX environment. As a minimum, the superuser should include the principal used to run the HMaster process. Global admin privileges, which are implicitly granted to the superuser, are required to create namespaces, switch the balancer on and off, or take other actions with global consequences. The superuser can also grant all permissions to all resources.

ACL Matrix. For more details on how ACLs map to specific HBase operations and tasks, see Appendix D, Access Control Matrix.

Cell-level ACLs are implemented using tags (see Section 8.3.1, “Tags”). In order to use cell-level ACLs, you must be using HFile v3 and HBase 0.98 or newer.

ACL Implementation Caveats

  1. Files created by HBase are owned by the operating system user running the HBase process. To interact with HBase files, you should use the API or bulk load facility.

  2. HBase does not model "roles" internally in HBase. Instead, group names can be granted permissions. This allows external modeling of roles via group membership. Groups are created and manipulated externally to HBase, via the Hadoop group mapping service.

8.3.2.2. Server-Side Configuration

  1. As a prerequisite, perform the steps in Procedure 8.1, “Basic Server-Side Configuration”.

  2. Install and configure the AccessController coprocessor, by setting the following properties in hbase-site.xml. These properties take a list of classes.

    Note

    If you use the AccessController along with the VisibilityController, the AccessController must come first in the list, because with both components active, the VisibilityController will delegate access control on its system tables to the AccessController. For an example of using both together, see Section 8.4, “Security Configuration Example”.

    <property>
      <name>hbase.coprocessor.region.classes</name>
      <value>org.apache.hadoop.hbase.security.access.AccessController, org.apache.hadoop.hbase.security.token.TokenProvider</value>
    </property>
    <property>
      <name>hbase.coprocessor.master.classes</name>
      <value>org.apache.hadoop.hbase.security.access.AccessController</value>
    </property>
    <property>
      <name>hbase.coprocessor.regionserver.classes</name>
      <value>org.apache.hadoop.hbase.security.access.AccessController</value>
    </property>
    <property>
      <name>hbase.security.exec.permission.checks</name>
      <value>true</value>
    </property>
              

    Optionally, you can enable transport security, by setting hbase.rpc.protection to auth-conf. This requires HBase 0.98.4 or newer.

  3. Set up the Hadoop group mapper in the Hadoop namenode's core-site.xml. This is a Hadoop file, not an HBase file. Customize it to your site's needs. Following is an example.

    <property>
      <name>hadoop.security.group.mapping</name>
      <value>org.apache.hadoop.security.LdapGroupsMapping</value>
    </property>
    
    <property>
      <name>hadoop.security.group.mapping.ldap.url</name>
      <value>ldap://server</value>
    </property>
    
    <property>
      <name>hadoop.security.group.mapping.ldap.bind.user</name>
      <value>Administrator@example-ad.local</value>
    </property>
    
    <property>
      <name>hadoop.security.group.mapping.ldap.bind.password</name>
      <value>****</value>
    </property>
    
    <property>
      <name>hadoop.security.group.mapping.ldap.base</name>
      <value>dc=example-ad,dc=local</value>
    </property>
    
    <property>
      <name>hadoop.security.group.mapping.ldap.search.filter.user</name>
      <value>(&amp;(objectClass=user)(sAMAccountName={0}))</value>
    </property>
    
    <property>
      <name>hadoop.security.group.mapping.ldap.search.filter.group</name>
      <value>(objectClass=group)</value>
    </property>
    
    <property>
      <name>hadoop.security.group.mapping.ldap.search.attr.member</name>
      <value>member</value>
    </property>
    
    <property>
      <name>hadoop.security.group.mapping.ldap.search.attr.group.name</name>
      <value>cn</value>
    </property>
                
  4. Optionally, enable the early-out evaluation strategy. Prior to HBase 0.98.0, if a user was not granted access to a column family, or at least a column qualifier, an AccessDeniedException would be thrown. HBase 0.98.0 removed this exception in order to allow cell-level exceptional grants. To restore the old behavior in HBase 0.98.0-0.98.6, set hbase.security.access.early_out to true in hbase-site.xml. In HBase 0.98.6, the default has been returned to true.

  5. Distribute your configuration and restart your cluster for changes to take effect.

  6. To test your configuration, log into HBase Shell as a given user and use the whoami command to report the groups your user is part of. In this example, the user is reported as being a member of the services group.

    hbase> whoami
    service (auth:KERBEROS)
        groups: services
                

8.3.2.3. Administration

Administration tasks can be performed from HBase Shell or via an API.

API Examples

Many of the API examples below are taken from source files hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java and hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/SecureTestUtil.java.

Neither the examples, nor the source files they are taken from, are part of the public HBase API, and are provided for illustration only. Refer to the official API for usage instructions.

  1. User and Group Administration

    Users and groups are maintained external to HBase, in your directory.

  2. Granting Access To A Namespace, Table, Column Family, or Cell

    There are a few different types of syntax for grant statements. The first, and most familiar, is as follows, with the table and column family being optional:

    grant 'user', 'RWXCA', 'TABLE', 'CF', 'CQ'

    Groups and users are granted access in the same way, but groups are prefixed with an @ symbol. In the same way, tables and namespaces are specified in the same way, but namespaces are prefixed with an @ symbol.

    It is also possible to grant multiple permissions against the same resource in a single statement, as in this example. The first sub-clause maps users to ACLs and the second sub-clause specifies the resource.

    Note

    HBase Shell support for granting and revoking access at the cell level is for testing and verification support, and should not be employed for production use because it won't apply the permissions to cells that don't exist yet. The correct way to apply cell level permissions is to do so in the application code when storing the values.

    ACL Granularity and Evaluation Order. ACLs are evaluated from least granular to most granular, and when an ACL is reached that grants permission, evaluation stops. This means that cell ACLs do not override ACLs at less granularity.

    Example 8.1. HBase Shell

    • Global:

      hbase> grant '@admins', 'RWXCA'
    • Namespace:

      hbase> grant 'service', 'RWXCA', '@test-NS'
    • Table:

      hbase> grant 'service', 'RWXCA', 'user'
    • Column Family:

      hbase> grant '@developers', 'RW', 'user', 'i'
    • Column Qualifier:

      hbase> grant 'service, 'RW', 'user', 'i', 'foo'
    • Cell:

      The syntax for granting cell ACLs uses the following syntax:

      grant <table>, \
        { '<user-or-group>' => \
          '<permissions>', ... }, \
        { <scanner-specification> }
      • <user-or-group> is the user or group name, prefixed with @ in the case of a group.

      • <permissions> is a string containing any or all of "RWXCA", though only R and W are meaningful at cell scope.

      • <scanner-specification> is the scanner specification syntax and conventions used by the 'scan' shell command. For some examples of scanner specifications, issue the following HBase Shell command.

        hbase> help "scan"

      This example grants read access to the 'testuser' user and read/write access to the 'developers' group, on cells in the 'pii' column which match the filter.

      hbase> grant 'user', \
        { '@developers' => 'RW', 'testuser' => 'R' }, \
        { COLUMNS => 'pii', FILTER => "(PrefixFilter ('test'))" }

      The shell will run a scanner with the given criteria, rewrite the found cells with new ACLs, and store them back to their exact coordinates.


    Example 8.2. API

    The following example shows how to grant access at the table level.

    public static void grantOnTable(final HBaseTestingUtility util, final String user,
        final TableName table, final byte[] family, final byte[] qualifier,
        final Permission.Action... actions) throws Exception {
      SecureTestUtil.updateACLs(util, new Callable<Void>() {
        @Override
        public Void call() throws Exception {
          HTable acl = new HTable(util.getConfiguration(), AccessControlLists.ACL_TABLE_NAME);
          try {
            BlockingRpcChannel service = acl.coprocessorService(HConstants.EMPTY_START_ROW);
            AccessControlService.BlockingInterface protocol =
                AccessControlService.newBlockingStub(service);
            ProtobufUtil.grant(protocol, user, table, family, qualifier, actions);
          } finally {
            acl.close();
          }
          return null;
        }
      });
    }               
                  

    To grant permissions at the cell level, you can use the Mutation.setACL method:

    Mutation.setACL(String user, Permission perms)
    Mutation.setACL(Map<String, Permission> perms)
        
                  

    Specifically, this example provides read permission to a user called user1 on any cells contained in a particular Put operation:

    put.setACL(“user1”, new Permission(Permission.Action.READ))
        

  3. Revoking Access Control From a Namespace, Table, Column Family, or Cell

    The revoke command and API are twins of the grant command and API, and the syntax is exactly the same. The only exception is that you cannot revoke permissions at the cell level. You can only revoke access that has previously been granted, and a revoke statement is not the same thing as explicit denial to a resource.

    Note

    HBase Shell support for granting and revoking access is for testing and verification support, and should not be employed for production use because it won't apply the permissions to cells that don't exist yet. The correct way to apply cell-level permissions is to do so in the application code when storing the values.

    Example 8.3. Revoking Access To a Table

    public static void revokeFromTable(final HBaseTestingUtility util, final String user,
        final TableName table, final byte[] family, final byte[] qualifier,
        final Permission.Action... actions) throws Exception {
      SecureTestUtil.updateACLs(util, new Callable<Void>() {
        @Override
        public Void call() throws Exception {
          HTable acl = new HTable(util.getConfiguration(), AccessControlLists.ACL_TABLE_NAME);
          try {
            BlockingRpcChannel service = acl.coprocessorService(HConstants.EMPTY_START_ROW);
            AccessControlService.BlockingInterface protocol =
                AccessControlService.newBlockingStub(service);
            ProtobufUtil.revoke(protocol, user, table, family, qualifier, actions);
          } finally {
            acl.close();
          }
          return null;
        }
      });
    } 
                  

  4. Showing a User's Effective Permissions

    Example 8.4. HBase Shell

    hbase> user_permission 'user'
    hbase> user_permission '.*'
    hbase> user_permission JAVA_REGEX

    Example 8.5. API

    public static void verifyAllowed(User user, AccessTestAction action, int count) throws Exception {
      try {
        Object obj = user.runAs(action);
        if (obj != null && obj instanceof List<?>) {
          List<?> results = (List<?>) obj;
          if (results != null && results.isEmpty()) {
            fail("Empty non null results from action for user '" + user.getShortName() + "'");
          }
          assertEquals(count, results.size());
        }
      } catch (AccessDeniedException ade) {
        fail("Expected action to pass for user '" + user.getShortName() + "' but was denied");
      }
    }
                  

8.3.3. Visibility Labels

Visibility labels control can be used to only permit users or principals associated with a given label to read or access cells with that label. For instance, you might label a cell top-secret, and only grant access to that label to the managers group. Visibility labels are implemented using Tags, which are a feature of HFile v3, and allow you to store metadata on a per-cell basis. A label is a string, and labels can be combined into expressions by using logical operators (&, |, or !), and using parentheses for grouping. HBase does not do any kind of validation of expressions beyond basic well-formedness. Visibility labels have no meaning on their own, and may be used to denote sensitivity level, privilege level, or any other arbitrary semantic meaning.

If a user's labels do not match a cell's label or expression, the user is denied access to the cell.

In HBase 0.98.6 and newer, UTF-8 encoding is supported for visibility labels and expressions. When creating labels using the addLabels(conf, labels) method provided by the org.apache.hadoop.hbase.security.visibility.VisibilityClient class and passing labels in Authorizations via Scan or Get, labels can contain UTF-8 characters, as well as the logical operators normally used in visibility labels, with normal Java notations, without needing any escaping method. However, when you pass a CellVisibility expression via a Mutation, you must enclose the expression with the CellVisibility.quote() method if you use UTF-8 characters or logical operators. See TestExpressionParser and the source file hbase-client/src/test/java/org/apache/hadoop/hbase/client/TestScan.java.

A user adds visibility expressions to a cell during a Put operation. In the default configuration, the user does not need to access to a label in order to label cells with it. This behavior is controlled by the configuration option hbase.security.visibility.mutations.checkauths. If you set this option to true, the labels the user is modifying as part of the mutation must be associated with the user, or the mutation will fail. Whether a user is authorized to read a labelled cell is determined during a Get or Scan, and results which the user is not allowed to read are filtered out. This incurs the same I/O penalty as if the results were returned, but reduces load on the network.

Visibility labels can also be specified during Delete operations. For details about visibility labels and Deletes, see HBASE-10885.

The user's effective label set is built in the RPC context when a request is first received by the RegionServer. The way that users are associated with labels is pluggable. The default plugin passes through labels specified in Authorizations added to the Get or Scan and checks those against the calling user's authenticated labels list. When the client passes labels for which the user is not authenticated, the default plugin drops them. You can pass a subset of user authenticated labels via the Get#setAuthorizations(Authorizations(String,...)) and Scan#setAuthorizations(Authorizations(String,...)); methods.

Visibility label access checking is performed by the VisibilityController coprocessor. You can use interface VisibilityLabelService to provide a custom implementation and/or control the way that visibility labels are stored with cells. See the source file hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabelsWithCustomVisLabService.java for one example.

Visibility labels can be used in conjunction with ACLs.

Table 8.2. Examples of Visibility Expressions

ExpressionInterpretation
fulltime

Allow accesss to users associated with the fulltime label.

!public

Allow access to users not associated with the public label.

Allow access to users associated with either the secret or topsecret label and not associated with the probationary label.

 

8.3.3.1. Server-Side Configuration

  1. As a prerequisite, perform the steps in Procedure 8.1, “Basic Server-Side Configuration”.

  2. Install and configure the VisibilityController coprocessor by setting the following properties in hbase-site.xml. These properties take a list of class names.

    <property>
      <name>hbase.coprocessor.region.classes</name>
      <value>org.apache.hadoop.hbase.security.visibility.VisibilityController</value>
    </property>
    <property>
      <name>hbase.coprocessor.master.classes</name>
      <value>org.apache.hadoop.hbase.security.visibility.VisibilityController</value>
    </property>
              

    Note

    If you use the AccessController and VisibilityController coprocessors together, the AccessController must come first in the list, because with both components active, the VisibilityController will delegate access control on its system tables to the AccessController.

  3. Adjust Configuration

    By default, users can label cells with any label, including labels they are not associated with, which means that a user can Put data that he cannot read. For example, a user could label a cell with the (hypothetical) 'topsecret' label even if the user is not associated with that label. If you only want users to be able to label cells with labels they are associated with, set hbase.security.visibility.mutations.checkauths to true. In that case, the mutation will fail if it makes use of labels the user is not associated with.

  4. Distribute your configuration and restart your cluster for changes to take effect.

8.3.3.2. Administration

Administration tasks can be performed using the HBase Shell or the Java API. For defining the list of visibility labels and associating labels with users, the HBase Shell is probably simpler.

API Examples

Many of the Java API examples in this section are taken from the source file hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabels.java. Refer to that file or the API documentation for more context.

Neither these examples, nor the source file they were taken from, are part of the public HBase API, and are provided for illustration only. Refer to the official API for usage instructions.

  1. Define the List of Visibility Labels

    Example 8.6. HBase Shell

    hbase< add_labels [ 'admin', 'service', 'developer', 'test' ]

    Example 8.7. Java API

    public static void addLabels() throws Exception {
      PrivilegedExceptionAction<VisibilityLabelsResponse> action =
          new PrivilegedExceptionAction<VisibilityLabelsResponse>() {
        public VisibilityLabelsResponse run() throws Exception {
          String[] labels = { SECRET, TOPSECRET, CONFIDENTIAL, PUBLIC, PRIVATE, COPYRIGHT, ACCENT,
              UNICODE_VIS_TAG, UC1, UC2 };
          try {
            VisibilityClient.addLabels(conf, labels);
          } catch (Throwable t) {
            throw new IOException(t);
          }
          return null;
        }
      };
      SUPERUSER.runAs(action);
    }
                    

  2. Associate Labels with Users

    Example 8.8. HBase Shell

    hbase< set_auths 'service', [ 'service' ]
    hbase< set_auths 'testuser', [ 'test' ]
    hbase< set_auths 'qa', [ 'test', 'developer' ]

    Example 8.9. Java API

    public void testSetAndGetUserAuths() throws Throwable {
      final String user = "user1";
      PrivilegedExceptionAction<Void> action = new PrivilegedExceptionAction<Void>() {
        public Void run() throws Exception {
          String[] auths = { SECRET, CONFIDENTIAL };
          try {
            VisibilityClient.setAuths(conf, auths, user);
          } catch (Throwable e) {
          }
          return null;
        }
        ...
                    

  3. Clear Labels From Users

    Example 8.10. HBase Shell

    hbase< clear_auths 'service', [ 'service' ]
    hbase< clear_auths 'testuser', [ 'test' ]
    hbase< clear_auths 'qa', [ 'test', 'developer' ]

    Example 8.11. Java API

    ...
    auths = new String[] { SECRET, PUBLIC, CONFIDENTIAL };
    VisibilityLabelsResponse response = null;
    try {
      response = VisibilityClient.clearAuths(conf, auths, user);
    } catch (Throwable e) {
      fail("Should not have failed");
    ...
                    

  4. Apply a Label or Expression to a Cell

    The label is only applied when data is written. The label is associated with a given version of the cell.

    Example 8.12. HBase Shell

    hbase< set_visibility 'user', 'admin|service|developer', \
      { COLUMNS => 'i' }
    hbase< set_visibility 'user', 'admin|service', \
      { COLUMNS => ' pii' }
    hbase< COLUMNS => [ 'i', 'pii' ], \
        FILTER => "(PrefixFilter ('test'))" }

    Note

    HBase Shell support for applying labels or permissions to cells is for testing and verification support, and should not be employed for production use because it won't apply the labels to cells that don't exist yet. The correct way to apply cell level labels is to do so in the application code when storing the values.

    Example 8.13. Java API

    static HTable createTableAndWriteDataWithLabels(TableName tableName, String... labelExps)
        throws Exception {
      HTable table = null;
      try {
        table = TEST_UTIL.createTable(tableName, fam);
        int i = 1;
        List<Put> puts = new ArrayList<Put>();
        for (String labelExp : labelExps) {
          Put put = new Put(Bytes.toBytes("row" + i));
          put.add(fam, qual, HConstants.LATEST_TIMESTAMP, value);
          put.setCellVisibility(new CellVisibility(labelExp));
          puts.add(put);
          i++;
        }
        table.put(puts);
      } finally {
        if (table != null) {
          table.flushCommits();
        }
      }
                    

8.3.3.3. Implementing Your Own Visibility Label Algorithm

Interpreting the labels authenticated for a given get/scan request is a pluggable algorithm. You can specify a custom plugin by using the property hbase.regionserver.scan.visibility.label.generator.class. The default implementation class is org.apache.hadoop.hbase.security.visibility.DefaultScanLabelGenerator. You can also configure a set of ScanLabelGenerators to be used by the system, as a comma-separated list.

8.3.4. Transparent Encryption of Data At Rest

HBase provides a mechanism for protecting your data at rest, in HFiles and the WAL, which reside within HDFS or another distributed filesystem. A two-tier architecture is used for flexible and non-intrusive key rotation. "Transparent" means that no implementation changes are needed on the client side. When data is written, it is encrypted. When it is read, it is decrypted on demand.

8.3.4.1. How It Works

The administrator provisions a master key for the cluster, which is stored in a key provider accessible to every trusted HBase process, including the HMaster, RegionServers, and clients (such as HBase Shell) on administrative workstations. The default key provider is integrated with the Java KeyStore API and any key management systems with support for it. Other custom key provider implementations are possible. The key retrieval mechanism is configured in the hbase-site.xml configuration file. The master key may be stored on the cluster servers, protected by a secure KeyStore file, or on an external keyserver, or in a hardware security module. This master key is resolved as needed by HBase processes through the configured key provider.

Next, encryption use can be specified in the schema, per column family, by creating or modifying a column descriptor to include two additional attributes: the name of the encryption algorithm to use (currently only "AES" is supported), and optionally, a data key wrapped (encrypted) with the cluster master key. If a data key is not explictly configured for a ColumnFamily, HBase will create a random data key per HFile. This provides an incremental improvement in security over the alternative. Unless you need to supply an explicit data key, such as in a case where you are generating encrypted HFiles for bulk import with a given data key, only specify the encryption algorithm in the ColumnFamily schema metadata and let HBase create data keys on demand. Per Column Family keys facilitate low impact incremental key rotation and reduce the scope of any external leak of key material. The wrapped data key is stored in the ColumnFamily schema metadata, and in each HFile for the Column Family, encrypted with the cluster master key. After the Column Family is configured for encryption, any new HFiles will be written encrypted. To ensure encryption of all HFiles, trigger a major compaction after enabling this feature.

When the HFile is opened, the data key is extracted from the HFile, decrypted with the cluster master key, and used for decryption of the remainder of the HFile. The HFile will be unreadable if the master key is not available. If a remote user somehow acquires access to the HFile data because of some lapse in HDFS permissions, or from inappropriately discarded media, it will not be possible to decrypt either the data key or the file data.

It is also possible to encrypt the WAL. Even though WALs are transient, it is necessary to encrypt the WALEdits to avoid circumventing HFile protections for encrypted column families, in the event that the underlying filesystem is compromised. When WAL encryption is enabled, all WALs are encrypted, regardless of whether the relevant HFiles are encrypted.

8.3.4.2. Server-Side Configuration

This procedure assumes you are using the default Java keystore implementation. If you are using a custom implementation, check its documentation and adjust accordingly.

  1. Create a secret key of appropriate length for AES encryption, using the keytool utility.

    $ keytool -keystore /path/to/hbase/conf/hbase.jks \
      -storetype jceks -storepass **** \
      -genseckey -keyalg AES -keysize 128 \
      -alias <alias>

    Replace **** with the password for the keystore file and <alias> with the username of the HBase service account, or an arbitrary string. If you use an arbitrary string, you will need to configure HBase to use it, and that is covered below. Specify a keysize that is appropriate. Do not specify a separate password for the key, but press Return when prompted.

  2. Set appropriate permissions on the keyfile and distribute it to all the HBase servers.

    The previous command created a file called hbase.jks in the HBase conf/ directory. Set the permissions and ownership on this file such that only the HBase service account user can read the file, and securely distribute the key to all HBase servers.

  3. Configure the HBase daemons.

    Set the following properties in hbase-site.xml on the region servers, to configure HBase daemons to use a key provider backed by the KeyStore file or retrieving the cluster master key. In the example below, replace **** with the password.

    <property>
        <name>hbase.crypto.keyprovider</name>
        <value>org.apache.hadoop.hbase.io.crypto.KeyStoreKeyProvider</value>
    </property>
    <property>
        <name>hbase.crypto.keyprovider.parameters</name>
        <value>jceks:///path/to/hbase/conf/hbase.jks?password=****</value>
    </property>
                

    By default, the HBase service account name will be used to resolve the cluster master key. However, you can store it with an arbitrary alias (in the keytool command). In that case, set the following property to the alias you used.

    <property>
        <name>hbase.crypto.master.key.name</name>
        <value>my-alias</value>
    </property>
                

    You also need to be sure your HFiles use HFile v3, in order to use transparent encryption. This is the default configuration for HBase 1.0 onward. For previous versions, set the following property in your hbase-site.xml file.

    <property>
        <name>hfile.format.version</name>
        <value>3</value>
    </property>
                

    Optionally, you can use a different cipher provider, either a Java Cryptography Encryption (JCE) algorithm provider or a custom HBase cipher implementation.

    1. JCE:

      • Install a signed JCE provider (supporting “AES/CTR/NoPadding” mode with 128 bit keys)

      • Add it with highest preference to the JCE site configuration file $JAVA_HOME/lib/security/java.security.

      • Update hbase.crypto.algorithm.aes.provider and hbase.crypto.algorithm.rng.provider options in hbase-site.xml.

    2. Custom HBase Cipher:

      • Implement org.apache.hadoop.hbase.io.crypto.CipherProvider.

      • Add the implementation to the server classpath.

      • Update hbase.crypto.cipherprovider in hbase-site.xml.

  4. Configure WAL encryption.

    Configure WAL encryption in every RegionServer's hbase-site.xml, by setting the following properties. You can include these in the HMaster's hbase-site.xml as well, but the HMaster does not have a WAL and will not use them.

    <property>
        <name>hbase.regionserver.hlog.reader.impl</name>
        <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogReader</value>
    </property>
    <property>
        <name>hbase.regionserver.hlog.writer.impl</name>
        <value>org.apache.hadoop.hbase.regionserver.wal.SecureProtobufLogWriter</value>
    </property>
    <property>
        <name>hbase.regionserver.wal.encryption</name>
        <value>true</value>
    </property>
                  
  5. Configure permissions on the hbase-site.xml file.

    Because the keystore password is stored in the hbase-site.xml, you need to ensure that only the HBase user can read the hbase-site.xml file, using file ownership and permissions.

  6. Restart your cluster.

    Distribute the new configuration file to all nodes and restart your cluster.

8.3.4.3. Administration

Administrative tasks can be performed in HBase Shell or the Java API.

Java API

Java API examples in this section are taken from the source file hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsckEncryption.java. .

Neither these examples, nor the source files they are taken from, are part of the public HBase API, and are provided for illustration only. Refer to the official API for usage instructions.

Enable Encryption on a Column Family

To enable encryption on a column family, you can either use HBase Shell or the Java API. After enabling encryption, trigger a major compaction. When the major compaction completes, the HFiles will be encrypted.

Example 8.14. HBase Shell

hbase> disable 'mytable'
hbase> alter 'mytable', 'mycf', {ENCRYPTION => AES}
hbase> enable 'mytable'
                

Example 8.15. Java API

You can use the HBaseAdmin#modifyColumn API to modify the ENCRYPTION attribute on a Column Family. Additionally, you can specify the specific key to use as the wrapper, by setting the ENCRYPTION_KEY attribute. This is only possible via the Java API, and not the HBase Shell. The default behavior if you do not specify an ENCRYPTION_KEY for a column family is for a random key to be generated for each encrypted column family (per HFile). This provides additional defense in the (unlikely, but theoretically possible) occurrence of storing the same data in multiple HFiles with exactly the same block layout, the same data key, and the same randomly-generated initialization vector.

This example shows how to programmatically set the transparent encryption both in the server configuration and at the column family, as part of a test which uses the Minicluster configuration.

@Before
public void setUp() throws Exception {
  conf = TEST_UTIL.getConfiguration();
  conf.setInt("hfile.format.version", 3);
  conf.set(HConstants.CRYPTO_KEYPROVIDER_CONF_KEY, KeyProviderForTesting.class.getName());
  conf.set(HConstants.CRYPTO_MASTERKEY_NAME_CONF_KEY, "hbase");

  // Create the test encryption key
  SecureRandom rng = new SecureRandom();
  byte[] keyBytes = new byte[AES.KEY_LENGTH];
  rng.nextBytes(keyBytes);
  cfKey = new SecretKeySpec(keyBytes, "AES");

  // Start the minicluster
  TEST_UTIL.startMiniCluster(3);

  // Create the table
  htd = new HTableDescriptor(TableName.valueOf("default", "TestHBaseFsckEncryption"));
  HColumnDescriptor hcd = new HColumnDescriptor("cf");
  hcd.setEncryptionType("AES");
  hcd.setEncryptionKey(EncryptionUtil.wrapKey(conf,
    conf.get(HConstants.CRYPTO_MASTERKEY_NAME_CONF_KEY, User.getCurrent().getShortName()),
    cfKey));
  htd.addFamily(hcd);
  TEST_UTIL.getHBaseAdmin().createTable(htd);
  TEST_UTIL.waitTableAvailable(htd.getName(), 5000);
}
                

Rotate the Data Key

To rotate the data key, first change the ColumnFamily key in the column descriptor, then trigger a major compaction. When compaction is complete, all HFiles will be re-encrypted using the new data key. Until the compaction completes, the old HFiles will still be readable using the old key.

If you rely on HBase's default behavior of generating a random key for each HFile, there is no need to rotate data keys. A major compaction will re-encrypt the HFile with a new key.

Switching Between Using a Random Data Key and Specifying A Key

If you configured a column family to use a specific key and you want to return to the default behavior of using a randomly-generated key for that column family, use the Java API to alter the HColumnDescriptor so that no value is sent with the key ENCRYPTION_KEY.

Rotate the Master Key

To rotate the master key, first generate and distribute the new key. Then update the KeyStore to contain a new master key, and keep the old master key in the KeyStore using a different alias. Next, configure fallback to the old master key in the hbase-site.xml file.

<property>
  <name>hbase.crypto.master.alternate.key.name</name>
  <value>hbase.old</value>
</property>
                

Rolling restart your cluster for this change to take effect. Trigger a major compaction on each table. At the end of the major compaction, all HFiles will be re-encrypted with data keys wrapped by the new cluster key. At this point, you can remove the old master key from the KeyStore, remove the configuration for the fallback master key from the hbase-site.xml, and perform a second rolling restart at some point. This second rolling restart is not time-sensitive.

8.3.5. Secure Bulk Load

Bulk loading in secure mode is a bit more involved than normal setup, since the client has to transfer the ownership of the files generated from the mapreduce job to HBase. Secure bulk loading is implemented by a coprocessor, named SecureBulkLoadEndpoint, which uses a staging directory configured by the configuration property hbase.bulkload.staging.dir, which defaults to /tmp/hbase-staging/.

Secure Bulk Load Algorithm

  • One time only, create a staging directory which is world-traversable and owned by the user which runs HBase (mode 711, or rwx--x--x). A listing of this directory will look similar to the following:

    $ ls -ld /tmp/hbase-staging
    drwx--x--x  2 hbase  hbase  68  3 Sep 14:54 /tmp/hbase-staging
              
  • A user writes out data to a secure output directory owned by that user. For example, /user/foo/data.

  • Internally, HBase creates a secret staging directory which is globally readable/writable (-rwxrwxrwx, 777). For example, /tmp/hbase-staging/averylongandrandomdirectoryname. The name and location of this directory is not exposed to the user. HBase manages creation and deletion of this directory.

  • The user makes the data world-readable and world-writable, moves it into the random staging directory, then calls the SecureBulkLoadClient#bulkLoadHFiles method.

The strength of the security lies in the length and randomness of the secret directory.

To enable secure bulk load, add the following properties to hbase-site.xml.

<property>
  <name>hbase.bulkload.staging.dir</name>
  <value>/tmp/hbase-staging</value>
</property>
<property>
  <name>hbase.coprocessor.region.classes</name>
  <value>org.apache.hadoop.hbase.security.token.TokenProvider,
  org.apache.hadoop.hbase.security.access.AccessController</value>
</property>
<property>
  <name>hbase.coprocessor.regionserver.classes</name>
  <value>org.apache.hadoop.hbase.security.token.TokenProvider,
  org.apache.hadoop.hbase.security.access.AccessController,org.apache.hadoop.hbase.security.access.SecureBulkLoadEndpoint</value>
</property>
    
comments powered by Disqus