HiveAccumuloTableInputFormat (Hive 1.2.2 API)

java.lang.Object
- org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat

All Implemented Interfaces:

org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,AccumuloHiveRow>
```
public class HiveAccumuloTableInputFormat
extends Object
implements org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,AccumuloHiveRow>
```
Wraps older InputFormat for use with Hive. Configure input scan with proper ranges, iterators, and columns based on serde properties for Hive table.

Field Summary

Fields
Modifier and Type	Field and Description
`protected org.apache.accumulo.core.client.mapred.AccumuloRowInputFormat`	`accumuloInputFormat`
`protected HiveAccumuloHelper`	`helper`
`protected AccumuloPredicateHandler`	`predicateHandler`

Constructor Summary

Constructors
Constructor and Description

HiveAccumuloTableInputFormat()

Constructors
Constructor and Description
`HiveAccumuloTableInputFormat()`

Method Summary

Methods
Modifier and Type	Method and Description
`protected void`	`addIterators(org.apache.hadoop.mapred.JobConf conf, List<org.apache.accumulo.core.client.IteratorSetting> iterators)`
`protected void`	`configure(org.apache.hadoop.mapred.JobConf conf, org.apache.accumulo.core.client.Instance instance, org.apache.accumulo.core.client.Connector connector, AccumuloConnectionParameters accumuloParams, ColumnMapper columnMapper, List<org.apache.accumulo.core.client.IteratorSetting> iterators, Collection<org.apache.accumulo.core.data.Range> ranges)` Configure the underlying AccumuloInputFormat
`protected void`	`fetchColumns(org.apache.hadoop.mapred.JobConf conf, Set<org.apache.accumulo.core.util.Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>> cfCqPairs)`
`protected ColumnMapper`	`getColumnMapper(org.apache.hadoop.conf.Configuration conf)`
`protected HashSet<org.apache.accumulo.core.util.Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>>`	`getPairCollection(List<ColumnMapping> columnMappings)` Create col fam/qual pairs from pipe separated values, usually from config object.
`org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,AccumuloHiveRow>`	`getRecordReader(org.apache.hadoop.mapred.InputSplit inputSplit, org.apache.hadoop.mapred.JobConf jobConf, org.apache.hadoop.mapred.Reporter reporter)` Setup accumulo input format from conf properties.
`org.apache.hadoop.mapred.InputSplit[]`	`getSplits(org.apache.hadoop.mapred.JobConf jobConf, int numSplits)`
`protected String`	`getTableName(org.apache.accumulo.core.client.mapred.RangeInputSplit split)` Reflection to work around Accumulo 1.5 and 1.6 incompatibilities.
`protected void`	`setConnectorInfo(org.apache.hadoop.mapred.JobConf conf, String user, org.apache.accumulo.core.client.security.tokens.AuthenticationToken token)`
`protected void`	`setInputTableName(org.apache.hadoop.mapred.JobConf conf, String tableName)`
`protected void`	`setMockInstance(org.apache.hadoop.mapred.JobConf conf, String instanceName)`
`protected void`	`setRanges(org.apache.hadoop.mapred.JobConf conf, Collection<org.apache.accumulo.core.data.Range> ranges)`
`protected void`	`setScanAuthorizations(org.apache.hadoop.mapred.JobConf conf, org.apache.accumulo.core.security.Authorizations auths)`
`protected void`	`setTableName(org.apache.accumulo.core.client.mapred.RangeInputSplit split, String tableName)` Sets the table name on a RangeInputSplit, accounting for change in method name.
`protected void`	`setZooKeeperInstance(org.apache.hadoop.mapred.JobConf conf, String instanceName, String zkHosts, boolean isSasl)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

accumuloInputFormat

protected org.apache.accumulo.core.client.mapred.AccumuloRowInputFormat accumuloInputFormat

predicateHandler

protected AccumuloPredicateHandler predicateHandler

helper
```
protected HiveAccumuloHelper helper
```

Constructor Detail
- HiveAccumuloTableInputFormat
```
public HiveAccumuloTableInputFormat()
```

Method Detail

getSplits

public org.apache.hadoop.mapred.InputSplit[] getSplits(org.apache.hadoop.mapred.JobConf jobConf,
                                              int numSplits)
                                                throws IOException

Specified by:: getSplits in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,AccumuloHiveRow>
Throws:: IOException

getRecordReader

public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.Text,AccumuloHiveRow> getRecordReader(org.apache.hadoop.mapred.InputSplit inputSplit,
                                                                                               org.apache.hadoop.mapred.JobConf jobConf,
                                                                                               org.apache.hadoop.mapred.Reporter reporter)
                                                                                                 throws IOException

Setup accumulo input format from conf properties. Delegates to final RecordReader from mapred package.

Specified by:: getRecordReader in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.Text,AccumuloHiveRow>
Parameters:: inputSplit -; jobConf -; reporter -
Returns:: RecordReader
Throws:: IOException

getColumnMapper

protected ColumnMapper getColumnMapper(org.apache.hadoop.conf.Configuration conf)
                                throws IOException,
                                       TooManyAccumuloColumnsException

Throws:: IOException; TooManyAccumuloColumnsException

configure

protected void configure(org.apache.hadoop.mapred.JobConf conf,
             org.apache.accumulo.core.client.Instance instance,
             org.apache.accumulo.core.client.Connector connector,
             AccumuloConnectionParameters accumuloParams,
             ColumnMapper columnMapper,
             List<org.apache.accumulo.core.client.IteratorSetting> iterators,
             Collection<org.apache.accumulo.core.data.Range> ranges)
                  throws org.apache.accumulo.core.client.AccumuloSecurityException,
                         org.apache.accumulo.core.client.AccumuloException,
                         SerDeException,
                         IOException

Configure the underlying AccumuloInputFormat

Parameters:: conf - Job configuration; instance - Accumulo instance; connector - Accumulo connector; accumuloParams - Connection information to the Accumulo instance; columnMapper - Configuration of Hive to Accumulo columns; iterators - Any iterators to be configured server-side; ranges - Accumulo ranges on for the query
Throws:: org.apache.accumulo.core.client.AccumuloSecurityException; org.apache.accumulo.core.client.AccumuloException; SerDeException; IOException

setMockInstance

protected void setMockInstance(org.apache.hadoop.mapred.JobConf conf,
                   String instanceName)

setZooKeeperInstance

protected void setZooKeeperInstance(org.apache.hadoop.mapred.JobConf conf,
                        String instanceName,
                        String zkHosts,
                        boolean isSasl)
                             throws IOException

Throws:: IOException

setConnectorInfo

protected void setConnectorInfo(org.apache.hadoop.mapred.JobConf conf,
                    String user,
                    org.apache.accumulo.core.client.security.tokens.AuthenticationToken token)
                         throws org.apache.accumulo.core.client.AccumuloSecurityException

Throws:: org.apache.accumulo.core.client.AccumuloSecurityException

setInputTableName

protected void setInputTableName(org.apache.hadoop.mapred.JobConf conf,
                     String tableName)

setScanAuthorizations

protected void setScanAuthorizations(org.apache.hadoop.mapred.JobConf conf,
                         org.apache.accumulo.core.security.Authorizations auths)

addIterators

protected void addIterators(org.apache.hadoop.mapred.JobConf conf,
                List<org.apache.accumulo.core.client.IteratorSetting> iterators)

setRanges

protected void setRanges(org.apache.hadoop.mapred.JobConf conf,
             Collection<org.apache.accumulo.core.data.Range> ranges)

fetchColumns

protected void fetchColumns(org.apache.hadoop.mapred.JobConf conf,
                Set<org.apache.accumulo.core.util.Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>> cfCqPairs)

getPairCollection
```
protected HashSet<org.apache.accumulo.core.util.Pair<org.apache.hadoop.io.Text,org.apache.hadoop.io.Text>> getPairCollection(List<ColumnMapping> columnMappings)
```
Create col fam/qual pairs from pipe separated values, usually from config object. Ignores rowID.

Parameters:
columnMappings - The list of ColumnMappings for the given query

Returns:
a Set of Pairs of colfams and colquals

getTableName
```
protected String getTableName(org.apache.accumulo.core.client.mapred.RangeInputSplit split)
                       throws IOException
```
Reflection to work around Accumulo 1.5 and 1.6 incompatibilities. Throws an IOException for any reflection related exceptions

Parameters:
split - A RangeInputSplit

Returns:
The name of the table from the split

Throws:

IOException

setTableName
```
protected void setTableName(org.apache.accumulo.core.client.mapred.RangeInputSplit split,
                String tableName)
                     throws IOException
```
Sets the table name on a RangeInputSplit, accounting for change in method name. Any reflection related exception is wrapped in an IOException

Parameters:
split - The RangeInputSplit to operate on
tableName - The name of the table to set

Throws:

IOException

Class HiveAccumuloTableInputFormat

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

accumuloInputFormat

predicateHandler

helper

Constructor Detail

HiveAccumuloTableInputFormat

Method Detail

getSplits

getRecordReader

getColumnMapper

configure

setMockInstance

setZooKeeperInstance

setConnectorInfo

setInputTableName

setScanAuthorizations

addIterators

setRanges

fetchColumns

getPairCollection

getTableName

setTableName