@InterfaceAudience.Public @InterfaceStability.Stable public abstract class TableInputFormatBase extends org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
TableInputFormats. Receives a HTable, an
Scan instance that defines the input columns etc. Subclasses may use
other TableRecordReader implementations.
An example of a subclass:
class ExampleTIF extends TableInputFormatBase implements JobConfigurable {
public void configure(JobConf job) {
HTable exampleTable = new HTable(HBaseConfiguration.create(job),
Bytes.toBytes("exampleTable"));
// mandatory
setHTable(exampleTable);
Text[] inputColumns = new byte [][] { Bytes.toBytes("cf1:columnA"),
Bytes.toBytes("cf2") };
// mandatory
setInputColumns(inputColumns);
RowFilterInterface exampleFilter = new RegExpRowFilter("keyPrefix.*");
// optional
setRowFilter(exampleFilter);
}
public void validateInput(JobConf job) throws IOException {
}
}
| Constructor and Description |
|---|
TableInputFormatBase() |
| Modifier and Type | Method and Description |
|---|---|
org.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,Result> |
createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
Builds a TableRecordReader.
|
protected HTable |
getHTable()
Allows subclasses to get the
HTable. |
Scan |
getScan()
Gets the scan defining the actual details like columns etc.
|
List<org.apache.hadoop.mapreduce.InputSplit> |
getSplits(org.apache.hadoop.mapreduce.JobContext context)
Calculates the splits that will serve as input for the map tasks.
|
protected boolean |
includeRegionInSplit(byte[] startKey,
byte[] endKey)
Test if the given region is to be included in the InputSplit while splitting
the regions of a table.
|
String |
reverseDNS(InetAddress ipAddress) |
protected void |
setHTable(HTable table)
Allows subclasses to set the
HTable. |
void |
setScan(Scan scan)
Sets the scan defining the actual details like columns etc.
|
protected void |
setTableRecordReader(TableRecordReader tableRecordReader)
Allows subclasses to set the
TableRecordReader. |
public org.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,Result> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException
createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>split - The split to work with.context - The current context.IOException - When creating the reader fails.InputFormat.createRecordReader(
org.apache.hadoop.mapreduce.InputSplit,
org.apache.hadoop.mapreduce.TaskAttemptContext)public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context) throws IOException
getSplits in class org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>context - The current job context.IOException - When creating the list of splits fails.InputFormat.getSplits(
org.apache.hadoop.mapreduce.JobContext)public String reverseDNS(InetAddress ipAddress) throws NamingException, UnknownHostException
NamingExceptionUnknownHostExceptionprotected boolean includeRegionInSplit(byte[] startKey,
byte[] endKey)
This optimization is effective when there is a specific reasoning to exclude an entire region from the M-R job,
(and hence, not contributing to the InputSplit), given the start and end keys of the same.
Useful when we need to remember the last-processed top record and revisit the [last, current) interval for M-R processing,
continuously. In addition to reducing InputSplits, reduces the load on the region server as well, due to the ordering of the keys.
Note: It is possible that endKey.length() == 0 , for the last (recent) region.
Override this method, if you want to bulk exclude regions altogether from M-R. By default, no region is excluded( i.e. all regions are included).
startKey - Start key of the regionendKey - End key of the regionprotected void setHTable(HTable table)
HTable.table - The table to get the data from.public Scan getScan()
public void setScan(Scan scan)
scan - The scan to set.protected void setTableRecordReader(TableRecordReader tableRecordReader)
TableRecordReader.tableRecordReader - A different TableRecordReader
implementation.Copyright © 2014 The Apache Software Foundation. All rights reserved.