TableInputFormatBase (HBase 0.98.7-hadoop1 API)

java.lang.Object
- org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
- - org.apache.hadoop.hbase.mapreduce.TableInputFormatBase

Direct Known Subclasses:: TableInputFormat

@InterfaceAudience.Public
@InterfaceStability.Stable
public abstract class TableInputFormatBase
extends org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>

A base for TableInputFormats. Receives a HTable, an Scan instance that defines the input columns etc. Subclasses may use other TableRecordReader implementations.

An example of a subclass:

   class ExampleTIF extends TableInputFormatBase implements JobConfigurable {

     public void configure(JobConf job) {
       HTable exampleTable = new HTable(HBaseConfiguration.create(job),
         Bytes.toBytes("exampleTable"));
       // mandatory
       setHTable(exampleTable);
       Text[] inputColumns = new byte [][] { Bytes.toBytes("cf1:columnA"),
         Bytes.toBytes("cf2") };
       // mandatory
       setInputColumns(inputColumns);
       RowFilterInterface exampleFilter = new RegExpRowFilter("keyPrefix.*");
       // optional
       setRowFilter(exampleFilter);
     }

     public void validateInput(JobConf job) throws IOException {
     }
  }

Constructor Summary

Constructors
Constructor and Description

TableInputFormatBase()

Constructors
Constructor and Description
`TableInputFormatBase()`

Method Summary

Methods
Modifier and Type	Method and Description
`org.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,Result>`	`createRecordReader(org.apache.hadoop.mapreduce.InputSplit split, org.apache.hadoop.mapreduce.TaskAttemptContext context)` Builds a TableRecordReader.
`protected HTable`	`getHTable()` Allows subclasses to get the `HTable`.
`Scan`	`getScan()` Gets the scan defining the actual details like columns etc.
`List<org.apache.hadoop.mapreduce.InputSplit>`	`getSplits(org.apache.hadoop.mapreduce.JobContext context)` Calculates the splits that will serve as input for the map tasks.
`protected boolean`	`includeRegionInSplit(byte[] startKey, byte[] endKey)` Test if the given region is to be included in the InputSplit while splitting the regions of a table.
`String`	`reverseDNS(InetAddress ipAddress)`
`protected void`	`setHTable(HTable table)` Allows subclasses to set the `HTable`.
`void`	`setScan(Scan scan)` Sets the scan defining the actual details like columns etc.
`protected void`	`setTableRecordReader(TableRecordReader tableRecordReader)` Allows subclasses to set the `TableRecordReader`.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - TableInputFormatBase
```
public TableInputFormatBase()
```
- Method Detail
  - createRecordReader
```
public org.apache.hadoop.mapreduce.RecordReader<ImmutableBytesWritable,Result> createRecordReader(org.apache.hadoop.mapreduce.InputSplit split,
                                                                                         org.apache.hadoop.mapreduce.TaskAttemptContext context)
                                                                                           throws IOException
```
    Builds a TableRecordReader. If no TableRecordReader was provided, uses the default.
    
    Specified by:
    
    createRecordReader in class org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
    
    Parameters:
    split - The split to work with.
    context - The current context.
    
    Returns:
    The newly created record reader.
    
    Throws:
    
    IOException - When creating the reader fails.
    See Also:
    InputFormat.createRecordReader( org.apache.hadoop.mapreduce.InputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext)
  - getSplits
```
public List<org.apache.hadoop.mapreduce.InputSplit> getSplits(org.apache.hadoop.mapreduce.JobContext context)
                                                       throws IOException
```
    Calculates the splits that will serve as input for the map tasks. The number of splits matches the number of regions in a table.
    
    Specified by:
    
    getSplits in class org.apache.hadoop.mapreduce.InputFormat<ImmutableBytesWritable,Result>
    
    Parameters:
    context - The current job context.
    
    Returns:
    The list of input splits.
    
    Throws:
    
    IOException - When creating the list of splits fails.
    See Also:
    InputFormat.getSplits( org.apache.hadoop.mapreduce.JobContext)
  - reverseDNS
```
public String reverseDNS(InetAddress ipAddress)
                  throws NamingException,
                         UnknownHostException
```
    Throws:
    
    NamingException
    
    UnknownHostException
  - includeRegionInSplit
```
protected boolean includeRegionInSplit(byte[] startKey,
                           byte[] endKey)
```
    Test if the given region is to be included in the InputSplit while splitting the regions of a table.
    This optimization is effective when there is a specific reasoning to exclude an entire region from the M-R job, (and hence, not contributing to the InputSplit), given the start and end keys of the same.
    Useful when we need to remember the last-processed top record and revisit the [last, current) interval for M-R processing, continuously. In addition to reducing InputSplits, reduces the load on the region server as well, due to the ordering of the keys.
    
    Note: It is possible that endKey.length() == 0 , for the last (recent) region.
    Override this method, if you want to bulk exclude regions altogether from M-R. By default, no region is excluded( i.e. all regions are included).
    
    Parameters:
    startKey - Start key of the region
    endKey - End key of the region
    
    Returns:
    true, if this region needs to be included as part of the input (default).
  - getHTable
```
protected HTable getHTable()
```
    Allows subclasses to get the HTable.
  - setHTable
```
protected void setHTable(HTable table)
```
    Allows subclasses to set the HTable.
    
    Parameters:
    table - The table to get the data from.
  - getScan
```
public Scan getScan()
```
    Gets the scan defining the actual details like columns etc.
    
    Returns:
    The internal scan instance.
  - setScan
```
public void setScan(Scan scan)
```
    Sets the scan defining the actual details like columns etc.
    
    Parameters:
    scan - The scan to set.
  - setTableRecordReader
```
protected void setTableRecordReader(TableRecordReader tableRecordReader)
```
    Allows subclasses to set the TableRecordReader.
    
    Parameters:
    tableRecordReader - A different TableRecordReader implementation.

Class TableInputFormatBase

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

TableInputFormatBase

Method Detail

createRecordReader

getSplits

reverseDNS

includeRegionInSplit

getHTable

setHTable

getScan

setScan

setTableRecordReader