org.apache.hadoop.examples.terasort
Class TeraInputFormat

java.lang.Object
  extended by org.apache.hadoop.mapred.FileInputFormat<Text,Text>
      extended by org.apache.hadoop.examples.terasort.TeraInputFormat
All Implemented Interfaces:
InputFormat<Text,Text>

public class TeraInputFormat
extends FileInputFormat<Text,Text>

An input format that reads the first 10 characters of each line as the key and the rest of the line as the value. Both key and value are represented as Text.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.hadoop.mapred.FileInputFormat
FileInputFormat.Counter
 
Field Summary
 
Fields inherited from class org.apache.hadoop.mapred.FileInputFormat
LOG
 
Constructor Summary
TeraInputFormat()
           
 
Method Summary
 RecordReader<Text,Text> getRecordReader(InputSplit split, JobConf job, Reporter reporter)
          Get the RecordReader for the given InputSplit.
 InputSplit[] getSplits(JobConf conf, int splits)
          Splits files returned by FileInputFormat.listStatus(JobConf) when they're too big.
static void writePartitionFile(JobConf conf, Path partFile)
          Use the input splits to take samples of the input and generate sample keys.
 
Methods inherited from class org.apache.hadoop.mapred.FileInputFormat
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, isSplitable, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TeraInputFormat

public TeraInputFormat()
Method Detail

writePartitionFile

public static void writePartitionFile(JobConf conf,
                                      Path partFile)
                               throws IOException
Use the input splits to take samples of the input and generate sample keys. By default reads 100,000 keys from 10 locations in the input, sorts them and picks N-1 keys to generate N equally sized partitions.

Parameters:
conf - the job to sample
partFile - where to write the output file to
Throws:
IOException - if something goes wrong

getRecordReader

public RecordReader<Text,Text> getRecordReader(InputSplit split,
                                               JobConf job,
                                               Reporter reporter)
                                        throws IOException
Description copied from interface: InputFormat
Get the RecordReader for the given InputSplit.

It is the responsibility of the RecordReader to respect record boundaries while processing the logical split to present a record-oriented view to the individual task.

Specified by:
getRecordReader in interface InputFormat<Text,Text>
Specified by:
getRecordReader in class FileInputFormat<Text,Text>
Parameters:
split - the InputSplit
job - the job that this split belongs to
Returns:
a RecordReader
Throws:
IOException

getSplits

public InputSplit[] getSplits(JobConf conf,
                              int splits)
                       throws IOException
Description copied from class: FileInputFormat
Splits files returned by FileInputFormat.listStatus(JobConf) when they're too big.

Specified by:
getSplits in interface InputFormat<Text,Text>
Overrides:
getSplits in class FileInputFormat<Text,Text>
Parameters:
conf - job configuration.
splits - the desired number of splits, a hint.
Returns:
an array of InputSplits for the job.
Throws:
IOException


Copyright © 2009 The Apache Software Foundation