TeraInputFormat (Hadoop 1.2.1 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.hadoop.examples.terasort
Class TeraInputFormat

java.lang.Object
  org.apache.hadoop.mapred.FileInputFormat<Text,Text>
      org.apache.hadoop.examples.terasort.TeraInputFormat

All Implemented Interfaces:: InputFormat<Text,Text>

public class TeraInputFormat
extends FileInputFormat<Text,Text>
extends FileInputFormat<Text,Text>

An input format that reads the first 10 characters of each line as the key and the rest of the line as the value. Both key and value are represented as Text.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.mapred.FileInputFormat
`FileInputFormat.Counter`

Field Summary

Fields inherited from class org.apache.hadoop.mapred.FileInputFormat
`LOG`

Constructor Summary
`TeraInputFormat()`

Method Summary
`RecordReader<Text,Text>`	`getRecordReader(InputSplit split, JobConf job, Reporter reporter)` Get the `RecordReader` for the given `InputSplit`.
`InputSplit[]`	`getSplits(JobConf conf, int splits)` Splits files returned by `FileInputFormat.listStatus(JobConf)` when they're too big.
`static void`	`writePartitionFile(JobConf conf, Path partFile)` Use the input splits to take samples of the input and generate sample keys.

Methods inherited from class org.apache.hadoop.mapred.FileInputFormat
`addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, isSplitable, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize`

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

TeraInputFormat

public TeraInputFormat()

Method Detail

writePartitionFile

public static void writePartitionFile(JobConf conf,
                                      Path partFile)
                               throws IOException

Use the input splits to take samples of the input and generate sample keys. By default reads 100,000 keys from 10 locations in the input, sorts them and picks N-1 keys to generate N equally sized partitions.

Parameters:: conf - the job to sample; partFile - where to write the output file to
Throws:: IOException - if something goes wrong

getRecordReader

public RecordReader<Text,Text> getRecordReader(InputSplit split,
                                               JobConf job,
                                               Reporter reporter)
                                        throws IOException

Description copied from interface: InputFormat

Get the RecordReader for the given InputSplit.

It is the responsibility of the RecordReader to respect record boundaries while processing the logical split to present a record-oriented view to the individual task.

Specified by:: getRecordReader in interface InputFormat<Text,Text>
Specified by:: getRecordReader in class FileInputFormat<Text,Text>

Parameters:: split - the InputSplit; job - the job that this split belongs to
Returns:: a RecordReader
Throws:: IOException

getSplits

public InputSplit[] getSplits(JobConf conf,
                              int splits)
                       throws IOException

Description copied from class: FileInputFormat

Splits files returned by FileInputFormat.listStatus(JobConf) when they're too big.

Specified by:: getSplits in interface InputFormat<Text,Text>
Overrides:: getSplits in class FileInputFormat<Text,Text>

Parameters:: conf - job configuration.; splits - the desired number of splits, a hint.
Returns:: an array of InputSplits for the job.
Throws:: IOException