CombineFileInputFormat (Hadoop 1.2.2-SNAPSHOT API)

java.lang.Object
- org.apache.hadoop.mapred.FileInputFormat<K,V>
- - org.apache.hadoop.mapred.lib.CombineFileInputFormat<K,V>

所有已实现的接口:

InputFormat<K,V>
```
public abstract class CombineFileInputFormat<K,V>
extends FileInputFormat<K,V>
```
An abstract InputFormat that returns CombineFileSplit's in InputFormat.getSplits(JobConf, int) method. Splits are constructed from the files under the input paths. A split cannot have files from different pools. Each split returned may contain blocks from different files. If a maxSplitSize is specified, then blocks on the same node are combined to form a single split. Blocks that are left over are then combined with other blocks in the same rack. If maxSplitSize is not specified, then blocks from the same rack are combined in a single split; no attempt is made to create node-local splits. If the maxSplitSize is equal to the block size, then this class is similar to the default spliting behaviour in Hadoop: each block is a locally processed split. Subclasses implement InputFormat.getRecordReader(InputSplit, JobConf, Reporter) to construct RecordReader's for CombineFileSplit's.

另请参阅:
CombineFileSplit

嵌套类概要
- 从类继承的嵌套类/接口 org.apache.hadoop.mapred.FileInputFormat
  FileInputFormat.Counter

字段概要
- 从类继承的字段 org.apache.hadoop.mapred.FileInputFormat
  LOG

构造器概要

构造器
构造器和说明

CombineFileInputFormat()
default constructor

构造器
构造器和说明
`CombineFileInputFormat()` default constructor

方法概要

方法
限定符和类型	方法和说明
`protected void`	`createPool(JobConf conf, List<PathFilter> filters)` Create a new pool and add the filters to it.
`protected void`	`createPool(JobConf conf, PathFilter... filters)` Create a new pool and add the filters to it.
`abstract RecordReader<K,V>`	`getRecordReader(InputSplit split, JobConf job, Reporter reporter)` This is not implemented yet.
`InputSplit[]`	`getSplits(JobConf job, int numSplits)` Splits files returned by `FileInputFormat.listStatus(JobConf)` when they're too big.
`protected void`	`setMaxSplitSize(long maxSplitSize)` Specify the maximum size (in bytes) of each split.
`protected void`	`setMinSplitSizeNode(long minSplitSizeNode)` Specify the minimum size (in bytes) of each split per node.
`protected void`	`setMinSplitSizeRack(long minSplitSizeRack)` Specify the minimum size (in bytes) of each split per rack.

从类继承的方法 org.apache.hadoop.mapred.FileInputFormat
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, isSplitable, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize

从类继承的方法 java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- 构造器详细资料
  - CombineFileInputFormat
```
public CombineFileInputFormat()
```
    default constructor
- 方法详细资料
  - setMaxSplitSize
```
protected void setMaxSplitSize(long maxSplitSize)
```
    Specify the maximum size (in bytes) of each split. Each split is approximately equal to the specified size.
  - setMinSplitSizeNode
```
protected void setMinSplitSizeNode(long minSplitSizeNode)
```
    Specify the minimum size (in bytes) of each split per node. This applies to data that is left over after combining data on a single node into splits that are of maximum size specified by maxSplitSize. This leftover data will be combined into its own split if its size exceeds minSplitSizeNode.
  - setMinSplitSizeRack
```
protected void setMinSplitSizeRack(long minSplitSizeRack)
```
    Specify the minimum size (in bytes) of each split per rack. This applies to data that is left over after combining data on a single rack into splits that are of maximum size specified by maxSplitSize. This leftover data will be combined into its own split if its size exceeds minSplitSizeRack.
  - createPool
```
protected void createPool(JobConf conf,
              List<PathFilter> filters)
```
    Create a new pool and add the filters to it. A split cannot have files from different pools.
  - createPool
```
protected void createPool(JobConf conf,
              PathFilter... filters)
```
    Create a new pool and add the filters to it. A pathname can satisfy any one of the specified filters. A split cannot have files from different pools.
  - getSplits
```
public InputSplit[] getSplits(JobConf job,
                     int numSplits)
                       throws IOException
```
    从类复制的说明: FileInputFormat
    
    Splits files returned by FileInputFormat.listStatus(JobConf) when they're too big.
    
    指定者:
    
    getSplits 在接口中 InputFormat<K,V>
    
    覆盖:
    
    getSplits 在类中 FileInputFormat<K,V>
    
    参数:
    job - job configuration.
    numSplits - the desired number of splits, a hint.
    
    返回:
    an array of InputSplits for the job.
    
    抛出:
    
    IOException
  - getRecordReader
```
public abstract RecordReader<K,V> getRecordReader(InputSplit split,
                                JobConf job,
                                Reporter reporter)
                                           throws IOException
```
    This is not implemented yet.
    
    指定者:
    
    getRecordReader 在接口中 InputFormat<K,V>
    
    指定者:
    
    getRecordReader 在类中 FileInputFormat<K,V>
    
    参数:
    split - the InputSplit
    job - the job that this split belongs to
    
    返回:
    a RecordReader
    
    抛出:
    
    IOException

类 CombineFileInputFormat<K,V>

嵌套类概要

从类继承的嵌套类/接口 org.apache.hadoop.mapred.FileInputFormat

字段概要

从类继承的字段 org.apache.hadoop.mapred.FileInputFormat

构造器概要

方法概要

从类继承的方法 org.apache.hadoop.mapred.FileInputFormat

从类继承的方法 java.lang.Object

构造器详细资料

CombineFileInputFormat

方法详细资料

setMaxSplitSize

setMinSplitSizeNode

setMinSplitSizeRack

createPool

createPool

getSplits

getRecordReader