CombineFileInputFormat (Hadoop 1.2.2-SNAPSHOT API)

java.lang.Object
- org.apache.hadoop.mapreduce.InputFormat<K,V>
- - org.apache.hadoop.mapreduce.lib.input.FileInputFormat<K,V>
  - - org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat<K,V>

```
@InterfaceAudience.Public
@InterfaceStability.Stable
public abstract class CombineFileInputFormat<K,V>
extends FileInputFormat<K,V>
```
An abstract InputFormat that returns CombineFileSplit's in InputFormat.getSplits(JobContext) method. Splits are constructed from the files under the input paths. A split cannot have files from different pools. Each split returned may contain blocks from different files. If a maxSplitSize is specified, then blocks on the same node are combined to form a single split. Blocks that are left over are then combined with other blocks in the same rack. If maxSplitSize is not specified, then blocks from the same rack are combined in a single split; no attempt is made to create node-local splits. If the maxSplitSize is equal to the block size, then this class is similar to the default splitting behavior in Hadoop: each block is a locally processed split. Subclasses implement InputFormat.createRecordReader(InputSplit, TaskAttemptContext) to construct RecordReader's for CombineFileSplit's.

另请参阅:
CombineFileSplit

嵌套类概要
- 从类继承的嵌套类/接口 org.apache.hadoop.mapreduce.lib.input.FileInputFormat
  FileInputFormat.Counter

字段概要

字段
限定符和类型字段和说明

static String SPLIT_MINSIZE_PERNODE

static String SPLIT_MINSIZE_PERRACK

字段
限定符和类型	字段和说明
`static String`	`SPLIT_MINSIZE_PERNODE`
`static String`	`SPLIT_MINSIZE_PERRACK`

构造器概要

构造器
构造器和说明

CombineFileInputFormat()
default constructor

构造器
构造器和说明
`CombineFileInputFormat()` default constructor

方法概要

方法
限定符和类型	方法和说明
`protected void`	`createPool(List<PathFilter> filters)` Create a new pool and add the filters to it.
`protected void`	`createPool(PathFilter... filters)` Create a new pool and add the filters to it.
`abstract RecordReader<K,V>`	`createRecordReader(InputSplit split, TaskAttemptContext context)` This is not implemented yet.
`protected BlockLocation[]`	`getFileBlockLocations(FileSystem fs, FileStatus stat)`
`List<InputSplit>`	`getSplits(JobContext job)` Generate the list of files and make them into FileSplits.
`protected boolean`	`isSplitable(JobContext context, Path file)` Is the given filename splitable?
`protected void`	`setMaxSplitSize(long maxSplitSize)` Specify the maximum size (in bytes) of each split.
`protected void`	`setMinSplitSizeNode(long minSplitSizeNode)` Specify the minimum size (in bytes) of each split per node.
`protected void`	`setMinSplitSizeRack(long minSplitSizeRack)` Specify the minimum size (in bytes) of each split per rack.

从类继承的方法 org.apache.hadoop.mapreduce.lib.input.FileInputFormat
addInputPath, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, listStatus, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize

从类继承的方法 java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- 字段详细资料
  - SPLIT_MINSIZE_PERNODE
```
public static final String SPLIT_MINSIZE_PERNODE
```
    另请参阅:
    常量字段值
  - SPLIT_MINSIZE_PERRACK
```
public static final String SPLIT_MINSIZE_PERRACK
```
    另请参阅:
    常量字段值
- 构造器详细资料
  - CombineFileInputFormat
```
public CombineFileInputFormat()
```
    default constructor
- 方法详细资料
  - setMaxSplitSize
```
protected void setMaxSplitSize(long maxSplitSize)
```
    Specify the maximum size (in bytes) of each split. Each split is approximately equal to the specified size.
  - setMinSplitSizeNode
```
protected void setMinSplitSizeNode(long minSplitSizeNode)
```
    Specify the minimum size (in bytes) of each split per node. This applies to data that is left over after combining data on a single node into splits that are of maximum size specified by maxSplitSize. This leftover data will be combined into its own split if its size exceeds minSplitSizeNode.
  - setMinSplitSizeRack
```
protected void setMinSplitSizeRack(long minSplitSizeRack)
```
    Specify the minimum size (in bytes) of each split per rack. This applies to data that is left over after combining data on a single rack into splits that are of maximum size specified by maxSplitSize. This leftover data will be combined into its own split if its size exceeds minSplitSizeRack.
  - createPool
```
protected void createPool(List<PathFilter> filters)
```
    Create a new pool and add the filters to it. A split cannot have files from different pools.
  - createPool
```
protected void createPool(PathFilter... filters)
```
    Create a new pool and add the filters to it. A pathname can satisfy any one of the specified filters. A split cannot have files from different pools.
  - isSplitable
```
protected boolean isSplitable(JobContext context,
                  Path file)
```
    从类复制的说明: FileInputFormat
    
    Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be. FileInputFormat implementations can override this and return false to ensure that individual input files are never split-up so that Mappers process entire files.
    
    覆盖:
    
    isSplitable 在类中 FileInputFormat<K,V>
    
    参数:
    context - the job context
    file - the file name to check
    
    返回:
    is this file splitable?
  - getSplits
```
public List<InputSplit> getSplits(JobContext job)
                           throws IOException
```
    从类复制的说明: FileInputFormat
    
    Generate the list of files and make them into FileSplits.
    
    覆盖:
    
    getSplits 在类中 FileInputFormat<K,V>
    
    参数:
    job - job configuration.
    
    返回:
    an array of InputSplits for the job.
    
    抛出:
    
    IOException
  - createRecordReader
```
public abstract RecordReader<K,V> createRecordReader(InputSplit split,
                                   TaskAttemptContext context)
                                              throws IOException
```
    This is not implemented yet.
    
    指定者:
    
    createRecordReader 在类中 InputFormat<K,V>
    
    参数:
    split - the split to be read
    context - the information about the task
    
    返回:
    a new record reader
    
    抛出:
    
    IOException
  - getFileBlockLocations
```
protected BlockLocation[] getFileBlockLocations(FileSystem fs,
                                    FileStatus stat)
                                         throws IOException
```
    抛出:
    
    IOException

类 CombineFileInputFormat<K,V>

嵌套类概要

从类继承的嵌套类/接口 org.apache.hadoop.mapreduce.lib.input.FileInputFormat

字段概要

构造器概要

方法概要

从类继承的方法 org.apache.hadoop.mapreduce.lib.input.FileInputFormat

从类继承的方法 java.lang.Object

字段详细资料

SPLIT_MINSIZE_PERNODE

SPLIT_MINSIZE_PERRACK

构造器详细资料

CombineFileInputFormat

方法详细资料

setMaxSplitSize

setMinSplitSizeNode

setMinSplitSizeRack

createPool

createPool

isSplitable

getSplits

createRecordReader

getFileBlockLocations