org.apache.hadoop.mapred.lib.db
Class DBInputFormat<T extends DBWritable>

java.lang.Object
  extended by org.apache.hadoop.mapred.lib.db.DBInputFormat<T>
All Implemented Interfaces:
InputFormat<LongWritable,T>, JobConfigurable

public class DBInputFormat<T extends DBWritable>
extends Object
implements InputFormat<LongWritable,T>, JobConfigurable

A InputFormat that reads input data from an SQL table.

DBInputFormat emits LongWritables containing the record number as key and DBWritables as value. The SQL query, and input class can be using one of the two setInput methods.


Nested Class Summary
protected static class DBInputFormat.DBInputSplit
          A InputSplit that spans a set of rows
protected  class DBInputFormat.DBRecordReader
          A RecordReader that reads records from a SQL table.
static class DBInputFormat.NullDBWritable
          A Class that does nothing, implementing DBWritable
 
Constructor Summary
DBInputFormat()
           
 
Method Summary
 void configure(JobConf job)
          Initializes a new instance from a JobConf.
protected  String getCountQuery()
          Returns the query for getting the total number of rows, subclasses can override this for custom behaviour.
 RecordReader<LongWritable,T> getRecordReader(InputSplit split, JobConf job, Reporter reporter)
          Get the RecordReader for the given InputSplit.
 InputSplit[] getSplits(JobConf job, int chunks)
          Logically split the set of input files for the job.
static void setInput(JobConf job, Class<? extends DBWritable> inputClass, String inputQuery, String inputCountQuery)
          Initializes the map-part of the job with the appropriate input settings.
static void setInput(JobConf job, Class<? extends DBWritable> inputClass, String tableName, String conditions, String orderBy, String... fieldNames)
          Initializes the map-part of the job with the appropriate input settings.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DBInputFormat

public DBInputFormat()
Method Detail

configure

public void configure(JobConf job)
Initializes a new instance from a JobConf.

Specified by:
configure in interface JobConfigurable
Parameters:
job - the configuration

getRecordReader

public RecordReader<LongWritable,T> getRecordReader(InputSplit split,
                                                    JobConf job,
                                                    Reporter reporter)
                                                                throws IOException
Get the RecordReader for the given InputSplit.

It is the responsibility of the RecordReader to respect record boundaries while processing the logical split to present a record-oriented view to the individual task.

Specified by:
getRecordReader in interface InputFormat<LongWritable,T extends DBWritable>
Parameters:
split - the InputSplit
job - the job that this split belongs to
Returns:
a RecordReader
Throws:
IOException

getSplits

public InputSplit[] getSplits(JobConf job,
                              int chunks)
                       throws IOException
Logically split the set of input files for the job.

Each InputSplit is then assigned to an individual Mapper for processing.

Note: The split is a logical split of the inputs and the input files are not physically split into chunks. For e.g. a split could be <input-file-path, start, offset> tuple.

Specified by:
getSplits in interface InputFormat<LongWritable,T extends DBWritable>
Parameters:
job - job configuration.
chunks - the desired number of splits, a hint.
Returns:
an array of InputSplits for the job.
Throws:
IOException

getCountQuery

protected String getCountQuery()
Returns the query for getting the total number of rows, subclasses can override this for custom behaviour.


setInput

public static void setInput(JobConf job,
                            Class<? extends DBWritable> inputClass,
                            String tableName,
                            String conditions,
                            String orderBy,
                            String... fieldNames)
Initializes the map-part of the job with the appropriate input settings.

Parameters:
job - The job
inputClass - the class object implementing DBWritable, which is the Java object holding tuple fields.
tableName - The table to read data from
conditions - The condition which to select data with, eg. '(updated > 20070101 AND length > 0)'
orderBy - the fieldNames in the orderBy clause.
fieldNames - The field names in the table
See Also:
setInput(JobConf, Class, String, String)

setInput

public static void setInput(JobConf job,
                            Class<? extends DBWritable> inputClass,
                            String inputQuery,
                            String inputCountQuery)
Initializes the map-part of the job with the appropriate input settings.

Parameters:
job - The job
inputClass - the class object implementing DBWritable, which is the Java object holding tuple fields.
inputQuery - the input query to select fields. Example : "SELECT f1, f2, f3 FROM Mytable ORDER BY f1"
inputCountQuery - the input query that returns the number of records in the table. Example : "SELECT COUNT(f1) FROM Mytable"
See Also:
setInput(JobConf, Class, String, String, String, String...)


Copyright © 2009 The Apache Software Foundation