|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.pig.impl.plan.Operator<PhyPlanVisitor>
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit
public class POSplit
The MapReduce Split operator.
The assumption here is that the logical to physical translation will create this dummy operator with just the filename using which the input branch will be stored and used for loading Also the translation should make sure that appropriate filter operators are configured as outputs of this operator using the conditions specified in the LOSplit. So LOSplit will be converted into: | | | Filter1 Filter2 ... Filter3 | | ... | | | ... | ---- POSplit -... ---- This is different than the existing implementation where the POSplit writes to sidefiles after filtering and then loads the appropriate file.
The approach followed here is as good as the old approach if not better in many cases because of the availability of attachinInputs. An optimization that can ensue is if there are multiple loads that load the same file, they can be merged into one and then the operators that take input from the load can be stored. This can be used when the mapPlan executes to read the file only once and attach the resulting tuple as inputs to all the operators that take input from this load. In some cases where the conditions are exclusive and some outputs are ignored, this approach can be worse. But this leads to easier management of the Split and also allows to reuse this data stored from the split job whenever necessary.
| Nested Class Summary |
|---|
| Nested classes/interfaces inherited from class org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator |
|---|
PhysicalOperator.OriginalLocation |
| Field Summary |
|---|
| Fields inherited from class org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator |
|---|
alias, illustrator, input, inputAttached, inputs, lineageTracer, outputs, parentPlan, pigLogger, requestedParallelism, res, resultType |
| Fields inherited from class org.apache.pig.impl.plan.Operator |
|---|
mKey |
| Constructor Summary | |
|---|---|
POSplit(OperatorKey k)
Constructs an operator with the specified key |
|
POSplit(OperatorKey k,
int rp)
Constructs an operator with the specified key and degree of parallelism |
|
POSplit(OperatorKey k,
int rp,
List<PhysicalOperator> inp)
Constructs an operator with the specified key, degree of parallelism and inputs |
|
POSplit(OperatorKey k,
List<PhysicalOperator> inp)
Constructs an operator with the specified key and inputs |
|
| Method Summary | |
|---|---|
void |
addPlan(PhysicalPlan inPlan)
Appends the specified plan to the end of the nested input plan list |
Result |
getNextTuple()
|
List<PhysicalPlan> |
getPlans()
Returns the list of nested plans. |
FileSpec |
getSplitStore()
Returns the name of the file associated with this operator |
Tuple |
illustratorMarkup(Object in,
Object out,
int eqClassIndex)
input tuple mark up to be illustrate-able |
String |
name()
|
void |
removePlan(PhysicalPlan plan)
Removes plan from the nested input plan list |
void |
setSplitStore(FileSpec splitStore)
Sets the name of the file associated with this operator |
boolean |
supportsMultipleInputs()
Indicates whether this operator supports multiple inputs. |
boolean |
supportsMultipleOutputs()
Indicates whether this operator supports multiple outputs. |
void |
visit(PhyPlanVisitor v)
Visit this node with the provided visitor. |
| Methods inherited from class org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator |
|---|
addOriginalLocation, addOriginalLocation, attachInput, clone, cloneHelper, detachInput, getAlias, getAliasString, getIllustrator, getInputs, getLogger, getNext, getNextBigDecimal, getNextBigInteger, getNextBoolean, getNextDataBag, getNextDataByteArray, getNextDateTime, getNextDouble, getNextFloat, getNextInteger, getNextLong, getNextMap, getNextString, getOriginalLocations, getPigLogger, getReporter, getRequestedParallelism, getResultType, isAccumStarted, isAccumulative, isBlocking, isInputAttached, processInput, reset, setAccumEnd, setAccumStart, setAccumulative, setIllustrator, setInputs, setParentPlan, setPigLogger, setReporter, setRequestedParallelism, setResultType |
| Methods inherited from class org.apache.pig.impl.plan.Operator |
|---|
compareTo, equals, getOperatorKey, getProjectionMap, hashCode, regenerateProjectionMap, rewire, toString, unsetProjectionMap |
| Methods inherited from class java.lang.Object |
|---|
finalize, getClass, notify, notifyAll, wait, wait, wait |
| Constructor Detail |
|---|
public POSplit(OperatorKey k)
k - the operator key
public POSplit(OperatorKey k,
int rp)
k - the operator keyrp - the degree of parallelism requested
public POSplit(OperatorKey k,
List<PhysicalOperator> inp)
k - the operator keyinp - the inputs that this operator will read data from
public POSplit(OperatorKey k,
int rp,
List<PhysicalOperator> inp)
k - the operator keyrp - the degree of parallelism requestedinp - the inputs that this operator will read data from| Method Detail |
|---|
public void visit(PhyPlanVisitor v)
throws VisitorException
Operator
visit in class PhysicalOperatorv - Visitor to visit with.
VisitorException - if the visitor has a problem.public String name()
name in class Operator<PhyPlanVisitor>public boolean supportsMultipleInputs()
Operator
supportsMultipleInputs in class Operator<PhyPlanVisitor>public boolean supportsMultipleOutputs()
Operator
supportsMultipleOutputs in class Operator<PhyPlanVisitor>public FileSpec getSplitStore()
public void setSplitStore(FileSpec splitStore)
splitStore - the FileSpec used to store the datapublic List<PhysicalPlan> getPlans()
PlanPrinterpublic void addPlan(PhysicalPlan inPlan)
inPlan - plan to be appended to the listpublic void removePlan(PhysicalPlan plan)
plan - plan to be removed
public Result getNextTuple()
throws ExecException
getNextTuple in class PhysicalOperatorExecException
public Tuple illustratorMarkup(Object in,
Object out,
int eqClassIndex)
Illustrable
in - input tupleout - output tuple before wrapped in ExampleTupleeqClassIndex - index into equivalence classes in illustrator
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||