org.apache.hadoop.mapred.lib
Class InputSampler.RandomSampler<K,V>

java.lang.Object
  extended by org.apache.hadoop.mapred.lib.InputSampler.RandomSampler<K,V>
All Implemented Interfaces:
InputSampler.Sampler<K,V>
Enclosing class:
InputSampler<K,V>

public static class InputSampler.RandomSampler<K,V>
extends Object
implements InputSampler.Sampler<K,V>

Sample from random points in the input. General-purpose sampler. Takes numSamples / maxSplitsSampled inputs from each split.


Constructor Summary
InputSampler.RandomSampler(double freq, int numSamples)
          Create a new RandomSampler sampling all splits.
InputSampler.RandomSampler(double freq, int numSamples, int maxSplitsSampled)
          Create a new RandomSampler.
 
Method Summary
 K[] getSample(InputFormat<K,V> inf, JobConf job)
          Randomize the split order, then take the specified number of keys from each split sampled, where each key is selected with the specified probability and possibly replaced by a subsequently selected key when the quota of keys from that split is satisfied.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

InputSampler.RandomSampler

public InputSampler.RandomSampler(double freq,
                                  int numSamples)
Create a new RandomSampler sampling all splits. This will read every split at the client, which is very expensive.

Parameters:
freq - Probability with which a key will be chosen.
numSamples - Total number of samples to obtain from all selected splits.

InputSampler.RandomSampler

public InputSampler.RandomSampler(double freq,
                                  int numSamples,
                                  int maxSplitsSampled)
Create a new RandomSampler.

Parameters:
freq - Probability with which a key will be chosen.
numSamples - Total number of samples to obtain from all selected splits.
maxSplitsSampled - The maximum number of splits to examine.
Method Detail

getSample

public K[] getSample(InputFormat<K,V> inf,
                     JobConf job)
              throws IOException
Randomize the split order, then take the specified number of keys from each split sampled, where each key is selected with the specified probability and possibly replaced by a subsequently selected key when the quota of keys from that split is satisfied.

Specified by:
getSample in interface InputSampler.Sampler<K,V>
Throws:
IOException


Copyright © 2009 The Apache Software Foundation