|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.pig.EvalFunc<T>
org.apache.pig.builtin.BuildBloomBase<DataByteArray>
org.apache.pig.builtin.BuildBloom
public class BuildBloom
Build a bloom filter for use later in Bloom. This UDF is intended to run
in a group all job. For example:
define bb BuildBloom('jenkins', '100', '0.1');
A = load 'foo' as (x, y);
B = group A all;
C = foreach B generate BuildBloom(A.x);
store C into 'mybloom';
The bloom filter can be on multiple keys by passing more than one field
(or the entire bag) to BuildBloom.
The resulting file can then be used in a Bloom filter as:
define bloom Bloom(mybloom);
A = load 'foo' as (x, y);
B = load 'bar' as (z);
C = filter B by Bloom(z);
D = join C by z, A by x;
It uses BloomFilter.
| Nested Class Summary | |
|---|---|
static class |
BuildBloom.Final
|
static class |
BuildBloom.Initial
|
static class |
BuildBloom.Intermediate
|
| Nested classes/interfaces inherited from class org.apache.pig.EvalFunc |
|---|
EvalFunc.SchemaType |
| Field Summary |
|---|
| Fields inherited from class org.apache.pig.builtin.BuildBloomBase |
|---|
filter, hType, numHash, vSize |
| Fields inherited from class org.apache.pig.EvalFunc |
|---|
log, pigLogger, reporter, returnType |
| Constructor Summary | |
|---|---|
BuildBloom(String hashType,
String numElements,
String desiredFalsePositive)
Construct a Bloom filter based on expected number of elements and desired accuracy. |
|
BuildBloom(String hashType,
String mode,
String vectorSize,
String nbHash)
Build a bloom filter of fixed size and number of hash functions. |
|
| Method Summary | |
|---|---|
DataByteArray |
exec(Tuple input)
This callback method must be implemented by all subclasses. |
String |
getFinal()
Get the final function. |
String |
getInitial()
Get the initial function. |
String |
getIntermed()
Get the intermediate function. |
Schema |
outputSchema(Schema input)
Report the schema of the output of this UDF. |
| Methods inherited from class org.apache.pig.builtin.BuildBloomBase |
|---|
bloomIn, bloomOr, bloomOut |
| Methods inherited from class org.apache.pig.EvalFunc |
|---|
finish, getArgToFuncMapping, getCacheFiles, getInputSchema, getLogger, getPigLogger, getReporter, getReturnType, getSchemaName, getSchemaType, isAsynchronous, progress, setInputSchema, setPigLogger, setReporter, setUDFContextSignature, warn |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public BuildBloom(String hashType,
String mode,
String vectorSize,
String nbHash)
hashType - type of the hashing function (see
Hash).mode - Will be ignored, though by convention it should be
"fixed" or "fixedsize"vectorSize - The vector size of this filter.nbHash - The number of hash functions to consider.
public BuildBloom(String hashType,
String numElements,
String desiredFalsePositive)
hashType - type of the hashing function (see
Hash).numElements - The number of distinct elements expected to be
placed in this filter.desiredFalsePositive - the acceptable rate of false positives.
This should be a floating point value between 0 and 1.0, where 1.0
would be 100% (ie, a totally useless filter).| Method Detail |
|---|
public DataByteArray exec(Tuple input)
throws IOException
EvalFunc
exec in class EvalFunc<DataByteArray>input - the Tuple to be processed.
IOExceptionpublic String getInitial()
Algebraic
getInitial in interface Algebraicpublic String getIntermed()
Algebraic
getIntermed in interface Algebraicpublic String getFinal()
Algebraic
getFinal in interface Algebraicpublic Schema outputSchema(Schema input)
EvalFunc
The default implementation interprets the OutputSchema annotation,
if one is present. Otherwise, it returns null (no known output schema).
outputSchema in class EvalFunc<DataByteArray>input - Schema of the input
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||