PathPartitionHelper (Pig 0.13.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.pig.piggybank.storage.partition
Class PathPartitionHelper

java.lang.Object
  org.apache.pig.piggybank.storage.partition.PathPartitionHelper

public class PathPartitionHelper
extends Object
extends Object

Implements the logic for:

Listing partition keys and values used in an hdfs path
Filtering of partitions from a pig filter operator expression

Restrictions
Function calls are not supported by this partition helper and it can only handle String values.
This is normally not a problem given that partition values are part of the hdfs folder path and is given a
determined value that would not need parsing by any external processes.

Field Summary
`static String`	`PARITITION_FILTER_EXPRESSION`
`static String`	`PARTITION_COLUMNS`

Constructor Summary
`PathPartitionHelper()`

Method Summary
`Set<String>`	`getPartitionKeys(String location, org.apache.hadoop.conf.Configuration conf)` Returns the partition keys for a location. The work is delegated to the PathPartitioner class
`Map<String,String>`	`getPathPartitionKeyValues(String location)` Returns the Partition keys and each key's value for a single location. That is the location must be something like mytable/partition1=a/partition2=b/myfile. This method will return a map with [partition1='a', partition2='b'] The work is delegated to the PathPartitioner class
`List<org.apache.hadoop.fs.FileStatus>`	`listStatus(org.apache.hadoop.mapreduce.JobContext ctx, Class<? extends LoadFunc> loaderClass, String signature)` This method is called by the FileInputFormat to find the input paths for which splits should be calculated. If applyDateRanges == true: Then the HiveRCDateSplitter is used to apply filtering on the input files. Else the default FileInputFormat listStatus method is used.
`void`	`setPartitionFilterExpression(String partitionFilterExpression, Class<? extends LoadFunc> loaderClass, String signature)` Sets the PARITITION_FILTER_EXPRESSION property in the UDFContext identified by the loaderClass.
`void`	`setPartitionKeys(String location, org.apache.hadoop.conf.Configuration conf, Class<? extends LoadFunc> loaderClass, String signature)` Reads the partition keys from the location i.e the base directory

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Field Detail

PARTITION_COLUMNS

public static final String PARTITION_COLUMNS

PARITITION_FILTER_EXPRESSION

public static final String PARITITION_FILTER_EXPRESSION

Constructor Detail

PathPartitionHelper

public PathPartitionHelper()

Method Detail

getPathPartitionKeyValues

public Map<String,String> getPathPartitionKeyValues(String location)
                                             throws IOException

Returns the Partition keys and each key's value for a single location.
That is the location must be something like mytable/partition1=a/partition2=b/myfile.
This method will return a map with [partition1='a', partition2='b']
The work is delegated to the PathPartitioner class

Parameters:: location -
Returns:: Map of String, String
Throws:: IOException

getPartitionKeys

public Set<String> getPartitionKeys(String location,
                                    org.apache.hadoop.conf.Configuration conf)
                             throws IOException

Returns the partition keys for a location.
The work is delegated to the PathPartitioner class

Parameters:: location - String must be the base directory for the partitions; conf -
Returns:
Throws:: IOException

setPartitionFilterExpression

public void setPartitionFilterExpression(String partitionFilterExpression,
                                         Class<? extends LoadFunc> loaderClass,
                                         String signature)
                                  throws IOException

Sets the PARITITION_FILTER_EXPRESSION property in the UDFContext identified by the loaderClass.

Parameters:: partitionFilterExpression -; loaderClass -
Throws:: IOException

setPartitionKeys

public void setPartitionKeys(String location,
                             org.apache.hadoop.conf.Configuration conf,
                             Class<? extends LoadFunc> loaderClass,
                             String signature)
                      throws IOException

Reads the partition keys from the location i.e the base directory

Parameters:: location - String must be the base directory for the partitions; conf -; loaderClass -
Throws:: IOException

listStatus

public List<org.apache.hadoop.fs.FileStatus> listStatus(org.apache.hadoop.mapreduce.JobContext ctx,
                                                        Class<? extends LoadFunc> loaderClass,
                                                        String signature)
                                                 throws IOException

This method is called by the FileInputFormat to find the input paths for which splits should be calculated.
If applyDateRanges == true: Then the HiveRCDateSplitter is used to apply filtering on the input files.
Else the default FileInputFormat listStatus method is used.

Parameters:: ctx - JobContext; loaderClass - this is chosen to be a subclass of LoadFunc to maintain some consistency.
Throws:: IOException