org.apache.hadoop.tools.rumen (Hadoop 1.2.2-SNAPSHOT API)

接口概要
接口	说明
ClusterStory	`ClusterStory` represents all configurations of a MapReduce cluster, including nodes, network topology, and slot configurations.
DeepCompare	Classes that implement this interface can deep-compare [for equality only, not order] with another instance.
HistoryEvent
InputDemuxer	`InputDemuxer` dem-ultiplexes the input files into individual input streams.
JobHistoryParser	`JobHistoryParser` defines the interface of a Job History file parser.
JobStory	`JobStory` represents the runtime information available for a completed Map-Reduce job.
JobStoryProducer	`JobStoryProducer` produces the sequence of `JobStory`'s.
Outputter<T>	Interface to output a sequence of objects of type T.

类概要
类	说明
AbstractClusterStory	`AbstractClusterStory` provides a partial implementation of `ClusterStory` by parsing the topology tree.
CDFPiecewiseLinearRandomGenerator
CDFRandomGenerator	An instance of this class generates random values that confirm to the embedded `LoggedDiscreteCDF` .
ClusterTopologyReader	Reading JSON-encoded cluster topology and produce the parsed `LoggedNetworkTopology` object.
DefaultInputDemuxer	`DefaultInputDemuxer` acts as a pass-through demuxer.
DefaultOutputter<T>	The default `Outputter` that outputs to a plain file.
DeskewedJobTraceReader
Hadoop20JHParser	`JobHistoryParser` to parse job histories for hadoop 0.20 (META=1).
HadoopLogsAnalyzer	已过时
JhCounter
JhCounterGroup
JhCounters
Job20LineHistoryEventEmitter
JobBuilder	`JobBuilder` builds one job.
JobConfigurationParser	`JobConfigurationParser` parses the job configuration xml file, and extracts configuration properties.
JobFinishedEvent	Event to record successful completion of job
JobHistoryParserFactory	`JobHistoryParserFactory` is a singleton class that attempts to determine the version of job history and return a proper parser.
JobHistoryUtils
JobInfoChangeEvent	Event to record changes in the submit and launch time of a job
JobInitedEvent	Event to record the initialization of a job
JobPriorityChangeEvent	Event to record the change of priority of a job
JobStatusChangedEvent	Event to record the change of status for a job
JobSubmittedEvent	Event to record the submission of a job
JobTraceReader	Reading JSON-encoded job traces and produce `LoggedJob` instances.
JobUnsuccessfulCompletionEvent	Event to record Failed and Killed completion of jobs
JsonObjectMapperWriter<T>	Simple wrapper around `JsonGenerator` to write objects in JSON format.
LoggedDiscreteCDF	A `LoggedDiscreteCDF` is a discrete approximation of a cumulative distribution function, with this class set up to meet the requirements of the Jackson JSON parser/generator.
LoggedJob	A `LoggedDiscreteCDF` is a representation of an hadoop job, with the details of this class set up to meet the requirements of the Jackson JSON parser/generator.
LoggedLocation	A `LoggedLocation` is a representation of a point in an hierarchical network, represented as a series of membership names, broadest first.
LoggedNetworkTopology	A `LoggedNetworkTopology` represents a tree that in turn represents a hierarchy of hosts.
LoggedSingleRelativeRanking	A `LoggedSingleRelativeRanking` represents an X-Y coordinate of a single point in a discrete CDF.
LoggedTask	A `LoggedTask` represents a [hadoop] task that is part of a hadoop job.
LoggedTaskAttempt	A `LoggedTaskAttempt` represents an attempt to run an hadoop task in a hadoop job.
MachineNode	`MachineNode` represents the configuration of a cluster node.
MachineNode.Builder	Builder for a NodeInfo object
MapAttempt20LineHistoryEventEmitter
MapAttemptFinishedEvent	Event to record successful completion of a map attempt
MapTaskAttemptInfo	`MapTaskAttemptInfo` represents the information with regard to a map task attempt.
Node	`Node` represents a node in the cluster topology.
ParsedJob	This is a wrapper class around `LoggedJob`.
ParsedTask	This is a wrapper class around `LoggedTask`.
ParsedTaskAttempt	This is a wrapper class around `LoggedTaskAttempt`.
Pre21JobHistoryConstants
RackNode	`RackNode` represents a rack node in the cluster topology.
RandomSeedGenerator	The purpose of this class is to generate new random seeds from a master seed.
ReduceAttempt20LineHistoryEventEmitter
ReduceAttemptFinishedEvent	Event to record successful completion of a reduce attempt
ReduceTaskAttemptInfo	`ReduceTaskAttemptInfo` represents the information with regard to a reduce task attempt.
ResourceUsageMetrics	Captures the resource usage metrics.
RewindableInputStream	A simple wrapper class to make any input stream "rewindable".
Task20LineHistoryEventEmitter
TaskAttempt20LineEventEmitter
TaskAttemptFinishedEvent	Event to record successful task completion
TaskAttemptInfo	`TaskAttemptInfo` is a collection of statistics about a particular task-attempt gleaned from job-history of the job.
TaskAttemptStartedEvent	Event to record start of a task attempt
TaskAttemptUnsuccessfulCompletionEvent	Event to record unsuccessful (Killed/Failed) completion of task attempts
TaskFailedEvent	Event to record the failure of a task
TaskFinishedEvent	Event to record the successful completion of a task
TaskInfo
TaskStartedEvent	Event to record the start of a task
TaskUpdatedEvent	Event to record updates to a task
TopologyBuilder	Building the cluster topology.
TraceBuilder	The main driver of the Rumen Parser.
TreePath	This describes a path from a node to the root.
ZombieCluster	`ZombieCluster` rebuilds the cluster topology using the information obtained from job history logs.
ZombieJob	`ZombieJob` is a layer above `LoggedJob` raw JSON objects.
ZombieJobProducer	Producing `JobStory`s from job trace.

枚举概要
枚举	说明
EventType
JobConfPropertyNames
JobHistoryParserFactory.VersionDetector
LoggedJob.JobPriority
LoggedJob.JobType
Pre21JobHistoryConstants.Values	This enum contains some of the values commonly used by history log events.

异常错误概要
异常错误说明

DeepInequalityException
We use this exception class in the unit test, and we do a deep comparison when we run the

异常错误概要
异常错误	说明
DeepInequalityException	We use this exception class in the unit test, and we do a deep comparison when we run the

程序包org.apache.hadoop.tools.rumen的说明

Rumen is a data extraction and analysis tool built for Apache Hadoop. Rumen mines job history logs to extract meaningful data and stores it into an easily-parsed format. The default output format of Rumen is JSON. Rumen uses the Jackson library to create JSON objects.

The following classes can be used to programmatically invoke Rumen:

JobConfigurationParser
A parser to parse and filter out interesting properties from job configuration.

Sample code:

      
        // An example to parse and filter out job name
        
        String conf_filename = .. // assume the job configuration filename here
        
        // construct a list of interesting properties
        List interestedProperties = new ArrayList();
        interestedProperties.add("mapreduce.job.name");
        
        JobConfigurationParser jcp = 
          new JobConfigurationParser(interestedProperties);

        InputStream in = new FileInputStream(conf_filename);
        Properties parsedProperties = jcp.parse(in);

Some of the commonly used interesting properties are enumerated in JobConfPropertyNames.

Note: A single instance of JobConfigurationParser can be used to parse multiple job configuration files.

JobHistoryParser
A parser that parses job history files. It is an interface and actual implementations are defined as Enum in JobHistoryParserFactory. Note that RewindableInputStream
is a wrapper class around InputStream to make the input stream rewindable.
Sample code:

      
        // An example to parse a current job history file i.e a job history 
        // file for which the version is known
        
        String filename = .. // assume the job history filename here
        
        InputStream in = new FileInputStream(filename);
        
        HistoryEvent event = null;
        
        JobHistoryParser parser = new CurrentJHParser(in);
        
        event = parser.nextEvent();
        // process all the events
        while (event != null) {
          // ... process all event
          event = parser.nextEvent();
        }
        
        // close the parser and the underlying stream
        parser.close();

JobHistoryParserFactory provides a JobHistoryParserFactory.getParser(org.apache.hadoop.tools.rumen.RewindableInputStream) API to get a parser for parsing the job history file. Note that this API can be used if the job history version is unknown.

Sample code:

      
        // An example to parse a job history for which the version is not 
        // known i.e using JobHistoryParserFactory.getParser()
        
        String filename = .. // assume the job history filename here
        
        InputStream in = new FileInputStream(filename);
        RewindableInputStream ris = new RewindableInputStream(in);
        
        // JobHistoryParserFactory will check and return a parser that can
        // parse the file
        JobHistoryParser parser = JobHistoryParserFactory.getParser(ris);
        
        // now use the parser to parse the events
        HistoryEvent event = parser.nextEvent();
        while (event != null) {
          // ... process the event
          event = parser.nextEvent();
        }
        
        parser.close();

Note: Create one instance to parse a job history log and close it after use.

TopologyBuilder
Builds the cluster topology based on the job history events. Every job history file consists of events. Each event can be represented using org.apache.hadoop.mapreduce.jobhistory.HistoryEvent. These events can be passed to TopologyBuilder using org.apache.hadoop.tools.rumen.TopologyBuilder#process(org.apache.hadoop.mapreduce.jobhistory.HistoryEvent). A cluster topology can be represented using LoggedNetworkTopology. Once all the job history events are processed, the cluster topology can be obtained using TopologyBuilder.build().

Sample code:

      
        // Building topology for a job history file represented using 
        // 'filename' and the corresponding configuration file represented 
        // using 'conf_filename'
        String filename = .. // assume the job history filename here
        String conf_filename = .. // assume the job configuration filename here
        
        InputStream jobConfInputStream = new FileInputStream(filename);
        InputStream jobHistoryInputStream = new FileInputStream(conf_filename);
        
        TopologyBuilder tb = new TopologyBuilder();
        
        // construct a list of interesting properties
        List interestingProperties = new ArrayList();
        // add the interesting properties here
        interestingProperties.add("mapreduce.job.name");
        
        JobConfigurationParser jcp = 
          new JobConfigurationParser(interestingProperties);
        
        // parse the configuration file
        tb.process(jcp.parse(jobConfInputStream));
        
        // read the job history file and pass it to the 
        // TopologyBuilder.
        JobHistoryParser parser = new CurrentJHParser(jobHistoryInputStream);
        HistoryEvent e;
        
        // read and process all the job history events
        while ((e = parser.nextEvent()) != null) {
          tb.process(e);
        }
        
        LoggedNetworkTopology topology = tb.build();

JobBuilder
Summarizes a job history file. TraceBuilder provides TraceBuilder.extractJobID(String) API for extracting job id from job history or job configuration files which can be used for instantiating JobBuilder. JobBuilder generates a LoggedJob object via JobBuilder.build(). See LoggedJob for more details.

Sample code:

      
        // An example to summarize a current job history file 'filename'
        // and the corresponding configuration file 'conf_filename'
        
        String filename = .. // assume the job history filename here
        String conf_filename = .. // assume the job configuration filename here
        
        InputStream jobConfInputStream = new FileInputStream(job_filename);
        InputStream jobHistoryInputStream = new FileInputStream(conf_filename);
        
        String jobID = TraceBuilder.extractJobID(job_filename);
        JobBuilder jb = new JobBuilder(jobID);
        
        // construct a list of interesting properties
        List interestingProperties = new ArrayList();
        // add the interesting properties here
        interestingProperties.add("mapreduce.job.name");
        
        JobConfigurationParser jcp = 
          new JobConfigurationParser(interestingProperties);
        
        // parse the configuration file
        jb.process(jcp.parse(jobConfInputStream));
        
        // parse the job history file
        JobHistoryParser parser = new CurrentJHParser(jobHistoryInputStream);
        try {
          HistoryEvent e;
          // read and process all the job history events
          while ((e = parser.nextEvent()) != null) {
            jobBuilder.process(e);
          }
        } finally {
          parser.close();
        }
        
        LoggedJob job = jb.build();

Note: The order of parsing the job configuration file or job history file is not important. Create one instance to parse the history file and job configuration.

DefaultOutputter
Implements Outputter and writes JSON object in text format to the output file. DefaultOutputter can be initialized with the output filename.

Sample code:

      
        // An example to summarize a current job history file represented by
        // 'filename' and the configuration filename represented using 
        // 'conf_filename'. Also output the job summary to 'out.json' along 
        // with the cluster topology to 'topology.json'.
        
        String filename = .. // assume the job history filename here
        String conf_filename = .. // assume the job configuration filename here
        
        Configuration conf = new Configuration();
        DefaultOutputter do = new DefaultOutputter();
        do.init("out.json", conf);
        
        InputStream jobConfInputStream = new FileInputStream(filename);
        InputStream jobHistoryInputStream = new FileInputStream(conf_filename);
        
        // extract the job-id from the filename
        String jobID = TraceBuilder.extractJobID(filename);
        JobBuilder jb = new JobBuilder(jobID);
        TopologyBuilder tb = new TopologyBuilder();
        
        // construct a list of interesting properties
        List interestingProperties = new ArrayList();
        // add the interesting properties here
        interestingProperties.add("mapreduce.job.name");
        
        JobConfigurationParser jcp =
          new JobConfigurationParser(interestingProperties);
          
        // parse the configuration file
        tb.process(jcp.parse(jobConfInputStream));
        
        // read the job history file and pass it to the
        // TopologyBuilder.
        JobHistoryParser parser = new CurrentJHParser(jobHistoryInputStream);
        HistoryEvent e;
        while ((e = parser.nextEvent()) != null) {
          jb.process(e);
          tb.process(e);
        }
        
        LoggedJob j = jb.build();
        
        // serialize the job summary in json (text) format
        do.output(j);
        
        // close
        do.close();
        
        do.init("topology.json", conf);
        
        // get the job summary using TopologyBuilder
        LoggedNetworkTopology topology = topologyBuilder.build();
        
        // serialize the cluster topology in json (text) format
        do.output(topology);
        
        // close
        do.close();

JobTraceReader
A reader for reading LoggedJob serialized using DefaultOutputter. LoggedJob provides various APIs for extracting job details. Following are the most commonly used ones

LoggedJob.getMapTasks() : Get the map tasks
LoggedJob.getReduceTasks() : Get the reduce tasks
LoggedJob.getOtherTasks() : Get the setup/cleanup tasks
LoggedJob.getOutcome() : Get the job's outcome
LoggedJob.getSubmitTime() : Get the job's submit time
LoggedJob.getFinishTime() : Get the job's finish time

Sample code:

      
        // An example to read job summary from a trace file 'out.json'.
        JobTraceReader reader = new JobTracerReader("out.json");
        LoggedJob job = reader.getNext();
        while (job != null) {
          // .... process job level information
          for (LoggedTask task : job.getMapTasks()) {
            // process all the map tasks in the job
            for (LoggedTaskAttempt attempt : task.getAttempts()) {
              // process all the map task attempts in the job
            }
          }
          
          // get the next job
          job = reader.getNext();
        }
        reader.close();

ClusterTopologyReader
A reader to read LoggedNetworkTopology serialized using DefaultOutputter. ClusterTopologyReader can be initialized using the serialized topology filename. ClusterTopologyReader.get() can be used to get the LoggedNetworkTopology.

Sample code:

      
        // An example to read the cluster topology from a topology output file
        // 'topology.json'
        ClusterTopologyReader reader = new ClusterTopologyReader("topology.json");
        LoggedNetworkTopology topology  = reader.get();
        for (LoggedNetworkTopology t : topology.getChildren()) {
          // process the cluster topology
        }
        reader.close();

程序包 org.apache.hadoop.tools.rumen

程序包org.apache.hadoop.tools.rumen的说明