19 Using the XML Pipeline Processor for Java
An explanation is given of how to use the Extensible Markup Language (XML) pipeline processor for Java.
Introduction to the XML Pipeline Processor
Topics here include prerequisites, standards and specifications, multistage processing, and customized pipeline processing.
Prerequisites for Using the XML Pipeline Processor for Java
Prerequisites for using the XML Pipeline processor are listed.
This chapter assumes that you are familiar with these topics:
-
XML Pipeline Definition Language. This XML vocabulary enables you to describe the processing relations between XML resources. For a more thorough introduction to the Pipeline Definition Language, consult the XML resources listed in Related Documents.
-
Document Object Model (DOM). DOM is an in-memory tree representation of the structure of an XML document.
-
Simple API for XML (SAX). SAX is a standard for event-based XML parsing.
-
XML Schema language. See Using the XML Schema Processor for Java for an overview and links to suggested reading.
Standards and Specifications for the XML Pipeline Processor for Java
The Oracle XML Pipeline processor is based on the World Wide Web Consortium (W3C) XML Pipeline Definition Language Version 1.0 Note. The W3C Note defines an XML vocabulary rather than an application programming interface (API).
Pipeline Definition Language Standard for XDK for Java describes the differences between the W3C Note and the Oracle XML Developer's Kit (XDK) implementation of the Oracle XML Pipeline processor.
Multistage XML Processing
The Oracle XML Pipeline processor is built on the XML Pipeline Definition Language. The processor can take an input XML pipeline document and execute pipeline processes according to derived dependencies.
A pipeline document, which is written in XML, specifies the processes to be executed in a declarative manner. You can associate Java classes with processes by using the <processdef/>
element in the pipeline document.
Use the Pipeline processor for multistage processing, which occurs when you process XML components sequentially or in parallel. The output of one stage of processing can become the input of another stage of processing. You can write a pipeline document that defines the inputs and outputs of the processes. Figure 19-1 shows a possible pipeline sequence.
In addition to the XML Pipeline processor itself, XDK provides an API for processes that you can pipe together in a pipeline document. Table 19-2 summarizes the classes provided in the oracle.xml.pipeline.processes
package.
The typical stages of processing XML in a pipeline are:
-
Parse the input XML documents. The
oracle.xml.pipeline.processes
package includesDOMParserProcess
for DOM parsing andSAXParserProcess
for SAX parsing. -
Validate the input XML documents.
-
Serialize or transform the input documents. The Pipeline processor does not enable you to connect the SAX parser to the Extensible Stylesheet Language Transformation (XSLT) processor, which requires a DOM.
In multistage processing, SAX is ideal for filtering and searching large XML documents. Use DOM to change or access XML content efficiently and dynamically.
See Also:
Processing XML in a Pipeline to learn how to write a pipeline document that provides the input for a pipeline application
Customized Pipeline Processes
Class oracle.xml.pipeline.controller.Process
is the base class for all pipeline process definitions. The classes in package oracle.xml.pipeline.processes
extend this base class. To create a customized pipeline process, you must create a class that extends class Process
.
At the minimum, every custom process must override the do-nothing initialize()
and execute()
methods of the Process
class. If the customized process accepts SAX events as input, then it should override the SAXContentHandler()
method to return the appropriate ContentHandler
that handles incoming SAX events. It should also override the SAXErrorHandler()
method to return the appropriate ErrorHandler
. Table 19-1 provides further descriptions of the preceding methods.
Table 19-1 Methods in Class oracle.xml.pipeline.controller.Process
Class | Description |
---|---|
|
Initializes the process before execution. Invoke |
|
Executes the process. Invoke Invoke Invoke |
|
Returns the SAX If dependencies from other processes are not available, then return |
|
Returns the SAX If you do not override this method, then the JAXB processor uses the default error handler implemented by this class to handle SAX errors. |
See Also:
Oracle Database XML Java API Reference for information about package oracle.xml.pipeline.processes
Using the XML Pipeline Processor for Java: Overview
Topics here include the basic process, running the demo programs, and using the XML pipeline processor command-line utility.
Using the XML Pipeline Processor for Java: Basic Process
The basic process of the XML Pipeline Processor for Java is described.
The XML Pipeline processor for Java is accessible through these packages:
-
oracle.xml.pipeline.controller
, which provides an XML Pipeline controller that executes XML processes in a pipeline based on dependencies. -
oracle.xml.pipeline.processes
, which provides wrapper classes for XML processes that can be executed by the XML Pipeline controller. Theoracle.xml.pipeline.processes
package contains the classes that you can use to design a pipeline application framework. Each class extends theoracle.xml.pipeline.controller.Process
class.Table 19-2 lists the components in the package. You can connect these components and processes through a combination of the XML Pipeline processor and a pipeline document.
Table 19-2 Classes in oracle.xml.pipeline.processes
Class | Description |
---|---|
|
Receives compressed XML and outputs parsed XML. |
|
Receives XML parsed with DOM or SAX and outputs compressed XML. |
|
Parses incoming XML and outputs a DOM tree. |
|
Parses incoming XML and outputs SAX events. |
|
Accepts a DOM as input, uses an XPath pattern to select one or more nodes from an XML |
|
Parses an XML schema and outputs a schema object for validation. This process is built into the XML Pipeline processor and builds schema objects used for validating XML documents. |
|
Validates against a local schema, analyzes the results, and reports errors if necessary. |
|
Accepts DOM as input, applies an XSL stylesheet, and outputs the result of the transformation. |
|
Receives an XSL stylesheet as a stream or DOM and creates an |
Figure 19-2 shows how to pass a pipeline document to a Java application that uses the XML Pipeline processor, configure the processor, and execute the pipeline.
Figure 19-2 Using the Pipeline Processor for Java
Description of "Figure 19-2 Using the Pipeline Processor for Java"
The basic steps are:
Related Topics
Running the XML Pipeline Processor Demo Programs
Demo programs for the XML Pipeline processor are included in $ORACLE_HOME/xdk/demo/java/pipeline
.
Table 19-4 describes the XML files and Java source files that you can use to test the utility.
Table 19-4 Pipeline Processor Sample Files
File | Description |
---|---|
|
A text file that describes how to set up the Pipeline processor demos. |
|
A sample Pipeline processor application. The program takes |
|
A sample program to create an error handler used by |
|
A sample XML document that describes a series of books. This document is specified as an input by |
|
An XSLT stylesheet that transforms the list of books in |
|
An XSLT stylesheet specified as an input by the |
|
An XSLT stylesheet specified as an input by the |
|
An XML schema document specified as an input by the |
|
A pipeline document. This document specifies that process p1 must parse |
|
A pipeline document. This document specifies that process p1 must parse |
|
A pipeline document. This document specifies that a process p5 must parse |
|
A pipeline document. This document specifies that process p1 must parse |
|
A sample XML document that describes a purchase order. This document is specified as an input by |
Documentation for how to compile and run the sample programs is located in the README
. The basic steps are:
Using the XML Pipeline Processor Command-Line Utility
The command-line interface for the XML Pipeline processor is named orapipe
. The Pipeline processor is packaged with Oracle Database. By default, the Oracle Universal Installer installs the utility on disk in $ORACLE_HOME/bin
.
Before running the utility for the first time, ensure that your environment variables are set as described in Setting Up the XDK for Java Environment. Run orapipe
at the operating system command line with this syntax:
orapipe options pipedoc
The pipedoc
is the pipeline document, which is required. Table 19-5 describes the available options for the orapipe
utility.
Table 19-5 orapipe Command-Line Options
Option | Purpose |
---|---|
|
Prints the help message |
|
Writes errors and messages to the specified log file. The default is |
|
Does not log informational items. The default is on. |
|
Does not log warnings. The default is on. |
|
Validates the input |
|
Prints the release version. |
|
Executes the pipeline in sequential mode. The default is parallel. |
|
Executes pipeline even if target is up-to-date. By default no force is specified. |
|
Sets the value of |
Processing XML in a Pipeline
Topics here include creating a pipeline document, writing a pipeline processor application, and writing a pipeline error handler.
Creating a Pipeline Document
To use the Oracle XML Pipeline processor, you must create an XML document according to the rules of the Pipeline Definition Language specified in the W3C Note. The W3C specification defines the XML processing components and the inputs and outputs for these processes.
The XML Pipeline processor includes support for these XDK components:
-
XML parser
-
XML compressor
-
XML Schema validator
-
XSLT processor
Example of a Pipeline Document
The XML Pipeline processor executes a sequence of XML processing according to the rules in the pipeline document and returns a result. The sample pipeline document that is included in the demo directory is presented.
Example 19-1 pipedoc.xml
<pipeline xmlns="http://www.w3.org/2002/02/xml-pipeline" xml:base="http://example.org/"> <param name="target" select="myresult.html"/> <processdef name="domparser.p" definition="oracle.xml.pipeline.processes.DOMParserProcess"/> <processdef name="xslstylesheet.p" definition="oracle.xml.pipeline.processes.XSLStylesheetProcess"/> <processdef name="xslprocess.p" definition="oracle.xml.pipeline.processes.XSLProcess"/> <process id="p2" type="xslstylesheet.p" ignore-errors="false"> <input name="xsl" label="book.xsl"/> <outparam name="stylesheet" label="xslstyle"/> </process> <process id="p3" type="xslprocess.p" ignore-errors="false"> <param name="stylesheet" label="xslstyle"/> <input name="document" label="xmldoc"/> <output name="result" label="myresult.html"/> </process> <process id="p1" type="domparser.p" ignore-errors="true"> <input name="xmlsource" label="book.xml "/> <output name="dom" label="xmldoc"/> <param name="preserveWhitespace" select="true"></param> <error name="dom"> <html xmlns="http://www/w3/org/1999/xhtml"> <head> <title>DOMParser Failure!</title> </head> <body> <h1>Error parsing document</h1> </body> </html> </error> </process> </pipeline>
Processes Specified in the Pipeline Document
The processes specified in the pipeline document are described.
In Example 19-1, three processes are called and associated with Java classes in the oracle.xml.pipeline.processes
package. The pipeline document uses element processdef
to make these associations:
-
domparser.p
is associated with theDOMParserProcess
class -
xslstylesheet.p
is associated with theXSLStylesheetProcess
class -
xslprocess.p
is associated with theXSLProcess
class
Processing Architecture Specified in the Pipeline Document
The basic design of and the processing architecture of the pipeline are described.
The PipelineSample
program accepts the pipedoc.xml
document shown in Example 19-1 as input along with XML documents book.xml
and book.xsl
. The basic design of the pipeline is:
- Parse the incoming
book.xml
document and generate a DOM tree. This task is performed byDOMParserProcess
. - Parse
book.xsl
as a stream and generate anXSLStylesheet
object. This task is performed byXSLStylesheetProcess
. - Receive the DOM of
book.xml
as input, apply the stylesheet object, and write the result tomyresult.html
. This task is performed byXSLProcess
.
Note these aspects of the processing architecture used in the pipeline document:
-
The target information set,
http://example.org/myresult.html
, is inferred from the default value of thetarget
parameter and thexml:base
setting. -
The process
p2
has an input ofbook.xsl
and an output parameter with the labelxslstyle
, so it must run to produce the input forp3
. -
The
p3
process depends on input parameterxslstyle
and documentxmldoc
. -
The
p3
process has an output parameter with the labelhttp://example.org/myresult.html
, so it must run to produce the target. -
The process
p1
depends on input documentbook.xml
and outputsxmldoc
, so it must run to produce the input forp3
.
In Example 19-1, more than one order of processing can satisfy all of the dependencies. Given the rules, the XML Pipeline processor must process p3
last but can process p1
and p2
in either order or process them in parallel.
Writing a Pipeline Processor Application
The PipelineSample.java
source file shows a basic pipeline application.
You can use the application with any of the pipeline documents in Table 19-4 to parse and transform an input XML document.
The basic steps of the program are:
Writing a Pipeline Error Handler
An application invoking the XML Pipeline processor must implement the PipelineErrorHandler
interface to handle errors received from the processor. Set the error handler in the processor by invoking setErrorHandler()
. When writing the error handler, you can choose to throw an exception for different types of errors.
The oracle.xml.pipeline.controller.PipelineErrorHandler
interface declares the methods shown in Table 19-6, all of which return void
.
Table 19-6 PipelineErrorHandler Methods
Method | Description |
---|---|
|
Handles |
|
Handles fatal |
|
Handles |
|
Prints optional, additional information about errors. |
The first three methods in Table 19-6 receive a reference to an oracle.xml.pipeline.controller.PipelineException
object. These methods of the PipelineException
class are especially useful:
-
getExceptionType()
, which gets the type of exception thrown -
getProcessId()
, which gets the process ID where the exception occurred -
getMessage()
, which returns the message string of thisThrowable
error
The PipelineSampleErrHdler.java
source file implements a basic error handler for use with the PipelineSample
program. The basic steps are: