dom4j

CVS

Fixed bug in BeanElement which prevented proper execution of the bean samples (contributed by Wonne Keysers).

STAXEventWriter now uses XMLEventConsumer instead of XMLEventWriter (contributed by Christian Niles).

Fixed bug in SAXReader that caused problems parsing files in OSX (reported by Paul Libbrecht).

Fixed bug in XMLWriter that caused whitespace to be added between successive calls of the characters(...) method (reported by Paul Libbrecht).

Improved performance of NamespaceCache in multithreaded environments (contributed by Brett Finnell).

1.5 beta 2

Added flag to OutputFormat that supresses newline after XML declaration.

Upgraded dependencies to their latest version on ibiblio.

Added method to DocumentHelper that allows user to specify encoding when parsing an xml String (contributed by Todd Wolff).

Fixed a ClassCastException bug in BeanElement.

Fixed a bug in SAXContentHandler which caused a NullPointerException in some situations.

Fixed bug which prevented an element's namespace prefix from being registered for use in xpath expressions (contributed by Todd Wolff).

Fixed bug in XMLWriter that caused duplication of the default namespace declaration (reported by Todd Wolff).

Added a bunch of patches to make the dom4j DOM classes more DOM compliant (contributed by Curt Arnold).

Fixed bug in DispatchHandler which made the handler not reusable (reported by Ricardo Leon).

Fixed bug in SAXContentHandler that caused incorrect CDATA section parsing (contributed by Todd Wolff).

Fixed bug in SAXContentHandler that caused incorrect entity handling.

Fixed bug in XMLWriter causing padding to be disabled, even if enabled in the specified outputformat (reported by Bo Gundersen).

Added initial support for STaX streams (contributed by Christian Niles).

1.5 beta 1

Fixed encoding bug in Document.asXML() and DocumentHelper.parseText().

Fixed bug in SAXReader that caused problems resolving relative URIs when parsing java.io.File Objects (reported by Kohsuke Kawaguchi).

The iterators returned by the Element.elementIterator(...) methods now support remove().

DOMWriter writes now DOM Level 2 attributes and elements (reported by Geert Dendoncker and Joury Gokel).

Use latest implementation of the Aelfred parser.

Fixed some problems with internal/external DTD declarations (reported by Bryan Thompson).

Upgraded to Jaxen 1.1 beta 2.

Ignore attribute order when comparing Elements in NodeComparator.

Fixed bug in XMLWriter where namespace declarations were duplicated.

Fixed bug in parsing a Processing Instruction (reported by Vladimir Kralik).

Added support for Stylesheet modes (reported by Mark Diggory).

Don't escape " and ' characters in attribute values if it's not necessary (contributed by Christian Niles).

Fixed some DOMNodeHelper issues (reported by Henner Kollmann).

Fixed some datatype issues (reported by Thomas Draier).

Fixed an bug where the EntityResolver was not set on the XMLReader.

Fixed multithreaded access on DefaultElement.

Fixed problem parsing XML Files (reported by Geoffrey Vlassaks).

Added xml:space attribute support based on XML Specification 1.0.

Maven build of dom4j is now nearly complete. Maven is now used for the website generation.

Fixed some bugs in BackedList (contributed by Alessandro Vernet).

1.4 Release

Added patch supplied by Dan Jacobs that fixes some entity encoding problems in XMLWriter - cheers Dan!

Patched the DOMElement replaceChild method to return the correct Node and to throw a DOMException when trying to replace a non-existing child.

Added patch to BackedList that could cause IndexOutOfBoundsExceptions to be thrown that was kindly supplied by Andy Yang - thanks Andy!

Update of Cookbook containing a chapter about rule API.

Patched SAXWriter to not pass in null System or Public IDs which can cause problems in Saxon.

Patched dom4j to work against Jaxen 1.0 RC1 or later build.

Applied patch to bug found by Tom Oehser that XPath expressions using elements or attributes whose name starts with '_' were not being handled correctly. It turns out this was a SAXPath issue.

Applied patch to bug found by Soumanjoy Das that creating a new DOMDocument then calling createElement() would generate a ClassCastException.

Applied patch supplied by James Dodd that fixes a MIME encoding issue in the embedded Aelfred parser

Applied patch to fix bug found by David Frankson. Adding attributes with null values causes problems in XSLT engines so now adding a null valued attribute is equivalent to removing the attribute. So null attribute values are silently ignored. e.g.

Element element = ...;
element.addAttribute( "foo", "123" );
...
Attribute attribute = element.attribute( "foo" );
assertTrue( attribute != null );
...
element.addAttribute( "foo", null );
attribute = element.attribute( "foo" );
assertTrue( attribute == null );

1.3 Release

Patches

Applied patch to bug found by Mike Skells that was causing XPath.matches() to return true for absolute XPaths which returned different nodes to the node provided to the XPath.

Applied patch provided by Stefan that was causing IndexOutOfBoundsException when using the evaluate() method in DefaultXPath on an empty result set. Also added a test case to org.dom4j.TestXPathBug called testStefan().

Applied patch suggested by Frank Walinsky, that XPath objects are now Serializable.

Applied patch provided by Bill Burton that fixes union pattern matching.

1.2 Release

New Swing TableModel for displaying XML

Added a new Swing TableModel for displaying XML data in a Swing user interface. It uses an XPath based model to define the rows and column values. A table definition can be specified using a simple XML format and then loaded in a small amount of code. e.g. here's an example of a table that will list the servlets used in a web.xml document

<table select="/web-app/servlet">
  <column select="servlet-name">Name</column>
  <column select="servlet-class">Class</column>
  <column select="../servlet-mapping[servlet-name=$Name]/url-pattern">Mapping</column>
</table>

Notice the use of the $Name XPath variable to access other cells on the row. Here's the pseudo code to display a table for an XML document.

Document tableDefinition = ...;
Document source = ...;
TableModel tableModel = new XMLTableModel( tableDefinition, source );
JTable table = new JTable( tableModel );

There is a sample program in samples/swing/JTableTool which will display any table definition for a given source XML document. There is an example table definition for the periodic table in xml/swing/tableForAtoms.xml.

Registering Namespace URIs for XPath

Added a new helper method to make it easier to create namespace contexts for doing namespace aware XPath expressions. The new setNamespaceURIs(Map) method on XPath makes it easier to pass in the prefixes and URIs you wish to use in an XPath expression. Here's an example of it in action

Map uris = new HashMap();
uris.put( "SOAP-ENV", "http://schemas.xmlsoap.org/soap/envelope/" );
uris.put( "m", "urn:xmethodsBabelFish" );        
XPath xpath = document.createXPath( "/SOAP-ENV:Envelope/SOAP-ENV:Body/m:BabelFish" );
xpath.setNamespaceURIs( uris );        
Node element = xpath.selectSingleNode( document );

In addition DocumentFactory has a setXPathNamespaceURIs(Map) method so that common namespace URIs can be associated with a DocumentFactory so namespace prefixes can be used across many XPath expressions in an easy way. e.g.

// register prefixes with my factory
Map uris = new HashMap();
uris.put( "SOAP-ENV", "http://schemas.xmlsoap.org/soap/envelope/" );
uris.put( "m", "urn:xmethodsBabelFish" );        

DocumentFactory factory = new DocumentFactory();
factory.setXPathNamespaceURIs( uris );

// now parse a document using my factory
SAXReader reader = new SAXReader();
reader.setDocumentFactory( factory );
Document doc = reader.read( "soap.xml" );

// now lets use the prefixes
Node element = doc.selectSingleNode( "/SOAP-ENV:Envelope/SOAP-ENV:Body/m:BabelFish" );

Whitespace handling

There is a new mergeAdjacentText option available on SAXReader to concatenate adjacent text nodes into a single Text node.

In addition there is a new stripWhitespaceText option to strip text which occurs between start/end tags which only consists of whitespace.

For example, parsing the following XML with the stripWhitespaceText option enabled and the mergeAdjacentText option enabled will result in a single child node of the parent element, rather than 3 (2 text nodes containing whitespace and one element).

<parent>
  <child>foo</child>
</parent>

Note that this option will not break most mixed content markup such as the following, since its only whitespace between tag start/ends that gets removed; non-whitespace strings are not trimmed.

<p>hello <b>James</b> how are you?</p>

Both these options together can improve the parsing performance by around 10-12% depending on the structure of the document. Though the whitespace layout of the XML document will be lost, so only use these modes in data-centric applications like XML messaging and SOAP.

So a typical SOAP or XML messaging developer, who may care more about performance than preserving exact whitespace layout, may use the following code to make the SAX parsing more optimal.

SAXReader reader = new SAXReader();
reader.setMergeAdjacentText( true );
reader.setStripWhitespaceText( true );
Document doc = reader.read( "soap.xml" );

Patches

Applied patch to HTMLWriter to fix bug found by Dominik Deimling that was not correctly outputting CDATA sections correctly.

Patched the setName() method on Element so that elements can be renamed. Also added a new setQName() to the Element interface so that elements can be renamed in a namespace aware manner. Thanks to Robert Lebowitz for this.

Applied fix to bug found by Manfred Lotz that XMLWriter in whitespace trimming mode would sometimes not correctly insert a space when text is seperated by newlines. The Test case testWhitespaceBug() in org.dom4j.TestXMLWriter reproduces the bug that has now been fixed.

Applied patches supplied by Stefan Graeber that enhance the datatype support to support included schemata and derived element types.

Applied patches suggested by Omer van der Horst Jansen to enable dom4j to fully work properly on JDK1.1 platforms. There were some uses of java.util.Stack which have been changed to ArrayList.

Applied patches supplied by Maarten Coene that fixes some issues with using the correct DocumentFactory when using the DOM implementation.

Updated the MSV support to comply with the latest MSV version, 1.12 (Nov 01 2001). In addition the MSVDemo.java in dom4j/src/samples/validate has been replaced by JARVDemo.java which now uses the JARV API to validate a dom4j document using the MSV implementation. This demo can validate any XML document against any DTD, XML Schema, Relax NG, Relax or Trex schema - thanks to the excellent JARV API and MSV library.

Applied patches supplied by Steen Lehmann that fixes handling of external DTD entities in SAXContentHandler and fix the XML output of the ExternalEntityDecl

Applied patch to bug found by Steen Lehmann that XPath expressions on the root element were not correctly handling namespaces correctly. The test case is demonstrated in dom4j/src/test/org/dom4j/xpath/TestSelectSingleNode.java

Added patch found by Howard Moore when using XTags that XPath string values which contained strings with entities, such as the use of & in a text, would result in redundant spaces occuring, breaking URLs.

1.1 Release

New features

Added a new package, org.dom4j.dtd which contains some DTD declaration classes which are added to the DocumentType interfaces List of declarations. This is useful for finding out details of the attribute or element delcarations inside either the internal or external DTD subset of a document.

To expand internal or external DTD subsets when parsing with SAXReader use the 2 properties on SAXReader (and SAXContentHandler).

SAXReader reader = new SAXReader();
reader.setIncludeInternalDTDDeclarations( true );
reader.setIncludeExternalDTDDeclarations( true );
Document doc = reader.read( "foo.xml" );

DocumentType docType = doc.getDocType();
List internalDecls = docType.getInternalDeclarations();
List externalDecls = docType.getExternalDeclarations();

This new feature means that XML documents which use internal DTD subsets, external DTDs or a mixture of internal and external DTD subsets can now be properly round tripped.

Note that there appears to be a bug in Crimson 1.1.3 which does not properly differentiate between internal or external DTD subsets. Refer to the startDTD() method of LexicalHandler for details of how startEntity/endEntity is meant to demark external DTD subsets.

Its our intention to expand internal DTD subsets by default (so that documents can be properly round tripped by default) but require external DTD subsets to be explicitly enabled via the property on the SAXReader (or SAXContentHandler). This bug in Crimson causes all DTD declarations to appear as internal DTD subsets, which both is a performance overhead and breaks round tripping of documents which just use external DTD declarations. So until this matter is resolved both internal and external declarations are not expanded by default.

Note that the code works perfectly against Xerces.

Patches

Applied patch submitted by Yuxin Ruan which fixes some issues with XML Schema Data Type support

Followed Dennis Sosnoski's suggestion, adding a null text String to an Element now throws an IllegalArgumentException. To ensure that the IllegalArgumentException is not thrown its advisable to check for null first. For example...

Element element = ...;
String text = ...;

// might throw IllegalArgumentException
// if text == null
element.addText( text );

// safer to do this
if ( text != null ) {
  element.addText( text );
}

Fixed problem found by Kesav Kumar Kolla whereby a deserialized Document could have problems if new elements were attempted to be added. The problem was an issue with DocumentFactory not correctly deserializing itself properly.

Fixed problem found by David Hooker with Ant build file for the binary and source distribution that was not including the manifest file in the distribution.

Applied patch submitted by Lari Hotari that was causing the XMLWriter to fail when used as a SAX XMLFilter or ContentHandler to turn SAX events into XML text. Thanks Lari!

Fixed bug found by Kohsuke Kawaguchi that there was a problem in XMLWriter during its serialization of a document which redeclared the default namespace prefix. It turned out to be a bug in org.dom4j.tree.NamespaceStack where redeclarations of namespace prefixes were not being handled properly during serialization. The test cases in org.dom4j.TestXMLWriter and org.dom4j.TestNamespaces have been improved to test these features more rigorously.

Fixed bug found by Toby that was causing a security exception in applets when using a DocumentFactory.

Implemented the suggestion by Kesav Kumar, that the detach() method now returns the node (this) so that moving nodes from one part of a document to any another can now be one line of code. Here's an example of it in use.

Document doc1 = ...;
Document doc2; = ...;
Element destination = doc2.getRootElement();
Element source = doc1.selectSingleNode( "//foo[@style='bar']" );

// lets move the source to the destination
destination.add( source.detach() );

Added better checking in selectSingleNode() implementation so that XPath expressions which do not return a Node throw a meaningful exception (not ClassCastException) informing the user of why the XPath expression did not succeed.

Added patch found by Kesav Kumar that a document containing null Strings would cause a NullPointerException to be thrown if it was passed into SAXWriter (used by the JAXP - XSLT code). Now the SAXWriter will quietly ignore null Strings, as will XMLWriter.

1.0 release

New features

Added helper method setXMLFilter() to SAXReader making it easier to install SAX filters to filter or preprocess SAX events before a dom4j Document is created. Also added a new sample program called sax.FilterDemo that demonstrates how to use a SAX filter with dom4j.

Added full support for Jaxen function, namespace and variable context interfaces. This allows the XPath engine to be fully customized. e.g.

XPath xpath = document.createXPath( "//foo[@code='123']" );

// customize function, namespace and variable contexts
xpath.setFunctionContext( myFunctionContext );
xpath.setNamespaceContext( myNamespaceContext );
xpath.setVariableContext( myVariableContext );

List nodes = xpath.selectNodes( document );

Added new helper class org.dom4j.util.XMLErrorHandler which turns SAX ErrorHandler callbacks into XML that can then be output in a JAXM or SOAP message or styled via XSLT or whatever.

Added new helper method DocumentHelper.makeElement(doc, "a/b/c") which will navigate from a document or element to the given simple path, creating new elements along the way if need be. This allows elements to be found or created using a simple path expression mechansim.

Added helper method getQName(String qualifiedName) to Element so that easier element name matching can be done. Here are some examples of it in use.

// find all elements with a local name of "foo" 
// in any namespace
List list = element.elements( "foo" );

// find all elements with a local name "foo" 
// and the default namespace URI
List list = element.elements( element.getQName( "foo" ) );

// find all elements which match the local name "foo" 
// and the namespace URI mapped to the "x" prefix
List list = element.elements( element.getQName( "x:foo" ) );

Added helper method on org.dom4j.DocumentFactory called getQNames that returns a List of all the QNames that were used to parse the documents.

Added an EntityResolver property to SAXReader to make it easier to configure a specific EntityResolver.

Patches

Added patch so that patterns such as @id='123' and name()='foo' are now working properly again. Also patterns such as not(@id='123') work now too.

Patched the dynamic loading of classes to fix some ClassLoader issues found with some application servers.

Ported the data type support to work with the latest MSV library from Sun

Fixed bug spotted by Stefan Graeber that was causing a DocumentException to be thrown with Xerces when turning validation mode on.

Patched bug in QName which was using the qualified name rather than the local name along with the namespace URI to determine equality.

Added patch kindly supplied by Michal Palicka that SAXReader was passing in the wrong name for the SAX string-interning feature. Thanks Michal!

Fixed the behaviour of DocumentFactory.createXPathFilter() to use XPath filtering rather than XSLT style patterns. One of the major differences is that an XSLT pattern (used in the <xsl:template match="pattern"/> element in XSLT) works slightly differently. An element <foo> would match an XSLT pattern "foo" whereas an element <bar> could match an XPath filter "foo" if it contained a child <foo> element.

Patched the behaviour of Node.matches(String xpathExpression) so that it uses XPath filters now rather than XSLT patterns.

Patched bug in XRule implementation in org.dom4j.rule that was causing ordering problems when using stylesheets - the Rule precendence order was not being correctly used.

Backed out a previous patch added to 0.9 such that attributes with no namespace prefix are in no namespace. An attribute does not inherit the default namespace - the only way to put an attribute into a namespace is via a namespace prefix.

Patched XMLWriter to that a flush() is not required when using an OutputStream and the various sub-document write() methods are called such as write(Element), write(Attribute), write(Node), write(Namespace) etc.

Fixed bug in SAXReader that setEntityResolver() was not always behaving properly. Also the default entity resolver used to locate XML Schemas seems to work properly now.

Moved the XML Schema Data Type supporting classes in org.dom4j.schema.Schema* to org.dom4j.datatype.Datatype*. This should avoid confusion and better describe the intent of the classes, to implement Data typing, rather than schema validation. We hope to use the MSV library for all of our schema validation requirements.

0.9 release

Full support for the Jaxen XPath engine

The XPath engine in dom4j has been migrated to using Jaxen. This single XPath engine can be plugged into any model such that Jaxen will support DOM, dom4j, EXML and JDOM. Hopefully we'll get Jaxen working on Java Beans too.

In general this will mean a much better, more compliant and more bug-free XPath engine for dom4j as it will be used extensively across XML object models.

Already numerous irregularities have been fixed in the XPath support in dom4j. We have donated the dom4j XPath test harness to Jaxen so that we now have a large rigorous test harness to ensure correct XPath behaviour - this test harness is run against all 4 current XML object models to ensure consistent behaviour and valid XPath compliance.

We are also in the process of migrating over our XPath extension functions as well as adding additional XPath functions such as those defined in XSLT and XPointer.

New features

New class org.dom4j.io.XMLResult which is-a JAXP Result which uses the same org.dom4j.io.OutputFormat object to provide its formatting options to allow XML output from JAXP (such as via XSLT) to be pretty printed.

XMLWriter now implements the SAX XMLFilter interface so that it can be added to a SAX parsing filter chain to output the XML being parsed in a simple way. Many thanks to Joseph Bowbeer for his help in this area.

Added setProperty() and setFeature() methods to SAXReader to allow the easy configuration of custom parser properties via SAXReader, such as being able to specify the location of schema or DTD resources.

Added new method OutputFormat.createCompactFormat() for those wishing to output their XML in a compact format, such as in messaging systems.

Patches and bug fixes

Fixed bug in getNamespaceForPrefix() where if the prefix is null or "" and there is a default namespace defined, this method was returning a namespace instance with the incorrect URI.

Patched DOM writer so that it uses JAXP if it is available on the CLASSPATH using namespace aware mode by default.

Fixed a number of issues relating to namespaces and the redefinition of namespace prefixes. We now have a quite aggressive JUnit test harness to ensure that we handle namespace URIs correctly when prefixes are mapped and unmapped.

Applied patch from Andrew Wason for HTMLWriter to support the full HTML 4.01 DTD elements which do not require proper XML element closes. The new elements are PARAM, AREA, LINK, COL, BASE and META.

Fixed bug found by Dennis Sosnoski that SAX warnings were causing exceptions to be thrown by the SAXReader. Now warnings are silently ignored. If you want to detect warnings then an ErrorHandler should be registered with the SAXReader.

Patched bug that was also found by Jonathan Doughty for the non-standard behaviour of the FilterIterator. Also added Jonathan's JUnit test case to the distribution so that this problem should not come back.

Fixed bug that when round tripping into JAXP and back again, sometimes additional namespace attributes were appearing. Now the TestRoundTrip JUnit test case includes JAXP round tripping.

Fixed bug that attributes without a namespace prefix which are inside an element with a default namespace declaration of the form xmlns="theURI", the attribute now correctly inherits the namespace URI.

Applied patch found by Stefan Graeber that the UserDataFactory was not correctly creating UserDataAttribute instances.

Fixed bug that SAXWriter and DocumentSource were not correctly producing lexical events such as entities, comments and DOCTYPE declarations. Many thanks to Joseph Bowbeer for his help in this area.

0.8 release

New methods

hasContent()

has been added to the Node interface so that it is easy to decide if a node is a leaf node or not. This method was suggested by Dane Foster. This method returns true if the node is a Branch (i.e. an Element or Document) which contains at least one node.

getPath(Element context)

getUniquePath(Element context)

These new methods allow paths and unique paths to be created relatively. Previously both getPath() and getUniquePath() would create absolute XPath expressions. These new methods allow relative path expressions to be created by providing an ancestor Element from which to make the path expression. This method was suggested by Chris Nokleberg.

Patches and bug fixes

Fixed bug found by Chris Nokleberg when using the UserDataElement that the clone() and createCopy() methods were not correctly copying the user data object. A JUnit test case has been added that tests this fix (org.dom4j.TestUserData). If any deep copying of user data objects is required then UserDataElement now has a method getCopyOfUserData() which can be overloaded to perform a deep copy of any user data objects if required.

Minor patch for dom4j implementors wishing to create their own QName implementations. Previously the DocumentFactory class was hardwired to use QNameCache internally which was hard wired to only create QName instances. Now some factory methods have been added such that you can derive your own DocumentFactory which uses your own QNameCache which creates your own QName classes.

If JAXP can be found in the CLASSPATH then it is now used first by the SAXReader to find the correct SAX parser class. We have found that sometimes (e.g. Tomcat 4.0 beta 6) the value of the org.xml.sax.driver system property is set to a class which is not in the CLASSPATH but a valid JAXP parser is present. So now we try JAXP first, then the SAX system property then if all else fails we use the bundled Aelfred SAX parser.

Fixed XPath bug found by James Elson that the path /foo[@a and @b] or /foo[@a='1' and @b='2'] was no longer working correctly. This is now fixed and many tests of this nature have been added to the JUnit test harness.

Fixed some namespace related bugs found by Steen Lehmann. It appears that for a document of:-

<a xmlns="dummyNamespace">
  <b>
    <c>Hello</c>
  </b>
</a>

Then the path /a/b/c will not find anything - this is correct according to the XPath spec. Instead the path /*[name()='a']/*[name()='b']/*[name()='c'] is required. These changes have been applied to getPath() and getUniquePath() such that these methods now work, irrespectively of the namespaces used in a document. Finally many new test cases have been added to validate a variety of XPath expressions with various uses of namespaces.

SAXWriter now fully supports the SAX feature "http://xml.org/sax/features/namespace-prefixes". Failure to support this feature properly was causing problems when outputting a dom4j Document using JAXP - the namespace declarations often did not appear correctly.

Patched bug in XMLWriter which caused multiple duplicate namespace declarations to sometimes appear.

0.7 release

Integration with SAXPath

The SAXPath project is a Simple API for XPath parsing. Its analogous to SAX in that the API abstracts away the details of parsing XPath expressions and provides a simple event based callback interface.

Originally dom4j was using a parser generated via the Antlr tool which resulted in a considerably larger code base. Now dom4j uses SAXPath for its XPath parsing which results in faster XPath parsing and a much smaller code base.

The dom4j.jar is now about 100 Kb smaller! Also several XPath related bugs are now fixed. For example the numeric paths like '2 + count(//foo)' are now working.

Patches and bug fixes

Fixed bug found by Tobias Rademacher that XML Schema Data Type support wasn't working correctly when the XSD document used a namespace prefix. The bug was hidden by a further bug in the JUnit test case that was not correctly testing this case. Both these bugs are now fixed.

Fixed bug found by Piero de Salvia that some invalid XPath expressions were not correctly throwing exceptions. Now any attempt to use any invalid XPath expressions should result in an InvalidXPathException being thrown.

Applied patch submitted by Theodor Schwarzinger that fixes the preceding-sibling and preceding axes.

Fixed bug found my James Elson that the normalize() method was being quite agressive and removing all text nodes! New JUnit test case added to ensure this doesn't break again.

Improved the setContent() semantics on Branch (and so Element and Document) such that the parent and document relationships are correctly removed for old content and added for new content. As a helper method, the setContent() method will clone any content nodes which are already part of an existing document. So for example the following code will clone the content of a document.

    Document doc1 = ...;
    Document doc2 = DocumentHelper.createDocument();  
    doc2.setContent( doc1.content() );

Though this behaviour is much more useful when used with elements...

    Element sourceElement;
    Element destElement;

    // copy the content of sourceElemenet
    destElement.setContent( sourceElement.content() );

0.6 release

Serialization support added

Support has been added for Java Serialization so dom4j documents can be serialized over RMI or EJB calls. Note that currently Serialization is much slower (by a factor of 2-5 times) than using the textual format of XML so we recommend sending XML text over RMI rather than serialization if possible. Over time we will tune the serialization implementation to be at least as fast as using the text format (even if that means under the covers we just use the text format).

Patches and bug fixes

Fixed bug in XPath engine found by Christophe Ponsard for paths of the form /* which were not finding anything. Now we have an extensible XPath test harness (in src/test/org/dom4j/TestXPathExamples.java) which contains some test cases for these kinds of paths. We can extend these cases to test other XPath expressions easily.

Fixed bug in elementByID() method found by Thomas Nichols that was resulting in the element not being found correctly.

Fixed bug in IndexedElement reported by Kerstin Grï¿½nefeld that was causing a null pointer exception when using XPath on an IndexedElement.

Applied the patch supplied by Mike Skells that fix problems with the getUniquePath() method not returning properly indexed elements

Applied a fix to the problem found by Dane Foster when using dom4j with JTidy. JTidy returns null for getLocalName() so DOMReader has been patched to handle nulls returned from either getLocalName() or getName().

Fixed bug reported anonymously to the Sourceforge Site here that explicitly creating a Document from an existing Element could cause problems when using XMLWriter.

Assorted performance tunings of SAX parsing, avoiding unnecessary repeated code paths.

Tidied factory and construction of Element code such that there are no longer dependencies on the SAX Attributes class. This was originally added as a performance enhancement, but after further refactoring this is now no longer needed. This makes the process of creating new Element derivations or DocumentFactory implementations easier.

0.5 release

NodeComparator available

For those wishing to do value based comparisons of Nodes, Element, Attributes, Documents or Document fragments there is a new NodeComparator class which implements the Comparator interface from the Java Collections Framework.

New helper method DocumentHelper.parseText()

A new helper method has been added for parsing text. For example:-

    Document document = DocumentHelper.parseText(
        "<team> <author>James</author> </team>"
    );

New Branch.normalize() method

The Branch interface (and so Document and Element interfaces) has a new normalize() method that has the same semantics as the same method in the DOM API to remove empty Text nodes and merge adjacent Text nodes.

Easier document building methods

A document can now be constructed more easily now that the addXXX() methods return a reference to the Document or Element which created them. An example is shown below

import org.dom4j.Document;
import org.dom4j.DocumentHelper;
import org.dom4j.Element;

public class Foo {

    public Document createDocument() {
        Document document = DocumentHelper.createDocument();
        Element root = document.addElement( "root" );

        Element author1 = root.addElement( "author" )
            .addAttribute( "name", "James" )
            .addAttribute( "location", "UK" )
            .addText( "James Strachan" );
        
        Element author2 = root.addElement( "author" )
            .addAttribute( "name", "Bob" )
            .addAttribute( "location", "Canada" )
            .addText( "Bob McWhirter" );

        return document;
    }
}

Note that the addElement() method returns the new child element not the parent element.

To promote consistency, the Element.setAttributeValue() method is now deprecated and should be replaced with Element.addAttribute().

Patches and bug fixes

Applied Theo's patch for cloning of Documents correctly together with JUnit test cases to ensure this keeps working.

Applied Rob Wilson's patch that NullPointerExceptions were being thrown if a Document is output with the XMLWriter and an attribute value is null.

Fixed problem found by Nicolas Fonrose that XPath expressions using namespace prefixes were not working correcty.

Fixed problem found by Thomas Nichols whereby default namespaces with no prefix were not being processed correctly. As a result of finding this bug we now have a rigorous JUnit round trip test harness in place which highlighted a number of issues with namespaces when round tripping from dom4j to SAX to DOM to Text and back again. These issues have now been fixed and should not show up again hopefully.

Fixed some detach() bugs that were found with Attributes.

Default encoding is now "UTF-8" rather than "UTF8". Thanks to Thomas Nichols for spotting that one. Also the default line seperator when using XMLWriter is now "\n" rather than "\r\n"

If an XMLWriter is used with an OutputStream then an explicit call to flush() is no longer required after calling write(Document)

Some housekeeping was performed in the naming of some implementation classes. The old XPathXXX.java classes in the org.dom4j.tree package where XXX = Attribute, CDATA, Comment, Entity, ProcessingInstruction and Text have been renamed to DefaultXXX and the corresponding DefaultXXX has been renamed to FlyweightXXX. This makes it clearer the purpose of these implementation classes. The default implementations of the leaf nodes are mutable but cannot be shared across elements. The FlyweightXXX implementations are immutable and can be shared across nodes and documents.

0.4 release

Enhanced event notification mechanism

A new enhanced event notification mechanism has been implemented by David White. Now you can register multple ElementHandler instances with a SAXReader object before you parse a document such that the different handlers are notified when different paths are reached.

The ElementHandler interface now has both onStart() and onEnd() allowing more fine grained control over when you are called and the ability to perform actions before or after the content for an Element is populated. The methods also take a reference to a ElementPath to allow more optimised and powerful access to the path to the specified document.

Early alpha release of XML Schema Data Type support

This release contains an alpha release of XML Schema Data Type support. The main class in question is the XML Schema Data Type aware DatatypeDocumentFactory which will create an XML Schema Data Type aware XML object model.

The getData() and setData(Object) methods on Attribute and Element allow access to the concrete data types such as Dates and Numbers.

Patches and bug fixes

Applied Theo's patch for the XPath substring function that was causing the incorrect string indexes to be returned. The substring now returns the correct answer.

Applied Theo's patch for incorrectly escaping of element text.

Fixed bug in the XPath engine for absolute path expressions which now work correctly when applied to leaf nodes.

Fixed bug in the name() and local-name() functions such that the following expressions now work fine local-name(..), name(parent::*).

A variety of minor performance tuning optimisations have been made.

0.3 release

The org.dom4j.io.OutputFormat class now has a new helper method to make it easier to create pretty print formatting objects. The new method is OutputFormat.createPrettyPrint(). So to pretty print some XML (trimming all whitespace and indenting nicely) the following code should do the job...

    OutputFormat format = OutputFormat.createPrettyPrint();
    XMLWriter writer = new XMLWriter( out, format );
    writer.write( document );
    writer.close();

SAXReader.read(String url) can now accept either a URL or a file name which makes things a little easier. The logic uses the existence of a ':' in the url String to determine if it should be treated as a URL or a File name.

For more explicit control over whether documents are Files or URLs call SAXReader.read(File file) or SAXReader.read(URL url)

A new extension function, matrix-concat() was submitted by James Pereira. By default, doing concat() functions in XPath the 'string-value' is taken for each argument. So for a document:-

<root project="dom4j">
    <contributor>James Pereira</contributor>
    <contributor>Bob McWhirter</contributor>
</root;>

Then the XPath

concat( 'thanks ', /root/contributor )

would return

"thanks James Pereira"

as the /root/contributor expression matches a node set of 2 elements, but the "string-value" takes the first elements text. Whereas matrix-contact will do a cartesian product of all the arguments and then do the concatenation of each combination of nodes. So

matrix-concat( 'thanks ', /root/contributor )

will produce

"thanks James Pereira"
"thanks Bob McWhirter"

The cartesian product is done such that multiple paths can be used.

matrix-concat( 'thanks ', /root/contributor, ' for working on ', '/@project' )

will produce

"thanks James Pereira for working on dom4j"
"thanks Bob McWhirter for working on dom4j"

Fixed bug where XMLWriter.write(Object) was not correctly writing a Document instance.

Finally, a couple of small issues with the build process have been fixed. The dom4j.jar no longer contains any SAX or DOM classes (they are all in dom4j-full.jar) And the Antlr grammar files for the XPath parser are now corrrectly included in the binary distribution.

0.2 release

There following new features were added:-

Clean integration with XSLT via JAXP / TrAX API.
New SAXValidator to allow validation on prebuild Document instances
XMLWriter and HTMLWriter rewritten so that they work at either the SAX level or the dom4j level. API much improved and more like Reader and Writer in the JDK.
API modified to avoid clashes with WC3 DOM such that a dual implementation of dom4j and DOM is now possible. An early alpha release of a DOM implementation of dom4j is available.
New sorting method added to Node for easier selections of nodes which are sorted via an XPath expression. The following code sorts all CUSTOMER elements by their name attributes and removes duplicates:-
```
Document document 
  = new SAXReader().read( new File( "customers.xml" ) );

List customers 
  = document.selectNodes( "//CUSTOMER", "@name", true );
              
```
The getText() and getStringValue() methods of Element now return the textual values of CDATA, Entity and Text nodes. The previous version only returned Text node values.
Refactored code and removed XPathEngine, XPathHelper and all the static newXXX() methods in DocumentFactory. Added equivalent methods to DocumentHelper and DocumentFactory.

This release also includes full XPath source code.

0.1 release

Initial release which comes complete with DOM, JAXP and SAX support and integrated XPath

To Do List

The internal subset does not pass through DOMReader and DOMWriter. This needs patching!
We should add support for Xerces XNI API via an XNIReader and XNIWriter. This would also allow dom4j users to make good use of the NekoHTML parser thats layered on top of XNI.
Better documentation and user guides
A lazy parser; implement a special Element implementation (or probably a special List) which allows the XPP (XML Pull Parser) to parse the document as it is navigated rather than all up front.
Build a dom4j validator based on top of Suns MSV library
Ensure that the optional DOM implementation passes the DOM compliance tests
Implement a ValidatingDocumentFactory and an EncodingDocumentFactory which can be used by developers where invalid strings may be specified allowing validation or encoding of names or text values to be done in one place for use across parsers or application code. This would avoid any performance hit by making this kind of validation the default behaviour.
Implement a canonical XML processor
Implement XML Signature
Implement XPointer, XLink and XInclude
Build a version of XMLC which uses the dom4j API rather than DOM which could also make use of XPath, XSLT and Java 2 Collections support.

Consider adding support for Java Generics such that typesafe Iterators can be used. For example

Iterator<Node> iter = element.nodeIterator();
while ( iter.hasNext() ) {
    Node node = iter.next();
}

Iterator<Element> iter2 = element.elementIterator( "foo" );
while ( iter2.hasNext() ) {
    Element foo = iter2.next();
}

Implement XSLT engine on top of dom4j?
XML Query implementation on top of dom4j?

Known problems

The following functions are not yet fully supported in the inbuilt XPath engine

id()
generate-id()
format-number()

The optional W3C DOM implementation of the dom4j API is not yet at full DOM compliance

Contributors

The following people have contributed to the dom4j project. Many thanks to you all!

James Strachan
Bob McWhirter
James Dodd
James Elson
Jakob Jenkov
James Pereira
David White
Tobias Rademacher
Rashmi Mathew
Jonathan Doughty
Joseph Bowbeer
Michal Palicka
Yuxin Ruan
Steen Lehmann
Maarten Coene
Stefan Graeber

Project Documentation

Legend

CVS

1.5 beta 2

1.5 beta 1

1.4 Release

1.3 Release

Patches

1.2 Release

New Swing TableModel for displaying XML

Registering Namespace URIs for XPath

Whitespace handling

Patches

1.1 Release

New features

Patches

1.0 release

New features

Patches

0.9 release

Full support for the Jaxen XPath engine

New features

Patches and bug fixes

0.8 release

New methods

Patches and bug fixes

0.7 release

Integration with SAXPath

Patches and bug fixes

0.6 release

Serialization support added

Patches and bug fixes

0.5 release

NodeComparator available

New helper method DocumentHelper.parseText()

New Branch.normalize() method

Easier document building methods

Patches and bug fixes

0.4 release

Enhanced event notification mechanism

Early alpha release of XML Schema Data Type support

Patches and bug fixes

0.3 release

0.2 release

0.1 release

To Do List

Known problems

Contributors