Dom4J performance versus Xerces / Xalan

Martin Böhm, Jean-Jacques Dubray

Eigner Precision Lifecycle Management

www.eigner.com

Introduction

We have created a simple test bed to evaluate the performance of DOM4J versus Xerces/Xalan. These results are intended to give a rough idea rather than exhaustive test suite. In particular we focus our study on XML document which look like database result set. It is pretty clear that performance results may vary greatly based on the topology of your XML.

The test was designed with two topologies in mind: 

a) to have elements only and each element name is unique in the whole document.

<?xml version="1.0" encoding="UTF-8"?>

<ItemResultSet>

<Item>

<Attr0x0>123456789</Attr0x0>

<Attr1x0>123456789</Attr1x0>

<Attr2x0>123456789</Attr2x0>

<Attr3x0>123456789</Attr3x0>

<Attr4x0>123456789</Attr4x0>

<Attr5x0>123456789</Attr5x0>

<Attr6x0>123456789</Attr6x0>

<Attr7x0>123456789</Attr7x0>

<Attr8x0>123456789</Attr8x0>

<Attr9x0>123456789</Attr9x0>

<Attr10x0>123456789</Attr10x0>

<Attr11x0>123456789</Attr11x0>

<Attr12x0>123456789</Attr12x0>

<Attr13x0>123456789</Attr13x0>

...

</Item>

<Item>

<Attr0x1>123456789</Attr0x1>

<Attr1x1>123456789</Attr1x1>

<Attr2x1>123456789</Attr2x1>

...

</ItemResultSet>

b) To use attributes only

<?xml version="1.0" encoding="UTF-8"?>

<ItemResultSet>

<Item guid="0" Attr0="123456789" Attr1="123456789" .../>

<Item guid="1" Attr0="123456789" Attr1="123456789" .../>

</ItemResultSet>

 

 

We have tested for 1000,100,10,1 items the time it takes to:

All tests are running on my lapdog (PIII, 500MHz, 512Mb) We allocate a heap size of 256 Mb when we start the test.

 

Results

case a), comparison between Dom4j/Jaxen versus Xerces/Xalan

All times in ms
Create Document Write Document to disk Reparse the document from disk
Items dom4j xalan dom4j xalan dom4j xalan
1000 641.0 571.0 531 852 2020 2664
100 9.0 20.0 60 61 62.99 68.6
10 0.7 1.0 10 10 11.92 14.62
1 0.1 0.0 10 10 8.01 8.31

 

The most surprising result comes from executing XPath statements. Xalan does warn us in the JavaDoc that things could be a little slow.

selectSingleNode()

All times in ms 1000 Items in the document
dom4j Xalan
/*/*/Attr1x1 127 10
/*/*/Attr1x500 20 661
/*/*/Attr1x999 23 1662
100 Items in the document
dom4j  Xalan
/*/*/Attr1x1 2 3.0
/*/*/Attr1x50 3 13.0
/*/*/Attr1x99 2 55.1

 

selectNodes()

All times in ms 1000 Items in the document
dom4j Xalan
/*/*/Attr1x1 16.6 1633
/*/*/Attr1x500 20 1772
/*/*/Attr1x999 20.0 1733
/*/Item 2.0 1742
100 Items in the document
dom4j Xalan
/*/*/Attr1x1 1.0 35.0
/*/*/Attr1x50 2.0 36.1
/*/*/Attr1x99 1.0 49.0
/*/Item 0.2 49.0

 

selectNodes()

All times in ms /*/*/Attr1x500
dom4j  Xalan
1000 20.0 1793
100 2.0 36.1
10 0.1 11.72
1 0.1 4.3

 

case b), we use Dom4j/Jaxen and compare how it behaves with a document that contains only element elements versus document that model the same data as attribute attributes

All times in ms
Create Document Write Document to disk Reparse the document from disk
Items dom4j - elements dom4j - attrs dom4j - elements dom4j - attrs dom4j - elements dom4j - attrs
1000 641.0 100 531 140 2020 207
100 9.0 8.0 60 20 62.99 24
10 0.7 0.9 10 10 11.92 8.31
1 0.1 0.1 10 10 8.01 6.81

 

The most surprising result comes from executing XPath statements. Xalan does warn us in the JavaDoc that things could be a little slow.

selectSingleNode()

All times in ms 1000 Items in the document
Dom4j - elements dom4j - attrs
/*/*/Attr1x1 127 36
/*/*/Attr1x500 20 33
/*/*/Attr1x999 23 37
100 Items in the document
Dom4j - elements dom4j - attrs
/*/*/Attr1x1 2 4
/*/*/Attr1x50 3 4
/*/*/Attr1x99 2 4

 

selectNodes()

All times in ms 1000 Items in the document
dom4j - elements dom4j - attrs
/*/*/Attr1x1 16.6 36.6
/*/*/Attr1x500 20 36.6
/*/*/Attr1x999 20.0 36.6
/*/Item 2.0 1.7
100 Items in the document
Dom4j - elements dom4j - attrs
/*/*/Attr1x1 1.0 3.0
/*/*/Attr1x50 2.0 4.1
/*/*/Attr1x99 1.0 4
/*/Item 0.2 0.2

 

selectNodes()

All times in ms /*/*/Attr1x500
dom4j - elements dom4j - attrs
1000 20.0 36.6
100 2.0 4.1
10 0.1 0.4
1 0.1 0.1

 

c) We also run a simple XSLT test which took the first XML formant (elements) and transformed it to the second format (attr) or conversely.

All times in ms. dom4j el -> attr dom4j attr -> el Xalan el -> attr
10000 12558  10044  12338
1000 1181 874 1913
100 98 83 123
10 12 11 20
1 3 4 10

Conclusion

These number suggest one should use the XPathAPI class of Xalan with great caution, if at all

The syntax of Xpath statements must be chosen carefully. Contrary to some belief, and of the topology of our XML format, using /*/* or // was most efficient compared to the absolute path /ItemResultSet/Item

It appears more efficient to use selectNodes with Dom4j even if one needs a single node.

With DOM4J, it is about twice as fast when running XPath against a document which contains elements vs attributes.

In our case, we found that Dom4j is faster than Xalant for XSLT transformations. We do not claim this is  a general result, but rather a datapoint

Resources

Here's the source code and data for these tests. Try them for yourself

PerfDOM4J.java
PerfDOM4JAttr.java
PerfW3C.java
item.xslt
w3c_100.xml