Search Apache POI

Apache POI - Component Overview

Apache POI Project Components

POIFS for OLE 2 Documents

POIFS is the oldest and most stable part of the project. It is our port of the OLE 2 Compound Document Format to pure Java. It supports both read and write functionality. All of our components ultimately rely on it by definition. Please see the POIFS project page for more information.

HSSF and XSSF for Excel Documents

HSSF is our port of the Microsoft Excel 97(-2007) file format (BIFF8) to pure Java. XSSF is our port of the Microsoft Excel XML (2007+) file format (OOXML) to pure Java. SS is a package that provides common support for both formats with a common API. They both support read and write capability. Please see the HSSF+XSSF project page for more information.

HWPF and XWPF for Word Documents

HWPF is our port of the Microsoft Word 97 (-2003) file format to pure Java. It supports read, and limited write capabilities. It also provides simple text extraction support for the older Word 6 and Word 95 formats. Please see the HWPF project page for more information. This component remains in early stages of development. It can already read and write simple files.

We are also working on the XWPF for the WordprocessingML (2007+) format from the OOXML specification. This provides read and write support for simpler files, along with text extraction capabilities.

HSLF and XSLF for PowerPoint Documents

HSLF is our port of the Microsoft PowerPoint 97(-2003) file format to pure Java. It supports read and write capabilities. Please see the HSLF project page for more information.

We are also working on the XSLF for the PresentationML (2007+) format from the OOXML specification.

HPSF for OLE 2 Document Properties

HPSF is our port of the OLE 2 property set format to pure Java. Property sets are mostly use to store a document's properties (title, author, date of last modification etc.), but they can be used for application-specific purposes as well.

HPSF supports both reading and writing of properties.

Please see the HPSF project page for more information.

HDGF for Visio Documents

HDGF is our port of the Microsoft Viso 97(-2003) file format to pure Java. It currently only supports reading at a very low level, and simple text extraction. Please see the HDGF project page for more information.

HPBF for Publisher Documents

HPBF is our port of the Microsoft Publisher 98(-2007) file format to pure Java. It currently only supports reading at a low level for around half of the file parts, and simple text extraction. Please see the HPBF project page for more information.

HMEF for TNEF (winmail.dat) Outlook Attachments

HMEF is our port of the Microsoft TNEF (Transport Neutral Encoding Format) file format to pure Java. TNEF is sometimes used by Outlook for encoding the message, and will typically come through as winmail.dat. HMEF currently only supports reading at a low level, but we hope to add text and attachment extraction. Please see the HMEF project page for more information.

HSMF for Outlook Messages

HSMF is our port of the Microsoft Outlook message file format to pure Java. It currently only some of the textual content of MSG files, and some attachments. Further support and documentation is coming in slowly. For now, users are advised to consult the unit tests for example use. Please see the HSMF project page for more information.

Microsoft has recently added the Outlook file format to its OSP. More information is now available making implementing this API an easier task.

What is it?

The Apache POI project is the master project for developing pure Java ports of file formats based on Microsoft's OLE 2 Compound Document Format. OLE 2 Compound Document Format is used by Microsoft Office Documents, as well as by programs using MFC property sets to serialize their document objects.

Apache POI is also the master project for developing pure Java ports of file formats based on Office Open XML (ooxml.) OOXML is part of an ECMA / ISO standardisation effort. This documentation is quite large, but you can normally find the bit you need without too much effort! ECMA-376 standard is here, and is also under the Microsoft OSP.

Component Map

The Apache POI distribution consists of support for many document file formats. This support is provided in several Jar files. Not all of the Jars are needed for every format. The following tables show the relationships between POI components, Maven repository tags, and the project's Jar files.

Component Application type Maven artifactId Notes
POIFS OLE2 Filesystem poi Required to work with OLE2 / POIFS based files
HPSF OLE2 Property Sets poi  
HSSF Excel XLS poi For HSSF only, if common SS is needed see below
HSLF PowerPoint PPT poi-scratchpad  
HWPF Word DOC poi-scratchpad  
HDGF Visio VSD poi-scratchpad  
HPBF Publisher PUB poi-scratchpad  
HSMF Outlook MSG poi-scratchpad  
OpenXML4J OOXML poi-ooxml plus either poi-ooxml-schemas or
ooxml-schemas and ooxml-security
see below for differences
XSSF Excel XLSX poi-ooxml  
XSLF PowerPoint PPTX poi-ooxml  
XWPF Word DOCX poi-ooxml  
Common SS Excel XLS and XLSX poi-ooxml WorkbookFactory and friends all require poi-ooxml, not just core poi


This table maps artifacts into the jar file name. "version-yyyymmdd" is the POI version stamp. For the latest stable release it is 3.12-20150511. For the latest beta release it is 3.12-beta1-20150228.

Maven artifactId Prerequisites JAR
poi commons-logging, commons-codec, log4j poi-version-yyyymmdd.jar
poi-scratchpad poi poi-scratchpad-version-yyyymmdd.jar
poi-ooxml poi, poi-ooxml-schemas poi-ooxml-version-yyyymmdd.jar
poi-ooxml-schemas xmlbeans poi-ooxml-schemas-version-yyyymmdd.jar
poi-examples poi, poi-scratchpad, poi-ooxml poi-examples-version-yyyymmdd.jar
ooxml-schemas xmlbeans ooxml-schemas-1.1.jar
ooxml-security xmlbeans
For signing: bcpkix-jdk15on, bcprov-jdk15on, xmlsec, slf4j-api
ooxml-security-1.0.jar

 

poi-ooxml requires poi-ooxml-schemas. This is a substantially smaller version of the ooxml-schemas jar (ooxml-schemas-1.1.jar for POI 3.7 or later, ooxml-scheams-1.0.jar for POI 3.5 and 3.6). The larger ooxml-schemas jar is normally only required for development. Similar to the ooxml-schemas jar, the ooxml-security jar contains all generated classes for the encryption and signing schemas and is subsetted in poi-ooxml-schemas.

The OOXML jars require a stax implementation, but now that Apache POI requires Java 6, that dependency is provided by the JRE and no additional stax jars are required. The OOXML jars used to require DOM4J, but the code has now been changed to use JAXP and no additional dom4j jars are required. By the way, look at this FAQ if you have problems when using a non-Oracle JDK.

The ooxml schemas jars are compiled with Apache XMLBeans 2.3, and so can be used at runtime with any version of XMLBeans from 2.3 or newer. Wherever possible though, we recommend that you use XMLBeans 2.6.0 with Apache POI, and that is the version now shipped in the binary release packages.

Examples

Small sample programs using the POI API are available in the src/examples directory of the source distribution.

POI Browser

The POI Browser is a very simple Swing GUI tool that displays the internal structure of a Microsoft Office file and especially the property set streams. Further information and instructions how to execute it can be found in the POI source code.

All of the examples are inclided in POI distributions as a poi-examples artifact.

Contributed Software

Besides the "official" components outlined above there is some further software distributed with POI. This is called "contributed" software. It is not explicitly recommended or even maintained by the POI team, but it might still be useful to you.

See POI Ruby Bindings and other code in the poi-contrib module

by Andrew C. Oliver, Rainer Klute, David Fisher