In addition to the usual bug fixes, the new version of the library contains a large number of changes related to parsing and processing MS Word files.
Most of the bugs (47287, 47287, 47563, 47731, 49933, 51604) were closed, and new features appeared:
- added support for reading footnotes (footnotes, endnotes);
- added support for reading internal links (bookmarks);
- added support for images saved as OfficeDrawing (vector images);
- fixed processing of nested tables;
- extended support for character and paragraph properties.
All these minor changes were made in order to add new functionality, namely three classes, which, judging by the mailing list, have already proved useful for some library users:
- Word-to-HTML Converter , which converts a Word document to HTML, can even be with pictures;
- Word-to-Text Converter , which is a replacement for the past WordExtractor, correctly processing embedded OLE-documents, paragraphing, field codes (including hyperlinks);
- A Word-to-FO Converter that converts a Word document to an XSL FO file is also possible with pictures. Then this file can be transferred to Apache FOP for processing to get PDF from Word.
Let me remind you, all this is done in pure Java, without the use of additional packages like OpenOffice / LibreOffice,
JOD Converter or the like. If with JOD Converter the conversion speed to HTML takes 2-3 seconds, then with new converters this can be done in tens of milliseconds.
Also in the new version there is Excel-to-HTML Converter, and in the future beta5 will include Excel-to-FO Converter. If you have ideas, add-ons (patches) or you just want to tell how your company uses this library, go to the home page:
http://poi.apache.org/ . You can also download the latest version.