Releasing version 5.4.0 of iText, XML Worker and RUPS

Thirteen years ago, on February 14th of the magical year 2000, I published version 0.30 of a library I had been writing in my spare time. This library allowed developers to enhance their applications with simple PDF generation functionality without having to know anything about PDF syntax.
Being a fan of Donald Knuth, I was looking for a name that sounded like TeX, but that was different enough for people not to confuse it with TeX. As the first versions of my library were only able to process text —images weren't supported until the summer of 2000—, I experimented with variations on the words TeX or TeXt. At that time, everything was "e-": e-mail, e-marketing, e-Business,... My first idea was to call my library "eTeXt", but I didn't like the sound of that word, so I changed it into iText.
I often get the question if I was inspired by Apple's product line, but I've never been a Mac-user, so I didn't think of the iMac (1998), and the other devices that made the "i-" popular are from a much later date: the iPod (2001), iPhone (2007) and iPad (2010).

Tomorrow, exactly 13 years after the first use of the name iText and the first official iText release, I'm releasing iText 5.4.0.

What's new in this release?

We have done great effort to offer support for PDF/UA. UA stands for Universal Accessibility and it means you make your documents accessible for blind and visually impaired users. There's still some work to do, but with tomorrow's release, you'll be able to create a PDF/UA compliant PDF file "out of the box" by following these instructions:

One of the main requirements of PDF/UA is that the PDF needs to be Tagged. You can achieve this with the PdfWriter.setTagged() method before opening the Document. This method sets a tagged flag that instructs iText to preserve the order of the content of all the high-level objects that are added to the Document. At the same time, iText will also create an appropriate structure tree. In this structure tree, the type of high-level object (Paragraph, PdfPTable, List,...) will determine the "role" of the structure element. You can programmatically change the default value in every implementation of the IAccessibleElement interface by using the setRole() method. As long as you stick to using high-level objects, your document will be tagged correctly.

For the PDF to be compliant with ISO 14289 (the PDF/UA standard), you also need to use the following methods:

  • Document.addTitle()— a method you use to give the document a title,
  • PdfWriter.setViewerPreferences(PdfWriter.DisplayDocTitle)— a method that makes sure the document title is shown in the viewer,
  • Document.addLanguage()— a method you should use to indicate which language is used in the document,
  • PdfWriter.createXmpMetadata()— a method that creates an XMP stream and adds this stream as document-level metadata.

Now when you run the resulting PDF through an accessibility checker, you'll see that it conforms with the PDF/UA standard.

This is what we call a "minimum implementation" of PDF/UA. In the future, we'll try to improve the API. For instance: in PDF 2.0, metadata will no longer be stored in an Info dictionary. This concept will be deprecated in favor of using an XMP stream. It probably makes sense for us to always create an XMP stream instead of keeping the createXmpMetadata() method optional. We also may choose to set the viewer preference by default for Tagged PDFs, and to throw an exception if you create a Tagged PDF without setting a title or defining a language.

Other PDF/UA functionality includes the manipulation of existing documents: we can already split and merge Tagged PDF documents without breaking the structure. In the near future, we'll take a closer look at filling out PDF/UA forms, maintaining their PDF/UA status. We've also been improving iText's text extraction capabilities.
Whereas the series of 5.3.x releases mainly brought new digital signature functionality, the 5.4.x will bring more functionality for structured PDF. This includes Level A support for PDF/A. Better support for structured/unstructured documents is one of the main goals on our technical roadmap for 2013.

This doesn't mean we've stopped working on digital signatures, which was one of our major goals for 2012 (resulting in a 150-page book about PDF and digital signatures). In iText 5.4.0, we're switching from BouncyCastle 1.47 to BouncyCastle 1.48, we've fixed a problem with the creation of OCSP responses for use in a Document Security Store, we added a missing OID for an RSA algorithm, and so on.

Important: Upgrading from iText 5.3.5 to iText 5.4.0 is also a must because our major I/O changes introducing a new package, caused a problem when using multiple threads: embedding fonts was no longer multi-threaded. We've fixed this problem.

Surprisingly, we've also discovered some bugs that have been present in iText for a long time, but that surfaced recently:

  • When prefilling an AcroForm template, Adobe Reader would ask the end user if he wants to save the document, even if the end user didn't change anything.
  • Pre-filled fields were disappearing when filling out an AcroForm template.
  • Changing the font in an AcroForm template didn't work in case the optional /DR entry was missing in the form.
  • PdfSmartCopy caused an OutOfMemoryException when it encountered a PDF with circular references (an object A referring to an object B that refers to object A).
  • A Chunk object was able to change the size of an Image which caused strange side-effects when the Image was used in a different context.

The fact that these old bugs (some of which date from almost 10 years ago) are now surfacing probably means that more and more people are using iText in ways I've never imagined before. Back in 2000, I had no idea I would be announcing exciting new iText functionality on the library's 13th birthday!

Looking forward: what will the next release bring us? Apart from further PDF/UA improvements, we're also working on better support for ligatures. This would finally allow us to create documents in Hindi and other Indic languages. Rest assure: we won't wait until iText's 14th birthday to release this interesting functionality!


See also Bruno's profile on Google+