Archiving your mailbox

Have you ever thought it would be a good idea to archive your mailbox in the form of a PDF file? I have. There’s always plenty of mails I want to save for further reference, but if I leave them on the server, I risk ‘mailbox quota exceeded’ problems; if I download them to a local folder using a mail client, I need a user-friendly way to browse through these mails if I change to a new PC, another operating system, another mail client.
Wouldn’t it be great to have a PDF file containing the mails I want to archive? Of course: it wouldn’t be much help if I put all these mails in a traditional PDF file; with all the mails one after the other, sorted by Date: (maybe with the most recent one first), or by From: alphabetically. We’d need a PDF where all the mails can be sorted, searched, printed more or less the way you do it in a mail client.
Well, this is possible if you use portable collections.

Your mailbox as a portable collection

Before you look at an example, I have an important remark: the mailbox I use in this example is not your average mailbox. It has some plain text mails, some mails with attachments, some mails in HTML. All these mails can be converted to PDF using iText. However, if you take your own mailbox, you will find out that iText has problems with some specific mails. The first issue that comes to mind are mails using HTML tags or attributes that aren’t supported in iText. These could cause a RuntimeException to be thrown. I hope people will find this example interesting enough to help us with the further development of classes such as HTMLWorker so that we can build a full blown mailbox parser.
With this in mind, let’s have a look at the mailbox archive I made from one of my mailboxes. Each mail is added to this collection as an embedded file. The overview lists the Date:, From:, To:, and Subject: taken from the mail header.

Cover Sheet

The cover sheet of this collection can be a single page with some metadata about the archive (the name of the mailbox, the date, the theme of the mails,...). In this example, you choose to make a PDF document with one table per mail. This table lists the Date:, From:, To:, and Subject: headers. itext.ugent.be_wiki_images_collections_mail_coversheet.jpg The downside of having such a cover page, is that you can’t change the order of the metadata of these mails. This makes it difficult to search for a specific mail. That’s why it’s handy to have the most important metadata listed in the overview. In this example, your mail is sorted by Date:, but with a simple click in the collection items pane, you can sort them by Subject: alphabetically; or by any other of the collection items in our overview. itext.ugent.be_wiki_images_collections_mail_collectionitems.jpg There are two ways to access an individual mail.

  1. by clicking on one of the ‘Read this mail’ links on the cover page. To construct this link, you used a GoToE action.
  2. by clicking one a row in the collection overview pane.

In this example, different types of mail messages were added: plain text mails, HTML text mails, and mails with multipart data. Let’s have a look at the differences.

Different Content Types

The most simple type of mail, is the plain text mail (MIME-type text/plain). The plain text is rendered to PDF in a very easy way. itext.ugent.be_wiki_images_collections_mail_embeddedgoto.jpg Note that you start each PDF with a table with some reduced metadata. Next to this metadata, you added a little pushpin. By clicking on this pushpin, the end user can open a text file (for instance in Notepad) with the complete mail header: itext.ugent.be_wiki_images_collections_mail_headers.jpg Another fairly simple MIME-type is text/html: itext.ugent.be_wiki_images_collections_mail_html.jpg Unfortunately iText only supports a limited set of HTML tags, so converting an HTML mail to PDF won’t work on most of these mails.
Finally you added a mail with multipart data: itext.ugent.be_wiki_images_collections_mail_multipart.jpg The mail attachments are added as file attachments to the PDF. You can open them by clicking on the pushpin. This is the text/plain attachment of the PDF representing the multipart mail: itext.ugent.be_wiki_images_collections_mail_attachment.jpg I won’t discuss this examples into as much detail as I did with the Kubrick Collection, I’ll simple give you the source code and the resource used to produce the mailbox.pdf.

Parsing your mailbox

If you want to reproduce this example, you’ll need a mailbox file, more specifically a mailbox file that doesn’t have any ‘special’ mails that can’t be parsed using my very simple mailbox parser. This is the mailbox I used: box.
The source code can be found in the package com.lowagie.collections.mail.
MailboxParser is the main class. In method createMailCollection, you recognize a lot of the functionality that was discussed in the Kubrick example: PdfCollection, PdfCollectionSchema, PdfCollectionField, PdfCollectionItem, and so forth. Instead of reading data from a database file, we read the mailbox file. The actual parsing is done by class MailParser. Every time MailboxParser encounters a new mail, a new instance of MailParser is created. The code in class MailParser is very specific, and it goes beyond the topic of this Wiki page about portable collections. If you’re a mail specialist, you are always welcome to improve this code.
Observe that there’s also the class Base64. It was written by Rober Harder who placed it in the Public Domain. This code is used to decode Base64 encoded mail attachments.

Conclusion

This example demonstrates an interesting and highly useful application of portable collections. I really would like to have a tool that is able to parse my mailbox into a portable collection. Unfortunately, this example is only a first hint at a full blown mail parser. There’s still a lot of work to be done, if you want to use this in a production environment. I hope this example inspires some developers to enhance this example and integrate it into online mail services and other mail applications.

 
Back to top
collections/mail.txt · Last modified: 2007/04/11 14:12 by root