We sort email in Java

To my last project, written 80% in Java, I had to add a module - a parser of all letters passing through the server. The religious motives of the module are very strange, but I would like to share some details.

Available:

Postfix mail server with Dovecot delivery service on CentOS. Well, the JVM.

Message structure

What is an e-mail, its constituent parts, their approximate structure, headers and MIME types are humanly described on Wikipedia .
More interesting is the structure of ~~the~~ letter ~~file~~ name on the server. An example of the name of a new (not read / not requested by the client) letter:
')

1348142977.M852516P31269.mail.example.com,S=3309,W=3371

The name consists of flags. Flags are separated by commas; when creating a new letter, it indicates “where”, “when” the letter came and its dimensions.

Two letter sizes are indicated. The usual Size, denoted by "S" and Vsize, denoted by the symbol "W", which is rfc822.SIZE. (They answer the question “What is RFC822.SIZE?”).
The time is specified in Unix format, in seconds.
In one flag with time, through a dot, there can be a “P” - process ID and “M” - a counter in microseconds, added to make the name unique (there may be other attributes, additionally in the notes)
The server indicates the final one the one on which the letter is stored, and not the relay-server in case the letter was forwarded.

Of this, the time to create the letter was useful for me (the first ten numbers). However, often this time may differ from the time in the message header, so the time in the name I used only to filter messages in the directory.

Additional / client flags

The client mail interface (hereinafter referred to as the client) can add flags to the letter name. The beginning of client flags is indicated by the symbol ":"

As soon as the client ~~gets a~~ request for new letters from the server, a request is sent to the transport to move each of the requested letters to the “read” directory and add to the name of the information flag (one of the two) separated from the subsequent flags with a comma:

"1" - as the documentation says, "Flag, carrying an experimental meaning."
“2” is what I had in practice in 100% of cases. It means that each successive decimal is a separate flag.

Despite the fact that the letter on the server is already in the “read” folder, the user will see it as new, because customers read the flags, not the location of the letter.
That is, only when the user himself opens the letter (or another action with it) and the “S” (Seen) flag is added to its name, it becomes visually “read”. Different actions on the letter, as one would expect, add their own flags, see notes.

Example:
A new message has come to the server for our mailbox, its name will look something like the following:

 1348142977.M852516P31269.mail.example.com,S=3309,W=3371

~~God forbid~~ Outlook, which requests a list of new emails and tells us to move them on the server to the "read" directory, adding the flag:

 1348142977.M852516P31269.mail.example.com,S=3309,W=3371:2,

Next, we ~~delete~~ open Outlook and click on the new letter, while adding the S flag:

 1348142977.M852516P31269.mail.example.com,S=3309,W=3371:2,S

And then another answer to it and delete:

 1348142977.M852516P31269.mail.example.com,S=3309,W=3371:2,SRT

As we can see, flags are listed without separators.

Notes: some clients have the ability to customize (not) moving the letter to the "read" folder. Also, clients sometimes add flags not specified in the documentation “for their needs”, which I didn’t pay much attention to.
More useful information about flags: cr.yp.to/proto/maildir.html

And a little java

I used javax.mail to work with letters. We are kindly provided with the abstract class javax.mail.Message , although in this case I limited myself to javax.mail.MimeMessage .
The module rotates on the server, so we access the messages locally (checks and exception handling are omitted in the code):

 //   properties   Session session = Session.getDefaultInstance(System.getProperties()); FileInputStream fis = new FileInputStream(pathToMessage); MimeMessage mimeMessage = new MimeMessage(session, fis);

Now we can count the headers of the letter, which are expected in ASCII. If the title is not found, then null will be returned to us. For example:

 String messageSubject = mimeMessage.getSubject(); String messageId = mimeMessage.getMessageID();

To determine the list of recipients, we are provided with the getRecipients method, which takes as a argument Message.RecipientType. The method returns an array of objects of type Address . For example, we list the recipients of the letter:

 for(Address recipient : mimeMessage.getRecipients(Message.RecipientType.TO)){ System.out.println(recipient.toString()); }

To find out the sender (s) of the letter, we have a getFrom method. Also returns an array of objects of type Address. The method reads the “From” header, if it is absent - reads the “Sender” header, if it is absent and “Sender” - then null.

 for(Address sender : mimeMessage.getFrom()){ System.out.println(sender.toString()); }

Next we analyze the message body (in most cases we need text and attachments). It can be composite (Mime multipart message), or it can contain only one text / plain format block. If the body of the letter consists only of an attachment (without text), it is still marked as a multipart message. According to RFC822, the format is specified for the message body (and its parts) in the Content-Type header.

  //        if(mimeMessage.isMimeType("multipart/mixed")){ // getContent()    ,   . //   - Object,    Multipart Multipart multipart = (Multipart) mimeMessage.getContent(); //       for(int i = 0; i < multipart.getCount(); i ++){ BodyPart part = multipart.getBodyPart(i); // html-   , "text/plain"  "text/html" (     html ),       : if(part.isMimeType("text/plain")){ System.out.println(part.getContent().toString()); } //    part  else if(Part.ATTACHMENT.equalsIgnoreCase(part.getDisposition()){ //     .    ,  decode String fileName = MimeUtility.decodeText(part.getFileName()); //  InputStream InputStream is = part.getInputStream(); //    ,  -    .... } } } //          else if(mimeMessage.isMimeType("text/plain")){ System.out.println(mimeMessage.getContent().toString()); }

That's all. Hope the material can be useful.
Also on oracle.com there is a useful FAQ on javax.mail.

UPD: As stated in the first comment, parts of the message body can be nested inside each other. There, in the comments, there are two ways to sort them out.

Source: https://habr.com/ru/post/153415/

All Articles

We sort email in Java

Available:

Message structure

Additional / client flags

And a little java

More articles: