To my last project, written 80% in Java, I had to add a module - a parser of all letters passing through the server. The religious motives of the module are very strange, but I would like to share some details.
Available:
Postfix mail server with Dovecot delivery service on CentOS. Well, the JVM.
Message structure
What is an e-mail, its constituent parts, their approximate structure, headers and MIME types are humanly described on
Wikipedia .
More interesting is the structure of
the letter
file name on the server. An example of the name of a new (not read / not requested by the client) letter:
')
1348142977.M852516P31269.mail.example.com,S=3309,W=3371
The name consists of flags. Flags are separated by commas; when creating a new letter, it indicates “where”, “when” the letter came and its dimensions.
- Two letter sizes are indicated. The usual Size, denoted by "S" and Vsize, denoted by the symbol "W", which is rfc822.SIZE. (They answer the question “What is RFC822.SIZE?”).
- The time is specified in Unix format, in seconds.
- In one flag with time, through a dot, there can be a “P” - process ID and “M” - a counter in microseconds, added to make the name unique (there may be other attributes, additionally in the notes)
- The server indicates the final one the one on which the letter is stored, and not the relay-server in case the letter was forwarded.
Of this, the time to create the letter was useful for me (the first ten numbers). However, often this time may differ from the time in the message header, so the time in the name I used only to filter messages in the directory.
Additional / client flags
The client mail interface (hereinafter referred to as the client) can add flags to the letter name. The beginning of client flags is indicated by the symbol ":"
As soon as the client
gets a request for new letters from the server, a request is sent to the transport to move each of the requested letters to the “read” directory and add to the name of the information flag (one of the two) separated from the subsequent flags with a comma:
- "1" - as the documentation says, "Flag, carrying an experimental meaning."
- “2” is what I had in practice in 100% of cases. It means that each successive decimal is a separate flag.
Despite the fact that the letter on the server is already in the “read” folder, the user will see it as new, because customers read the flags, not the location of the letter.
That is, only when the user himself opens the letter (or another action with it) and the “S” (Seen) flag is added to its name, it becomes visually “read”. Different actions on the letter, as one would expect, add their own flags, see notes.
Example:
A new message has come to the server for our mailbox, its name will look something like the following:
1348142977.M852516P31269.mail.example.com,S=3309,W=3371
God forbid Outlook, which requests a list of new emails and tells us to move them on the server to the "read" directory, adding the flag:
1348142977.M852516P31269.mail.example.com,S=3309,W=3371:2,
Next, we
delete open Outlook and click on the new letter, while adding the S flag:
1348142977.M852516P31269.mail.example.com,S=3309,W=3371:2,S
And then another answer to it and delete:
1348142977.M852516P31269.mail.example.com,S=3309,W=3371:2,SRT
As we can see, flags are listed without separators.
Notes: some clients have the ability to customize (not) moving the letter to the "read" folder. Also, clients sometimes add flags not specified in the documentation “for their needs”, which I didn’t pay much attention to.
More useful information about flags:
cr.yp.to/proto/maildir.html
And a little java
I used
javax.mail to work with letters. We are kindly provided with the abstract class
javax.mail.Message , although in this case I limited myself to
javax.mail.MimeMessage .
The module rotates on the server, so we access the messages locally (checks and exception handling are omitted in the code):
Now we can count the headers of the letter, which are expected in ASCII. If the title is not found, then null will be returned to us. For example:
String messageSubject = mimeMessage.getSubject(); String messageId = mimeMessage.getMessageID();
To determine the list of recipients, we are provided with the getRecipients method, which takes as a argument Message.RecipientType. The method returns an array of objects of type
Address . For example, we list the recipients of the letter:
for(Address recipient : mimeMessage.getRecipients(Message.RecipientType.TO)){ System.out.println(recipient.toString()); }
To find out the sender (s) of the letter, we have a getFrom method. Also returns an array of objects of type Address. The method reads the “From” header, if it is absent - reads the “Sender” header, if it is absent and “Sender” - then null.
for(Address sender : mimeMessage.getFrom()){ System.out.println(sender.toString()); }
Next we analyze the message body (in most cases we need text and attachments). It can be composite (Mime multipart message), or it can contain only one text / plain format block. If the body of the letter consists only of an attachment (without text), it is still marked as a multipart message. According to RFC822, the format is specified for the message body (and its parts) in the Content-Type header.
That's all. Hope the material can be useful.
Also on oracle.com there is a useful
FAQ on javax.mail.
UPD: As stated in the first comment, parts of the message body can be nested inside each other. There, in the comments, there are two ways to sort them out.