📜 ⬆️ ⬇️

To the issue of bicycle engineering in the field of electromail posting

By the will of fate there is in my care a mail server. Small, ~ 20 users. It works stably, software change is undesirable. And it would not be necessary, but once the backup logs hinted unambiguously - if you continue in the same vein, a full backup will go all night. And the point - in the volume of user mailboxes.


The problem is indicated, it is necessary to solve. The way ahead is to buy iron more powerful - not in my taste, and the budget is not rubber. The obvious option: quotas. But in practice it does not help much. Oath assurances "I cleaned everything" on closer inspection turn into seals, funny pictures and family photo archives (in corporate mail, yes). And the number of screams "I urgently burn does not work do immediately" increases by an order of magnitude. So long and faith in people to lose.

Fortunately, I am not a psychologist, not a coach or a mentor. My job is technology. Here we come from the technical side.
')
The first thing that was thought was self-destructing messages. Roughly speaking, everything without the “important” mark is deleted after N days. For my taste, this should be “stitched” into e-mail standards. But so far this is not, and the implementation seemed to me too large-scale.

The second thought was a copy. You know, these messages where you are not the main addressee. Comes to you just for information. Some of these messages could be deleted automatically. But, suddenly, here the users were divided into two camps: “they all need you what” and “what is it”. I did not master the automatic sorting algorithm with such conditions.

Well, do not delete, so copy! Take all the copies and make symbolic links. A quick analysis showed that even processing in this way only FULL duplicates saves THIRD repository. But but but. Unfortunately, the path is a dead end due to many technical limitations.

Details for those interested in a spoiler
- not all archivers understand symlinks;
- server software goes crazy;
- complexity org. nature and access rights.

By the way, in my mail server the settings for both general backups and archive storage for users are very scanty. So there was little room for maneuver.

What remains? I looked at the cats with sadness


and wondered already unpretentious neural network that would clean the mail for the user. And then ... Excuse me, permit me, but what do the seals do in the letter? I remember that a letter with an attachment weighs almost a third more than one attachment! And whether to move the attachment? ..

Thus began the path where there were "many wonderful discoveries." If I knew ... Well, you understand. A drop of ignorance and courage lead us to victories!

So: we make keeping attachments separate from letters .

The main mistake that can be made here is to open the eml file in a text editor and decide that there is plain text. So I did. And he was delighted. Right now I'll write a batch file. Command line utilities for extracting full attachments: github.com/erikvdv1/eml-attachments or github.com/maiken2051/uudeview , offhand. There are problems with encodings, but this is not the most important thing.

The most important thing: to take out the file and create a link to it - it's a nasty business. But shove this link into the original letter ... Because there is not a text. There mime .

An experienced reader, of course, is now laughing at the hapless author. The author also discovered the delights of "standard". The most important thing that I realized: for falling into a berserk, the toadstools are not necessary.

Examples and swearing - under the spoiler:

charset = UTF-8
charset = "UTF-8"
charset = "UTF-8"
charset = UTF-8;
charset = "UTF-8";
charset = "UTF-8";
This is their one and the same thing.

Line breaks in the middle of the Base64 stream. Where come from - for me is still a mystery.

And vice versa: the absence of \ r \ n \ r \ n after the header part.

In the title, the order of the fields at the request of the left heel.

Old letters allow a length of no more than 80 characters, including service letters.

There may be line breaks in file names (in the body of the letter, and not in the name itself).

In general, line breaks can be anywhere, despite the fact that in the standard a line break is declared as the end of the current parameter.

The text of the letter itself is coded. How exactly it is encoded, remains on the conscience of a particular server, there are a lot of options (stinky) there.

And, and in the letter there is almost always a html part. That is, if you send "Hello" and there is a br or p tag, then the letter will always have TWO sections: with just text and tags. And the text is duplicated. And here they "saved" the computing power ... Just some kind of menagerie with Frankenstein.

They have the file name like this: filename = "=? Encoding? Type ?; and it happens like this: filename * 0 * = encoding '' (STA ?? !!). The second is a newer standard, RFC5987. The standard explicitly states that filename * 0 * = ENC and filename = "=? same. At this point, I was finally convinced that they were being bullied. How can I handle it normally, I do not know.

Separately, as usual, distinguished Apple. They have a standard of their own. Looking ahead, long attempts to process their code led to the only correct solution: “Error: Apple mail is not supported.”

Thunderbird though. With grief, I got into its source, but I could not find the necessary section in one and a half gigabytes of code on a mixture of python and dialects. I got into their IRC, where I kindly suggested where to look, but I still could not find it.

But he did not lose heart. Documentation do not read @ code write, and ready. No, seriously, I had to do something to bring the end of MIME closer.

Batch-script has not done. The result is a command line utility in C # and dotNet .

The utility has two modes of operation:
First: simply extracts attachments. It works correctly with encodings under Windows.

Second: and then the main fun. Now we can still keep mail attachments separate from the mail! The utility creates a new letter to replace the old one : the attachment is cut out, the letter is reformatted in plain HTML with UTF encoding, without limiting the length of the string. The text / plain section is taken as the basis. If there are tables in the html section, it transfers them while preserving the formatting inside the table, but this functionality works so-so. At the end of the text of the current letter (if it is an answer or forward) links to network resources are inserted with the path to the extracted files in the file: /// and ftp: // formats.

image

The system was tested on 10,000+ letters and deployed on the existing infrastructure.

Identified advantages:
+ was:
Backup
was started at 01:00:08
and successfully completed 03:26:32

has become:
Backup
was started at 01:00:09
and successfully completed 01:40:36

+ Saved 30+% of the storage: the files are leaving the heavy Base64 and their ilk in the normal format of the file system, plus a lot of duplicates were found even inside separate boxes.

+ Increases the speed of processing mailboxes server and mail programs.

+ Disappears "I opened the letter from the post office, I edited it for 10 hours and it was not saved"

+ You can refuse quotas.

+ It remains possible to find an attachment in the mail, in contrast to the simple transfer to the file storage.

+ Approaching the end of MIME. Repent, authors!

Solution Minuses:

- Some letters (but not attachments) are still fighting. Mostly not internally, but when viewed in some clients;
- some devils constantly break in ftp;
- not all email clients support opening via file: ///

Controversial moments:

? Apple mail not supported. For me, the Buddha is with him;
? Fight letters with complex formatting. These are usually Booker flyers or advertisements;
? If the ftp server is on a non-standard port, then there may be problems with access. Decided by mail bot.

So the problem was solved by a thorny path.

Thanks for attention!

Source: https://habr.com/ru/post/420371/


All Articles