The hard life of anti-spam or as it actually happens

The reason for this publication was the recent major changes made to the anti-spam mechanism in our mail service. We want to share the news, but not in the form of a dry press release. Therefore, we decided to talk about how AntiSpam works in Mail@Mail.Ru, and of course - with pleasure to answer your questions. So…

Anti-spam Mail.Ru architecture

Own antispam in Mail.Ru has existed for many years. The desire to develop your own product is understandable, because at a certain stage in the development of the project, the requirements for the quality and scalability of the anti-spam mechanism have become too great for even very highly customized “alien” products to satisfy. Of course, we still use some services and components of independent suppliers (for example, to check emails for a virus component), but their role is no longer decisive.

The requirements for our own anti-spam were very clear and logical - maximum speed and accuracy of response. Of course, there is no limit to perfection, the relations of spammers with their opponents represent the eternal struggle of shield and sword. But now we can confidently say that we have seriously advanced towards our cherished goal and continue to increase the “momentum”.

So, how does it look and work - is the modern antispam of Mail@Mail.Ru?
')
First of all, even on the “approach” to mail servers, all senders are checked against the base of IP addresses seen in spam mailings. The database is dynamically updated in real time: some IPs are “whitewashed”, others are on the “black list”. Accordingly, letters from IP-addresses with a “tarnished” reputation are not accepted - this way we manage to cut off most of the botnets.

If the sender's IP is not on the blacklist, the email is received by the server and tested by two anti-spam systems: Kaspersky Anti-Spam (or KAS for short) and the spam filtering system (Mail.Ru Anti-Spam) developed by Mail.Ru. These two systems always work in parallel.

The name MRAS appears, in particular, in the service headlines of almost every letter passing through Mail.Ru. For example, the title “X-Mras: Ok” says that no spam signatures were found in this letter.

When choosing the MRAS architecture, we used the most common approach: collecting samples of spam letters, analyzing them, and generating signatures. To put it simply, the signature is a piece of meaningful information in the letter: a phone number, a link, a characteristic phrase or a keyword, etc. Evaluation of a letter in the MRAS is made according to signatures according to a simple logic: if the letter contains signatures characteristic of spam mailings, then most likely this letter is spam.

Separately, it is worth noting the system of recognition of graphic spam. Each picture that comes in the letter is analyzed and also decomposed into signatures that participate in the decision making. For example, antispam confidently determines the phone numbers and addresses of sites written graphically, and the algorithm even works with distorted and noisy images.

In addition to the signatures in the MRAS, there are so-called rules that describe more complex logic. Using the rules in MRAS, you can create filters that take into account the multiple characteristics of messages, including service headers, image parameters, format or link patterns, frequency and reputational characteristics of any entity in the message, etc.

When we chose the engine for the implementation of the rules, various options were discussed. The main requirements were: high performance, syntax flexibility and easy extensibility. Found that the above conditions correspond to the embedded interpreter language Lua. As a result, we got a powerful and flexible tool that was useful not only for creating rules. Now with the help of Lua-scripts in MRAS a significant part of business logic is implemented, for example, mechanisms for parsing images and frequency shingles, various reputation mechanisms.

Where does MRAS find out about spam e-mails?

There are several sources of spam samples for MRAS. The main source are complaints from users who click the "This is spam" button in the web-based mail interface. They are grouped, automatically filtered and then entered into the decision-making system.

Another of the most important sources are trap boxes - specially registered and “lit” boxes on the Internet, where only spam gets. Outwardly, they look like the boxes of ordinary users - these can be accounts in My World and other social networks, posts on forums and guest books, etc. Unscrupulous mailers who compile a database of addresses on the Internet are more likely to hook several “traps” —and when they receive a letter, it will most certainly serve as the basis for spam signatures.

Finally, at the third stage, there is a group of analysts from Pochty@Mail.Ru, who analyze real-time complaints received from users about emails, possibly being spam, the content of “trap” boxes, in 24x7 mode.

Next - what happens to the letter at the exit of the MRAS? Having worked out significant signatures of letters, at the exit MRAS gives the letter a final assessment, which can take one of three values:

the letter is not spam
the letter is probably spam
the letter is exactly spam.

The same estimates are issued by KAS. If both anti-spam systems consider a letter to be good - the letter is sent to the Inbox folder, if one or both systems have marked the letter as possible spam, then to the Spam folder. If at least one of the systems is sure that the letter is spam, then such a letter does not reach the user, and the sender leaves a bounce message.

It is important to note that the same system also processes outgoing mail from Mail.Ru servers. So if a user tries to send a spam letter, he receives a notification that the message cannot be sent.

It is interesting to note that MRAS checks the letter not only at the entrance, but also some time after it has entered the user box - this is due to the fact that new data on spam mailings could change the situation and, accordingly, the opinion of the system. Therefore, if at the moment when the letter was processed by MRAS, it was not detected as spam, and after a few minutes it was already determined, MRAS shifts the letter from the Inbox to the Spam folder. Naturally, this happens strictly before the user went to the Inbox and saw the letters.

All that was said above is an automatic spam filtering system that works for all users. However, different users have different preferences, so recently we introduced an individual (personal) component of spam filtering.

What's new?

It is no secret that with the mass distribution of social networks, online games, online stores and other services that are actively communicating with their audience via e-mail, mountains of various notifications began to accumulate in user boxes. And our research shows that for modern users, spam is far from being just a mass mailing about “printing business cards”, “green cards” or “increasing oneself.” People consider spam to be any unwanted email, be it a boring mailing with an opaque unsubscribe or a long time uninteresting Internet service, which with enviable regularity falls into the Inbox.

According to the internal statistics of Mail.Ru, users daily receive several dozens of various newsletters from social networks, stores and Internet services. An advanced user easily avoids the accumulation of mailings in the "Inbox" using filters or blacklists. In order to make life easier for all other users, we have implemented personal antispam.

Now, any user can, in one click, once and for all get rid of the annoying Internet service, social network or store distribution — that is, quite legitimate services. It is enough to select one unnecessary letter and click the “This is spam” button, after which all letters from this sender will be sent to the “Spam” folder. And of course, this will not affect the delivery of emails to other users, in this case, it’s all about the individual customization of the anti-spam mechanism “for yourself”.

By the way, the “This is spam” button has a antipode, without which the mechanism of personal anti-spam would not be complete. The “This is not spam” button, available for emails from the “Spam” folder, allows you to move an email to the Inbox that has entered Spam by mistake and “whitewash” the sender's address. In the future, all letters from this sender will be sent to the Inbox.

Of course, in reality everything is somewhat more complicated. When forming individual black and white lists, we take into account not only the address of the sender, but also other parameters of the letter. Otherwise, we would have done too nicely to spammers who fake the From header;)

And of course, in addition to replenishing individual spam filters, pressing the “This is spam” and “This is not spam” buttons is also used to train general antispam. So, by pressing these buttons, the user does better not only for himself, but for all other users.

Interesting facts and figures

The very first days after the launch of personal anti-spam showed that this feature greatly simplifies the lives of users. By the end of the first week of personal anti-spam, more than 1,000,000 emails per day began to be sent to spam — and, of course, these were mostly social network notifications.

We were interested to analyze what other users send letters to the "Spam". Here is the distribution:

By the way, as one would expect, users click the “This is not spam” button 10–20 times less frequently than “This is spam”.

And finally ... how to send mail;)

Many of you have a direct relationship to web development and, in one way or another, send emails to your users. To ensure that your letters are delivered safely, we have formulated recommendations for senders. Their implementation, of course, is not strictly necessary, but makes the world a better place;) General recommendations are at http://help.mail.ru/mail-help/rules/general , and more specific technical requirements are located at http: / /help.mail.ru/mail-help/rules/technical .

The main task of the mail is to deliver letters to its users. Therefore, we diligently deal with the false positives of the anti-spam when they occur. If your letters do not reach users, write to abuse@corp.mail.ru. In order to understand the problem, you need to attach a full copy of the letter that you sent (with all service headers), as well as a non-delivery reply (also completely).

I would like to pay special attention to the mechanisms that are designed to compensate for the shortcomings of the e-mail transmission protocols. This is about specifying the correct SPF records and especially about signing each message with the help of DKIM, which was not once written in Habr .

In fact, if all honest senders use these approaches, the world spam situation will drastically improve. Therefore, we urge you to introduce these technologies sooner, especially since it is quite simple to configure (for example, the documentation for configuring DKIM in Exim or one of the DKIM implementations for postfix ).

Sergey Martynov,
Head of Mail@Mail.Ru

Source: https://habr.com/ru/post/120389/

All Articles