Waging war against spam

Each mail system is faced with spam, Hotmail is no exception. Our weapon against it - the SmartScreen spam filter - is one of the most effective in the industry. This post will give you an idea of how we use SmartScreen to combat the threat, which is called spam, and how you can help us with this.

Spam is war

Why do people send spam? It's simple, it makes money. Spam is a big, very big business, most of which is illegal, but this does not stop people from sending it.
')
Believe me, spam is a battle. Spammers are very smart, they will never disappear. They are constantly inventing new and new ways to use our mail for their own purposes. But we do not give up.

Various studies, including the monthly Symantec spam report , show that more than 90% of all e-mails sent over the Internet are spam. As a result, the letters sent to Hotmail, like other mail providers, are mostly spam. With an active user base of 350 million users, Hotmail is a large target. As a result, we receive several billion spam messages per day.

But Hotmail removes 98% of all spam before it can reach your inbox. Let's talk about how we achieve this, and how we strive to improve it.

Know the enemy

First, let's give some definitions. Spam is a widely used term for unsolicited commercial e-mail sent without legitimate reason to a large number of recipients. Nobody wants spam to come to him.

It should be noted that not all spam is spam. For example, you receive newsletters or purchase offers as a result of registration on a completely legal and bona fide site. You may or may not want to see them in your inbox, but these letters will be legitimate, you yourself subscribed to them! We call such letters gray because it is not clear whether you want to see them in your mail or not, these letters are not “white” and not “black”, hence the name (from the translator: after the comment , apparently for this reason in the user interfaces of mail programs and services of Microsoft, nowhere is the word spam (Junk and Spam).

Our goal is to eliminate spam as much as possible. But we must avoid mistakenly marking good emails as spam. We call this type of error a false positive.

Thus, this is a real trick - eliminating spam as much as possible, ideally everything, minimizing the number of false positives, ideally none. In a sense, these two goals contradict each other.

All this in numbers

Conventional engineering wisdom says: what we can not measure, we can not and improve.

In Hotmail, we track several very similar indicators. Every day we monitor the SITI indicator (“spam in the inbox”), and we also monitor the percentage of SITI that is occupied by spam, excluding gray letters. We also monitor how often we make mistakes by placing normal letters in the Junk Mail folder.

In addition to automated means of measuring our work, we use feedback from clients. If you notice in your inbox a letter that you think is spam, you can mark it as unwanted. Accordingly, if you find a completely normal letter in the Junk Mail folder, you can mark it as not being undesirable or simply drag and drop a letter from this folder.

The majority of letters, about 75%, marked as unwanted by users, are actually gray, i.e. legal letters, but which users do not want to see in their mail, and therefore mark them as unwanted. A good example of gray letters is newsletters or notifications you subscribe to when shopping on the site, but which are not really interesting for you.

So what are we doing? Let's go back to 2006, when we had some problems with spam. The share of spam was approaching 35%, which means that every third email in your inbox was spam. Since then, we have made tremendous progress, lowering the proportion of spam below 5% and keeping it at this level. The following graph shows the trend in the number of spam over the past few years on the entire Internet, as well as in Hotmail. The green triangles on the Hotmail chart show the introduction of new anti-spam technologies.

You can see that at a time when the share of spam on the entire Internet was growing, the investments made in Hotmail really paid off. Now we are seeing not only the historical minimum of the share of spam, but also the best indicator of false positives.

SmartScreen: our anti-spam weapon

We have achieved these results by making huge investments in our SmartScreen technology. Let's talk about some of its component parts.

Filtering at the time of connection (Connection-time filtering) . This is our first "defensive line". At any given time, our system has an idea of the reputation of mail senders around the world, as well as the latest trends in email content, based on various sources. The reputation of the sender is mostly related to the IP address or address range. Based on this data, we set a limit on messages that a particular sender can deliver to Hotmail. Setting this value to zero allows you to block all email addresses of the sender. For good letters, we have set a limit so that it does not interfere with the normal receipt of letters, while minimizing the potential for abuse of the sender's address in the event of hacking his computer. We use several sources to assess the reputation of the sender:

IP addresses of bots (IPs of Bots). We track individual computers that were used to send spam. Often these are malware infected computers that are part of a botnet.
Dynamic IPs (Dynamic IPs). We know that computers with dynamically received IP should not send mail, so we immediately block mail sent from such computers.
Known spamming networks (Known spam entities). We use additional information, such as an autonomous number system and IP address registration , to track the ranges of addresses that were used to send spam.
Third-party sources (Third-party sources). We have agreed with third parties to use the best that is in the industry.
Content filters. We have a lot of filters through which we pass incoming mail, which, by analyzing the content, can identify the email as spam. This is not as simple as searching for “watch replica”. Our SmartScreen system uses self-learning to adapt to the trends and technologies used by spammers. The filtering system applies tailored policies, content filters, and reputation based on the class of the sender. Filters detect spam with a certain degree of accuracy. When we are absolutely sure that the letter is spam, we delete it. Otherwise, we place it in the Junk Mail folder. Our content filters remove approximately 1 billion posts per day.
Your personal settings (Your preferences). You control spam too! You can set up black and white lists and rules that we will use to further filter emails.
Time-traveling filters. Yes, you read it correctly. We can travel in time ... Well ... Our filters can. It's pretty simple. We cannot always find out about a new spamming source as soon as it appears. But as soon as we find a spammer, we can go back to the past and remove this spam before you notice it in our inbox. We call our tool filters traveling through time, because in a sense we are able to go back and get rid of spam, even after we missed it! (Of course, if you have already noticed this spam, we cannot remove it. Otherwise it would create a time travel paradox that could break our brain)
Malware detection. We check email attachments for known malware and viruses.
Tools in the Hotmail user interface (Tools in the Hotmail UI). Finally, we provide powerful spam tools right into the mail interface. We display a security panel whenever you read a potentially dangerous email. Links and images are disabled by default for unknown and untrusted senders to protect you from bad links and web beacons . You can help us by marking bad emails as spam, or by dragging them into the junk mail folder. And also to reduce the level of false positives by moving good letters from the Junk Mail folder. Whenever you mark a letter as “desirable” or “undesirable,” our system becomes smarter.

How can you help us?

Our system is only part of the solution. We look forward to user feedback in the fight against spam. Here are a few ways to make our system smarter, as well as contribute to the state of the postal ecosystem:

Give feedback based on your experience. There are three ways to get feedback. You can mark the letters as “desirable”, “unwanted”, “fraudulent”, thus making our filter smarter. By marking some letters as “Non-undesirable,” you help us identify false positives, which makes it possible for you not to repeat the error in the future.
Participate in the feedback program. From time to time, we invite some users to participate in our feedback program. The program works as follows: from time to time we send you a letter and ask you if it is undesirable. How you categorize this letter determines the setting of our spam filters. Agree to participate in this program if you are offered
Do not buy anything from spammers. A very small number of people follow the links indicated in the spam letters. But spammers make money because of a very large number of sent messages. A typical spammer can have a very good profit, even if the user responds to only 50 letters out of a million sent out.
Check your computer for malware. Make sure that your computer itself is not a spam bot! You can use free antiviruses like MSE for this.

A look into the future

Over the past few years, the Hotmail team has made very important investments in the development of the SmartScreen, in order not only to solve spam problems, but also to be the best in the email services industry.
In the next post I will talk a little about the problem of gray letters, look deeper into the filtering mechanisms, and give some tips to those who still have problems with spam.

Until then, I hope you continue to use Hotmail and leave your feedback in the comments .
Dick cradock
Group Program Manager, Windows Live Hotmail

Source: https://habr.com/ru/post/82217/

All Articles