Building an effective anti-spam system

In any system where there is user communication, there will be an anti-spam problem. Consider building a system on the example of commenting on a blog entry, etc. We will not use forced registration and captcha. We will use the points system (Points System)

It can be solved by entering the system of forced registration - this is when it is impossible to add a new comment without first registering with this system. You can complicate the mandatory confirmation of the email address.

The second most commonly used method is the introduction of bots verification (CAPTCHA). What's wrong with this captcha? Maybe she will save our blog from spam. However, it severely limits the addition of new comments. Because if you spend 10 seconds writing your very useful comment and you guess 10 seconds about what is depicted on this very captcha, it may be trite to be lazy or not guess with digits.

There are several other options for anti-spam (anti-bots) - to make a complex form with blackjacks and hashes. You can hide the form (and show it on request for javascript). And other techniques that make life difficult for spammers.
')

Points System (Points System)

Let's try to analyze the messages on the server and decide whether it is spam or not.

Based on the article How I built an effective blog comment spam blocker , on the advice of an outcoldman . This is a set of rules according to which certain points are added or withdrawn according to certain rules.

For every comment the system likes, you get points. When the system does not like the message - it takes points. If, after all the checks, a total of 1 or more points were obtained, the message is published. If 0, it is marked as spam and published. If less than 0, then you can safely destroy it.

Types of spam

Spam is automatic and manual .
Automatic spam (spam bots) - the easiest to detect. There are several factors that define a message as spam.

Manual spam is more complicated. The person manually enters the "correct" data in the form and sends. However, you can still analyze the messages and make the right decisions.

rules

On my website I solved the problem of mass automatic spam in comments, reviews and other forms. My rules are somewhat different from the rules of the article. They are more adapted to my local tasks and several other rules have been added. If there are other ideas on how to analyze messages, we will definitely put it in a tablet.

Rule	Meanings	Points
Number of links in the message	≥ 2	-1 for each link
Number of links in the message	<2	+1
Message length	> 20 characters and no links	+2 points
Message length	<20	-1 point
Keywords	Viagra, casino and other vocabulary words	-1 point for each word
Analysis of links in the message	If the link is a domain in the zone .de, .pl, or .cn (there may be others)	-1 point

The rules that I borrowed are over. Next will be adapted or new rules for analysis.

We check the UrlReferrer in the form and compare it with the one that should be true.	Do not pay much attention to the differences of this parameter.	-2 points if different
Russian text Since the audience of the site is Russian-speaking, there should be no other languages in the name and other fields. I check the percentage of Russian characters	<10%	-2 points. If everything is good +1
BB tags . There were messages where BB tags were in the body. The site engine does not provide for this, it means that such messages are immediately in the trash	[url] and [link]	-2 points for each tag
Analysis of previous messages from this address (email, ip and others)	If you have already been marked as spam	-1 point

Further development

You can develop the system in the course of work, analyzing spam messages and adding new rules. There are a few ideas that are not yet implemented:

The analysis of meaningless text is when a message contains not just words, but random characters. The difficulty is in determining that these are random characters, not normal words.
There is an article. We define “wrong” words in the fight against spam , which can help to catch words that look like “normal”

What I would like to do next is to collect other possible rules and see how the system will work. Now she catches up to 500 spam messages every day. These are all automatic bots and they do nothing harmful, and I look at the growing counter of caught spam and I know that the system works . Look at the class here

Source: https://habr.com/ru/post/105366/

All Articles

Building an effective anti-spam system

Points System (Points System)

Types of spam

rules

Further development

More articles: