A look at modern web forms spam protection systems

What will be discussed

Probably never stop fighting people who want to put down their links or something to advertise, with people who do not want to see in their comments or on forums “left advertising” and sometimes even “right”.

Like someone who in the past has spent a lot of time developing web form spam tools. I would like to dwell on the points that many authors miss while agitating for this or that method of protection.

Repeatedly, similar topics were raised at various resources, but all the articles that came to my eyes were written by people on the other side of the barricades.

A small historical excursion

Many years have passed since botmaster released its famous XRumer . At that time it was a real revolution in spam technology, spam has moved to the industrial level.
')
Automatic recognition of captchas (at first, the simplest, and then quite complex), activation of accounts by email, the ability to engage in dialogue with himself, the hooper, who allows you to quickly assemble the necessary forum databases, work in hundreds of streams - all this justified the rather high price of this software product. Moderators of forums, guestbooks, and later blogs, cleaned out tons of spam, and sometimes even banned the registration of new users ...

I will not advertise the hrumer, but it was a truly revolutionary and unique product in its class (the word was not entirely appropriate, since it is actually relevant at the present time).

Black SEO of those times mainly consisted in competent running of untwisted resources on the correct bases of forums and guest pages. Very often, such simple actions led to amazing results.

Protection of the forums, guestbooks and blogs of those times was at a rather primitive level, at best they were simple captchas, and often the protection was completely absent ...

The public response was the development of methods to combat malicious software. Of course, even before the hrumer there was software for spam and remedies, but it was with the advent of this software that this problem became particularly relevant.

Modern methods to combat spam web forms

Captcha-picture - there’s nothing to talk about, everyone has seen various kinds of captcha-pictures. And also many people know about the universal service reCAPTCHA , which provides some of the most difficult to recognize captcha.
Text captchas of different types are captchas that use a question-answer bundle and offer to write an answer to the proposed question. This also includes captcha offering to do some arithmetic and enter the correct answer in the input field.
Interactive captchas are a fairly new and so far uncommon kind of captcha, the meaning of which is the interactive interaction of the user with some objects. There are several implementations of such caps for specific CMS (mostly WP). As well as a universal service KeyCAPTCHA , which, like reCAPTCHA, can be integrated into any CMS.
The “uncapacitated” spam filtering method - this method is actively promoted in many articles on spam protection, as the method that “traumatizes the psyche” of website visitors least of all.
This protection class includes:
- All sorts of tricks on the JS type of forming forms "on the fly" or the installation of some fields that will later be checked for correctness by the web server.
- Web server snares:
  - Creation of invisible sections of the site, where only robots get and later banned by IP
  - Check delays on the speed of filling out forms
  - Anonymous proxy filtering
  - As well as any other types of traps, varying degrees of sophistication depending on the fantasy webmaster
- Akismet service
- Service Disqus , although it is only for blogs, but it can also be conditionally attributed to this category.

How spammed resources on which the above described methods of protection are established

Captcha-picture - if they are not passed with the help of OCR, they are recognized at a price of about $ 1 for 1000 ANY captcha pictures. This is a reasonable price that almost any serious spammer is willing to pay.

At the moment, several services are already actively functioning for this “shameful” occupation. The image received from the web page is transferred to the service using a script, after which it is successfully recognized by the person (usually schoolchildren, students and Chinese), and the service returns the finished answer to the spammer in text form, after which the spam program inserts it into the form which wants to proshamit.

Of course, not all anti-gay workers (we will call them so in honor of the most popular such service) work in good faith, therefore such services have a feedback system and “quality control”. Bad workers you complain about are punished with a ruble. Integration with such services is already built into modern spam software.

Therefore, I personally do not quite understand the persistence of some complicators of captcha-captcha pictures, no matter how complex the captcha-captcha picture is, it is still recognized with the help of “a la antigate” services. Especially it concerns the protection of registration forms, to pay $ 1 for registration 1000 akkov, in my opinion it is not difficult even for a completely "green" spammer.
Text captchas - here, on the one hand, everything is much simpler, on the other, much more complicated. If you have a very rich imagination and not a very popular resource, then such protection can save you, provided it is manually configured. Otherwise, software developers for spam constantly replenish the text base of question-answers, and quite successfully spam through such protection.
Interactive captchas - this type of captcha is still not very common. At the moment, I know several non-universal implementations for WP, as well as one universal service that can be integrated into any CMS, is KeyCAPTCHA . So far, personally, I do not know the spam methods using a bot in the forms protected by KeyCAPTCHA. And it seems to me that in the next few years the forms protected by this protection will be the most invulnerable to spam bots.
Bottom-free protection - this method of protection is particularly interesting and requires more detailed analysis.
- Protection on the basis of all sorts of JS tricks - here most of the programs for spam can no longer cope,
  since this requires real HTML rendering with full JS processing and all page events.
  Everything would be great, but there is already more than one program to fully emulate the browser, or rather, this is simply an absolutely honest IE managed by. Such a browser is completely controlled by the necessary spammer scripts. That is, it can 100% emulate the work of a real browser using any algorithm written by a spammer in PHP or any other scripting language.
  
  Such browsers can change proxies, take pictures directly from the screen, emulate clicks on checkboxes and mouse links, view any updated styles. In general, do whatever you want. Moreover, there is software on the basis of which you can create a spam machine on which you can simultaneously launch dozens of such controlled browsers.
- Protection on the basis of tweaks from the web server in the form of all sorts of bots for bots - Easy and costly with the help of the above-described managed browser, since it (the managed browser) will only follow visible links, unless of course it is programmed.
- Protection based on the Akismet spam filtering service - Protection of this service is based on the isolation of some signs of spam from the text of messages and maybe IP addresses, it can be browser cookies. As a result, it can be used not on any arbitrary web forms, for example, the Akismet registration form will most likely not be able to reliably protect if the spammer uses non-“burnt” IP addresses. But as practice has shown, spam messages regularly appear on a blog with average traffic, which means experienced villains bypass Akismet context protection. The whole question is simply in the competent preparation of spam messages.
- Protection based on Disqus is, as I already wrote above, the solution is also not universal, it completely removes comments from the content of your site and transfers them to the Disqus server. After that, they are uploaded to the site visitor using JS. On the one hand, this is effective, but not without flaws, since comments cease to be part of your site. And spamming through this protection is possible with the help of the same managed browser, and in your comments it still appears “when I worked for Megafon ...,” and then with variations.

Total

Finally, I would like to summarize and summarize all of the above in a comparative table.

Method or service protection	Virtues	disadvantages	The possibility of hacking
Captcha Picture	It is easy to install, for most CMS there are several built-in types of captcha-pictures + many plug-ins for working with the reCAPTCHA service.	Many types of modern captcha-pictures of captchas are hard to recognize even if you are not a robot.	Recognized using OCR or special services such as antigate.
Text captcha	There are implementations for many CMS and it’s not difficult to make yourself; you can customize your own question-answer dictionary.	There are no particular shortcomings, apart from the risk that the person also does not know the answer to the question you raised.	Hacked on the basis of compiled for popular resources, updated database included in the kit hrumer. You can also organize recognition by third parties.
Captures on JS	Most modern spam programs do not know how to bypass such protection.	There are no particular drawbacks, but testing in different browsers is necessary, since some moments of JS are executed by different browsers in different ways.	Easy to manage with a managed browser.
Captcha tricks in server side view	You can implement your cunning algorithm.	The effectiveness of protection with the correct operation of the bot is questionable.	Easy to manage with a managed browser.
Akismet	Able to "transparently" catch a significant part of spam in the comments.	Unable to protect registration forms, or any other arbitrary forms.	With the help of a managed browser and not very aggressive behavior of the bot, you can spam.
Disqus	Able to "transparently" catch a significant part of spam in the comments.	Not universal, suitable only for comments in blogs. Comments of readers cease to be part of the content of your site from the point of view of a search engine.	With the help of a managed browser and not very aggressive behavior of the bot, you can spam advertising messages.
KeyCAPTCHA	Very fun for users. Provides maximum protection from bots at the moment.	Quite large in size. While there are few plugins for CMS, and the class for universal connection is only in PHP.	Having studied the mechanism of work, I think that automatic recognition or transmission to third parties is very problematic.

Source: https://habr.com/ru/post/107286/

All Articles