
This article will not be anything fundamentally new, but some will be very interested in the view from the spammer (already former) with experience. The idea to write on Habr came long ago, after reading
this article , which only caused a smile. It so happened that a long time ago, at the time of studying at the university, I was looking for remote work, because it is difficult to survive on a stipend, not to mention some kind of entertainment. Then one friend invited me to a company where he worked himself. The work was simple - to leave links to certain sites anywhere, adding a message to them, i.e. regular spam Here, as you understand, absolutely no captcha, checking cookies or javascript will save the situation, since a real person is registered using a regular browser. Regarding automatic spam, the situation is similar if they are scripted browser versions. However, there are ways to make life difficult for spammers and, if not completely removed, then significantly reduce the flow of spam to your site.
When I wrote that I was a former spammer, I did not mean that I completely changed my occupation. Now I am the author of one of the browser automation systems, the name of which I will not name, so as not to be considered for advertising. However, some situations will be somehow connected with this program, since most protection circumvention methods are decided at the engine level and such protection ceases to be protection once and for all (I apologize for the tautology).
Protection from hitting the site in the resource base
The resource base is a huge trashcan with a list of sites on which you can potentially build a link. Such databases are used to refuel all spamming programs that will try to do their dirty work. The success rate is not very high, about 5-10% (depending on the quality of the base), however, having a million sites in the database (and this is not even a big base), you can get 50,000-100,000 links, which is already a lot. Naturally, this is not done at one time, since the site in the blink of an eye will be in the search engines bath, but you can build and gradually promote more than one site at a time. But this is another question, and we return to the topic. The most popular and logical way to search for sites for databases is the correct use of all the same search engines. And sites are searched for footprints (distinctive features) of their platforms. For example, a spammer once managed to leave a link on a site that uses WordPress. He decides to add as many similar sites as possible to his resource base, he opens google and enters something like “inurl: wp-login.php”, after which you can safely add sites to your base (of course, all this is done automatically). If he needs a specific subject, for example, "real estate", then the request will be: "real estate inurl: wp-login.php". Adding such keywords can also bypass the search engine limit on the maximum number of displayed results. Hence the logical conclusion: get rid of footprints. For popular CMS there are a lot of plugins that make “beautiful links” - use them. Also, do not forget to remove “Powered by blah-blah” from the site’s footers, and if you want to pay tribute to the developers, change the text or replace it with a picture.
Of course, there are more sophisticated ways in which spammers make up the base, but their review itself will draw on the article. If society shows interest in this issue, then perhaps I will write about everything in much more detail.
Linkbuilder protection

Linkbuilders are people who are supposed to make quality spam. Their peculiarity is that they pass practically everywhere where, according to the idea, only a person must pass. The productivity of linkbuilders is much lower than that of programs, and is about 40 links per day. They try to answer thematic questions in the comments, from which their identification is often difficult.
The most radical step to protect against them will be the introduction of message moderation. It can be facilitated by any rules, such as "moderation only messages with links" or "moderation of users who have messages less than 5". I would also add moderation by the date of registration, the spammer is unlikely to remember your site after 5 days or so.
If any moderation is unacceptable for you, you can often stop the spammer during the registration phase. Most spammers use the so-called "temporary mail" services. The most famous and popular among them is mailinator. Its popularity is due to the fact that it is referred to by thousands of domains that are not in open sources, and you can also add your own. Those. The spammer buys (or registers a free) domain, links it to the mailinator and gets its own mail server, with which it can send spam until the domain reaches the spam database. However, such domains can be detected, since their MX Record refers to the one-time mail service. MX Record of native domains does not refer to the service, but redirects to it when a GET request is made. It does not hurt to check the ownership of mail domains to such services. If you don’t want an extra load on your servers, do these JavaScript checks, although this will lower the level of protection, but linkbilders are rarely intelligent and well, if they know such a word.
And, of course, I will not forget to add such services as “black lists”. They check for IP and Email in their spam databases, connect using a simple API, and provide good basic protection.
It is also quite important to not show the spammer that you know that he is a spammer. If your triggers have identified a spammer, you do not need to display him in big red letters about what bad things he does - he will simply change the proxy / mail and try again. A reasonable solution would be to display a pseudo service message, for example, “Technical work is being done, registration is temporarily disabled” or to simulate some kind of a 500th error (Internal Error).
')
Spam protection
This item can be divided into two plausible scenes: your site is only one of the vast base of the spammer and your site is the goal. In the first case, everything is quite simple: just remove the footprints, slightly correct the registration page (more on that later) and, most likely, no one will break through. It is much harder to protect against mass registration if the spammer needs a lot of accounts on your site. In the article to which I referred at the beginning, there is a share of common sense, but only a share. It is known that captcha is not a guarantee of protection against bots for a long time, there are a lot of services where all your captchas are solved by Hindus for a penny. But removing the captcha from the registration page is a very stupid decision, since registration will not only be absolutely free for the spammer, but also incredibly quick (the Hindu captcha solution takes an average of 30 seconds). As for the hidden fields, then, for example, in my program it has long been taken into account that in the automatic mode you only need to fill in the visible fields that have a non-zero height and width. Thus, even those spammers who do not know anything about it easily bypass this “protection”. In addition to the methods described earlier, as well as telephone validation (which I do not consider at all for obvious reasons), I can advise only one really very difficult to work around: captcha, where you need to perform some kind of action, like putting together a simple puzzle. A good example of such a captcha is PlayThru.
Do not interfere even customize the registration page. It is unlikely that an ordinary user will read the source code, so why call the fields (here I mean the attributes “name” and “id”) with such obvious names as “username”, “email”? Any self-respecting program will cope with this form of registration in automatic mode. The desirable minimum is to make these attributes simply unreadable or swap them specifically (for example, the login field is called “email”). But it is possible (and desirable) to go even further - to make these attributes dynamic. You ask how the server handles dynamic parameter names? I can offer hidden input with an encryption key algorithm known only to you. Thus the sequence is as follows:
The server generates a random key -> The server encrypts the names of the attributes -> The user receives a page with encrypted input attributes and hidden input with your key -> When submitting the form, the server decrypts the attributes and determines which one is responsible for what.
But, in order for this approach to work by 100%, the fields must be of the same size, type, displayed in random order on the page and, preferably, add one or more fields "Leave this field blank." And the final touch will be the use of images of the same size and dynamic address instead of the label.
But you can go even further and not transfer the hidden key values ​​directly to the input, but provide it with a JavaScript function dynamically generated on the server that should return the necessary key. This will negate any attempts to register with the help of programs that stupidly send POST requests.
Of course, this is all complicated, but it will perfectly protect your site with little or no inconvenience to real users.
And what about the authorization form?
It all depends on the site. Many do not understand why in the Habré, to log in, you should always enter the captcha. I agree that this is bad form, but it is one of two ways I know of how to protect your accounts. Such methods take place where some privileges can be tied to an account, which should not be available to everyone by default. On Habré it invites, on some other sites - money or pegged digital purchases. Any account can hijack and no matter how difficult your password is. They do this with the help of brute based on the purchased email: password type databases. And the bases are sold in millions of lines for ridiculous money. Mined, as far as I know, plums from various forums through SQL vulnerabilities. So think about how many accounts you can spin with a database with real mail / login and password combinations. In general, to protect yourself, ideally each site should have a different password. Then, if the base of this site is drained, nothing threatens you. To remember all the passwords, you can, for example, add a couple of letters from the name of the site on which you are registering to one main password. But this is a completely different topic, and we will return to ours. Brutus is carried out by sending POST requests to the authorization address and, if there is no such problem as a captcha, then all this will happen very quickly and, moreover, free of charge. But if you add a captcha, then a brutal will take ten times more time and checking the base of 1 million lines will cost about $ 1,400. But, as I wrote above, there is another way to protect the site without a captcha, which, oddly enough, seems to me more reliable. Its essence is to bind to a specific browser and / or IP, and when the user logs in from another computer, then ask him to answer the secret question. I have already met such implementations and, perhaps, habr developers should do something like that. As an additional protection, you can use the encryption key trick, then creating a brute program for such a site would be theoretically possible, but extremely difficult.
At last
Very rarely I met sites that were really serious about protection, despite their colorful statements. The accounts of such giants as Twitter, Skype, Sony, PayPal are very easily stolen, as evidenced by their extensive sales market. It is much harder to hijack accounts, for example, VK or Google. If the site does not have a mandatory confirmation of a mobile phone number, then write an autoregistrar is a matter of a couple of hours, no one goes further than adding hidden fields. And in vain, I even imagine the terrible amount they spend manual cleaning.
UPDAs the
evocatus user
suggested , there is another method to determine brute, regardless of the proxy and user-agent. Its essence is to check and compare installed plugins and fonts.