⬆️ ⬇️

How it works: CAPTCHA

How many years Habr has existed - so many years posts on the regular captcha regularly appear on it - be it a script for generating a picture, a new captcha idea with cats and the like. The most recent example of what a person does not quite understand is how a captcha should work (see the text of the post and the last comments), but at the same time shares its errors with the community. One gets the impression that captcha is such a terra incognita for most developers, both for those who simply fasten it to the next form in the hope that it will work out of the box, and for those who come up with captcha like those on which you need to choose a picture with a cat from several photos.



The article contains useful information for those who use captcha on their server, instead of trusting to a third-party service like reCaptcha.



And for the seed, if you think that such a captcha test will work:

if($_POST['captcha'] == $_SESSION['captcha']) return true; (example from practice)

then you are deeply mistaken.



')

Captcha



According to its definition, captcha is an automated public Turing test (a test that a person can pass, but not a computer). In this article I will look at captcha properties on the example of its most common form - the text in the picture, although almost everything written is equally applicable to any kind of captcha.



The two main properties of captcha



Any captcha should have two properties, without which it will not work:



Recognition resistance is a property that protects a captcha from recognition by an algorithm — for example, a text recognition system. Ensures that a person can read the text in the picture, but the computer does not.

An anti-example: the standard captcha of the phpBB 2.x forums didn’t have such a property - because of the relative simplicity of recognition, scripts appeared that spammed all forums in a row, forcing webmasters to change the captcha to more persistent ones.



Resistance to guessing is a captcha property, which does not allow guessing its value in a small number of attempts (less than 1000). If the set of possible captcha values ​​is small, the program will not be difficult to guess by its selection instead of recognition.

Anti-example: arithmetic captcha like “1 + 2” (enumeration of numbers from 1 to 20 will soon result).

Anti-example: choose from several pictures the one on which the cat is depicted.



Captcha Check



The value for verification should be stored on the server, and not transmitted along with the image to the browser. To match the visitor and the correct value of the captcha it is necessary to use a certain key that is transmitted along with the captcha (session ID, captcha number, etc.)

Anti-example: if you transfer the captcha itself and the value to check it (including encrypted), then a person needs to recognize such a captcha once and then use the combination “answer” - “value for check” in his script (by reference at the beginning of the post) such a case)



Before checking the answer - you need to make sure that it is not empty. Otherwise, an attacker can, without downloading a picture or deleting the identifier of the current session, pass an empty value and pass the captcha, because two empty lines will be compared (in PHP, a non-existent value is an empty string).

Anti-example: the code I have already mentioned is if($_POST['captcha'] == $_SESSION['captcha']) return true;

Moreover, this code was written by an experienced programmer.



After verification, the saved captcha value must be deleted. If you do not, the attacker will be able to use this value again an unlimited number of times. Yes, when updating the page with the form, the captcha is updated (either when generating the form or when generating a picture), only the script may not load the form again (it should be mentioned that it is not relevant if the site uses disposable csrf tokens for the forms).

Anti-example: a hypothetical login form, in which it is enough to enter a captcha once correctly, and then select the password with a script, avoiding captcha regeneration on the server.



Bulletproof captcha



Protection against brute force. If your captcha is resistant to recognition, but not very resistant to iteration (for example, you need to read only 3-4 figures on it), it is advisable to limit the number of incorrect answers “from one ip” / “for one login” / etc. Such restrictions must be checked BEFORE checking the captcha itself (that is, even in the case of a correctly entered captcha, if there is a restriction, it should not be considered as passed) otherwise it will not interfere with busting.



DoS protection. When generating captcha on your server, you need to understand that this is a convenient vector for DoS attacks (which, unlike DDoS, any student can arrange). For protection, you can limit the number of captcha generation for one ip, captcha caching, etc. More about this



Protection against recognition. If you choose a captcha, or suddenly you are going to write it yourself, it is advisable to understand which captcha is more protected from recognition. There are ready-made universal captcha recognition scripts that operate on the OCR principle, and if your site is interesting for spammers, there is a risk that they will use / write the script specifically for your captcha. The latter truth relates more to the Yandex or vk level sites, but it is advisable to provide a variant with protection against banal OCR.



Protection against antigate. Formally speaking, captcha as a Turing test is not obliged to protect you from antigates, since in this case a person will recognize it. From the practical point of view, this question is highly relevant and it is necessary to defend ourselves somehow.

There is not and cannot be a “gold standard” (for in this case, antigates will introduce its support), so you are free to supplement the captcha with any tricks to make its recognition through antigate impossible. For example:

- non-standard captcha (puzzle collection, image rotation, click on the area in the photo, etc.);

- Cyrillic captcha is the simplest solution, but has a number of drawbacks: only suitable for projects with a Russian-speaking audience, there are anti-gates with Cyrillic support;

- use the virtual keyboard next to the captcha to enter non-standard characters or shapes (it may be inconvenient for mobile users);



Usability



Do not ask for a captcha if you have already seen that you are in front of you. Here, however, you need to be careful that the form cannot be used by the script an unlimited number of times after a single person has entered a captcha.

Example: registration form. If I register somewhere, and forgot to enter the field "zip code", but correctly entered the captcha - no need to show me a new one. Spend 10 minutes to save somewhere in your home, that a living person is trying to fill this particular form now.



To facilitate human recognition: do not use letters and numbers in the captcha at the same time, do not use uppercase and lowercase letters at the same time, exclude similar symbols.



Refuse to use captcha



The best captcha is no captcha. Where you can refuse to use it - it must be done. You may need to implement additional limits and checks, but users will thank you.

But here we must be very careful. For example: a registration form without a captcha, with an email field to which an email arrives with activation. Without additional means of protection, such a form can fill up with “left” addresses, and your site will be included in the black lists of postal services. In this case, you can do without captcha, but only if you have another line of defense, such as the ip limit.



To whom the information in this topic will seem obvious, but if I had not encountered examples of a lack of understanding of these simple principles in life, including among experienced fellow developers, I would not waste time writing this text.

Source: https://habr.com/ru/post/175461/



All Articles