📜 ⬆️ ⬇️

DIY CAPTCHA Development

Today, thanks to the masters of spam bots and their ilk, it is practically impossible to do anything on the network without entering characters from the generated image - CAPTCHA (hereinafter referred to as captcha) . In fact, this prevents the execution of any script without human intervention, and in this topic I will tell you how to create such a captcha using PHP, and also mention the very useful topic of another habrayuzer that is useful when developing a captcha image.

Immediately I warn you that the topic may be of interest only to novice developers, since in essence I am reinventing the wheel, but with my own hands.

Fundamental rules


When developing a captcha it is necessary to observe several basic rules:

1. Captcha is made for people.
It should be read immediately, but not at the expense of recognition resistance. A very graphic example of a captcha that does not comply with this rule is the image on the right.

2. The captcha generator should be clearly limited in the symbols used.
A good example is the image at the beginning of the topic. Of course, reCAPTCHA is a wonderful invention, but sometimes it offers to enter characters that are difficult to find in a charmap. By the way, when it comes to captcha using Cyrillic characters - in any case, the generator should not use the letter "e". Personally, I know a lot of people who have some action on the system (tilde / e).
')
3. Captcha must be resistant to recognition.
... but not at the expense of readability. In general, this item is the most difficult in all development. It is necessary to find a middle ground - captcha is readable by people and in general (as far as possible) is not readable by bots. It is also necessary to take into account the specifics of the resource on which it is planned to use a captcha, and its contingent. If we are talking about, say, the forum of reading housewives over forty, then you can spit on the resistance of a captcha from a high tower - it will not be rested to anyone. If we are talking about, for example, imageboard, then you need a captcha a la vyrviglaz.

Design


As a theoretical object for which we will do a captcha, a spherical forum in a vacuum, with moderately aggressive, moderately intelligent and generally moderate users, will speak. A very useful topic from the Pastafarianist habrauser will help us in developing such a captcha. Namely, I will pay attention to the listed disadvantages and advantages of the captcha taken by him.

So, in order we list what we can use:

1. At least several colors should be used in the image. Always desirable different


The image above is an example of how this looks in action. In fact, this is not a very reliable option, since the text with the background is very contrasted. We will deal with the flowers later.

2. There must be noise


Capital truth. Practically in any captcha you can find noise, which is most often expressed in a set of lines crossing the text, of different lengths and at different angles.

3. Letters should be a short distance apart.


The main thing is not to overdo it. Excessive convergence of characters will lead to a strong deterioration in human readability. In the example above, you can see that the letters stick together, this creates an obstacle for the bot when the image is segmented.

4. The size of the characters must be different


If we apply this trick, we must remember that the obstacle is actually expressed in the fact that a bot cannot use a constant matrix for segmentation of a captcha. Therefore, if we make the size of the characters different, then the size of each character must be random, dynamic.

5. Disgusting font


Very useful way. Serifs, italics, styling are excellent pitfalls for the bot. Also in combination with noise in the form of lines, a thin font will look very good. If we abstract from the first rule of captcha generation, then you can use many fonts at once, for example, your own font for each character.

6. Characters at random angles.



A very effective way to protect yourself from bots. Again, the segmentation will be complicated, though not significantly. It is best to choose a small angle range, otherwise the readability will deteriorate badly (the letters will float on each other).

7. Dynamic distortion



Mankind has not yet invented anything worse. Distortions in captcha often greatly reduce human readability. Of course, it is quite effective against bots, but it is just as effective against humans. The main thing - do not overdo it, the distortion should be minor.

So, what we will do:
- Contrast background, with noise
- Lines behind the text, lines on the text
- Text in random position
- The number of characters will be random, from 4 to 7
- The size of each character will be random
- The text color will be random every time.
- Characters will touch lightly.
- Each character will be at a random small angle.

Development


We will decide on the purpose:
- Noise generation
- Text generation
- Form with the ability to update the captcha
- Entry handler
As I wrote this article, I realized that the distortions are absolutely inappropriate in this case. For those who still need them - at the end of the topic link to the lesson on creating distortion.

We write the form

<form action="go.php" method="post" enctype="multipart/form-data"> <!--        go.php  POST --> <img src='captcha.php' id='capcha-image'> <!--   --> <a href="javascript:void(0);" onclick="document.getElementById('capcha-image').src='captcha.php?rid=' + Math.random();"> </a> <!--    .   captcha.php  . --> <span> :</span> <input type="text" name="code"> <input type="submit" name="go" value=""> <!--   - go.php --> </form> 

It's all very clear, but just in case, I commented on some lines. By the way, pay attention, I did not set the maxlenght parameter in the input field. In most cases, developers set this parameter to indicate the size of the captcha. First of all, this is not a sickly hint for a bot, and secondly, the number of characters will be dynamic. Everything, the interface we wrote, it's time to start creating a generation script

Writing a captcha code generator (random.php)

 <?php //    function generate_code() { $chars = 'abdefhknrstyz23456789'; //  ,   .    . $length = rand(4, 7); //   ,    -  4  7 $numChars = strlen($chars); // ,      $str = ''; for ($i = 0; $i < $length; $i++) { $str .= substr($chars, rand(1, $numChars) - 1, 1); } //   // ,    $array_mix = preg_split('//', $str, -1, PREG_SPLIT_NO_EMPTY); srand ((float)microtime()*1000000); shuffle ($array_mix); //    return implode("", $array_mix); } ?> 

Everything is limited only by your imagination. Somehow I saw a different way - the current minute, hour, month was selected, it all was multiplied, 10 random symbols were selected from this, the MD5 hash was extracted twice, 6 random symbols were extracted from it, and then it was all mixed up. By the way, pay attention to the symbols I chose - I excluded such as i, l, 1 and 0, o, c, because they are too similar to each other, in some situations the user may be mistaken. I called the generator random.php (further it will be requested in other scripts).

We write image generator (captcha.php)

 <?php //   img_dir,          ( )  define ( 'DOCUMENT_ROOT', dirname ( __FILE__ ) ); define("img_dir", DOCUMENT_ROOT."/captcha/img/"); //    ,        $HTTP_SERVER_VARS.   ,      . // define("img_dir", "/captcha/img/"); //    include("random.php"); $captcha = generate_code(); //   (  -      go.php) // session_start(); // $_SESSION['captcha']=$captcha; // session_destroy(); //     .    120 . $cookie = md5($captcha); $cookietime = time()+120; //      setcookie("captcha", $cookie, $cookietime); //     function img_code($code) // $code -   ,       { //   Header' header("Expires: Mon, 26 Jul 1997 05:00:00 GMT"); header("Last-Modified: " . gmdate("D, d MYH:i:s", 10000) . " GMT"); header("Cache-Control: no-store, no-cache, must-revalidate"); header("Cache-Control: post-check=0, pre-check=0", false); header("Pragma: no-cache"); header("Content-Type:image/png"); //  .  ,      (    ).   ,  3  7. $linenum = rand(3, 7); //    .         /img.   - 15070.      $img_arr = array( "1.png" ); //   .    ,     $font_arr = array(); $font_arr[0]["fname"] = "DroidSans.ttf"; //  .   Droid Sans,  ,    . $font_arr[0]["size"] = rand(20, 30); //   pt //  ""      $n = rand(0,sizeof($font_arr)-1); $img_fn = $img_arr[rand(0, sizeof($img_arr)-1)]; $im = imagecreatefrompng (img_dir . $img_fn); //     for ($i=0; $i<$linenum; $i++) { $color = imagecolorallocate($im, rand(0, 150), rand(0, 100), rand(0, 150)); //   c  imageline($im, rand(0, 20), rand(1, 50), rand(150, 180), rand(1, 50), $color); } $color = imagecolorallocate($im, rand(0, 200), 0, rand(0, 200)); //   .   . //    $x = rand(0, 35); for($i = 0; $i < strlen($code); $i++) { $x+=15; $letter=substr($code, $i, 1); imagettftext ($im, $font_arr[$n]["size"], rand(2, 4), $x, rand(50, 55), $color, img_dir.$font_arr[$n]["fname"], $letter); } //  ,    for ($i=0; $i<$linenum; $i++) { $color = imagecolorallocate($im, rand(0, 255), rand(0, 200), rand(0, 255)); imageline($im, rand(0, 20), rand(1, 50), rand(150, 180), rand(1, 50), $color); } //    ImagePNG ($im); ImageDestroy ($im); } img_code($captcha) //   ?> 

All code is maximally commented out. To check the entered captcha handlers, we add its hash to cookies. For the choice of font and background, I made arrays, so that I could shove a dozen fonts and a dozen backgrounds there, each time random ones would be chosen. The font chose Droid Sans, it is thin and is hardly noticeable among the noise. By the way, in this case, I chose a terrible, disgusting and contrasting background, the demonstration will be at the end of the topic. To my surprise, this did not affect human readability, the font stands out quite well, despite the fact that the font color is chosen from a random pixel in the background.

Writing a handler (go.php)

 <META http-equiv=content-type content="text/html; charset=UTF-8"> <?php include("random.php"); $cap = $_COOKIE["captcha"]; //     MD5 ,    captcha.php //      function check_code($code, $cookie) { //   $code = trim($code); //      $code = md5($code); //    ! //   ,   -     captcha.php,  ,    //session_start(); //$cap = $_SESSION['captcha']; //$cap = md5($cap); //session_destroy(); if ($code == $cap){return TRUE;}else{return FALSE;} //    -  TRUE (  - false) } //    if (isset($_POST['go'])) //  ,    : ,    . { //     ( POST-  'code' )... if ($_POST['code'] == '') { exit(":  !"); //...   } //     (  TRUE), ... if (check_code($_POST['code'], $cookie)) { echo "   .    ,  ."; //     } //    ... else { exit(":   !"); //...   } } //     ,   ... else { exit("Access denied"); //...,   } ?> 

It's all very clear. We take cookies, recorded earlier in captcha.php, take the entered code, or rather its hash, and compare. Attention! If you plan to use this code, do not forget to change the verification algorithm.

Result



The result was quite good, fully meeting my expectations (please, please note that I did not set myself the goal of making an impenetrable captcha). You can test it for resilience right here (I hope my server will not fall under the habra effect).
In fact, my code is just a template that you can scoff at as you please. And in fact, this is not necessary - there is a reCaptcha, but personally I was pleased to do this script, while I did read the mountain of documentation, I hope that someone will learn more from my topic :)

Footer


As promised, here is a very useful code for distorting text, if anyone comes in handy.

Do not throw tomatoes, please :) This is my first technical topic on Habré.
And yes, I have no idea how modern captchas work, I simply did not look for any information on this topic. If my methods seem indian to you - please immediately write about this in the LAN / comments, I will read with pleasure :) From the shortcomings of my method, I see only that if cookies are not enabled in the browser, then this captcha will not pass.

Source: https://habr.com/ru/post/120615/


All Articles