Preface:
I have one friend who keeps an online store. Sometimes I indulge in rather non-standard programming tasks. It all started with the fact that, on reflection, he decided that for its users it would be very convenient to receive information about where his package is now (he sends goods from the store by mail to Russia). Fortunately, the mail site has a tracking function.

On the page just enter the tracker number and the information about the parcel appears on the screen in the form of a neat plate. Without hesitation, I armed myself with the
curl function and in a couple of minutes I prepared a light script - which parsed this information and displayed the last location of the parcel (status: “Arrived at the delivery point” or “Presentation to the addressee” - gave the command to the script to send the buyer a text message can pick up your package).
')
I did not have time to drink money for the script I wrote - how strange things began to happen in the mail. My script stopped working and the reason for this was the fact that the site “Mail of Russia” put a tricky block - which, when the session was empty, redirected the page in such a way that my script went in a loop. By the way, even an ordinary visitor of their site cannot come to their site from the first time.
The solution to the problem was to ensure that the script followed its referral to the mail site (
CURLOPT_FOLLOWLOCATION ), for credibility, I also filled out
CURLOPT_REFERER and
CURLOPT_USERAGENT . After the first connection, it was possible to re-send the request and the script's work on obtaining information on the tracker continued in the normal mode. For my not tricky manipulations, I got a bonus and I took up other projects with a calm soul.
Chapter 1 - Blow from the back
A month after the script was submitted, the postal workers struck back by setting a simple captcha on that very form. They turned to me again for help.
At that time, I knew that php had the ability to disassemble a picture pixel-by-pixel - thus, there is an opportunity to teach the script to see and what is most important to understand what is depicted in the picture. To my regret, I have never done this once, but the task was clearly set, but I was used to the script. By the way, this script reduced failures by 60% - this is very good money and it would be at least silly to refuse such a function.
Chapter 2 - Preparing for a Battle
First of all, I started looking at the script itself that displays this captcha

I saw that in
$ _GET ['Id'] weird numbers, but unfortunately I did not find the connection, but I found out that one and the same picture at the same address remains available only for 2 minutes.
Well, it does not matter, captcha is quite simple without noises and in one color.
For starters, I saved about 20 different captcha variants (with different numbers) - it turned out that the script that displays these numbers changes not only their
x and
y coordinates, but also the size (from 1 to 4 pixels) - So I had to teach script to solve ~ 40 different numbers.
Now, having felt the amount of work to be done, we begin to code.

Our captcha of
width: 70px; height: 23px; . Cycle through the entire picture and find out the color of the pixel (white = 0, not white = 1) We hammer the information into an array. Now, to check whether everything is correct, I do, for clarity, in the next cycle I draw a function - which draws a table and sets the cell to the corresponding color

We keep checking.

Well, as we see, everything seems to be working out. Now you need to come up as a script to recognize the numbers in the array and interpret them as a picture. Perhaps I began to reinvent the wheel, but it was more interesting for me to come up with the logic myself - without drawing information - which can only confuse me - from other sources.
After a few cups of coffee, it was decided to show the script a reference point, and depending on which pixels are painted next to calculate what figure is drawn in the picture.

And so - taking one pivot point (in this case, the crown of the number
1 ), I counted several pixels along the
X and
Y axis and if they were black, the script said that it was one. Having started the test, I saw that the script calls the unit number 3,4,7 and 9 - it is logical that the verification conditions should be greater. For each digit I added 9 test points and after 3 hours I launched a script that was supposed to solve a captcha with numbers 70039

And here it is a miracle! The script confidently guessed the first captcha (knowing only the numbers 7,0,3,9) - for convincing I loaded the captcha in which there were the same numbers, but to my regret the script did not work because the numbers differed in height. Looking at the clock, I decided that I needed more pivot points and something to automate learning.
Knowing not only
PHP , but also
JavaScript made a function that, by clicking on a cell, put it into an array of coordinates - allowing me to set for checking - the maximum number of reference points

The process went faster. It took less than a minute to train a single-digit script, and in an hour the script knew all the numbers that Russian Post used to generate captcha.


Information about the associated points for each digit is neatly folded in a separate file - which, in which case, can be supplemented.
Chapter 3 - Strike Back
Going back to the mail site and downloading some more captcha options to check, I made sure that the script absolutely correctly guessed the captcha with an accuracy of 100% - not bad for the first time!

Even I was less vigilant than my script.


The output is a php script - 45 KB in size. Who took the captcha id on the site "Mail of Russia"

and in response sent the code - which is depicted in the picture. With ease I connected my anti-captcha to the previous script (parser) and it all worked again!
It all took about 8 hours and 10 cups of coffee. An acquaintance was extremely happy, for which he again wrote me a prize.
Epilogue
I am sure that soon “Russian Post” will again answer me with a new challenge, which I will gladly accept.
Screenshots of the work process were made by a special program - which selectively inserts your logo into the screenshots, you should not pay attention to it.
Let me remind you once again that I didn’t use other people’s works for my script - I was more interested in writing this script completely from scratch and in a way that I would choose myself, so comments are like: “There’s a lot of software ..” or “Why reinvent the wheel.” - will be regarded as inattentive reading of the post.