📜 ⬆️ ⬇️

Hacking Matan Captcha in C # is Easy!

In this topic, I want to tell you about hacking a so-called. “Matan-Captcha”, an example of which was presented in a recent topic Matanova Captcha in PHP - it's easy! .
After reading the author's article about this wonderful captcha, I wanted to write a program for its recognition, as they say just for fun;)


Let's start with the standard preparatory procedure, namely: the search for vulnerabilities. Vulnerabilities marked in italics are not used in recognition.

Weak sides:
  1. Black symbols on white background.
  2. No noise and other artifacts (for example, lines).
  3. Symbols never intersect.
  4. Always the same font.
  5. Under the integral is always 4 terms.
  6. Degrees and multipliers are one digit.
  7. Degrees and multipliers range from 2 to 5.
The strengths include:
  1. The presence of nonlinear distortion.
  2. Possible lack of degree or multiplier x.
  3. Sometimes, dx stick together in one character.
  4. Changing the width of the captcha.
Let's make recognition algorithm:
  1. Get a bitmap captcha image.
  2. Number all characters.
  3. Find the coordinates of the upper and lower limits.
  4. Recognize the characters that make up the limits using a neural network.
  5. Find the first character of the integrand.
  6. Alternately recognizing characters, move right to the last character.
  7. Solve the resulting integral.

Source captcha



')

Bitmap image


Per unit we will consider a pixel whose brightness according to the color model HSB <0.8, zero, respectively> = 0.8. After that, the captcha will take the following form:


Character numbering


In order to enumerate all the symbols, we use the simplest recursive Flood fill algorithm to highlight the connected regions in 8 directions.
public int FloodFill(ref int[,] source, int num, int x, int y) { if (source[x, y] == -1) { source[x, y] = num; FloodFill(ref source, num, x - 1, y - 1); FloodFill(ref source, num, x - 1, y); FloodFill(ref source, num, x - 1, y + 1); FloodFill(ref source, num, x, y - 1); FloodFill(ref source, num, x, y + 1); FloodFill(ref source, num, x + 1, y - 1); FloodFill(ref source, num, x + 1, y); FloodFill(ref source, num, x + 1, y + 1); return ++num; } return num; } ... int num = 1; for (int x = 0; x < CaptchaWidth; x++) for (int y = 0; y < CaptchaHeight; y++) num = FloodFill(ref bit, num, x, y); 

Character Recognition


Considering non-linear distortions and differing character sizes, the best option for recognizing them is an artificial neural network. To make things a little easier, I used the Fast Artificial Neural Network Library free library. It is written in C ++, and is good in that it has interfaces for almost all popular programming languages, including C # .

All characters, before recognition, are reduced to the same size: in this case, 16px x 21px.
Thus, an array of 336 elements (16 * 21) will be fed to the input of the neural network. Each element sets the color of the corresponding pixel as an integer: 0 or 1.
The middle layer consists of 130 neurons. And at the output is an array of 14 elements with real values ​​from 0 to 1, corresponding to numbers from 0 to 9, the +, d, x, and stuck dx signs.

The training took place on 3090 samples, among which there are most of all the characters “x” and least of all “d”. Despite this, the learning process on my C2D e6750 took only 40 seconds.
Code for learning NA:
 static void Main(string[] args) { NeuralNet net = new NeuralNet(); //    uint[] layers = { 336, 130, 14 }; net.CreateStandardArray(layers); //   net.RandomizeWeights(-0.1, 0.1); net.SetLearningRate(0.7f); //     TrainingData data = new TrainingData(); data.ReadTrainFromFile("train.tr"); //  net.TrainOnData(data, 1000, 0, 0.001f); //     net.Save("skynet.ann"); } 

Structure of the text file train.tr:
 num_train_data num_input num_output inputdata seperated by space outputdata seperated by space ... inputdata seperated by space outputdata seperated by space 

Examples of characters passed to recognition:


Integral recognition


First we find and recognize the limits. The top one can consist of 1-2 digits, the maximum value being 10. The bottom one is in the range from 0 to -10.
We will facilitate the task of the neural network:
Find the sign of the integral, simply moving to the right of the left border with a space from the top border at half the height of the captcha. Everything to the right is the integrand.
Knowing the number of the integral, the number of objects in the captcha, and the fact that these objects are numbered from left to right, by further sequential recognition of the characters we get about the following line "5x5 + x5-4x2 + 4x5dx". First we simplify the work of the expression parser - we will bring it to the form "+ 5x5 + x5-4x2 + 4x5".
Some patterns make it easy to obtain summary data for solving the integral:
At the output we get 2 arrays with multipliers and powers of x. Further solution of the integral of difficulties is no longer present.

Instead of conclusion


The recognition quality of the entire captcha is impressive: 93.8% on a sample of 500 pieces!



Download the finished program that demonstrates the solution of the integral can be here . Or together with source codes (+ a program for training a neural network and a training sample) here .
I hope neither my server nor the grishkaa habrauser server will fall under the habra effect;)

Source: https://habr.com/ru/post/121032/


All Articles