📜 ⬆️ ⬇️

Rushim Captcha SilkRoad 2.0



This article is a continuation of my previous topic. You asked, and I publish.

For a start: I was extremely surprised that the code from the first article really defeated SilkRoad's captcha. People really became interested in the dark Internet, and, as you know, SillRoad 2.0 appeared after the closure of its first friend (although the second one was also recently closed). We'll talk about breaking the captcha with you under the cut.
')

The Silk Road vs. SilkRoad 2.0


The Silk Road 2 does not require entering captcha to enter. It is only needed for registration. It may have been demanded somewhere else, but I only looked at it on one page.

I learned that the captcha that we hacked in the last article was created in PHP CMS called ExpressionEngine.

SilkRoad 2.0 uses a plugin for Rails called simple-captcha. Its original (?) Branch has not been maintained since 2008, but some forks have seriously advanced since that time. I'm not sure which one is used on the site of interest to us, but this option was chosen for our tests.

Let's just say: Captchas SR and SR2 are not similar to each other, but the variant from SR2 is also trivial. SR2 is also likely to be solved with a high probability (99% +) without machine learning, since all operations to obtain a solution are reversible.

First look


Captcha looks quite good.







Some facts:

  1. No background;
  2. 5 characters, /\A[AZ]{5}\z/ ;
  3. Not a “word,” no dictionary tricks;
  4. One line of information
  5. Difference from SR: the characters are not just rotated or reflected, but also skewed.


All the distortions look the same, so let's take a better look.

Warp, right?


Judging by the names of the images, the script was called something like “simple_captcha”. It got to get its source code, but the solution was just a couple of hours, not weeks. Since 90% of the transformations are only ImageMagick distortions, it would be irrational to search for the algorithm of the captcha operation. However, having many examples, but not knowing the principle of operation, the task becomes more complicated.

Therefore, let's take a look here for a while and immediately see the operations of ImageMagick:

 params = ImageHelpers.image_params(SimpleCaptcha.image_style).dup params << "-size #{SimpleCaptcha.image_size}" params << "-wave #{amplitude}x#{frequency}" params << "-gravity \"Center\"" params << "-pointsize #{SimpleCaptcha.point_size}" params << "-implode 0.2" 


As you can see, this operation is completely rolled back. Do they use -implode 0.2 ? Let's do -implode -0.2 !

 for i in * ; do convert "$i" -implode -0.2 "$i-exploded.png"; done 


And take a look at the results of the work done:

OriginalRollback


Even if implode would be executed with a random parameter, we could try several options and, using a binary search, determine which one suits us best.

Riding a wave


Now we have text on the y axis. Yes, the distortion is only for her.

I could stop right now and say that the code of my first article would easily solve heaps of these problems, but let's try to change something in order to achieve success by 100%.

Look here :

 def distortion(key='low') key = key == 'random' ? DISTORTIONS[rand(DISTORTIONS.length)] : DISTORTIONS.include?(key) ? key : 'low' case key.to_s when 'low' then return [0 + rand(2), 80 + rand(20)] when 'medium' then return [2 + rand(2), 50 + rand(20)] when 'high' then return [4 + rand(2), 30 + rand(20)] end end 


Two randomly generated parameters are used in the -wave operator as amplitude and frequency. Judging by the instructions of ImageMagick , the beginning of the wave (along the x axis) is always zero.

Based on these two parameters and binary search, we can build these letters in the same way as soldiers stand in the ranks.

Since the search for two numbers is quite simple, I will miss this fragment and immediately move on.

Improved segmentation (object extraction)


Note that this captcha differs from SR1 captcha in that the spaces between its characters are not the same. Feeling as if kerning is involved. Take a look at the spaces between T and J in this example, XCUTJ :



The method that we used in the first article would have failed successfully now, since it is only looking for vertical spaces. We would get the wrong decision in about 50% of all cases. A clearer algorithm is required.

Above all: Moving squares


This logrhythm can divide our objects. (Here are examples for Ruby and C ++ , written by me a long time ago.)

Philippe Spiess created the best and most suitable example for us that I once saw. I took his animation:



The bottom line is that the square will go along the first object it finds and returns an array with the found points. If you combine this with something like the Douglas Pucker algorithm , you get a polygon. ( And here is the application of this algorithm in another project .)

The problem is that you need to immediately remove the characters that you just found without using any other methods.

So, we want to remove the symbol so that when the moving squares algorithm is restarted, another symbol is found following the one just removed. Or we can record the coordinates of the found object. next time to start the search for him, which is more difficult to implement.

This is rather difficult without any library. By the way, pixel-by-pixel operations are very (very, very) slow in Ruby. Let's look for a simpler way.

Fill method


So it will be smarter, faster and easier.

  1. Duplicate the image;
  2. Find the first black pixel to be “inside” the character;
  3. Fill it with white to make it invisible in the work area;
  4. Find the differences between the original and the image with the symbol removed;
  5. Repeat until all characters are detected.


It looks something like this:

 def each_extracted_object(im) return enum_for(__method__, im) unless block_given? loop do xy = first_black_pixel(im) break if xy.nil? # Save the original copy = im.clone # Erase it from our working image im = im.color_floodfill(xy[0], xy[1], 'white') # Exclusion to get the difference, trim and yield copy.composite!(im, 0, 0, Magick::ExclusionCompositeOp) copy = copy.negate.trim('white') # This stuff creates a bit of garbage GC.start yield copy end end 


Consider step-by-step visual transformations:

ActExample
Original image (monochrome for simplicity)
Fill in white for the first pixel found on the left.
Find the differences between the original and the image with the erased symbol.
Roll back


The result of the second step is an image with which we will continue to work, and the result of the fourth is an isolated symbol.

All this is done at relatively good speed.

Note that this method will not work if there are two objects to fill on the same straight line along the y axis. For an example, see the image with T and J above.

Picture matching


On the SR1 captcha, in order to separate the characters from the background, we should use filters. It turned at

With such a captcha we got a set of beautiful letters. Having collected information from 40 captchas, we received such a set:


It was obtained, say, by taking the letter M and comparing its transparency with all the others M: 1 / number_of_m_examples .

Instead of using a neural network, we simply find the symbol with the highest number of matches (and also taking into account the wave) with the set we received earlier.

 def font_match(im, candidate) score = 0 (0...FONT_HEIGHT).each do |y| (0...FONT_WIDTH).each do |x| if black?(im.pixel_color(x, y)) == black?(candidate.pixel_color(x, y)) score += 1 end end end return score.to_f / (FONT_WIDTH * FONT_HEIGHT) end 


0.96 ** 5 - approximately 81% of captcha passed.

For comparison, with 40 examples and 3 hours of training, the neural network won only 45%.

Let's sum up



Solving bad and low-quality captchas is easy. Using some snippets from the SR1 article, this uncoupled captcha was defeated in 3 hours. I am sure that with smarter work, the percentage of solvability will be more than 95.

I still do not want to publish the full script code, since it can be used to hack other applications, namely those that run on simple_captcha gem in Ruby.

I am also curious about the usefulness of captcha in 2014. I heard that in the Tor network you can decide up to a thousand captcha for $ 1.

Despite this, I learned so much about captcha that I would have enough of this knowledge for the rest of my life :)

Thanks for the idea of ​​a series of articles thanks ilusha_sergeevich

Source: https://habr.com/ru/post/241263/


All Articles