Why blur badly hide sensitive information

Surely everyone has seen on TV and on the Internet photos of people, specially blurred to hide their faces. For example, Bill Gates:

For the most part this works because there is no convenient way to turn the blur back into a fairly detailed photo to recognize the face. So with the faces everything is fine. However, many resort to blurring confidential numbers and text . I'll show you why this is a bad idea.

Suppose someone posted a photo of his check or credit card online for some horrible reason (proving on the forum that he made a million dollars or showing something funny, or comparing the size of something with a credit card, etc. .), blurred the image using a too-wide mosaic effect to hide the numbers:
')

It seems safe, because no one will read the numbers? INCORRECT ANSWER. There is an attack on this scheme:

Step 1. Get a clean receipt image

There are two ways to do this. You can either remove the numbers in the graphical editor, or create an account in the same bank and take a picture of your own card from the same angle, combine white balance and contrast. Then remove the numbers from the graphical editor (in the high resolution photo, this is easier to do).

In our examples, of course, this is easily done:

Step 2. Iterations

Use the script to iterate through all possible account numbers and create a check for each, separating groups of numbers. For example, on VISA cards, numbers are grouped by 4, so you can individually process each section. This requires only 4 × 10,000 = 40,000 images, which is easily generated by the script.

Step 3. Blur each image is identical to the original.

Determine the exact size and offset in pixels of the tiles used to blur the original image (easy), and then do the same with each of your blurred images. In this case, we see that the blurred image consists of a mosaic of 8x8 pixels, and the offset is determined by counting from the upper border of the image (not shown):

Now we iterate over all the images, blurring them just like the original one, and we get something like this:

Step 4. Determine the brightness vector of the mosaic of each blurred image.

What does it mean? Well, let's take the mosaic version 0000001 (enlarged):

... and determine the brightness level (0-255) of each mosaic area, calling them in a certain consistent manner as

a = [a_{1}, a_{2} . . ., a_{n}]

$a = [a_1, a_2 ..., a_n]$ :

In this case, account number 0000001 creates a mosaic brightness vector.

a (0000001) = [213, 201, 190, . . .]

$a (0000001) = [213,201,190, ...]$ . We find the mosaic brightness vector for each account number in the same way, using a script to blur each image and read the brightness. Let be

a (x)

$a (x)$ - account number function

x

$x$ . Then

a (x)_{i}

$a (x) _i$ denotes the i-th vector value of the brightness vector of the mosaic

a

$a$ received from the account number

x

$x$ . Above,

a (0000001)_{1} = 213

$a (0000001) _1 = 213$ .

Now we do the same for the original reference image, which we found on the Internet or anywhere, obtaining the vector, which we call here

z = [z_{1}, z_{2}, . . . z_{n}]

$z = [z_1, z_2, ... z_n]$ :

Step 5. Find the one closest to the original image.

Determine the brightness vector of the mosaic of the original image, let's call it

z = [z_{1}, z_{2}, . . . z_{n}]

$z = [z_1, z_2, ... z_n]$ and then simply calculate the distance from each account number (indicated by

x

$x$ ) to the brightness vector of the mosaic (after normalization):

d (x) = s q r t ((a (x)_{0} / N (a (x)) - z_{0} / N (z))^{2} + (a (x)_{1} / N (a (x)) - z_{1} / N (z))^{2} + . . .)

$d (x) = \ sqrt ((a (x) _0 / N (a (x)) - z_0 / N (z)) ^ 2 + (a (x) _1 / N (a (x)) - z_1 / N (z)) ^ 2 + ...)$

where N(a(x)) and N(z) are the normalization constants given by

N (a (x)) = (a (x)_{0}^{2} + a (x)_{1}^{2} + . . .)^{2}

$N (a (x)) = (a (x) _0 ^ 2 + a (x) _1 ^ 2 + ...) ^ 2$

N (z) = (z_{0}^{2} + z_{1}^{2} + . . .)^{2}

$N (z) = (z_0 ^ 2 + z_1 ^ 2 + ...) ^ 2$

Now just find the smallest

d (x)

$d (x)$ . For credit cards, only a small part of the possible numbers confirms the hypothetically possible credit card numbers, so there is nothing complicated here either.

For example, in our case we calculate,

N (z) = s q r t (206^{2} + 211^{2} + . . .) = $ 844.7845

$N (z) = \ sqrt (206 ^ 2 + 211 ^ 2 + ...) = $ 844.7845$

N (a (0000001)) = 907.47837

$N (a (0000001)) = 907.47837$

N (a (0000002)) = 909.20647

$N (a (0000002)) = 909.20647$

and then proceed to the calculation of distances:

d (0000001) = 1.9363

$d (0000001) = 1.9363$

d (0000002) = $ 1.937

$d (0000002) = $ 1.937$

. . .

$...$

d (1124587) = 0.12566

$d (1124587) = 0.12566$

d (1124588) = 0.00000

$d (1124588) = 0.00000$

. . .

$...$

Maybe the account number corresponds to 1124588 mosaic?

“But you used your own image, which is easy to decipher!”

In the real world, real photos, not dummy examples, made in Photoshop. We have text distortions due to camera angle, imperfect alignment, and so on. But this does not prevent a person from accurately determining the type of distortion and creating the corresponding script! In any case, several minimum defined distances can be considered as candidates, and especially in the world of credit cards, where numbers are beautifully divided into groups of 4, and only 1 out of 10 numbers is in fact a valid number, which makes it easy to choose from several of the most likely candidates.

To realize this in real photos, you should improve the distance algorithm. For example, you can rewrite the above distance formula to normalize standard deviations in addition to the mean. You can also independently process RGB or HSV values for each mosaic area, as well as use scripts to distort text by several pixels in each direction and compare (which still leaves you with quite a limited number of comparisons on a fast PC). You can use algorithms similar to existing nearest-neighbor algorithms to increase the reliability of working on real photos.

So yes, I used my image and adapted it for this case. But the algorithm can certainly be improved for real use. But I have neither the time nor the desire to improve anything, because I do not hunt for your information. But one thing is certain: this is a very simple situation. Do not use simple mosaics to blur the image. All you do is reduce the amount of information on the image that contains everything

l o g (10^{N}) / l o g (2)

$log (10 ^ N) / log (2)$ effective account data bits. When you distribute such images, you want to eliminate personal information rather than obstructing access to it by reducing the amount of visual information.

Imagine a graphic image of 100 × 100. Suppose that I simply averaged the pixels and replaced each of them with an average value (i.e., turned the image into a one-pixel “mosaic”). You have just created a function that, out of 256 ^ (10,000) variants, is hashed to 256 variants. Obviously, with the 8 bits received, you can’t restore the original image. But if you know that there are only 10 variants of the original image, then using these 8 bits you can easily determine which of them was used.

Analogy with dictionary attack

Most UNIX / Linux system administrators know that passwords in / etc / passwd or / etc / shadow are encrypted with a one-way function, such as Salt or MD5. This is fairly secure, since no one can decrypt the password by looking at its encrypted text. Authentication occurs by performing the same one-way password encryption entered by the user when logging in to the system, and comparing this result with the saved hash. If they match, the user has successfully passed the test.

It is well known that a one-way encryption scheme breaks easily when a user chooses a dictionary word as a password. All an attacker needs to do is to encrypt the entire English dictionary, compare the ciphertext of each word with the ciphertext stored in / etc / passwd, and select the correct word as the password. Thus, users are generally advised to choose more complex passwords that are not words. A dictionary attack can be illustrated as follows:

Similarly, image blur is a one-way encryption scheme. You convert the image you have into another image intended for publication. But since account numbers usually do not exceed millions, we can build a “dictionary” of possible numbers. For example, all the numbers are from 0000001 to 9999999. Then start automatic processing, which places each of these images on a photo of a blank background - and blur each image. Then it remains to simply compare the blurry pixels and see which options most closely match the original.

Decision

The solution is simple: do not blur the images! Instead, just paint them:

Remember that you want to completely remove the information, and not to reduce its amount, as in a blurred photo.

Source: https://habr.com/ru/post/449608/

All Articles