Wavelet compression on fingers: practice

In a previous post, we looked at the theoretical foundations of image compression using discrete wavelet transform. And although many important issues were not addressed, the results obtained are enough to try to do something in practice.

Why words? Let's write a program that compresses images! (There are a lot of pictures under the cut!)

')

Tools

We will write in the programming language Python. Yes, I know that many people would like me to illustrate the presentation of the code in [enter your favorite programming language] . But Python seems to me to be the best choice because of its extreme simplicity (this is, in fact, interpreted pseudo-code).

We will need an interpreter and a PIL (Python Imaging Library) library. The network has enough instructions for installing them, so I will not dwell on this.

The only thing I want to recommend for development is to use ipython notebook. Wonderful thing, providing a web interface, reminiscent of the package Mathematica, but programmable in Python.

Summary of the previous series

So, we have already found out that in real images there is a huge amount of redundant information. This redundancy is due to the fact that the neighboring brightness values of the pixels change quite smoothly. We also learned that this redundancy can be eliminated with the help of ordinary linear transformations, that is, multiplications by a matrix.

The first transformation considered — the Haar transformation — works with pairs of luminance values. Transformation is multiplication by a matrix

The second transformation, the Daubechies D4 transformation, already uses fours of values that are shifted relative to each other by two values. Therefore, the matrix will already be bigger (we denote the cumbersome coefficients by c _i ):

We found the transformation coefficients by solving a system of nonlinear equations. The second line is simply the coefficients written in reverse order with alternating characters.

Multiplying this matrix by a column vector of 4 brightness values, we get only two values (low-frequency and high-frequency coefficients). It is because of this injustice that the four must not be taken in succession, but “overlapped”, with a step of 2.

For example, if we have the values [1, 2, 3, 4], then we need to take two fours: [1, 2, 3, 4] (surprise, surprise!) And [3, 4, 1, 2]. The latter has such a strange appearance due to the fact that we did not have enough values on the right and had to return in a circle to the beginning.

By the way, the matrix of the entire transformation in this case will be:

Similarly, you can write higher order transformation matrices: D6, D8, and so on. (The numbers are always even. Think about why.)

Getting started!

Suppose we chose the D4 wavelet transform for programming. In order not to carry out calculations each time, put the coefficients in the list.

from math import sqrt CL = [(1 + sqrt(3)) / (4 * sqrt(2)), (3 + sqrt(3)) / (4 * sqrt(2)), (3 - sqrt(3)) / (4 * sqrt(2)), (1 - sqrt(3)) / (4 * sqrt(2))]

The letter L means that these are the coefficients of the first line related to the low (L, low) filter.

This is a fairly universal approach. If we want to implement a different conversion, simply replace the list.

We will find the coefficients of the high-frequency filter (HPF, high-pass filter) using a separate function that records the list in reverse order and alternating signs.

 def hpf_coeffs(CL): N = len(CL) #   CH = [(-1)**k * CL[N - k - 1] #        for k in xrange(N)] return CH

The expression in brackets, if anyone does not know, is a list generator. Very handy thing.

Direct one-dimensional transform

Now that we have coefficients, we can do the conversion itself. Its result will be a list containing weighted sums with coefficients alternately from the lists CL and CH.

Weighted sums (sums of pairwise products) are products of rows of a matrix by a column vector. The even rows of the matrix are the low-pass filter, and the odd lines are the high-frequency filter. That's where the alternation.

Lists of weighted sums calculated “along” another list in mathematics are called convolutions. There are effective algorithms for calculating these convolutions based on the Fourier transform. (Suddenly, right? The Fourier transform is the sine and cosine, and suddenly it is used for summation!) But we are supporters of simplicity and will not deal with premature optimizations. Count in the forehead, so clearer. And optimizations and beautiful hacks will be left to the reader as an exercise.

The function is called pconv. P - from the word pair (pair), and conv - convolution (convolution).

 def pconv(data, CL, CH, delta = 0): assert(len(CL) == len(CH)) #       N = len(CL) M = len(data) out = [] #   ,   for k in xrange(0, M, 2): #   0, 2, 4… sL = 0 #   sH = 0 #   for i in xrange(N): #     sL += data[(k + i - delta) % M] * CL[i] sH += data[(k + i - delta) % M] * CH[i] out.append(sL) #     out.append(sH) return out

It may be unclear what the delta value is, which is equal to zero by default. It allows us to start convolving not from the first element, but from an arbitrary one. This is useful to us a little later.

The calculation of the remainder (the expression "% M") when calculating weighted sums is necessary in order not to go beyond the boundaries of the list. If M = 4, and the index suddenly turns out to be equal to 5, then we will return again to the 1st element, since 5% 4 results in 1.

Let's test our function on the unregulated Haar transform (which is with half sums and half differences).

 >>>C = [0.5, 0.5] >>>pconv([1, 2, 3, 4], C, hpf_coeffs(C)) [1.5, -0.5, 3.5, -0.5]

Works!

And back?

As the reader remembers, we specially designed our matrices so that the inverse transformation was performed as simply as possible. Due to the orthogonality of the matrices H and D4, the inverse matrices for them can be found by simple transposition.

Consider once again the transformation matrix D4, but for a column vector of arbitrary length:

Multiplying such a matrix by a column vector with pixel brightness values, we get the decorrelated values. True, the matrix is very large, so instead of explicit multiplication, we used a convolution.

The inverse matrix can be obtained by transposition:

Very similar to the original matrix, but there is no square piece in the lower left corner. Although if you “cut off” the last two columns and “glue” to the beginning, you will get a matrix of the same form (albeit with different coefficients) and we will be able to use the ready-made pconv function. Stop! But we have the delta parameter there, which sets the offset in the source data. But this is the very cutting and gluing. The secret of the delta parameter is revealed!

For D4, you need to "cut off" the last 2 columns, for D6 - four, etc.

Things are easy - to determine the order of the coefficients in the rows of the inverse matrix. For D4, the order is obvious:

But what about the case of D6, D8 and others? Let's look at the D4 transformation matrix. We are interested in the rows of the transposed matrix, that is, the columns of the original.

Our vectors are the third and fourth columns. Regardless of the order of conversion, the offset of the rows always occurs by two elements, so the pattern here is as follows.

First vector:
[the penultimate element of the first row, the penultimate element of the second, the first element of the first line (it coincides with the third), the first element of the second].

Second vector:
[last element of the first line, last element of the second, second element of the first line (it coincides with the third), second element of the second].

That is, the coefficients are alternately selected in steps of 2 from CL and CH starting from the second to last for the first row of the inverse matrix or the last for the second row.

This rule will be true for D4, and for D6 and even for the Haar transform (which is actually the same as D2).

In Python, the penultimate element can be accessed at index -2, and the last at index -1. We used this useful feature in pconv.

We write a function to obtain a list of coefficients of the inverse matrix, working according to the described algorithm.

 def icoeffs(CL, CH): assert(len(CL) == len(CH)) #       iCL = [] #    iCH = [] #    for k in xrange(0, len(CL), 2): iCL.extend([CL[k-2], CH[k-2]]) iCH.extend([CL[k-1], CH[k-1]]) return (iCL, iCH)

Check her work:

 >>>C = [0, 1, 2, 3] >>>icoeffs(C, hpf_coeffs(C)) ([2, 1, 0, 3], [3, 0, 1, -2])

It appears that the coefficients are reordered correctly.

You can test the operation of functions by converting a list (say, [0, 1, 2, 3]), and then performing the inverse transformation.

 >>>CL = [(1 + sqrt(3)) / (4 * sqrt(2)), >>> (3 + sqrt(3)) / (4 * sqrt(2)), >>> (3 - sqrt(3)) / (4 * sqrt(2)), >>> (1 - sqrt(3)) / (4 * sqrt(2))] >>>X = [0, 1, 2, 3] >>>CH = hpf_coeffs(CL) >>>iCL, iCH = icoeffs(CL, CH) >>>Y = pconv(X, CL, CH) >>>X2 = pconv(Y, iCL, iCH, len(CL) - 2) [2.5077929123346285e-16, 1.0, 1.9999999999999991, 2.9999999999999991]

What nonsense! It should have turned out again [0, 1, 2, 3]! Stop, false alarm. The first number is only

, that is almost zero. The remaining numbers are also close to the original. The fact is that we work with irrational numbers, and in such cases rounding errors are inevitable.

2D transformation

So, we have programmed the transformations. But these transformations are one-dimensional! And the pictures that we plan to compress, very even two-dimensional. But this is easily solved. A two-dimensional wavelet transform is performed as one-dimensional across all lines, and then across all bars (or vice versa, to taste).

We will work with images using the numpy-based PIL library. And there is a convenient way to refer to the elements of two-dimensional arrays.

If the brightness values of the pixels of our image are stored in the image variable, then we can get the list of 4th brightness, for example, lines, like this:

 image[4, :]

And the list of brightnesses in the 3rd column is as follows:

 image[:, 3]

Take for experiments some kind of image. It is important that its size was a power of two! (In general, it is important that they are even, but we will then do repeated conversions.)

For example, you can take this (since Lena is fed up with everything):

It has dimensions of 512 × 512, so that it suits us.

Load it into memory (at the same time transforming to a black-and-white look and forming a brightness matrix):

 import PIL.Image as Image image = Image.open('/tmp/boat.png').convert('L') image = array(image) / 255.0 #   — [0, 1] imshow(image, cmap=cm.gray) #

Something should appear on the screen:

So, the variable image is an array of brightness.

We write the function for the two-dimensional wavelet transform. Since the alternating low and high frequency coefficients look ugly, we will reorder them. Low-frequency - left and up, high-frequency - right and down.

Since the transformations are performed both by rows and by column, the low-frequency coefficients, which make up a reduced copy of the original picture, are grouped in the upper left corner.

 def dwt2(image, CL): CH = hpf_coeffs(CL) #    w, h = image.shape #   imageT = image.copy() #      for i in xrange(h): #   imageT[i, :] = pconv(imageT[i, :], CL, CH) for i in xrange(w): #   imageT[:, i] = pconv(imageT[:, i], CL, CH) #     data = imageT.copy() data[0:h/2, 0:w/2] = imageT[0:h:2, 0:w:2] data[h/2:h, 0:w/2] = imageT[1:h:2, 0:w:2] data[0:h/2, w/2:w] = imageT[0:h:2, 1:w:2] data[h/2:h, w/2:w] = imageT[1:h:2, 1:w:2] return data

Let us test the operation of our function on the D4 transform, the coefficients of which we put in CL and CH.

 data2 = dwt2(image, CL) imshow(data2, cmap=cm.gray)

Immediately warn you! Since we have not optimized anything, it will work slowly. On my humble netbook, the calculation took a half dozen seconds.

In the end, you should get this picture.

Everything, as we expected! The reduced image in the corner and small in magnitude (as indicated by black color) are the coefficients in the rest.

And here is a graph of values along the 50th line of the transformed image.

It can be seen that in the first half of the corresponding reduced copy, the range of change is high, while the high-frequency coefficients are close to zero.

You can try with the Haar wavelet:

 C = [1/sqrt(2), 1/sqrt(2)] data3 = dwt2(image, C) imshow(data3, cmap=cm.gray)

With a good view and a monitor, you can see that more details appear in the high-frequency part. This is understandable - after all, only the linear component was filtered, and in D4 - also the quadratic component.

Or even take the coefficients for D6 from Wikipedia (do not forget to divide them by

they are normalized to a deuce there). And the most daring can even get the coefficients analytically.

The larger the conversion order, the darker the areas with high-frequency coefficients will be.

Inverse two-dimensional transform

The fact that we can do a two-dimensional transformation is great, but this skill is useless without the inverse transformation.

It's just easy to do. We perform the same steps as with the direct transformation, but in reverse order. And do not forget about the other coefficients and the non-zero value of delta in pconv.

 def idwt2(data, CL): w, h = data.shape #   #      imageT = data.copy() imageT[0:h:2, 0:w:2] = data[0:h/2, 0:w/2] imageT[1:h:2, 0:w:2] = data[h/2:h, 0:w/2] imageT[0:h:2, 1:w:2] = data[0:h/2, w/2:w] imageT[1:h:2, 1:w:2] = data[h/2:h, w/2:w] CH = hpf_coeffs(CL) iCL, iCH = icoeffs(CL, CH) image = imageT.copy() #      for i in xrange(w): #   image[:, i] = pconv(image[:, i], iCL, iCH, delta=len(iCL)-2) for i in xrange(h): #   image[i, :] = pconv(image[i, :], iCL, iCH, delta=len(iCL)-2) return image

For verification, we perform the inverse transformation of the previously obtained image2 image.

 >>>image_rec = idwt2(image2, CL) >>>imshow(image_rec, cmap=cm.gray)

We got the original image (well, or very, very similar), so everything works for us!

Do not stop there!

Well, we got a lot of coefficients close to zero. But we still have a quarter of the pictures occupied by low-frequency data. And what if they convert? This can be done in a loop, converting the upper left corner until it becomes too small.

For the D4 wavelet, the “corner” size limit is 4 × 4 dimensions, since the smaller image cannot be converted anymore - there are not enough values to multiply by 4 matrix coefficients.

For example, for the D4 transformation, a similar recursive transformation can be programmed as follows:

 data = image.copy() w, h = data.shape while w >= len(CL) && h >= len(CL): data[0:w, 0:h] = dwt2(data[0:w, 0:h], CL) w /= 2 h /= 2 imshow(data, cmap=cm.gray)

The picture is not very interesting.

Let's see in which range the values change. Here is a plot of values from the zero line.

The first few values are really going wild. But the rest are small.

Remove unnecessary

And now let's do something for the sake of which, all this was intended - we get a lot of zeros!

This can be done in different ways. For example, you can round the values to the required number of significant digits (a special case of quantization). But we will proceed more simply - we will replace the coefficients smaller in absolute value of some value by zero.

As such a threshold value we take 0.05. The more, the more zeros, but the greater the loss!

 from numpy import abs threshold = 0.05 data[abs(data)<threshold] = 0

After that, the schedule will be noticeably smoother.

Calculate the number of zeros:

 >>>sum(data == 0) 224000

Not bad! Considering that the image is only 512 × 512 = 262144 pixels, it turns out that we simply discarded 85.4% of the existing coefficients !!!

Is it bad for the image?

Perform the inverse transform. (How exactly - an exercise for the reader!)

In the small figure, the distortion cannot be seen at all, therefore I will give full-size pictures - the original and the reconstructed image.

Who saw the differences - that fellow! I found it with difficulty.

And if we take 0.1 (92.9% of coefficients are reset)?

It's not bad too!

Let us now take a threshold equal to 0.2 (96.9% of the rejected coefficients).

Already noticeable serious distortion.

Finally, let's try to drop everything that is smaller in modulus than 0.5 (99.2% is dropped).

Creepy soap! But the squares are not as contrasting as in compressed JPEGs. We used D4, so the soap is “quadratic”, with smooth gradients inside.

For comparison, we will try to compress the image with the same threshold (that is, 0.5), but already using the Haar transform.

Horror, horror! Also linear distortions. And only 99.2% of the coefficients are rejected. Worse than D4.

findings

So.

1. After the conversion of D4, we can safely discard 90% of the coefficients by simple zeroing and will not lose in quality. And if you apply advanced quantization techniques, the result can be further improved.

2. D4 is better than D2 (Haar transform). D6 will be better than D4 and so on. But there is a problem. The higher the order, the earlier we will have to stop the process of recursive transformation of the upper left corner. So in everything you need to know when to stop.

What's next?

Actually, we haven't finished yet. Yes, we have discarded 90% of the ratios. But these are real coefficients occupying 8 bytes! So the compression will be very, very small. Therefore, when saving to a file, the coefficients are rounded off and use a limited number of bits to store. In addition, for better packaging, the coefficients can be reordered in a special way. About this you can write more than one article.

But even if you simply save the array with the coefficients in the form of floating-point numbers with half precision (float16), then after compression by the xz archiver, a file of 43604 bytes is obtained.

Saving can be done like this.

 from numpy import save save("boat", data.astype(float16))

Despite the significant loss of accuracy due to the conversion float64 → float16 and compression in the forehead without any special algorithms, we get a pretty good result:

In this case, a compression factor of (512 × 512) / 43604 ≈ 6 or 1.33 bits / pixel was obtained. God knows that, but we didn’t try hard to squeeze. Proper quantization and a good compression algorithm can greatly improve this result! So there is room to grow! But this is already beyond the scope of our "project for the evening." Maybe some other time. ;)

Homework

1. Try to combine the given pieces of code into one script. Let it take as parameters the name of the image file, the threshold value and the type of conversion, compress the specified image, and save the results to a file. Provide two modes of operation: compression and decompression.
2. Experiment with encoding and decoding different images.
3. Try other transformations.
4. Consider how to more economically compress coefficients.
5. Try to speed up the program.
6. Implement the compression of color images.
7. Remove the restriction on the size (for now they must be powers of two).
eight. ???
9. PROFIT !!!

Another JPEG Killer Ready! ;)

Thanks to all! I hope it was interesting!

Source: https://habr.com/ru/post/169615/

All Articles