📜 ⬆️ ⬇️

As the modern image codecs shake the sound. JPEG2000 vs MP3

In this experiment, the popular JPEG2000 image compression format will be used for an unusual task, storing a sound file.

In general, the sound and image are very similar. If we represent the sound in a wave form, then we get a change in the sound signal over time. Similarly, if we take one row of image pixels, we get the brightness change by distance.

The greater the amplitude of the sound signal oscillations in time, the louder the sound. An analogue for the image will be an increase in contrast.
')
The faster the sound changes, the more treble the sound will be. Similarly, the rapid change in brightness in a row of pixels indicates a large number of details in images.

Moreover, the sound signal, that the brightness of the pixels in the row change sufficiently smoothly so that the codec can use this property.

One small problem remained. The sound is a one-dimensional signal, and the image is two-dimensional. You can imagine that a sound file is one long row of pixels, and an image is a lot of rows of pixels. However, the neighboring ranks of the piskels are very similar.

The analogue is for the sound wave - the main frequency. And to it, in addition, there are still a bunch of harmonics that exactly fit into the length of the main wave. If you cut the sound signal along the main wavelength and put it together, then the adjacent pieces will be similar to each other.

For the experiment a half-minute sound file was prepared from my favorite song Ame Caleen - A demi-nue. Writing in 16 bit mono format takes 2570 KB.

The main frequency was experimentally determined for this file. And then, as described above, the record is cut into pieces equal to the length of the period of this wave. The result is an image file. The pixel format is 16 bits in shades of gray. That is fully consistent with the format of the sound sample. The image size is 909x1448 pixels.



It’s very convenient that JPEG2000 supports 16-bit / pixel grayscale. ImageMagick was used for JPEG2000 compression. ImageMagick allows you to compress the image slightly or strongly, thereby affecting the quality of the resulting sound recording. The rival ImageMagick was chosen as a regular mp3 codec from the Adobe Audition package.

The essence of the experiment was to select the codec parameters, get a jp2 file of the same size as mp3, and compare the quality of the resulting sound files.

I wanted to evaluate how badly the quality will suffer with medium and strong compression. By selecting the codec parameters, the source file was compressed up to 32KB for strong compression and up to 400KB for medium.

With moderate compression, JPEG2000 adds a clearly audible noise signal to the sound. The rest of the sound is very similar to the original. With strong compression, JPEG2000 has a lot of distortion, clicks, dull sound, bottoms and tops are disgusting. But what is interesting, in contrast to MP3 in similar conditions, through all the distortions the voice of the singer is heard much better.

For strong JPEG2000 compression, an additional image transformation was performed (to stretch the sound quality): reducing the image size. Reducing the width of the image is like reducing the sample rate for sound. And reducing the height of the image - something like sound acceleration.

Lossy JPEG2000 compression was also tested (i.e., compression is almost no distortion). File jp2 shrank to 71% percent compared to the non-image. Not bad, despite the fact that specialized lossless codecs (like FLAC and APE) are tight at the level of 40-50 %%.
Another result. JPEG XR lossless compression showed 81%.

Below are the ImageMagick launch commands.

Example for 400KB compression:
convert -depth 16 -size 909x1448 wav.txt.gray -depth 16 -type Grayscale -define jp2: rate = 0.1565 tn.jp2
convert tn.jp2-type Grayscale tn3.gray

For 32K compression:
convert -depth 16 -size 909x1448 wav.txt.gray -depth 16 -type Grayscale -resize -454x924 -define jp2: rate = 0.0325 tn.jp2
convert tn.jp2 -type Grayscale -resize -909x1448 tn3.gray

Below is a link to the file with the results. It contains sound files received by JPEG2000 and mp3 codecs, and an example of a “picture with sound”.
http://depositfiles.com/files/jmd4yfdf5 .

Source: https://habr.com/ru/post/125528/


All Articles