JPEG technology: decision space analysis

JPEG images, besides the actual files with the .jpg extension, can be found inside PDF files and TIFF files.

JPEG technology stakeholders can probably be divided into the following groups:

camera and scanner developers;
photographers (large photos in good resolution with high quality requirements);
social networks and CDNs of the imgix type, which distribute flooded fotochki of uncontrolled UGC-origin, quantity and size in a clamped form;
webmasters who manage a moderate amount of non-UGC quality-controlled pictures;
lovers of scanned paper books and other historical sources;

The author of this article belongs mainly to the last group, and certainly does not belong to the number of artistic photographers. This should introduce a certain bias into the narration, which, nevertheless, is useful just to illustrate possible trajectories in the solution space.

Briefly about the process of turning the original image into JPEG:

color space transformation and downsampling;
The picture is divided into blocks of 8x8 pixels (MCU). If the size of the picture is not completely divided by 8, then the rest of the square is filled with some pixels, and is simply cut off during decoding;
DCT (discrete cosine transform) matrix of 8x8 pixels, which resulted in “more important” with so. image quality factors, closer to the upper left corner of the resulting 8x8 matrix, and “less important” closer to the lower right corner; we have 8x8 = 64 coefficients;
quantization: the matrix from the previous paragraph is divided element by element into a quantization matrix (which can be relatively freely chosen for each particular picture; as a result of quantization, a matrix is obtained in which elements from the lower right corner are usually zero;
coefficient compression (entropy coding, RLE coding and Huffman coding);

The only operation in which data loss occurs is quantization. All other operations occur without loss.

Different stakeholders have different production cycles , in which the following phases vary:

How much computing power and time is available at the time of generating the JPEG file? for example, a camera vs full-fledged computers;
Will the generated jpeg file be the end result? for example, this is so for the stock image on the website or for the generated thumbnail;
Will the generated jpeg file be the de facto source image? for example, this is the case for scanned book pages;
how much space is in the file storage? there is always not enough space, but sometimes there is very little of it;
What is the final technology? Jpeg file for browser? Pdf file? specialized software that is under complete control, such as e-reader?
can we determine what the image is? full color photo or scanned black and white sepia page?
How much effort can we spend on checking the quality of the result? a photographer can sit in a photoshop for hours; CDNs can provide only “acceptable” quality;

Staying within the standard JPEG format, we can vary the following:

source pixels as part of a human perception model;
quantization matrix (and we can separately select a suitable matrix, as well as scale it in accordance with the required JPEG quality);
the quantization result matrix, which allows to vary the pixels of the resulting image to some extent within the framework of the human perception model;
coefficient compression scheme;

Going beyond the standard file format, but preserving the basic JPEG encoding model, we can use a non-standard compression ratios scheme, saving storage space (and also to display if we control the end device).

If the final format is a standard PDF, and our source is scanned book pages, then we can use the technology of separating images into black and white and color parts. If we control the end device, then we can use another nonstandard “format”.

Key point: many of the technologies described here help reduce file size. It is important to understand that in fact it provides the opportunity to improve the quality of the picture, while remaining within the approximate goal of the file size.

Consider each dimension of the space of possible solutions separately.

Source pixels

The JPEG compression technology is based on the psycho-visual model of vision : the human eye suffers a loss of high-frequency image details; a person perceives differences in brightness better than differences in color. There are standardized metrics for the correctness of reproducing the original image, based on the SSIM (Structural Similarity) formula, which can be calculated automatically and set target values.

If we can vary the original pixels , while remaining within the target values of the correctness of reproduction, then we, in principle, can select such pixel values that, after quantization, are compressed better than without such a selection. We do not know if any software uses such technology.

Quantization matrix

Quantization matrices are the main mechanism that ensures the quality of reproduction with an acceptable compression ratio. There are several standard default matrices, including the one that was created more than 25 years ago based on the analysis of the body of test images created at the same time; If you use standard matrices, then everything will be more or less in order, if you do not need to bother.
Generally speaking, it would be optimal to generate a separate quantization matrix for each image, analyzing its content. Camera makers use their patented generation methods. Adobe Photoshop, they say, is able to analyze an image and select a suitable matrix.

JPEG compression quality is simply a real coefficient applied to the elements of standard quantization matrices, causing it to discard more and more information.

An interesting fact: quantization matrices can be used in computer forensics to identify the device with which the photo was taken.

Quantized DCT coefficients

Often, files in JPEG format are de facto source materials, because storing images in uncompressed formats is impractical. Also, sometimes we simply do not have access to the source material on paper, but, for example, there is only a PDF file of unknown origin and of dubious quality. Therefore, for many applications, it makes sense to work carefully with DCT-coefficients as first-class objects, applying lossless transformations to them whenever possible.

Work at the level of matrixes of DCT-coefficients allows you to do several transformations with minimal losses (or no loss at all). Operations that require full transcoding of pixels are not considered here.

Transformations with minimal losses:

rotate by 90 and 180 degrees, if the size of the picture is divided completely into 8;
cropping the picture, especially if the coordinates of the lines of the cut are completely divided by 8;
scaling pictures with a compression ratio of 8 / N, where N = 9 ... 16; In particular, this means that you can compress images with a factor of 1/2 and 2/3.
transfer of a picture from color to gray representation (the color component is thrown out);

Since we know that the next operation after quantization will be compression, in principle, theoretically, we can slightly vary the coefficients so that they are better compressed, but they are decoded at about the same pixels as the original image. It is not known if there are implementations using this approach.

Factor Compression Scheme

Standard JPEG supports Huffman coefficient compression as well as arithmetic coding.

Implementing Huffman compression is often the simplest and most naive; in particular, the compressor compresses the coefficients, treating them as random data. If we take into account the fact that these are the coefficients of a two-dimensional matrix that encodes an image, then we can get significantly better results. The compression of coefficients at this level occurs without loss.

Another way to improve compression is to encode the file in progressive JPEG. In this case, coefficients of the same order are grouped together, which usually leads to a decrease in file size. This is also a lossless compression operation. In addition, progressive JPEG starts to render faster when downloading a file, which may be a separate additional advantage on the web.

The second standard compression method is arithmetic coding. Unfortunately, this method is supported only by specialized tools, in particular, most common browsers do not support it. This is due to the questionable (in the past) patent status of arithmetic coding. Simple compression of images with arithmetic coding provides significant savings in size with no losses. Of course, this method can be used if we control the implementation of the client device, or if the pictures are used only for storage.

If we are in complete control of the client device, or if we only want to store pictures, then we can use non-standard compression schemes that better use the fact that pictures are the object of compression. For example, an interesting program PackJPG can compress JPEG without loss by about 20%, but the result is stored in a non-standard format.

Separation of image layers

For some images, especially for scanned printed pages, you can divide the image into a black-and-white (or other palette) layer and “specifying” the substrate in full-color format. Both of these layers can be compressed separately in an optimal format. For example, for PDF, the black and white layer can be compressed using JBIG2, and the color layer can be compressed in JPEG2000, JPEG or PNG.

A similar technology is used in the DJVU format.

If we control the client device, then we can do similar transformations, for example by overlaying pictures in HTML.

Links to documents and programs

https://en.wikipedia.org/wiki/JPEG is a very good overview. Some paragraphs become clear only after studying other references.

http://www.ijg.org/ - reference format implementation. Written on a super-portable C, so it is not highly optimized.

http://www.libjpeg-turbo.org/ - optimized jpeg implementation

http://www.libjpeg-turbo.org/About/Jpeg-9 - IJG + Criticism http://www.libjpeg-turbo.org/About/SmartScale - criticism of the ambiguous novelty SmartScale

http://www.libjpeg-turbo.org/About/Mozjpeg - very good technical analysis of Mozjpeg

https://github.com/mozilla/mozjpeg - JPEG implementation optimized for specific common use-cases

http://jpegclub.org/jpegtran/ - a utility for converting JPEG files with minimal loss

https://linux.die.net/man/1/exiftran is a utility for working with JPEG files rotated using EXIF.

https://github.com/ifad/pdfbeads - a script for generating PDF files using layer separation

https://en.wikipedia.org/wiki/DjVu#Compression - description of the separation of layers in DjVu

https://github.com/packjpg - a set of libraries and utilities for low-level manipulation of JPEG files

http://code.flickr.net/2017/01/05/a-year-without-a-byte/ - a story about how Flickr optimized storage and distribution of images

Source: https://habr.com/ru/post/322554/

All Articles