RGB color model history

I'm going to take a look at the history of the science of human perception, which led to the creation of modern video standards. I will also try to explain frequently used terminology. In addition, I will briefly explain why the typical process of creating a game over time will more and more resemble the process used in the film industry.

Pioneers of color perception studies

Today we know that the retina of the human eye contains three different types of photoreceptor cells, called cones. Each of the three types of cones contains protein from the opsin protein family, which absorbs light in different parts of the spectrum:

Absorption of light by opsins
')
Cones correspond to the red, green and blue parts of the spectrum and are often called the long (L), medium (M) and short (S) according to the wavelengths to which they are most sensitive.

One of the first scientific papers on the interaction of light and retina was the treatise of “Hypothesis Concerning Light and Colors” by Isaac Newton, written between 1670-1675. Newton had a theory that light with different wavelengths led to retinal resonance with the same frequencies; these oscillations were then transmitted through the optic nerve to the “sensorium”.

“Rays of light, falling to the bottom of the eye, excite vibrations in the retina, which spread through the fibers of the optic nerves to the brain, creating a sense of vision. Different types of rays create vibrations of different strengths, which, according to their strength, excite sensations of different colors ... "

(I recommend you to read Newton's scanned drafts on the website of the University of Cambridge . I, of course, state the obvious, but what a genius he was!)

After more than a hundred years, Thomas Jung came to the conclusion that since the resonance frequency is a property dependent on the system, there must be an infinite number of different resonant systems in the retina to absorb the light of all frequencies. Jung considered this unlikely, and reasoned that the number is limited to one system for red, yellow and blue. These colors have traditionally been used in the subtractive mixing of paints. In his own words :

Since, for the reasons given by Newton, it is possible that the movement of the retina is of an oscillatory rather than a wave nature, the frequency of oscillations should depend on the structure of its substance. Since it is almost impossible to believe that each sensitive point of the retina contains an infinite number of particles, each of which can oscillate in perfect agreement with any possible wave, it becomes necessary to assume that the number is limited, for example, by three primary colors: red, yellow and blue ...

Jung's assumption of the retina was wrong, but he made the correct conclusion: there are a finite number of cell types in the eye.

In 1850, Hermann Helmholtz was the first to obtain experimental proof of Jung's theory. Helmholtz asked the subject to match the colors of various light source patterns, adjusting the brightness of several monochrome light sources. He came to the conclusion that for the comparison of all samples it is necessary and sufficient to have three sources of light: in the red, green and blue parts of the spectrum.

The birth of modern colorimetry

Fast forward to the early 1930s. By that time, the scientific community had a fairly good understanding of the inner workings of the eye. (Although it took another 20 years for George Wald to experimentally confirm the presence and function of rhodopsins in the cones of the retina. This discovery led him to the Nobel Prize in Medicine in 1967.) Commission Internationale de L'Eclairage (International Commission on Illumination), CIE, set the task of creating an exhaustive quantitative assessment of human color perception. The quantification was based on experimental data collected by William David Wright and John Guild with parameters similar to those chosen for the first time by Hermann Helmholtz. 435.8 nm for blue, 546.1 nm for green and 700 nm for red were chosen as base settings.

Experimental setup of John Gilda, three knobs adjust primary colors

Due to the significant overlap of sensitivity of M and L cones, it was impossible to compare certain wavelengths with the blue-green part of the spectrum. To “match” these colors, we had to add a little basic red color as a reference point:

If we momentarily imagine that all primary colors make a negative contribution, then the equation can be rewritten as:

The result of the experiments was the RGB triad table for each wavelength, which was displayed on the graph as follows:

CGB 1931 RGB color matching functions

Of course, colors with a negative red component cannot be displayed using the primary CIE colors.

Now we can find the trichromic coefficients for the light of the spectral intensity distribution S as the following inner product:

It may seem obvious that sensitivity to different wavelengths can be integrated in this way, but in fact it depends on the physical sensitivity of the eye, which is linear with respect to sensitivity to wavelengths. This was empirically confirmed in 1853 by Herman Grassmann, and the integrals presented above in the modern formulation are known to us as Grassmann's law.

The term “color space” arose because the primary colors (red, green, and blue) can be considered the basis of a vector space. In this space, the various colors perceived by man are represented by the rays emanating from the source. The modern definition of a vector space was introduced in 1888 by Giuseppe Peano, but more than 30 years before that, James Clerk Maxwell had already used only the emerging theories of what later became linear algebra to formally describe the trichromatic color system.

CIE decided that in order to simplify the calculations, it would be more convenient to work with a color space in which the coefficients of primary colors are always positive. The three new primary colors were expressed in terms of the RGB color space as follows:

This new set of primary colors cannot be realized in the physical world. This is just a mathematical tool that simplifies working with color space. Furthermore, in order for the coefficients of the primary colors to always be positive, the new space is arranged in such a way that the color coefficient Y corresponds to the perceived brightness. This component is known as the CIE brightness (for more information about it, see the excellent article Color FAQ by Charles Poynton).

To simplify the rendering of the resulting color space, we will perform the last conversion. Dividing each component by the sum of the components, we get the dimensionless amount of color that does not depend on its brightness:

The x and y coordinates are known as chromaticity coordinates, and together with the Y CIE brightness, they make up the xyY CIE color space. If we place on the graph the chromaticity coordinates of all colors with a given brightness, we will have the following diagram, which you probably know:

XyY CIE 1931 diagram

And the last thing to know is what is considered the white color of the color space. In such a display system, white is the x and y coordinates of the colors, which are obtained when all the coefficients of the primary colors of RGB are equal.

Over time, several new color spaces appeared, which in various aspects made improvements to CIE 1931 spaces. Despite this, the xyY CIE system remains the most popular color space describing the properties of display devices.

Transfer functions

Before considering video standards, two more concepts need to be introduced and explained.

Opto-electronic transfer function

The optical-electronic transfer function (optical-electronic transfer function, OETF) determines how the linear light captured by the device (camera) should be encoded in the signal, i.e. this is a form function:

Previously, V was an analog signal, but now, of course, it has digital coding. Usually game developers rarely encounter OETF. One example in which the function will be important: the need to combine in-game video with computer graphics. In this case, it is necessary to know with which OETF the video was recorded in order to restore the linear light and mix it correctly with the computer image.

Electron-optical transfer function

The electronic-optical transfer function (electronic-optical transfer, EOTF) performs the opposite OETF task, i.e. it determines how the signal will be converted to linear light:

This feature is more important for game developers, because it determines how the content they create will be displayed on users' TVs and monitors.

The relationship between EOTF and OETF

The concepts of EOTF and OETF, although interrelated, serve different purposes. OETF is needed to represent the captured scene, from which we can then reconstruct the original linear illumination (this representation is conceptually an HDR ( High Dynamic Range ) frame buffer of a regular game). What happens during the production of an ordinary movie:

Capture Scene Data
Invert OETF to restore linear lighting values
Color correction
Mastering to various target formats (DCI-P3, Rec. 709, HDR10, Dolby Vision, etc.):
- Reducing the dynamic range of the material to match the dynamic range of the target format (tone mapping)
- Conversion to the color space of the target format
- EOTF invert for material (when using EOTF in a display device, the image is restored as needed).

A detailed discussion of this process will not be included in our article, but I recommend to study a detailed formalized description of the workflow ACES (Academy Color Encoding System).

Until now, the standard technical process of the game looked like this:

Rendering
HDR frame buffer
Tonal correction
EOTF invert for the intended display device (usually sRGB)
Color correction

Most game engines use a color correction method, popularized by Naty Hoffman’s Color Enhancement for Videogames presentation with Siggraph 2010. This method was practical when only the standard SDR ( Standard Dynamic Range ) was used, and it allowed the software to use for color correction, already installed on the computers of most artists, such as Adobe Photoshop.

Standard SDR Color Correction Workflow (Image belongs to Jonathan Blow)

After the introduction of HDR, most games began to move toward a process similar to that used in film production. Even in the absence of HDR, similar to the cinematic process technology allowed to optimize performance. Performing color correction in HDR means that you have a whole dynamic range of the scene. In addition, some effects that were previously unavailable become possible.

Now we are ready to consider the various standards currently used to describe TV formats.

Video standards

Rec. 709

Most of the standards related to video broadcasting are issued by the International Telecommunication Union (ITU), a UN information technology agency.

Recommendation ITU-R BT.709 , often referred to as Rec. 709 is a standard that describes the properties of HDTV. The first version of the standard was released in 1990, the last in June 2015. The standard describes parameters such as aspect ratio, resolution, frame rate. Most people are familiar with these characteristics, so I will not consider them and will focus on sections of the standard relating to color and brightness reproduction.

The standard describes in detail the chroma bounded by the xyY CIE color space. The red, green, and blue light sources of the standard-compliant display should be selected so that their individual chromaticity coordinates are as follows:

Their relative intensity should be adjusted so that the white dot has a chromaticity.

(This white point is also known as CIE Standard Illuminant D65 and is similar to capturing the chromaticity coordinates of the spectral intensity distribution of ordinary daylight.)

The chromaticity properties can be visually represented as follows:

Coverage Rec. 709

The area of the chroma scheme, bounded by a triangle created by the primary colors of a given display system, is called coverage.

Now we come to the part of the standard on brightness, and here everything becomes a bit more complicated. The standard states that the “Total optoelectronic transfer characteristic at the source” is equal to:

There are two problems here:

There is no specification of what corresponds to the physical brightness L = 1
Although this is a video broadcasting standard, it does not specify EOTF

It happened historically, because it was believed that the display device, i.e. Consumer TV is EOTF. In practice, this was done by adjusting the captured brightness range in the above OETF to make the image look good on the reference monitor with the following EOTF:

where L = 1 corresponds to a brightness of about 100 cd / m² (the unit cd / m² in this industry is called “nit”). This is confirmed by ITU in the latest versions of the standard with the following comment:

In standard manufacturing practice, the function of coding the sources of the image is adjusted so that the final image has the desired appearance, corresponding to the visible on the reference monitor. The decoding function from Recommendation ITU-R BT.1886 is taken as the reference. The reference viewing environment is specified in Recommendation ITU-R BT.2035.

Rec. 1886 is the result of the work on the documentation of the characteristics of CRT monitors (standard published in 2011), i.e. is a formalization of existing practice.

~~Elephant~~ Cemetery CRT

The nonlinearity of brightness as a function of the applied voltage led to the physical structure of CRT monitors. By pure chance, this nonlinearity is (very) approximately the inverted nonlinearity of human perception of brightness. When we switched to the digital representation of signals, this led to the successful effect of uniformly distributing the sampling error over the entire brightness range.

Rec. 709 is designed to use 8-bit or 10-bit encoding. Most content uses 8-bit encoding. For him, the standard states that the distribution of the brightness range of the signal should be distributed in codes 16-235.

HDR10

As for HDR video, there are two main rivals in it: Dolby Vision and HDR10. In this article, I will focus on the HDR10, because it is an open standard that quickly became popular. This standard is selected for the Xbox One S and PS4.

We will begin again with a review of the color space used in the HDR10, as defined in Recommendation ITU-R BT.2020 (UHDTV). It contains the following chromaticity coordinates of the primary colors:

And again D65 is used as a white point. When rendered on xy Rec. 2020 is as follows:

Coverage Rec. 2020

It is obvious that the coverage of this color space is much larger than that of Rec. 709.

Now we turn to the standard section on brightness, and here again everything becomes more interesting. In his 1999 PhD thesis “Contrast sensitivity of the human eye and its effect on image quality”, Peter Barten presented a slightly frightening equation:

(Many variables of this equation themselves are complex equations, for example, the brightness is hidden inside the equations that calculate E and M).

The equation determines how sensitive the eye is to changes in contrast at different brightness, and various parameters determine the viewing conditions and some properties of the observer. The “Minimum Distinct Difference ” (Just Noticeable Difference, JND) is inverse to Barten’s equation, so the following must be true for EOTF sampling to get rid of the binding to viewing conditions:

The Society of Film and Television Engineers (Society of Motion Picture and Television Engineers, SMPTE) decided that Barten’s equation would be a good basis for a new EOTF. The result was what we now call SMPTE ST 2084 or Perceptual Quantizer (PQ).

PQ was created by choosing conservative values for the parameters of the Barten equation, i.e. expected typical consumer viewing conditions. Later, PQ was defined as discretization, which, for a given range of brightness and number of samples, most closely matches the Barten equation with the selected parameters.

The discretized EOTF values can be found using the following recurrence formula for finding k <1 . The final sampling value will be the required maximum brightness:

For a maximum brightness of 10,000 nits using 12-bit sampling (which is used in Dolby Vision), the result is as follows:

EOTF PQ

As you can see, sampling does not occupy the entire range of brightness.

The HDR10 also uses EOTF PQ, but with 10-bit sampling. , 10 000 , . 10- PQ :

EOTF HDR10

. , , :

,
, , , .

HDR10, , 1000-1500 , 10 . , , , . , — .

, 8- Rec. 709 100 :

EOTF Rec. 709 (16-235)

, , , , 100 ( 250-400 ), Rec. 709 .

Finally

Rec. 709 HDR , . , , HDR, . , .

, HDR- , . HDR- , , Rec. 709, , , , . HDR , HDR : , , .

Source: https://habr.com/ru/post/320304/

All Articles