The article is intended, first of all, for people who have not previously worked with color. She describes those nuances, interesting moments and pitfalls that I learned when I first started working with color recognition (tasks like comparing the colors of two objects, finding a desired object by a robot at the request of a person, etc.).

Consider the original system that came up with the concept of color - man. A computer that recognizes colors must imitate the human perception system.
So here's the main question: how does a person see colors?
')
For the time being, let us omit how the material of objects in the field of view of a person interacts with the light, consider the waves that have already entered the eye. Direct determination of the frequency of light is not available to us, we only compare with the expected indicators of the radiation intensity at the three basic frequencies (RGB). These are three basic colors, for each of which there is a detector in the eye.
Then the main thing happens - the value from all cones and rods must be converted into color. Moreover, a person makes a comparison not in absolute values, as people with absolute hearing - with the frequency of a taken note, but in relative values ​​-
when shadows, glares , etc.
fall on the subject . After that, he identifies groups of tones and (this is important!) Shades in blocks under one tag - for example: “this is yellow”.
→ Additionally, you can read about the
relativity of color perception in principle , and about how, exactly, and in
what order the colors were markedNow about the computer.
A signal comes to the camera, the matrix recognizes the same three primary colors, and then ... And then it compresses the data, sometimes it performs the basic correction of lens aberrations and the stream received on the computer. The computer can already decrypt the stream using a variety of codecs and get the video. But this is not the video we are looking for.
Omit until the resolution and distortion, focus all the same on the color. If we show a monochromatic, evenly lit rectangle to the camera, the pixels that make up its image in the frame will be different. You can even go a little further. Load on one of the stages of such preprocessing video insertion in the form of a block of pixels. The same test rectangle, only we can not worry about the level of light and texture coloring.
Below you can look at the result of such an experiment. A tiny rectangle is drawn into the frame, then creepy algorithms pass over the image, and at the output we get Almost the same rectangle. For clarity, I highlighted the changes by removing the green channel of the image and raising the contrast.
Now this picture (already with garbage) should be robbed into the program that will process it. I used the OpenCV library for these purposes. And it also has a number of features. They are dedicated to how the picture is stored and transmitted. There are various reading modes. The essence is simple: some three-channel pictures are transmitted by devices as four-channel, even if there is no fourth channel. The library read function counts bytes waiting for the structure {26,54,250} {40,47,245} {30,26,255} ...
While the stream file has the format {26,54,250,111} {40,47,245,110} {30,26,255,112} ...
Reading three bytes will give the following result
{26,54,250 ,
111} {40.47 ,
245,110} {30 ,
26,255,112} ...

This error is easy to recognize when displaying an image - an object of the same color will turn into an alternation of red, green, blue and gray pixels - 4 each in a circle. If you confuse the other way around - then most likely any processing code will generate an error, as there will be a memory access outside the allocated one. Those. With such results, it makes sense to change the reading mode.
Finally we got the right file. But how to highlight the color on it is unclear. If the object actually denoted the pixels of the same color, there would be no problem - just sort out the pixels and choose which ones there are most. But there are two problems. The first is lighting. The second is image processing by the handkerchief on which the camera is sitting. You can enter the difference threshold at which to consider similar pixels the same - this is the same as if we lowered the resolution of the colors to our own. But unfortunately, there are two reasons not to do so. First, pixels of the same color can vary greatly in RGB values. And the second - the volume in the RGB-space of each color (as people have identified them) is different. Well, if so, then you can select a block for each color, specify 6 bounding coordinates and instruct the network to accurately determine them ... But again, not - the blocks in RGB have a complex shape, they are not parallelepipeds. We will have to add parameters if we reduce the step (up to the 1 / 255th part is not necessary, but it will still have to be reduced) the number of network input data will increase, and the garbage data unfortunately will grow faster than the useful ones. Need preprocessing. Looking at the pixels of the object, you can find a great thing - the differences in the hue parameter in the HLS space will be less than that of any parameter in RGB.

It looks like HLSI translated all the pixels in HLS (just in case I wrote the function manually - there are different translation systems, because I did not touch the built-in library function, but imitated the one in my graphic editor), and then indicated the basic boundary values ​​separating the colors. You can take them, for example, carefully looking at this diagram:

Now to the difficult. From the rest of the parameters we still have not got rid of. White, black, gray, brown are quite basic colors in the person’s view, but they have a rather weak relation to hue — you cannot understand them without illumination and saturation. Well, we add the same basic barriers to the eye, and see how the system behaves.
I identified and labeled 15 primary colors, then began testing. I used the standard Google search tools for pictures to filter the results by color, and for starters did not take a photo with the object and background, but more monotonous pictures (for example, grass, or yellow leaves). To view and filter unnecessary images in dataset on your own is a mandatory part of the training. I had a network without a trainer, I twisted the values ​​manually.

Then, having received hundreds of correctly identified colors in a row, and moved to the objects in the background (already taken from the working environment, those for which the algorithm will be applied). There is a trick here, and it consists in the fact that when voting pixels for a certain color, you need to create your own branch, your candidate, for each cluster of pixels. The search is still done on all pixels, so it is easy to create several cells under one color. For example, in the following picture, a line search will reveal the incoherence of red pixel sets and create two “red” format entries - 0. The first will be incremented by the pixels of one red ball, the second will be voted by the pixels of the second.

Just in case, the function returns not only a record with a maximum of votes, but also second place (so that in case the first place was taken by the background of the object) and first place among tones, without shades (I remind you that the volume of gray and black in the color space is much larger than the volume of the saturated bright colors. This means that adjusting the filters so that there are no false positives for unstressed colors is impossible in principle). The latter is especially noticeable when searching for an object on a snowy street, when there is a lot of white around. Therefore, one of the three results is chosen according to an additional context.
A little more detail:In my case
- calculate the difference between the results and if it is large (2.5 times or more) - output the first result
- calculate flare and snow. To do this, conduct color correction and repeat the calculation.
- use information about the expected position of the object in the picture to reduce the weight assigned to the background color.
An example of the “wrong” picture:

Now the difference in illumination is still greatly interfering with the identification of an object of the same color. In some cases, color correction is suitable, but sometimes it is helpless - gray lights up to white, black - to gray or white.
For this, information about the object is used (which we have distinguished by color in one of the previous frames and we track how a shadow or a glare falls on it). We read all those pixels that voted for a defined color. And now we consider their average value. We get the base color of the object and make up a palette of possible changes.
A very important thing: the color is reproduced by the monitor so as to match the perception of the person; therefore, the storage of color data, following the playback devices themselves, is optimized for users' eyes, and the RGB values ​​do not correspond to the real share of these colors in the color we perceive (and looking at the wide spectra of reflected light, you can make sure that all three base colors are placed in them ).
Therefore, mixing the base color with others, and testing the highlights - you need to put the RGB values ​​in the square before the start of all procedures, and take the root at the end. After that, I transferred to HLS and checked how much the color has changed. I took the base color, mixed it in turn with black, white, red, blue, green, and color from the main set, which was determined by the first algorithm. After that, I repeated the procedure several times to get several gradations of each option. For comparisons with the frequency of 1/30 and 1/24, you can take only two iterations - this is enough with the head (you can even delete the farthest values ​​to be safe - the values ​​will obviously coincide).
Then something like a hash function is done - the distances between colors in the palette are taken (I took the distance L1 - L2 + G1 - G2, since this strange metric showed itself well in the experiment, it is generally worth trying any combinations of RGB and HLS differences) then placed into a sorted list, in descending order.
examples of automatically generated palettes (for a system of objects of different colors, all the palettes were combined into one for the purpose of optimization, but the series still calculated only for those colors that relate only to one object):

On the new frame, the same procedure is performed, and the distances coinciding within the limits of the difference below the threshold value are searched. Then the length of the longest chain of coinciding differences in the list indicates whether the color is the same, but with a sudden highlight / shadow, or not. As the three primary colors for mixing colors in the palette, you can take not RGB, but slightly purple instead of blue and yellow instead of green - according to the popular colors of artificial light sources.
After it is necessary to calibrate the threshold values ​​for a specific task.
There are no pictures from the final tests, but at the end I loaded
that dress into the algorithm and received “white” and “yellow”.
So what is the result?
- To determine the color, I recommend doing three steps: take the initial coefficients from studies of how people perceive colors.
- Run a set of images from Google and perform a rough adjustment.
- Run around the set of test images of this type with which the program will work and perform a fine adjustment.
- To reduce the number of errors, you need to study the context of the form and conduct additional color correction.
- To distinguish between illuminated objects, you need to build a palette of the intended colors after the highlight / shadow and compare them in some quick way.
Moments that are worth paying attention to:
- the frame from the camera comes with distortions in the RGB values, which are different for each pixel;
- sometimes there comes an extra, fourth channel, which is not written, but gets into the picture from the programs;
- the values ​​on the monitor are correct, but they are not stored, but distorted coordinates, proportional to the root of the real values.
Now about where you can apply such a thing, and play with it:
- in Person Recognition systems, where, in addition to the person’s face, you can add information about the physique, bag and color of your favorite T-shirt to the database;
- in trackers, as a replacement for coarse initial filters (of the diffusion value type);
- in the technique for searching for different types of objects to be captured by a manipulator;
- for segmentation of thermal and vibra-images already brought to a certain range of colors;
- for anything! The computer distinguishes colors as a person. For convenience, you can return with the calculated label and the average color, and some service codes. I used the Vec3b output format, and slightly customized this type to convey here the recognition error and the degree of confidence in the answer.
Picture marking flowers from here