In the general case, the problem of pattern recognition has not yet been solved. Therefore, let's talk first about the methods that in some cases still allow you to find individual objects in the image, and then speculate about the future.
The human eye has several blocks that can recognize image properties. A person quickly snatches sets of objects from the surrounding reality and classifies them. What criteria does he follow? There are not so many of them:
- Objects of artificial origin, which are characterized by unnatural geometric correctness, quickly grab: straight or smooth lines, surfaces with a smooth color change or a more complex texture, the information content of which is still much lower than that of natural objects.
- With a more in-depth analysis of the scene, areas and areas that repeat in their properties are identified in which there are some deviations from the norm: a clearing in a uniformly mixed forest, darker green foliage against a light green background, small leaves against a large leaf, fast swaying branches against the background of slowly oscillating trunks, etc.
- This analysis is carried out on several scales at once: large, medium, small, and in each scale the most characteristic details are noticed.
Basic pattern recognition algorithms, in general, copy the same blocks:
- contour selection
- search for specified colors, textures
- linear filters that respond to some specific elements of the image
- highlighting areas with high / low information saturation, detail clarity
Etc.
For example, the classical problem of recognizing human skin in an image in the first approximation is solved as follows:
A) Highlighted colors that could potentially be human skin
B) The texture is checked.
C) The connectivity and sufficient volume of the selected area are checked.
')
Face recognition usually adds a response to a linear filter, which looks like this:

By the way, experiments were carried out; in the same way, a newborn baby allocates a face: these are two dark round objects, under which there is an elongated dark object.
In summary, existing image recognition methods usually seek to find in the object several characteristic details that can be somehow algorithmically described, and which can be “hooked” when searching for objects of this kind.
Now let's talk about the future. An important ability of the human brain is the ability to establish associative links between images, i.e. he can determine that they LOOK LIKE. In the process of comparing two images, a person pays attention to the elements of the image that we talked about earlier: spots of a certain color, geometry of shapes and lines, informational saturation, textures, reaction to simple filters. Each of us has a large database of images with signed tags in our head as a child. When a new frame appears, we very quickly find a suitable image in our database and “recognize” it.
However, it would be wrong to assume that in the process of searching there is a detailed comparison of images according to elementary features - with the amount of the database stored in the brain, even its enormous computational resources would not be enough for this. Therefore, we keep in our head a symbolic, semantic, short description of all objects. “A stick, a stick, a cucumber, and a little man came out”, “a table is a large rectangular board with four legs,” etc. The image of the map, which shows a black heart, is recognized by most people as a peak.
Those. The process of human image recognition looks like this:
A) the scene is described in some symbolic form (it may be just a few sentences; that is, an image of 100 megapixels is converted into a phrase from, say, 1000 bytes).
B) the symbolic description is compared with other existing symbolic descriptions of images in the database
C) the most appropriate match "flash" in our minds
This opportunity to describe the scene as a set of semantic constructions is not yet enough for pattern recognition algorithms, although in this direction a very interesting work is happening now. In a number of special cases, it is possible to obtain quite good results, but I have not yet seen an intelligible complete solution.
In the end, it is worth mentioning, probably, the once sensational method of neural networks. In the academic environment it is very much believed in, and in practical projects it is rarely used. As it turned out, neural networks are quite difficult to set up and, moreover, they poorly model the processes of the human brain, and this was a great hope. Today, more and more tasks solved earlier using neural networks (such as predicting the future, handwriting recognition, face recognition, etc.) are amenable to effective solution using other, more classical methods of mathematics.