Building signs and comparing images: local signs. Lectures from Yandex

Today we are publishing the fifth lecture from the course “Image and Video Analysis” given by Natalia Vasilyeva at the Computer Science Center in St. Petersburg, which was created on the joint initiative of the Yandex Data Analysis School, JetBrains and CS-club. In total, the program has nine lectures, of which have already been published:

Under the cut you will find the plan of this lecture, slides and a detailed transcript.

')
Signs of images:

when global signs do not work;
solution: local signs.

How to match fragments:

what should be the "special" fragments.

Comparison of images using local features, the main steps:

repeatability of singular points;
informativeness, representativeness;
geometric and photometric image transformations;
corner points as special.

Harris Detector:

Harris detector is the main idea;
Harris detector: math;
Harris algorithm;
example
Harris Detector: Summary;
Harris Detector properties;
scale invariance;
sharacteristic Scale;
scale Invariant Detectio; 8
Laplacian-of-Gaussian (Log);
Difference of-Gaussian (DoG);
Build Scale-Space Pyramid;
Key point localization;
Scale Invariant Detection: Summary.

Local signs:

SIFT: main steps;
scale space;
Difference-of-Gaussian;
SIFT keypoints;
scale-space extrema detection;
keypoints orientation and scale;
SIFT descriptor;
Example

Conclusion

Full text transcript of the lecture

Last time, if you remember, we talked about how to compare images, how they can be represented as feature vectors, as a set of some numbers, and how these same vectors can be compared with each other. At the same time, we said that, in principle, images can be described using the so-called global signs, that is, when we describe the entire picture, we consider the histogram over the entire image, and, in fact, we considered such signs. I told you about the fact that they are also quite half-popular and for a number of tasks it is necessary to calculate the so-called local signs. That is, when we have described not the whole picture, but some of its surroundings. Well, actually, when we have global signs do not work. Here are some examples here. For example, when our scale changes significantly, that is, these two fragments, they obviously coincide. But, nevertheless, if we try to count global signs, so to say, the texture throughout the picture here, throughout the picture here, it will be different. Histagrams, whatever you like, too. Forms - too. That is, in principle, we need to understand that we need to somehow be able to compare this fragment with this fragment. That is, to count a sign that would describe only a fragment of the image. The change, in fact, the point from which we look at the object. That is, the scale here is about the same, but, nevertheless, what we are looking through the window here again has quite a significant change in any global sign that we can build from this picture in comparison with this. That is, they will not be on each other. Well, if their threshold is small enough.

Changes in lighting. Here, of course, even more is not the fact that the sign is global or local, but in the invariance to change the lighting, but in fact local signs are almost all invariant. Well, while those that we are going to discuss today, they are invariant to such a change in lighting. And overlap. That is, some kind of interference, again, if we consider a large number of some small fragments and compare them with small fragments from here, we can completely establish that this fragment corresponds to this. If we look at the signs of this whole square, draw up with it, or even the whole image, they will turn out to be different.

In fact, it’s just here with us, if we say that we are trying to match these two objects, here the overlapping of the object itself does not occur. And here passes the overlap, in fact, of the object that we are trying to find.

What can be proposed in order to overcome all these troubles, try to calculate local signs for some image fragments. That is, the idea is that, despite the fact that if we try to build some numerical characteristic throughout the picture, it will be different for these two images, you can find some fragments in one and the second picture so that they will be comparable with each other. That is, we can say that we clearly have this fragment comparable to this and this fragment is comparable to this one. Two questions arise, in fact: how to choose these very fragments? And the second question: how to describe the fragments? Well, actually, an intuitive, such, in the forehead, approach, how to compare fragments, and how to choose them - that is, that is, we take, we completely scan our pair of images. That is, we choose a window of some size and go on. On one picture recorded. We look at the second: matches, does not match. It does not coincide, it does not coincide, it again does not coincide; More less similar. Hurray, there is a coincidence. It is clear that, on the one hand, of course, this approach is good, because we are going through all possible options and will not miss anything for sure. But, on the other hand, it is computationally very complicated, because you need to sort out, it is unclear how large to take the window, it is not clear, in fact, you need to sort out all possible offsets of this window in one picture and in the second picture. And another problem related to this is that we actually get too many comparable pairs. Because if we take, say, that here we initially had this window, this window was compared, this fragment was compared with this ... In fact, the shift here on this image is one window to the right, one pixel to the left, will not give a significant change to the feature vector. It turns out that we will have a lot of matches. We will have to bother, then look here right here among the overlapping most coincident and so on. This, in general, do not want to do. Another option, which in principle would be more pleasant and acceptable, is not to go through all possible options. The question is, what are the options for us to sort out? It would be nice if we could find some such singled out special points on the image that are the most informative. That is, which contain the greatest information about what is presented in this image.

Oh, here, under the "features" understand just the description, some characteristics. That is, around this point you can then build a feature vector or calculate any one “feature” that will describe a fragment built around this feature. In Russian literature, they are more often called either “precise features” or “special points”, or generally translate who is in what is ready, traces from English as “key points”, “points of interest”, “points of attention”, in general, all sorts of different options are possible. In fact, the idea to try to find such points is also comparable to our psychology. Surely many have heard about the fact that they are conducting research, for example, tracking your pupil. If you go to any page on the web, in order to understand which area of the page is most interesting for you, you can hang a webcam that will monitor the movement of your pupil and, in fact, the coordinates of the web page on which you are most of all delayed by your pupil, they are supposed to be more interesting for you. That is, this is what our vision focuses on when trying to understand the content of a page. And if you actually look at such a map of the focus and movement of the pupil, it is clear that the pupil is far unevenly moving across the entire screen, we are focusing on some such special points. We focus on those actions where there is a selection, that is, what attracts our attention.

It would be nice to try to find the same points in the pictures. That is, it is clear: fewer comparisons are significant. Plus, most likely we can ensure that these points are more or less isolated from each other, and, accordingly, we will not have the problem of intersecting fragments, as in the previous version. But, at the same time, we, of course, have no guarantee that we will find absolutely all comparable pairs. Well, the main question: how to look for these very key points? Let's try to think about what should be, that is, what requirements can we place on these very special points, special fragments? On the one hand, we want them to be not very many - substantially smaller than the pixels in the image in order for the properties to be executed ... so that the search is incomplete and so that these same fragments are separated. So that we do not have the intersection of a large number of pairs. On the other hand, we want these fragments to be so representative that it is easy to match one fragment in one image and the second fragment in another image. That is, if, for example, we have a completely homogeneous background, it is clear that we can choose several fragments that are indistinguishable from each other, and if we have exactly the same fragment in the second picture, it is not clear with which of these match it. That is, they must somehow be unique in some way. They have to be repeated in the sense that if we have an image of the same object, or the same picture is somehow modified, there is somehow reaped, scaled differently, then we want these fragments and some Some exact features coincided in these two pictures. That is, the point that we considered special, it is, let's say, if we take a person's face, so his eyes can be special points. If we find the eyes in one image, we want to find the eyes in another image. The fact that in one image we considered the features of the eye, in the second image we have a mouth - no, we will not do it, because again we will not be able to compare them. Well, these fragments themselves, which are built around singular points, should be small in size in order to be resistant to partial overlapping. That is, if we come back here, then here if we had a large fragment, then again we have a feature vector constructed from this fragment that would be significantly different from what we have here. That is, it is assumed that, generally speaking, in a small neighborhood of this most particular point, the changes should not be so significant that the feature vectors will differ.

How, for example, can we search for these very special points? What, then, will the comparison of these two images look like with the help of these local features? First of all, we really need to find the singular points in some way and build the singular fragments, like a neighborhood around this singular point. In this case, it would be good if our singular points were invariant to a change in lighting, to a change, that is, to different geometric or photometric transformations of a picture. That is, if we stretch it, compress it, the object moves somewhere in the image, if we change the brightness level - the intensity of the image so that the point is still all the time in the same place. And in order for the fragment to be also invariant to a turn and to a change in scale. Next, we build a feature vector based on such a particular fragment and match the sets of local features for different fragments in different pictures. For example, we have the same cat turned. We found point features, determined a neighborhood for each point feature, built a feature vector for this neighborhood and want to be able to compare them and further, actually, compare two pictures with each other depending on how many of these feature vectors they more or less match. . Well, plus it also actually makes sense to look at comparable relative fragments relative to each other. That is, for example, if we say that this point here corresponds to this point, and this point corresponds to this point, do we still have three, in fact, we had to choose points. Well, here we still have a feature, and here a feature. Let's say we compared, found, selected all these pinpoints, built fragments, built vectors of signs around them, realized that here we have this fragment that corresponds to this, this one corresponds to this one, and the knee corresponds to the knee. And then we can still use the information about the relative location of these very fragments here and here. That is, we basically, as they are located relative to each other within the same image, it is assumed that it does not change much.

There all this should be resistant to change. Anyway, you will have some restrictions on the relative location of these points, even if we take into account that the point of view has changed.
Once again about the actual required properties of these singular points. We want them to be repeatable, because if, for example, in these two images from this side, special points were highlighted in white and from this side in red, then we have no matching points, respectively, there is no possibility match these images.
What is important is that, on the one hand, these sets of singular points should intersect, that is, at least some of them should coincide, and, on the other hand, we should be able to find them independently ... that is, look for them in each image independently of the other Images. That is, we are looking for these singular points on one picture, regardless of that we look for the second picture and we expect that if these two images are modified copies of one another, then the sets of these singular points will intersect.

Plus, these fragments are special, they should be informative, this is exactly what I said: if we have an uninformative fragment, but this is actually also uninformative, a special point around which it was built, say, here. In this picture, here we have three here; we can still build a large number of the same fragments, which, in general, will be comparable to this, it is not clear which one is suitable for us. Well, I have already spoken about the fact that very invariance to all geometric and photometric transformations is desirable. Actually, a geometric transformation is either a rotation or a change in scale, or both, or affine transformations, and photometric transformations are affine intensity transformations, that is, when we change the brightness of a picture.

Today I will speak again about monochrome, simply because it is easier.
It turns out that in fact quite suitable such a candidate for a particular point is the corner point. In principle, it ... that is, if we find an angle somewhere, well, such an exact feature is actually not always ... where we have several directions of the gradient, where brightness changes. Thus, we immediately dismiss here these non-informative fragments, which we build in completely monotonous and monotonous areas, here we obviously have some kind of transition, and, generally speaking, the human eye, too, when looking at the image, it clings for such differences in brightness. Moreover, it is desirable that the differential was in several directions. That is, we need a corner. In principle, these corners can be easily detected using a small window, that is, the requirement that we have a small neighborhood. How can I find it? One of these options is from the signs of the angle and the fact that if we take some small window and try to shift it in different directions, then for us for a monochromatic area the offset of this window will not give a change in intensity. That is, the offset fragment will be similar to the original. In the case with which unidirectional we will not have a change at least along one direction, along the direction of the edge. That is, if we move this window up and down, they will look like twin brothers. If we have a window above the angle, then changing the position of this window in any direction will lead to a change in the brightness ratio of the fragment below it. Actually, the very first detector of these corner points was based on this property - the Moravik algorithm, which simply considered the difference between the intensity of window displacements and, in fact, took the smallest sum of squares of intensity differences for the angle response force. That is, we take the window, we consider the sum of the squares of the intensity differences in the next. If they coincide with us, if this value is minimal for some neighborhood, then there is no angle. If changes occur in many directions, then most likely there is an angle.

Probably the most applicable angle detector, the so-called Harris detector, it looks at the direction of the gradients of each point, and now we will talk about it in more detail. Consider just such a picture, a wonderful pyramid, and pull out three fragments. Actually, one fragment is completely homogeneous, the second fragment contains one contour unidirectional, the third fragment contains a clearly defined contour. That is, we have two distinct gradient directions. Construct the following visualization. For each point inside this window, we calculate the gradient in X and in the game. And then we will try to put them here on such a coordinate grid, that is, here we have horizontal values — this is a gradient along X, and the vertical values — a gradient across the game. In general, it is not difficult to guess for obvious reasons, the gradients of almost all points, they are missing here at zero, because there is no change in brightness. That is, we have a derivative everywhere - zero. Of course, it’s not exactly zero here, because it’s still an image that doesn’t have all pixels at all, that is, there is some slight brightness change, so here it runs a little around zero. If we try to do the same for this fragment, we note that we already have a much more covered region, and we can see that in fact we can trace one such direction as seen on this diagram, in fact, which corresponds to a corner of 450. That is, we have a corner of 450 - this means that both the derivative with respect to x and the game give some non-zero value. , , 450, , , - , , . , , , , , , … - , ? , , . . - , , -, , , , , - . , - . , , , .
: , , , , , , , , - , , ?
, . , , , . , , …

, , , … , - , , . , , . , . Hooray! , ? . , (-), , [u,v]. , , , , -… , , , , . , … – . , . . , – , , , - , , , , , , . , , , - .

«u» «v» , «-» , , . , , «» -, , , , , «u» «v», – . , , . , : , , .

… , – . – . , , … , , , … , ( -), , , - . , - . , , , – , , , , , , , . , , , «». , , – , . , , «» .

, , , , . , , , , , – , , . , , , . , , , - , , , , , - , , , , . , , . , , , . , «», , , , , … , , – , – . , , . . , , , , , , , . , , - . - . , , , , . , . , , -, , , , .

, , , ? , . , , , , , , .

? , , … , , , , , , . . , «» , , -, , , , . , , , , , - , . , , . , -, -, , , , , . , , . Not good. What can be done with this? ? , , , : . , . , , , , . , , , - , . , ? , , , . , , - . , , , , , . ? , , , . , . , , . , … , , , , , , , , , , , . , , . . , ? , . , . , , , . , - . , , , , , , , . , , , , , . , , . , , . , , , , , , . , , , , .
, , , , , - . , - , , , , , . – , , , .

, . , , , , , . , , , , , , , . , , , . ? . , , , . , , , , , , , , . , , , , . , , , , , -, , , . , . - . , , , – . , , , , , , . , , , , . , , , , . , , , - , , , . , - . , , , , , , , , (-), , ( 0:45:50) -, - , , , . , , , , , , , , , … , , , . , . , . , , , , , , .
, , -, , -, ? , -. . , , , , , , , , , . , , . , , , , . , , , , ? , , , . , , , , , , . , Space – . , , - SIFT. , . , , . , , ? , . ? - , , , , .

, , . , , , , . , , , .

, , , , ( -), 2004- . ? , , , . , , . , , , , , , , , , ., , . , , , . , , , , . . , - , , . , , , , , . – , , . , , . , , , . , , , , , , , . , , , , . , , . , , . , , , , . , , , . , , ? . , ? , . , -, ? , , . , , -, , , , , . , , , , , , , - . , , , , , , , . . , , , , . . , , , , : . , , . – . , , , , , , , , . , , , , , , , - . , , , , , – . , , , , . , … , , , , , , , . , , , , , . , , , , . . , , , , , , , , 900. , . , . , , 16 , 16- , -, . , , , . , , - , , . , , , , . , , : , , , , , , , . , , . , , , , . , , , , . .

. , .. . . , . Those. . - , - , . 500 ) , , , . , - – – . . , , . . Those. , , . , – . - , .

, -. . - . ( , ) - , . , . , , , , , , .

Source: https://habr.com/ru/post/255985/

All Articles

Building signs and comparing images: local signs. Lectures from Yandex

More articles: