Evaluation of search results

Once on a quiet summer night in the course of solving urgent analytical problems, the question arose of how should we measure the degree of variation of search results? In search of an answer, we managed to find a single study on this topic - Koksharov, 2012 .

But I didn't get satisfaction, there were even more questions. The use of Oliver and Levenshtein algorithms only because the corresponding functions are in PHP seemed unreasonable. A justification of methods based on the difference of positions is inconclusive.

Why so, and not that way? Why an array or string, and not an ordered set or tuple? What can the assumptions made lead to? And finally, is there a single best, most correct, most “final” way?
')
As a result, I had to reinvent my own bicycle - that is, put everything on the shelves at least for myself. But still with the hope that it will be interesting not only to me.

Measure of variability rating

The search for a ready-made mathematical apparatus also gave nothing. Ordered set? Line? Array? ... It's not that. The closest one is a tuple / vector, but the distance measures used there do not reflect the essence of the rating. Either I don’t know something, or too many years have passed since my student days. I hope those who practice with math more often will correct me or at least come up with the idea of which side to look for. We still try to enter their own definitions, remaining in terms of the subject area.

To refer to all your favorite Top3, Top10, Top100, etc. we introduce the concept of "rating N" as an ordered sequence

lengths

containing the identifiers of the objects being ranked

, (one)

where under object identifier

we will understand the link (URL) to the document being ranked.

The simplest and most natural assumption is that the measure of variation should be somehow related to the change in the positions of objects in the ratings. The greater the difference (distance) between the new and the old position of a particular object and the more objects that have changed their position, the greater should be the difference between the two ratings.
In this formulation, the distance between the two ratings will be called the sum of the differences in the positions of all the objects included in the ratings. Let us try to express this definition more formally.

Let two ratings be given

and

. The elements of these ratings may coincide in whole or in part, or they may not completely coincide.
Then let

- a set of objects included in both compared ratings. Power

of this set (the number of elements included in it) will vary from

(in the case when the objects in both ratings completely coincide and the difference between the ratings consists only in their permutation) before

(in the case when the elements of the two ratings are completely different).

The same object can be in ratings both on different positions, and on coinciding. Or it may even be absent in one of the ratings.
Let's call

by position

object in the rating

, but

- position of the same object in the rating

. Then the distance between the positions

th object will be the module of their difference

(2)

Summing the differences in positions for each element of the set

we obtain the following expression for the distance between two ratings in absolute values:

(3)

It is no problem to calculate this distance when objects are present in both ratings. But what to do when in one of the ratings there are no objects of another rating, that is, they are outside of it? In this case, it seems quite reasonable to take the value of the missing object as the position

- the closest position is outside the rating.
It is clear that in real life the site can fly, for example, from Top10 far beyond the 11th place. And it is possible to improve the accuracy of the evaluation of the variability of search results, considering ratings of greater length - 30, 50, 100, 1000. It is very likely that for large

this assumption will play a lesser role. In the meantime, the question of choosing the optimal length of the rating remains open and we have to be content with the statement that the estimates of variability obtained with this assumption will be estimates of the minimum difference in the sense that the distance between the ratings will be no less than the estimate obtained.

Estimates of the absolute difference between ratings are difficult to interpret and compare. For the convenience of operating estimates, they should be reduced to a relative form. As a normalization value, we need to find the maximum possible distance between the ratings. It is clear that it will correspond to the case when the ratings are completely different in the composition of elements. That is, all objects rating

turned out to be outside, and all objects of the rating

appeared from outside it. That is, each rating object

moved from position to position

and each rating object

, on the contrary, moved from the position

to their position.

Then for rating

the maximum possible sum of distances will be:

That is, we received the sum of the arithmetic progression with the first member

, step -1, and the last member 1.
Accordingly, for the second rating, when each of its objects moved from the position

to our position, we get a similar arithmetic progression with the first element 1, step 1 and the last element

whose sum

will be determined by the same expression.
As a result, we obtain that the total distance that the objects of the first and second rating have moved will be determined by the expression

(four)

So, for a relative assessment of rating variability, we get the following expression

(five)

Those interested can understand this in more detail by a small example.

Sample for Top5

Let be

, but

.
Then

From here the absolute distance between the ratings will be

The maximum possible distance will be

.
So, we get the following relative distance

or 40%

Weighted measure of rating variability

The attentive reader may notice that the estimates of the degree of change in the rating obtained from expressions (3) or (5) are weakly sensitive to local changes in general and to transpositions in particular. (Transposition is when the two elements simply change places). If the first two elements or the last two are swapped, we will get the same difference. For example, transposing 1st and 2nd places or 4th and 5th places the same difference.

.
Perhaps from the point of view of the search engine and its ranking function, such changes are really insignificant. But I, as a practicing marketer, are primarily interested in the implications for the trust sites. But the consequences of these, even in the case of local changes, can be very significant. And this is due primarily to the fact that the click-throughness of search results strongly depends on the place occupied in the rating (in serp'e) and, consequently, the organic traffic received by sites located in the field of local changes also changes quite significantly.

Thus, it would be desirable to take into account the fact that the difference between 1st and 2nd place in search results is much larger than the difference between 4th and 5th. To do this, we need to enter a weight function for places in the ranking. And the best such function, reflecting the change in search traffic, will be the dependence of the search results on the position taken.

In general, the choice of a “good” approximating function for serp’s clickability statistics is a topic for a separate study. Ideally, it depends on a very large number of parameters: the search engine, the type of the keyword, the quality of the snippet, the composition of the sites, finally. But for our purposes, when we are interested in not only absolute, but relative (place difference) estimates, practically any of the known ones can be used. I’m used to using the following dependency, given in Samuilov, 2014 , which demonstrates quite good approximating possibilities

, (6)
Where

- position in the ranking,

- parameter dependent on the search engine and taking values:

. Average value

all search engines

.

Taking into account (6) the distance between the positions

th object becomes

(7)

And the absolute distance between the ratings, respectively, will be

(eight)

The maximum weighted distance between the ratings will be determined by the expression

(9)

Then the weighted relative distance will be determined by the expression

(ten)
It should be noted that as a result, the weighted relative distance does not depend on the parameter

, that is, from a search engine.

For the example above, the weighted distance is 61%. That is, it is more sensitive to replacing the leader of the rating.
Well, it is much more sensitive to local changes: the transposition 1-2 in the Top5 rating will have a value of 34%, and the transposition 4-5 will have a value of 3.4%.

Variability of profile ratings

The obtained measures can be used for different tasks of the analysis of fluctuations of the search results. These tasks define specific profiles for analysis: the composition of search queries (by type, subject, length, frequency), search scope (by region, web / news / illustration / blog), etc.

Analysis of search engine updates . This has already become a classic task of analyzing the variability of search results. The more representative the set of keywords, the better the assessment of the global changes of the algorithm / database will be.

Reputation management tasks . As a set of keywords, branded queries related to your company / products are used here. Analyzing the fluctuations of the news issue, you can determine the increased activity in the profile you are interested in.

Analysis of competition in the niche . The increased diversity of search results by thematic queries can be interpreted as an indicator of low competition, when the definitive leaders have not yet decided.

As a conclusion

How to determine which method of analyzing the variability of search results is “the most final”? You can call your methods "correct" updates, "accurate", "most accurate" ... But how many do not say "halvah" - in the mouth will not become sweeter.

The only option is a comparative analysis of various methods on historical samples and an assessment of their sensitivity to the already known facts of changes in the ranking functions of search engines. Unfortunately, I do not have such statistics. But I would be glad to work together with those who have it.

[UPD 1] Use case for evaluating the competitiveness of search queries

Source: https://habr.com/ru/post/239797/

All Articles

Evaluation of search results

Measure of variability rating

Weighted measure of rating variability

Variability of profile ratings

As a conclusion

More articles: