How we updated the search tips in Yandex and found the correct metric for them

Search tips in Yandex for almost 10 years. At first glance, they seem to be rather simple fichey - many are still sure that the sadjest only takes into account how often people enter certain requests. Several years ago we told on Habré how much complex mathematics is in order to find the right next word and help a person to formulate his question. Then we even calculated that search tips save people about 60 years.

To some extent, the sadjest even outstripped his time: now, when the search is increasingly used from mobile, the speed with which a person enters a request and receives an answer has become a critical factor. The value of hints in the changed world has grown, and in order to continue to bring happiness to users, we also need to constantly move forward.

I started the topic of search prompts in Yandex in early 2016. The goal, facing the team of sadjest at the time, sounded very ambitious: “Make the best sadzhest on mobile”, no more and no less!

Remembering what has been done since then in the field of search tips in Yandex, it is difficult to get rid of two sensations. First: how much we have done! Second: did all this have happened before? Indeed, a lot has been done, but these things often seem so simple and obvious that it is hard to believe that they have not existed.

Under the cut is an exciting drama about how technology, design and product have changed, and how we looked for a metric to which we can orient ourselves. This is an instructive story that if you feel that the product is well made, but the metrics say the opposite, then something is wrong with them and not with you.

1. The word sjest

By the beginning of 2016, the search for Yandex on mobile came up with the so-called tap-ahead version of the sadjest. If we both showed and continue to show the usual “lower case” version of the sadjest on the desktop, in which when you click on a line with a hint, the query is immediately set, the tangent ahead option of the sadzhesta is more difficult.

The idea is this. Desktop keyboards are comfortable and, as a rule, users type texts using them rather quickly. On mobile, the situation is different and therefore, if the prompts do not have the option the user needs, it is much more difficult for him to simply add the missing part of the query. Therefore, they decided to accompany the first word of each request with a plus sign and specially mark it. When you click on this selection, the word was added to the already typed text of the query, and the transition to the search did not occur. So it was possible to type the necessary query word by word.

The problem with tap-ahead sadgets is that users don't understand it. Frankly, I also do not understand him, even after a year and a half of work in the event. How to memorize which places you can press and which places you don’t, it’s easier just to enter the entire query. So the users did that too.

It would be much easier for them to visually spread the elements with different functionality. This is how the word-by-word summon appeared, which has been working in Yandex mobile search since February 2016. At that time, he looked like this:

Word for wordpress

In this embodiment, the possibility of immediately setting up a long query as a whole has completely disappeared. It can be said that it was conceived: there is extremely little space on mobile screens, and the ends of long requests were inevitably lost sight of. As a result, users did not understand at all what they were asking, and they often had to change the text of an already entered query.

2. Metrics

In the desktop version with sadjest metrics, everything is more or less clear: the more often users click on sadjest, the better. Therefore, the main metrics were:

Offline: coverage. What percentage of requests made on a particular day could users find in the event?
Online: What is the proportion of queries entered with the help of sadgetsta?

On the mobile situation was more difficult. Since we assume that the request can be entered "in parts", it is important to know not only the entire request, but also all its parts. What is the “query entered using sadjest” is also unclear: after all, it was possible to click once on the word-by-word hint in the process, but you can - five. The second is somewhat better.

Therefore, for mobile, we had to make new metrics on which we could accept changes in search prompts.

2.1. Offline

Offline metrics are metrics for which real users are not required. They can be calculated using logs, assessors, or simply mathematical formulas.

In the case of sadgest, it immediately comes to mind creating a kind of “user model”. Suppose a user has entered the first few characters of the query. In response, Yandex showed him some word-by-word clues. Some of them continue his query in the right way. We will assume that in such a situation, he acts greedily and immediately clicks on the corresponding hint, or continues to type the query by letter if none of the hints are right for him. This continues until the request is fully specified. At the end, we will measure the total number of actions (keystrokes and prompts) that the user performed. This will be our metric, which we call ExpectedActionsCount (EAC).

One important aspect can be distinguished from this metric: how often does the user not have to start entering the next word, since it already exists among the word-by-word hints? Take the total number of correctly predicted next words and divide by their total number - we get the GuessProbability metric.

As a rule, two metrics change unidirectionally: a decrease in the ExpectedActionsCount is usually accompanied by an increase in GuessProbability.

With the help of these metrics, we made the first noticeable change in the word by word sadzhesta: we rolled out the ability to combine pairs of words into digrams. So, a rare user will want to enter the query “Eiffel Height”, it’s more likely that he will like the query “Eiffel Tower Height”. Criteria for displaying bigrams are easy to set up, having offline metrics - so they are designed to go through a large number of options in order to check only the best of them on users.

Interestingly, the EAC with disabled sadgetsta is approximately 19: in fact, this is the average length in the characters of requests to Yandex. Using the sajest, EAC was initially equal to 13.5, i.e. saved almost a third of the actions to enter a request. Currently, the EAC is approximately 11.5.

2.2. Online

However, offline models will never tell the full truth about user behavior, since any models are inaccurate.

It immediately seemed to us that the main goal of the event is to make the input of requests convenient, therefore the speed of entering requests should be the main quality criterion. Therefore, we invented simple metrics on the move: the percentage of queries entered faster than in X seconds for different X.

Having built a graph for these metrics from the beginning of the year, we were horrified: it turned out that they had deteriorated sharply when introducing the word-by-word sadget! Immediately after its inclusion in the production, users began to enter more slowly. However, then the input speed gradually grew, and by August surpassed the February figures. Here, for example, the graph of the share of requests that are entered in less than 15 seconds, from January to July 2016. The graph is normalized so that the unit is taken as the value at the very beginning of the observations.
Change in the proportion of quickly entered queries

It turns out that at the time of implementation, the share of quickly entered queries fell by more than six percent!

During the period under review, we had only one major introduction - the very union of words into digrams, and the corresponding discord that occurred in May is clearly visible. It is surprising that the input speed grew and at a time when we did not launch anything new. In other words, users gradually get used to the new input method and the input speed grows by itself!

A more detailed study of the graphs revealed some other patterns:

on weekends, users enter faster than on weekdays;
In the summer, the input speed variability disappears.

Not much time has passed since the introduction of new metrics, but autumn was already approaching - a time when many indicators are strongly influenced by seasonality. The input speed did not become an exception: the share of quickly entered queries fell sharply!
Change in the proportion of quickly entered queries

The first time was very scary: the input slowed down very much, for sure we broke something and did not notice it! The answer, however, was much simpler. With the onset of autumn, users began to ask much longer requests. Here is a graph of the share of requests of more than seven words: it has grown by more than a third!
Change in the proportion of long requests

It became clear that the “share of quickly entered queries” is a good, but seasonally sensitive metric. For a long time we were looking for a metric that would notice our releases and at the same time was resistant to seasonality. This metric was “input time per request character”. Here is what her schedule looks like for the period in question:
Change in the proportion of quickly entered queries

Thus, six months after the introduction of the word-by-word sadget, users began to enter queries about 5% faster when compared with the beginning of the year when tap-ahead was used. It remains only to add that from January 2016 to October 2017, Yandex mobile search users started asking queries 17% faster - an impressive demonstration of the effectiveness of our efforts!

It is worth mentioning that the spontaneous increase in the speed of input continues to this day. Every day, our users are entering requests faster and faster. However, we, of course, help them with this and our releases.

3. Verbal and full-text input

After metrics are thought up and checked up, opportunities for imaginations open. Let's think about how you could speed up the input? Well, we can always improve the data and it is better and better to choose options for continuations of requests. But what can we do with the purely visual part?

The answer is quite simple: let's combine the old and the new approaches. In addition to the word-by-word hints, we’ll show old, lowercase, or full-text. Then, if the necessary request has already appeared on the screen, the user will immediately be able to select it, and this will save several actions and time. The difference between the word-by-word and word-by-full sadzhest is easy to understand from the following illustration:

It is seen that the changes here are somewhat more substantial than one might expect. For example, now the page with search results is not visible at all. On the other hand, are they needed if the user has already decided to enter a new query?

It turned out not really needed. Users reacted as expected to this introduction: the input speed increased, as well as the number of users who ever use the sajest.

Sujest was an interesting and controversial way. Once we had only full-text hints, then only word-by-words, in the end we came to a combined version. But this combined version was very different from the previous iteration (tap-ahead) in that now the various functional elements with different behavior are separated by interface. This made it much easier to understand the purpose of each element and use the result more efficiently.

4. Other design experiments.

When it became clear that design changes in search prompts can significantly affect the metrics, imagination could not be stopped.

To begin with, let's try to enhance the effect of full-text tooltips. If all the word forms are pulled out in one line (with the possibility of scrolling), there will be more space for full-text prompts and, possibly, the input will become even faster.
Scrolling for WordTips

On the other hand, you can try something very very strange. Let's always suggest possible continuations for the most likely next word in a separate column! Thus was born the variant that we call the “sadest as a graph”:
Sujest as a graph

Sajest in the form of a graph made a splash on all UX studies. All users who saw it for the first time, literally said the following: “oooh, finally help me to enter a request!”. Those who still did not suspect that there are clues in the search, finally noticed them. The one who knew about their existence, began to use them more often.

In addition, we also tried to change the size of the buttons in the word prompts and their color. In general, the idea was clear: you need to somehow increase the visibility of the prompts, because they are useful and they need to be used more often!

However, when checking online both hypotheses were discarded. The differences between the metrics “share of sajest use” and “input speed” were immediately found out. Unfortunately, too noticeable prompts harm users: they start too often to skip their eyes between the sajest and the keyboard, and as a result they enter too slowly. In addition, it was one of the rare cases where the total success on UX research is accompanied by an equally disastrous performance online.

5. Network stories

We already understood that the input speed is made up of data quality and presentation quality. However, it turned out that there is another aspect of the problem - the network.

Historically, the source of search clues lived on suggest.yandex.net , to which the search layout made asynchronous requests during user input.

By the end of the summer of 2016, it became clear that this scheme was outdated. Many services have already lived behind the “single domain” yandex.ru : for example, images yandex.ru/images , video yandex.ru/video and so on. What for? To save network interaction. We have one single balancer for all services available on the yandex.ru domain. This means that without leaving this domain, the user only needs to establish a network connection once. In the case of sadgest, this was not the case: for a hike with sadgest from the yandex.ru domain, it was necessary to establish a new network connection, which on the 2G-Internet sometimes took a few seconds to wait.

Another interesting point is the behavior of ad blockers. It turned out that some of them block cross-domain queries. In our case, this led to the fact that some users had a sadst for several days completely non-functional!

Therefore, we decided to conduct an experiment in which the alarm is transferred for a single balancer for Yandex services.

It is worth noting what the sadjest differs from other services. The fact is that each search request requires approximately as many requests to the sjest source as there are characters in it. Therefore, it is not surprising that the typical RPS for a sajestovogo source is orders of magnitude greater than the RPS of other services, including large search. 100k RPS is the norm, a sadjest is one of the most highly loaded Yandex services that directly interact with users (some internal services withstand millions of RPS).

For a single balancer, this means a very significant increase in load, so in order to meet the needs of search hints, we had to significantly invest in hardware, and we didn’t want to do this without confirming the user benefit hypothesis.

As a result, the experiment was one of the most successful of all time. Its steepness was manifested even in the fact that users started to ask queries to the search system more often , not to mention the increase in usability of the sadget and the input speed by units of percent.

Future

At this historical moment, we realized that the full development of input on mobile is associated with many different aspects: the quality of data, the interface, and the network. Further development should be carried out in all these areas and, possibly, in some more new ones that we have not yet noticed.

In addition, the prompts are present not only in the search, but also in our applications, in the browser, in services other than Search, and with them, too, had to do something. In addition, input is not required to be text only.

And we did all this in 2017. About what it led to, we'll talk in the next article.

Source: https://habr.com/ru/post/340552/

All Articles