What else can be done in the search? Yandex report
In Yandex, there is a search engine component development service that builds a search base on MapReduce, provides layout data for rendering, generates algorithms and data structures, and solves ML quality growth problems. Alexey Shlyunkin, the head of one of the groups within this service, explains what the runtime of the search consists of and how we manage it.
Do you want to poke around in ML - poke around. You want only MapReduce - ok. Want runtime - runtime.
- What is the search today? Yandex began with what made the search, developed it. It's been 20 years. We have a search base for hundreds of billions of documents.
The document we call any page on the Internet, but, in fact, not only it. More - its contents, various statistics about what users like to go to it, how many of them. Plus the data we calculated. ')
It is also tens of thousands of instances that, in response to each request, process the data, search for something, enrich the search response. Some instances are looking for pictures, some - plain text documents, some - videos, etc. That is, tens of thousands of machines are activated for each of your requests. They all try to find something and improve the result that is shown to you. Accordingly, tens of thousands of machines serve thousands of requests per second. These tens of thousands of instances are combined into hundreds of services designed to solve a problem.
There is a search engine - web search service. And there is a video search service, and so on. Accordingly, there is a thing that combines the answers of different searches and tries to choose what and in what order is best shown to the user. If this is some kind of music query, then it’s probably better to first show Yandex.Music, and then, for example, a page about this music group. This is called a blender. There are already hundreds of such services, and they also do something for every request and try to somehow help users. And, of course, in all of this machine learning is used just of all sorts, from some simplest statistics, linear models, to gradient boosters, neural networks, and so on.
I will now talk about the infrastructure, and about ML.
My group is called the new runtime development team, it is included in the search component development service. So that you have an idea, I'll tell you a little bit about what our service is doing.
In fact - all. If you submit a search, then about everything we have launched our hands, starting from building a search base. That is, we have MapReduce, we collect all the data about documents there, boil it, build all sorts of data structures so that we can efficiently calculate something when we query them. Accordingly, we work from the bottom, when the document only gets to us, from the first stage, when these documents get something and rank, and to the very top, where the layout receives a conditional JSON and draws it with all the pictures and beautiful. From the bottom to the top of the entire stack, we are developing something.
But we are not only writing code and, accordingly, doing all this infrastructurally. We are, in fact, teaching neural networks, CatBoost. And other ML-pieces that you can imagine and that burn, we also teach. Still, since we have large loads, big data, we, of course, look in algorithms and data structures and never restrain them from being implemented somewhere. For example, in several places we use segment trees. We have our own compression of the indices that build boron and according to it consider the dynamics, how best to build dictionaries.
In general, being engaged in such a large edifice as a search, we were filled with similar simple tasks. Therefore, we, of course, adore something complex, new, something that challenges us. And we didn't just go and write, as usual, ten lines of code. We need to think, to conduct some experiments. In general, the tasks that we set for ourselves are often on the verge of fantasy. Sometimes you think: probably it is impossible. But then you, maybe, somehow experimented - experiments can take a whole year, - but in the end something happens. Then we begin to introduce something to redo.
And besides all projects, skills, and so on, in general, we are one of the most ambitious and fastest growing teams in Yandex. For example, I came two years ago, was the ninth person in our service. Now we have a service of almost 60 people. This is, in fact, with interns, but, in general, we have grown exactly four times four times in two years. This is about how you know what our service is doing.
Now I want to talk a little bit about our tasks and the direction that I think will be more and more relevant in the near future. But for this, you must first briefly describe how the most basic search layer works.
If in general, everything works very simply. We have our search base, we have all the documents, and we all divide these documents more or less evenly into N pieces. They are called shards. And a program called “Basic Search” is launched above the shard. Her task is to perform a search, respectively, on this piece of the Internet. That is, she knows how to search for it and knows nothing more about the other Internet. And we have such N shards. Basic searches have been launched above them, and, accordingly, there is a meta-search above. The user's request is dumped into it and, accordingly, it simply goes to all shards, and each shard performs a search, then each returns a result, and it performs some kind of merging and gives an answer.
Approximately the search for almost all 20 years was arranged in this way, and, in general, they thought for a long time that it would remain like this, and nothing could be done better. But everything is changing, new technologies are emerging, and machine learning now not only makes it possible to increase quality, but also allows solving some infrastructure problems. Recently, in our search projects are very fired, just at the junction of infrastructure and machine learning. When two such mastodon merge, very interesting results are obtained.
Recently, neural networks have appeared. We have the text of the request, there is the text of the document. We want to get some vector of numbers from the request, get some vector of numbers from the document so that the dot product predicts the value we want. For example, we want to teach that the scalar product predicts the probability of a user clicking on this document. Understandable thing.
It is arranged like this. If it is very, very rough, then we have some words on the lower layer, and then there are several layers of the network. Each layer, in fact, takes some vector to the input. That is, the bottom layer is such a sparse vector, where each word is a query. Multiplies it by a matrix, gets some vector, and further, respectively, applies some nonlinearity to each component, and so it does several times. And the last layer, this is called the very vector that we just took the query, applied such layers, and here the last layer is the same vector of the query.
Accordingly, these neural networks have been actively implemented in the search for the last few years, they have brought a lot of quality benefits. But they have one problem: all the quantities that we want to predict are good, but rather coarse, because in order to train such a neural network, the bottom layer is very large, all words are from tens of millions of words, therefore you must be able to her input is several billions of data.
For example, we can train for some user clicks, and so on. But the main signal that is considered the most important in our search is manual marking by special people. They take the request, take the document, read it, understand how good it is and rate it, that is, how this document fits this request. We couldn’t predict neural networks for such a long time, because we still have millions of assessments, because hiring the entire planet to mark it all up all the time is very expensive. So we did some hack.
Neural network of neural networks. We have accumulated in recent years a lot of some neural networks that predict good signals, but a little more rough than the assessment of special people. Accordingly, we decided that on the lower layer we will submit the ready-made vectors of these networks, and then we will train the neural network to predict our search relevance on a smaller data-network.
It turned out very good model. It leads to the vector of document requests, and their scalar product directly predicts real relevance, which we have long wanted to predict.
Then we had an idea how to search for a bit of remake. The project is called KNN-base (English k-nearest neighbors, the method of k-nearest neighbors).
The basic idea is this. We have a request vector and a document vector. We need to find the nearest one. Each document is represented by a vector. Let's select N clusters, those that characterize the entire document space. Roughly speaking. Strongly less than the number of documents, but for example, they characterize the topics. If in simple terms, there is a cluster about cats, a cluster about cars, a cluster about programming, and so on.
Accordingly, documents will not be scattered randomly on shards, as before, but we will put the document on that shard, that is, the centroid of which is closest to the document. Accordingly, in our shard there will be such documents grouped by topics.
And then just to the request now we can not go to all shards, and go only to some small subset of those who are closest to this request.
Accordingly, we had such a scheme, meta-search is included in all shards. And now he needs to go to a much smaller number, and at the same time we will still look for the nearest documents.
What we actually get from this design? It significantly reduces the consumption of computing resources, simply because we go to a smaller number of clusters. This, as I have already said, I consider one of the highlights of our service, this is a combination of infrastructure and machine learning, which gives such results that no one could think of before.
And, in the end, it's just a pretty fun thing, because you got the models here, and then you went, redid the whole search, petabytes of data turned up, and your search works, it burns ten times less resources. You saved a company a billion dollars, everyone is happy.
I told about one of the projects that we have in the search and which is conditionally implemented and done with all the experiments. Our other typical tasks are to grow the search base two more times, because the Internet is constantly growing and we want to catch up with it and look for all the pages on the Internet. And of course, this is the acceleration of the base layer, in which there are most instances, most of all iron. For example, speeding up the basic search by one percent means saving about a million dollars.
We also search as an incubator for startups. I will explain. The search is done for 20 years. It has already done a lot of things, many times we ran into some kind of dead end and thought that nothing could be done. Then there was a long series of experiments. We again punched this impasse. And during this time we have accumulated a lot of expertise on how to make big and cool things. Accordingly, now most of the new directions in Yandex are being made in the search, because people in the search already know how to do all this, and it is logical to ask them to at least design some new system. And as a maximum - go do yourself.
Now, I hope you are a little bit of our work. I will quickly tell the thematic part of my story, about the interns in our service. We love them very much. We have a lot of them, last summer only in my group there were 20 interns, and I think that this is good. When you take one or three interns, they feel a little lonely, sometimes afraid to ask their older colleagues. And when there are many, they communicate with each other as companions in misfortune. If they are afraid to ask the developers something, they will go and poshuskuyutsya in the corner. This atmosphere helps to do everything effectively.
We have a million tasks, the team is not very big, so our interns are loaded to the fullest. We do not ask the intern to sit all the time to log the log, write tests, refactor the code, but immediately give some complicated production task: speed up the search, improve the index compression. Of course, we help. We know that this all pays off, so we are happy to share our expertise. Since our field of activity is quite extensive, each of us will find a task to his taste. Do you want to poke around in ML - poke around. You want only MapReduce - ok. Want runtime - runtime. There is anything.
What you need to get to us? We do everything mostly in C ++ and Python. You don't have to know both, you can know one thing. We welcome knowledge of algorithms. It forms a certain style of thinking, and it helps a lot. But this is also not necessary: ​​again, we are ready to teach everything, we are ready to invest our time, because we know that it pays off. The most important requirement that we make, our motto, is not to be afraid of anything and to figure a lot. Do not be afraid to drop production, do not be afraid to start doing something difficult. Therefore, we need people who are also not afraid of anything and are also ready to move mountains. Thank you very much.