ScienceHub # 04: Theory of Random Graphs

The post-science crew, led by the chief editor, went not to any where, but to Yandex, to see what practical importance fundamental science has in the world of modern technologies. We met with Andrei Raygorodsky, a doctor of physical and mathematical sciences, the head of the department of theoretical and applied research at Yandex, and a professor at Moscow State University and MIPT.

Introduction

')
There is a suspicion that trying to tell in your own words Raigorodsky’s speech here, on Habré, is unheard of insolence and absurdity. Hubrechiteli probably understand this and do not need to be repeated. Therefore, in order not to pour it from empty to empty, I will quote Andrey, and you can add them in the comments.

Andrei Raygorodsky: “For a person who doesn’t know at all what a graph is, it’s right to speak in very specific terms. Imagine that you have a large spatial molecule that consists of hard balls, no matter what they are made of (cardboard, metal). And these balls are interlinked between each other, that is, connected by wires, straight-line segments. It turns out a rigid structure, which in mathematics is called the graph. This is a purely visual interpretation - in the form of a molecule. Although I think it seems convenient to anyone who tries to imagine what a graph is.

Abstractly, the graph is structured in this way — it has many vertices and many edges. Actually, the balls that we see in space are the vertices of the graph. And the wires that connect them are the edges of the graph. Speaking in abstract terms, vertices are just some finite sets of some elements.

No matter what nature these elements are. These can be people, for example, between whom there is some kind of relationship. These can be natural numbers, sites on the Internet, biological objects, for example, proteins. There may be completely abstract elements of some kind.

The edges that connect some pairs are the essence of a pair of these elements. That is, we say that the two elements are connected by an edge. And that's all. For every two elements, we can say whether they are connected by an edge or not. But it depends on how we define it.
In an abstract situation, we can define as we please, simply by taking a finite set of elements, and some pairs of these elements, connecting edges. The result is a construction in space, for example, located or on a plane. As you draw it, so be it. "

“Back in the 19th century, there was a classically assigned task, which concerned the coloring of maps. Here you have a map of the world. On it are drawn some countries. Naturally, you need to paint each country in a certain color so that the countries that border each other are of different colors. If they are the same color, they will simply merge. And you can not distinguish them. The question is, how many colors did it take for an arbitrary card to make such a painting?

The hypothesis was that the four colors are always enough if some natural limitations are met regarding the properties of the borders of these countries, their relative position. If there are no enclaves of the type of Kaliningrad, then the map is considered correct, roughly speaking. And I want to paint it in some way in four colors. This is naturally associated with graph theory. We simply assign an abstract peak to each country (if you like, its capital). And it turns out the top of the graph - the capital of these countries.

Next, we connect edge-on (if you wish, dear) two capitals, which are the main cities of those countries that border each other. We get a graph. Its tops correspond to the countries. Ribs are pairs of vertices that correspond to the capitals of the neighboring countries. And we want to color the vertices of this graph (that is, the countries) in the minimum number of colors so that the vertices connected by an edge are colored in different colors. The task of the four colors is a great classical problem that has not yet been completely solved. It has been solved with the help of computer enumeration, but there is still no manual, purely analytical proof that there are always four colors enough. ”

Randomness

The direction of random graphs is one of the most interesting in math right now. It is connected with Erds – Rényi theory; these Hungarian mathematicians suggested introducing randomness on a set of graphs.

A.R .: “They said this:“ Let's consider some fixed set of n. vertices, no matter what consists of. Let it be abstract n. objects. For example, the natural numbers 1, 2 ... n. We want to draw ribs between them by chance. ”
It is clear how many there are ribs. But every two vertices could potentially be connected by an edge. Let's take some two vertices - we will throw a coin. Coin, however, with an offset center of gravity. It is theoretically or practically (I don’t know how to say better, but the most correct thing to say is statistically) falls by the “eagle” more often than on a tail. When you throw it on the table, the likelihood that it will fall “eagle” up is not the same as the probability that it will fall up top of the tail. That is, if the coin were made of a perfectly homogeneous material, then this would, of course, not happen. On average, half the time, she would have fallen tails up, half the time as an eagle. This is quite natural to expect. But we have a misplaced bad coin. The likelihood that she will fall as a tail up - this is some P number from the interval 0 - 1. And the probability that she will fall up by the “eagle” is, respectively, 1 - P, because, of course, we assume that the coin does not fall on its side - it does not become on the edge and does not freeze in the air. That is, there are only two possibilities: either she falls by the “eagle” up or she falls upward by the tail.

Let us assume that if it falls back up with probability R., then when we drop tails we draw, respectively, an edge between the vertices fixed by us. And if we are not lucky, she fell by an eagle, then the ribs will not appear. And so we do with each pair of vertices that we have at our disposal. We have n. vertices. And we test each pair with the help of such a coin. This is called in science (in probability theory) the Bernoulli test scheme. We just throw a coin many times and fix it each time - it fell out with a tail or an eagle. Depending on this, either we decide to draw another edge — this connection between the vertices, or decide not to draw it. As a result, we, of course, have a random graph - that is, a graph whose edges have arisen as a result of throwing a coin, by chance. Such graphs are called classical or Erdos – Rényi graphs ”.

Power law

“The problems in the theory of random graphs are vast. You can talk about the same chromatic number of a random graph. There is a very beautiful and interesting science about how a chromatic number of a random graph behaves according to the Erdos – Rényi model. Tasks of a practical nature, which are very interesting from the point of view of random graphs, are the tasks of constructing adequate models of random graphs that describe the behavior of some real networks.
By real networks one can understand, for example, the Internet, whose vertices are (depending on the interpretation) either sites, or pages, or some other structural units. And edges are, as a rule, hyperlinks. It turns out a hefty object that changes over time. The graph somehow evolves. New sites appear, new links appear between them.
If we go back to spatial interpretation, then we have a molecule that grows over time, something from it, on the contrary, disappears, collapses. Links disappear, new ones appear.

It turns out that a molecule that describes, for example, the Internet or describes some biological networks, transport networks, networks arising in sociology, does not change completely randomly, but obeys some interesting statistical patterns. On the one hand, one can observe in practice how these patterns are arranged, and on the other hand, try to come up with adequate models of random graphs that are different from the classical model (because the classical model still does not describe this), which would most likely have those the same properties that we observed in practice.

For example, it is known. That the distribution of the degrees of the vertices on the Internet, that is, roughly speaking, the probability that a randomly selected vertex in the Internet graph has a predetermined degree of the number of edges that are connected to it. This distribution behaves in a very specific way. This is called a power law. ”

What to do with spam

“Imagine that you have some kind of random graph model that really adequately and well describes the Internet in the sense that, according to its probabilistic characteristics, it fits perfectly on empirically the data we received from the Internet. That is, we know that a web graph develops according to certain laws. And, oh, a miracle, we came up with a model of a random graph in which all these patterns are executed with a high probability. This is a difficult task, but it is partially solved. We already have some first approximation to how to correctly model the behavior of the Internet. Imagine that it is completely solved.

For example, we have an ideal model that predicts how the Internet should behave if there are no external wrong influences in it. Now let's imagine that we need to catch some spam, which is, as they say, reference character. The reference character of spam is organized as follows - there is some structure built by malicious optimizers.

The structure is designed, naturally, to ensure that those sites that are inside this structure are easier to be searched. That is, you specify a search query. You want an adequate response to it. And spammers try to make sure that you get an answer to it, consisting of those sites that are interesting to spammers. And they artificially wind up these sites in issue. In particular, an important factor in the issue is, of course, the number of links to this site or the number of some citations of those guys who quote you and so on.
It is important to look at the local structure of the graph around the construction that the spammers may have built. As a rule, they build such constructions that are really tightly linked to deceive the search engine. The search engine will think “Oh, this is a cool site. There really are sites that are highly cited. Therefore, they should be given up to the top of the issue. " And so the spammers will win.

What we can do? We can, with the help of some kind of algorithm (naturally, I am not going to say which one) to catch many, many structures that could potentially turn out to be bad, that is, quite tightly, heavily leaped. And then see: in this ideal model of the world, which we invented using graph theory and probability theory, does the appearance of each particular structure (from those found using the algorithm) have a high probability or a low probability?

If, from the point of view of the random graph model, which is designed to describe the behavior of a reasonable Internet, the probability of such a structure is high, then the spammer construction probably cannot reduce it in any way. She does not need to assign weighting places so that she will fly away after all. And if her probability is extremely low within the framework of our ideal model, then we must somehow suspect her. Maybe put a flag in the ranking for it to appear, perhaps not at the highest position. Or maybe banned. We must have eyes to see. If it is really bad, let's ban it. There are not many such structures. Why not. But all this can be done in automatic mode, taking advantage of the expectation that we have in the framework of a good model, and reality. That is, indeed, the number of edges (links) and so on. Here I do not reveal any secrets. These are very natural things. ”

Future spam catchers: who is it?

To solve problems in the field of graph theory, and problems related to the Internet and new technologies, we need scientists who are engaged in combinatorics and graph theory, probability theory, and random processes. Laboratories are opening up all over the world, and universities are putting a lot of emphasis on this area. In addition, bioinformatics is associated with this. We have already spoken about the task of storing and processing data obtained by biologists, more precisely, the microbiologist Konstantin Severinov told.
Such specialists are needed not only on the Internet, not only in connection with the control and dissemination of information, but also in banking structures, in researching questions about financial crises, which are also related to the theory of random graphs.

Andrei Raygorodsky: “I will briefly say what exactly my department at Yandex is doing. This is a fairly large department, which now employs more than thirty researchers and developers. It is built according to such a scheme, that there is a fairly significant piece of the department that is built into the product part of Yandex, and it receives some of its tasks from the direct developers - from those who are engaged in search, tasks that are interesting to Yandex as a search engine. system, builds some services. These people are engaged in research is not purely mathematical in nature, but rather, on the contrary - attempts to attach some mathematical ideas to the implementation of meaningful projects.

And there is a piece of the department, which, in fact, generates such ideas. That is, everything is built quite naturally. There is a piece that is directly for sale. There is a piece that is engaged in the generation of ideas, the production of scientific publications both at industrial conferences and in journals of purely mathematical content. And here are the main areas of research that we do? Of course, we are very interested in graphs. And we build models of various graphs that adequately describe the web. Probably, we have not yet managed to build an ideal model. But we have pretty good progress in this direction. We get models that really adequately describe the behavior of the web. They also manage to put into practice, as I said.
There is a direction associated with the study of machine learning. There is a direction connected with the problems of the abstract theory of random graphs ".

Listen to Andrew and look at the interiors of Yandex can follow the link.

Source: https://habr.com/ru/post/201416/

All Articles

ScienceHub # 04: Theory of Random Graphs

Introduction

Randomness

What to do with spam

More articles: