📜 ⬆️ ⬇️

How to build a probabilistic microscope

image

If you believe the rumors, 20th Century Fox in a couple of years will release a remake of the 1966 sci-fi film Fantastic Journey . In the story, the protagonists are squeezed and introduced into the human body, in which they travel in a submarine of microscopic size. On such a scale, blood flow turns into dangerous turbulence, white bodies can absorb the ship, and the surface tension of the drop turns into an insurmountable barrier.

Scaling destroys our intuitive understanding of what is important to us, what is valid and what is dangerous. To survive, you need to reconfigure the intuition. Even if any effect on familiar scales can be neglected, a slightly less negligible effect can become incredibly important on unfamiliar scales.

image
')
How do we understand what might be important on unfamiliar scales? It turns out that there is a mathematical theory of large deviations that works with probabilities in the same way that the reducing beam worked with the “Fantastic Journey” team. While the classical theory of probability deals with the probabilities of ordinary events, the theory of large deviations specializes in extremely rare events arising from the confluence of several rather unusual ones. It allows us to zoom in on a probabilistic microscope to determine the least likely ways in which an extremely unlikely event can occur.

Since the formulation of the theory 50 years ago, the mathematician S.R. Srinivasa Varadhan [SR Srinivasa Varadhan], she carefully studied and developed. It shows how the average behavior of a random system can deviate from the typical one. By carefully comparing all the rare possibilities, one can see how often we underestimate the likelihood of unusual events when we limit our attention to ordinary ways of how they can occur.

Let's go on a journey with a microscope in hand

High frequency trader


High-frequency trader conducts long sequences of transactions. On each of them, his condition with an initial value of $ 1,000,000 increases by half percent or half percent, and the probability of any outcome is ½. How much money does he most likely have in a million transactions?

He can reason like this: each transaction goes up or down by the same amount, so the average will not change, and as a result he should have $ 1 million.

But another reasoning: when he wins, his condition is multiplied by 1,005. When loses, then at 0.995. Both that and another multiplies it by 1,005 x 0,995 = 0,999975. For a million transactions, 500,000 of those and other cases will occur, so the original million will turn into $ 1,000,000 x (0.999975) 500,000 , which is approximately equal to $ 3.73.

What is the reasoning right? Oddly enough, both, but the second will be more important. Most likely, the trader will have nothing left, but if we increase the set of unlikely events in which he remains in the win, we will see options in which he wins a lot. The key function here is I (x), the relation function, which shows how the probability of obtaining the result x decreases with an increase in the number of transactions. Here, x is a number, but depending on the task, it can be a random trajectory, a random network structure or random geometry of the universe. I (x) = 0 corresponds to a typical case with not very small probability - in our case it is an option in which the trader’s condition decreases with an exponential rate. Larger values ​​of I (x) correspond exponentially to the least probable x.

The mean value determines the trade-off between the exponentially decreasing probability and the exponentially increasing state. Some of the x are very large, despite the small value of the corresponding probability. Optimization of this compromise confirms the naive intuitive notion that the average trading result will be equal to $ 1 million - even despite the fact that you can be sure that almost all traders will lose almost everything. If there are 1 million traders, and each of them makes a million operations with a capital of $ 1 million, then the average result will indeed be equal to $ 1 million. But this average will be determined by 1-2 traders, on whose accounts hundreds of billions of dollars will turn out. Most of the money will be in the accounts of a small number of random traders, and most of the traders will lose everything.

Chances to win, or stay at their own, do not exceed 1 out of 100.

Telephone node


The main problem of communication networks is the determination of the probability of overload. The data buffer of a telephone hub or the Internet may have a capacity sufficient for an average load, but not sufficient to handle an unusual number of simultaneous requests.

Mathematicians from Bell Lab, Alan Weiss, and Adam Schwartz [Alan Weiss and Adam Shwartz] outlined the application of the theory of large deviations to communications networks in 1995. According to the theory, the probability of a rare event decreases exponentially with the size of the system. In the language of mathematics, the probability changes as e -n * I (x) , where n denotes size, x is the path to a rare event, I is a relation function, which gives the relative probability of choosing this path. Rare events usually happen in a predictable way — one that minimizes the relationship function — and occur in groups separated by long time intervals.

In any task, the difficulty lies in determining (and successfully interpreting) the relation function. It gives the relative likelihood of all sequences of loads, from which it is possible to derive combinations leading to overloads, and having the smallest value of the relation function, that is, the highest probability. These combinations determine the frequency of overloads, as well as their nature: how many sources will be active, what sources they will be, and how quickly it will manage to cope with the overload.

As a simple example, consider a telephone network in which each of a large number of users — say, a million — connects at random times, so that on average they stay on the line 1% of the time. (We assume that they make calls independently of each other, and with equal chances at any time of the day). The network requires 10,000 links to meet medium demands. The company, using large deviations, calculated that when entering 10,500 communication lines, it will be in an overload state for about 2 minutes per year.

Imagine that in addition to the network, half a million players are starting to use consoles that are on the line 1 percent of the time, but require a large bandwidth - they take 5 lines each. New users also need 10,000 lines on average, so the company decides to double the capacity, to 21,000 lines. But as a result, the network is overloaded several minutes a week. An analysis of the relationship function shows that players using on average the same network capacity as other users — use over 8% more lines during an overload, and that an additional 250 lines will restore uninterrupted network operation. If we build a network load schedule seconds before the overload, we will see that it almost always follows a certain pattern, gently bending up before sharply resting on the ceiling - and this curve can also be calculated as a minimizing relation function.

In modern decentralized packet exchange networks, the relationship function can help detect botnets, networks of virus-infected computers that criminal hackers use to send spam and attacks on systems. The idea is to identify a botnet-controlling computer that is associated with an unusually large number of other computers, and then confirm the identification by finding unusual correlations among the computers with which it is associated. For this, researchers at Boston University used a relation function that could describe, among all the reasons for which an unlikely large set of unrelated computers could communicate with the same remote server, which of the correlation options for their communications would be most likely. (Wang, J. & Paschalidis, IC botnet detection based on anomaly and community detection. IEEE Transactions on Control of Network Systems (2016). Retrieved from DOI: 10.1109 / TCNS.2016.2532804.)

Sleeping seed


Diapause is a delay in biological development, often occurring at an early stage. Many plant species produce seeds that do not begin to grow immediately, but remain dormant for a long time and form a stable stock. If we consider that the battle for survival usually turns into “who gets there first and in greater numbers”, the accidental delay in development is a small mystery of ecology.

In order to understand the situation, we together with Shripad Tuljaprkar [Shripad Tuljapurkar] considered a simple model in our collaboration: a view with a two-year life cycle in which it grows from a seed into an adult for the first year and spends the second in seed production. (Steinsaltz, D. & Tuljapurkar, S. Stochastic growth rates for life histories with rare migration or diapause. ArXiv: 1505.00116 (2015).) We asked the following question: how does the seed rate affect some of the seeds in hibernation? during a year?

In the case when growth, survival and seed production remain constant from year to year, the answer is obvious: the delay in the growth of individuals retards the growth of the population. But with varying environmental conditions, everything is different. Even a small delay leads to a sharp increase in the population.

image

If 1% of the seeds are waiting for a year, one would expect a typical genealogical trajectory to experience 1 delay per 100 years, and fall into typical environmental conditions as they grow up. But subsequent generations of seeds will have very rare trajectories that linger more often, in which these delays occur precisely in the worst years, when growth means almost certain death or the inability to produce seeds. These trajectories are large deviations — exponentially rare — but over time they produce exponentially more descendants. The growth rate of the population is ultimately determined by these unlikely trajectories. In other words, if we trace back the trajectory of the individual alive today, it will look like a sequence of successful accidents.

The same mathematics also works for migration, supporting the important principle of protecting the habitat: the species will benefit from the ability to move between two equally good areas in which weather conditions randomly change from year to year. Tracking family history, each individual will find in it ancestors who fled from one place, by coincidence, just before the onset of the cataclysm, or who arrived at another place just when there was plenty of food. This is a special case of a banal evolution: most of the living organisms die without leaving offspring, but you can track your ancestors for billions of generations, and not meet a single such loser. Lucky for you!

Long-livers


Having lived to a certain age - which is less than most people think, because the probability that you will live another year, experiences a maximum of 12 years - you will face the fact that your physical condition and the probability of living another year all the time decreases, even if for short periods you can achieve improvement. Theoretical demographers considered models of aging in which the “vitality” of an individual serves as a random variable, changing in small steps, and which is more likely to change downwards than upwards, and the probability of death increases the more strongly the lower the vitality drops.

It is not surprising that, following this model, it can be calculated that the average survivability of a population decreases as a function of age ... until a certain moment. But a certain part of the population lives to a certain age, and these are exceptional individuals. Perhaps they were lucky to win the genetic lottery. Perhaps the occasional butting of life sent them in a relatively positive direction.

Whatever it was, the model predicts that the survivability of survivors gradually ceases to decrease. Each individual is still decreasing, but those who have decreased are taken away by an old woman with a scythe. The total survivability of the survivors reaches an equilibrium called “quasi-stationary distribution” between individual trajectories going down and screening out extra individuals in the lower part of the distribution of survivability.

In the language of large deviations, there is a function of the ratio I (x) - where x is a record of the vitality of a lifetime - which is zero for the trajectories remaining close to the average. Those that deviate strongly from the mean have a positive relation function, that is, their probability is exponentially less. In a typical model, it can be found that among all life paths that last for an uncharacteristically long time, the most likely will be those who accidentally maintained vitality at an unusually high level than those that followed the usual downward trajectory and did not accidentally die.

It follows from this that the death rate — the probability of dying in the following year for an individual of a certain age — increases in adulthood and then evens out at a very respectable age. Such a pattern, the “plateau of mortality,” is clearly seen in organisms such as Drosophila and nematode, if observed in large quantities in the same laboratory conditions — the mortality rate is evened out in the most common laboratory Drosophila, Drosophila melanogaster, already at the age of 4 weeks. (Vaupel, JW, et al. Biodemographic trajectories of longevity. Science 280, 855-860 (1998).)

The plateau of mortality in humans did not manifest until the population grew, and the health care did not improve so much that many people could live to 100 years or more. On average, the death rate of a person doubles every 8 years in the range of 30 with something years to 90 with something years. If we take a sample of Americans born in 1900, their mortality rate at 90 years was about 0.16, that is, 16% of them died this year. It increases more than twice to 98 years, and then never doubles. The highest death rate recorded is 0.62 at the age of 108 years. After this, there is very little data, but a careful analysis of people over 110 years old from around the world convincingly enough shows that under current conditions, the coefficient will even out somewhere between 0.4 and 0.7. (Vaupel, JW & Robine, JM Emergence of low-mortality countries. North American Actuarial Journal 6, 54-63 (2002))

Source: https://habr.com/ru/post/401517/


All Articles