Assessment of connectedness of events using Bayes

In his book, Neith Silver gives this example: suppose you want to place investments in several enterprises that can go bankrupt with probability

5 %

$5 \%$ . Required to assess their risks. The higher the probability of bankruptcy, the less we will invest. Conversely, if the probability of bankruptcy tends to zero, then you can invest without restrictions.

If there are 2 companies, then the likelihood that they both go bankrupt and we lose all investments

P = 0.05 c d o t 0.05 = 0.0025

$P = 0.05 \ cdot 0.05 = 0.0025$ . This is what standard probability theory teaches. But what will happen if businesses are connected, and the bankruptcy of one leads to the bankruptcy of another?

An extreme case is the situation when enterprises are completely dependent. The likelihood of double bankruptcy

P

$P$ (bankrupt1 & bankrupt2) =

P

$P$ (bankrupt 1), then the probability of losing all investments is

P = 0.05

$P = 0.05$ . Risk assessment methodology has a wide variation.

P

$P$ from 0.05 to 0.0025 and the real value depends on how correctly we estimate the connectedness of the two events.
')

In assessing investment in

N

$N$ we have enterprises

P

$P$ from

0.05

$0.05$ before

{0.05}^{n}

$0.05 ^ n$ . That is, the maximum possible probability remains large.

P = 0.05

$P = 0.05$ , and the old adage “do not put eggs in one basket” will not work if the counter with all baskets falls at once.

Thus, our estimates have a huge spread, and how much to invest remains a question. But it is necessary to consider well before investing. Neith Silver says that analysts' ignorance of these simple laws led to stock market crashes in 2008, when US rating agencies assessed risks, but did not assess the coherence of risks. What ultimately led to the domino effect, when a major player first collapsed and carried others along.

Let's try to make out this problem by solving a simple math problem after kata.

Let's solve a simplified problem in order to learn how to evaluate the coherence of two events using the Bayes method using a simple example of two coins. Then mathematics ... I will try to chew, so that I can become very clear.

Let there be 2 coins

M 1

$M1$ and

M 2

$M2$ which give out 0 or 1 when throwing and one of 4 combinations per throw is possible:

1:00 combination
combination 2: 01
3: 10 combination
4: 11 combination

Here the first digit refers to the first coin, and the second to the second. I introduced such designations for ease of presentation.

By the condition of the problem, let the first coin be independent. I.e

P_{1} (0) = 0.5

$P_1 (0) = 0.5$ and

P_{1} (1) = 0.5

$P_1 (1) = 0.5$ . And the second may be dependent, but we do not know how much. I.e

P_{2}

$P_2$ depends on

P_{1}

$P_1$ .

Maybe there is some kind of magnet that attracts coins, or the tossing is a cheater and a swindler, of which there is a dime a dozen. We will evaluate our ignorance in the form of probability.

To assess the connectedness, we need factual material and a model, the parameters of which will be estimated. Let's use the simplest assumptions to get a feel for the topic and build a model.

If the coins are not connected, then all four combinations will be equally probable. This is where an amendment from standard probability theory arises - such a result can be achieved with an infinite number of throws. But as in practice, the number of throws of course, we can get into a deviation from the average. For example, you can inadvertently get a series of three or five eagles when you toss coins, although on average, with an endless throwing, there will be exactly 50% of eagles and exactly 50% of tails. Deviation can be interpreted as a manifestation of connectedness, or it can be interpreted as a normal deviation of statistics from the mean. The smaller the sample, the greater the possible deviation, and therefore different assumptions can be confused.

This is where Bayesian theory comes to the rescue, which makes it possible to estimate the probability of a particular hypothesis using a finite set of data. Bayes produces the reverse process of the one with which we are accustomed to deal in probability theory. According to Bayes, the probability is estimated that our conjectures coincide with the real state of affairs, and not the probability of outcomes.

We turn to the creative process of building a model. There is a requirement for a connectedness model - at a minimum it should cover possible options. In our case, the extreme variants are complete connectedness or complete independence of the coins. That is, the model must have at least one parameter k describing the connectedness.

We describe it in the form of a coefficient

k i n [0, 1]

$k \ in [0,1]$ . If the second coin always coincides with the first one, then

k =

$k =$ 1. If the second coin always has opposite values, then

k = 0

$k = 0$ . If the coins are incoherent, then

k = 0.5

$k = 0.5$ . It turns out not bad - we describe a lot of options with one number. In addition, the meaning of this variable is defined as the probability of coincidence.

Let's try to estimate this number from the actual data.

Let there be a specific data set

D

$D$ which consists of

N = 5

$N = 5$ Outcomes:

Exodus 1:00
Exodus 2: 01
Exodus 3: 11
Exodus 4:00
Exodus 5: 11

At first glance, nothing says. Number of combinations:

N (00) = 2, N (01) = 1, N (10) = 0, N (11) = 2

$N (00) = 2, N (01) = 1, N (10) = 0, N (11) = 2$

Slowly disassemble what the Bayes formula means. Use the standard entry where the sign

|

$|$ means the likelihood of an event if it is already known that another event has been executed.

P (k | D) = f r a c P (D | k) c d o t P (k) P (D)

$P (k | D) = \ frac {P (D | k) \ cdot P (k)} {P (D)}$

k = a r g m a x P (k | D)

$k = argmax P (k | D)$

In this case, we have a combination of continuous and discrete distributions.

P (k | D)

$P (k | D)$ and

P (k)

$P (k)$ are continuous distributions. BUT

P (d | k)

$P (d | k)$ and

P (d)

$P (d)$ are discrete. For the Bayesian formula such a combination is possible. To save time, I do not paint all the details.

If we know this probability, then we can find the value

k i n [0, 1]

$k \ in [0,1]$ at which the probability of our hypothesis is maximum. That is, find the most likely ratio

k

$k$ .

In the right part we have 3 members that need to be assessed. Let's analyze them.

1) It is required to know or calculate the probability of obtaining such data with a particular hypothesis

P (d | k)

$P (d | k)$ . After all, even if the objects are incoherent (

k = 0.5

$k = 0.5$ ) then get the series

00, 00, 00

${00, 00, 00}$ possible, although difficult. It is much more likely to get such a combination if the coins are connected (

k = 1

$k = 1$ ).

P (d | k)

$P (d | k)$ - The most important member and below we will explain how to calculate it.

2) Need to know

P (k)

$P (k)$ . Here we come across a delicate moment of model construction. We do not know this function and we will build assumptions. If there is no additional knowledge, then we will assume that

k

$k$ equally probable in the range from 0 to 1. If we had insider information, we would know more about connectedness and build a more accurate prediction. But since such information is not available, we set

k s i m e v e n l y [0, 1]

$k \ sim evenly [0,1]$ . Since the magnitude

P (k)

$P (k)$ does not depend on

k

$k$ then when calculating

k

$k$ it won't matter.

3)

P (d)

$P (d)$ - This is the probability to have such a data set, if all values are random. We can get this set at different

k

$k$ with different probability. Therefore, all possible ways of obtaining a set are considered.

D

$D$ . Since at this stage the value is still unknown

k

$k$ , then it is necessary to integrate by

P (D) = i n t_{0}^{1} P (D | k) P (k) d k

$P (D) = \ int_ {0} ^ {1} P (D | k) P (k) dk$ . In order to understand this better, it is necessary to solve elementary problems in which a Bayesian graph is constructed, and then go from the sum to the integral. You’ll get a wolframalpha expression that looks for the maximum

k

$k$ will not be affected, since this value is independent of

k

$k$ .

We analyze how to calculate

P (d | k)

$P (d | k)$ . Remember that the first coin is independent, and the second is dependent. Therefore, the probability of the value for the first coin will look something like this

P (M 1 = 0)

$P (M1 = 0)$ , and for the second coin so

P (M 2 = 0 | M 1 = 0)

$P (M2 = 0 | M1 = 0)$ . The probability of coincidence with the first coin is equal to

k

$k$ , and the probability of mismatch is

(1 - k)

$(1-k)$ .

Let us analyze the possible cases for one outcome:

P (00) = P (M 1 = 0) c d o t P (M 2 = 0 | M 1 = 0) = 0.5 k

$P (00) = P (M1 = 0) \ cdot P (M2 = 0 | M1 = 0) = 0.5 k$

P (01) = P (M 1 = 0) c d o t P (M 2 = 1 | M 1 = 0) = 0.5 (1 - k)

$P (01) = P (M1 = 0) \ cdot P (M2 = 1 | M1 = 0) = 0.5 (1-k)$

P (10) = P (M 1 = 1) c d o t P (M 2 = 0 | M 1 = 1) = 0.5 (1 - k)

$P (10) = P (M1 = 1) \ cdot P (M2 = 0 | M1 = 1) = 0.5 (1-k)$

P (11) = P (M 1 = 1) c d o t P (M 2 = 1 | M 1 = 1) = 0.5 k

$P (11) = P (M1 = 1) \ cdot P (M2 = 1 | M1 = 1) = 0.5 k$

To test we add the probabilities, we should get one

P (00) + P (01) + P (10) + P (11) = 0.5 k + 0.5 (1 - k) + 0.5 (1 - k) + 0.5 k = 1

$P (00) + P (01) + P (10) + P (11) = 0.5 k + 0.5 (1-k) + 0.5 (1-k) + 0.5 k = 1$ . This makes me happy.

Now you can go to search for the most likely value.

k

$k$ on the fictional data set that has already been cited above

D = 00, 01, 11, 00, 11

$D = {00,01,11,00,11}$ .

The probability of having such a set

P (D | k) = P (00) c d o t P (01) c d o t P (11) c d o t P (00) c d o t P (11)

$P (D | k) = P (00) \ cdot P (01) \ cdot P (11) \ cdot P (00) \ cdot P (11)$ ,
reveal

P (D | k) = (0.5 k) c d o t (0.5 (1 - k)) c d o t (0.5 k) c d o t (0.5 k) c d o t (0.5 k)

$P (D | k) = (0.5 k) \ cdot (0.5 (1-k)) \ cdot (0.5 k) \ cdot (0.5 k) \ cdot (0.5 k)$ ,
simplified

P (D | k) = {0.5}^{5} c d o t k^{1} c d o t (1 - k)^{4}

$P (D | k) = 0.5 ^ 5 \ cdot k ^ 1 \ cdot (1-k) ^ 4$ ,
generalize to arbitrary data set

P (D | k) = {0.5}^{N} c d o t k^{N (00)} c d o t k^{N (11)} c d o t (1 - k)^{N (01)} c d o t (1 - k)^{N (10)}

$P (D | k) = 0.5 ^ {N} \ cdot k ^ {N (00)} \ cdot k ^ {N (11)} \ cdot (1-k) ^ {N (01)} \ cdot ( 1-k) ^ {N (10)}$ ,

P (D | k) = {0.5}^{N} c d o t k^{N (00) + N (11)} c d o t (1 - k)^{N (01) + N (10)}

$P (D | k) = 0.5 ^ N \ cdot k ^ {N (00) + N (11)} \ cdot (1-k) ^ {N (01) + N (10)}$ .

Denote the number of matches

A = N (00) + N (11)

$A = N (00) + N (11)$ , and the number of mismatches

B = N (01) + N (10)

$B = N (01) + N (10)$ .

We get this generalized formula

P (D | k) = {0.5}^{N} c d o t k^{A} c d o t (1 - k)^{B}

$P (D | k) = 0.5 ^ N \ cdot k ^ {A} \ cdot (1-k) ^ {B}$ .

Those who wish can play with the schedule, introducing different exponents: a link to wolframalpha .

Since in this example

P (D | k) = c o n s t c d o t P (k | D)

$P (D | k) = const \ cdot P (k | D)$ , then we work directly with

P (d | k)

$P (d | k)$ .

To search for the maximum differentiate and equate to zero:

0 = k^{A - 1} c d o t (1 - k)^{B - 1} c d o t (A (k - 1) + B k)

$0 = k ^ {A-1} \ cdot (1-k) ^ {B-1} \ cdot (A (k-1) + Bk)$ .

In order for the product to be equal to zero, one of the terms must be zero.
We are not interested

k = 0

$k = 0$ and

k = 1

$k = 1$ , since there is no local maximum at these points, and the third factor indicates a local maximum, therefore

k = f r a c A A + B = f r a c A N

$k = \ frac {A} {A + B} = \ frac {A} {N}$ .

We obtain a formula that can be used for predictions. That is, after an additional throw of the first coin, we are trying to predict the behavior of the second through

k

$k$ .
Exodus 6: 1¿

When new data is received, the formula is adjusted and refined. When obtaining insider data we will have a more accurate value.

P (k)

$P (k)$ , and you can further clarify the entire chain of calculations.

Since we calculate probability, for

k

$k$ it is desirable to analyze the mean and variance. The average can be calculated by the standard formula. But about the variance, we can say that as the amount of data increases, the peak on the graph (link above) becomes sharper, which means a more unambiguous prediction of the value

k

$k$ .

In the above data set we have 4 matches and one mismatch. therefore

k = 4 / 5

$k = 4/5$ . In the additional sixth throw with a probability of 80%, the second coin will coincide with the first one. Suppose we got on the first coin 1, then we have 80%, that “outcome 6” will be “11” and the remaining 20%, that “outcome 6” will be “10”. After each roll, we adjust the formula and predict the probability of still imperfect matches a step further.

On this I would like to finish my post. I will be glad to your comments.

PS
This example is described to demonstrate the algorithm. Here a lot of things are not taken into account of what happens in reality. For example, when analyzing events from the real world, it will be necessary to analyze time intervals, conduct factor analysis and much more. This is the concern of professionals. It should also be philosophically noted that everything in this world is interconnected, only these connections sometimes appear and sometimes not. Therefore, it is absolutely impossible to fully take everything into account, for it would have to include all the objects of this world in the formula, even those that we do not know, and process a very large amount of factual material.

Information sources:
1. https://ru.wikipedia.org/wiki/Teorema_Bayes

2. https://ru.wikipedia.org/wiki/Bayesovsky_output

3. Nate Silver, “Signal and Noise”

Rybakov D.A. 2017
- my thanks to those who responded: Arastas, AC130, koldyr

Source: https://habr.com/ru/post/331282/

All Articles

Assessment of connectedness of events using Bayes

More articles: