Wiener Chaos or Another Way to Flip a Coin

Probability theory never ceased to amaze me, starting from the moment I first encountered it, and to this day. At different times, I was overtaken in different degrees, let's call them “wow effects”, shock shocks to the cerebellum, from which I was covered with the effect of the third eye, and the world forever ceased to be the same.

I experienced the first “wow effect” from the Central Limit Theorem. We take a bunch of random variables, direct them to infinity and get a normal distribution. And it doesn’t matter how these values are distributed, it doesn’t matter whether it is tossing a coin or raindrops on the glass, flares on the Sun or the remains of coffee grounds, the result will always be the same - their sum always tends to normality. Unless, one needs to demand their independence and the existence of a dispersion (I later learned that there is a theorem for extreme heavy tailing distributions with infinite dispersion). Then this paradox did not let me fall asleep for a long time.
At some point of study at the university such subjects as discrete mathematics and functional analysis merged together and surfaced in the theater under the guise of the expression "almost sure." Standard example: you randomly select a number from 0 to 1. How likely are you to stick a rational number (hello, Dirichlet function)? Spoiler: 0. Zero, Karl! An infinite set has no power if it is countable. You have an infinite number of options, but you will not choose any of them. You will not select 0, or 1, or 1/2, or 1/4. You will not select 3/2.

Yes, yes, what to choose 1/2, what to choose 3/2, the probability is zero. But in 3/2 you will not pinch exactly, these are the conditions, and in 1/2 you will not get well ... almost surely. The concept of “almost everywhere” / “almost certainly” is amused by mathematics, and the average man is forced to twist his finger at his temple. Many break their brains in an attempt to classify zeros, but the result is worth it.
The third in a row, but not in strength, “wow effect” overtook already at the transition to the advanced level
- when reading books on stochastic calculus. The reason for this was Ito’s lemma. Since the days of school, when our virgin eyes were first shown a derivative, we have not doubted the correctness of such a formula:
$dX ^ 2 = 2X \ cdot dX.$
And she is true. Thats only if $X$ - This is not a random process. The hell mixture of the properties of the normal distribution and “almost surely” proves that in the opposite situation this formula is generally incorrect. A volume of mathematical analysis with solutions of ordinary differential equations can now be thrown into the furnace. People in the subject giggle softly, the others eagerly leaf through the wiki articles with Ito calculus.

But most recently, I experienced the fourth, so-called “wow effect”. This is not a single fact, but a whole theory, which I am going to tell in a series of several articles. And if the previous feints of the theory of probability do not surprise you any more, then I beg you for kat (I know, you are already here).
')

Hermite Polynomials

Let's start with ordinary algebra - let's define “probabilistic” (they are slightly different from “physical”) Hermite polynomials :

$H_n (x) = (-1) ^ ne ^ {x ^ 2/2} \ frac {d ^ n} {dx ^ n} (e ^ {- x ^ 2/2}), \ quad n \ in \ mathbb {N} _0.$

Values of the first polynomials:

$H_0 (x) = 1, H_1 (x) = x, H_2 (x) = x ^ 2-1, H_3 (x) = x ^ 3-3x, \ dots$

Hermite polynomials have the following properties:

$H_n '(x) = nH_ {n-1} (x),$
$H_n (-x) = (- 1) ^ nH_n (x),$
$H_ {n} (x) = xH_ {n-1} (x) - (n-1) H_ {n-2} (x).$

The last relation will help us in calculating

$n$ Hermite polynomials for a given

$x$ . We will be programmed in Haskell, because it allows mathematicians to express themselves in their usual language - Haskell is pure, strict and beautiful like mathematics itself.

-- | 'hermite' is an infinite list of Hermite polynomials for given x hermite :: (Enum a, Num a) => a -> [a] hermite x = s where s@(_:ts) = 1 : x : zipWith3 (\hn2 hn1 n1 -> x * hn1 - n1 * hn2) s ts [1..]

The hermite function takes a parameter as input.

$x$ , and the output gives an endless sheet of

$n$ polynomials for

$n = 0,1, ...$ Who is not familiar with the concept of lazy computing, I strongly advise you to read. For those who know this concept, but not yet fully functional programming: what happens here? Imagine that we already have an infinite sheet with all the values of Hermitian polynomials:

 s = [1, x, x^2-1, x^3-3x, x^4-6x^2+3, ... ]

The tail of this sheet (without the first element):

 ts = [x, x^2-1, x^3-3x, x^4-6x^2+3, ... ]

Afterwards, we will take another sheet with natural numbers:

 [1, 2, 3, ... ]

The zipWith3 function combines the last three sheets using the operator given to it:

 x * [ x, x^2-1, x^3-3x, ... ] - [ 1*1, 2*x, 3*(x^2-1), ... ] = [x^2-1, x^3-3x, x^4-6x^2+3, ... ]

Add ahead 1 and x, and we get the full set of Hermite polynomials. In other words, we got a sheet with the values of polynomials, using a sheet with these values, that is, a sheet, which we are trying to get. Rumor has it that a full awareness of the beauty and power of FP is akin to the ability to look into your ear.

Check: first 6 values for

$x = 1$ :

 Prelude> take 6 (hermite 1) [1,1,0,-2,-2,6]

What we expected to see.

Hilbert space

Let's move a little to another steppe - let us recall the definition of the Hilbert space. In scientific terms, this is a complete metric linear space with a scalar product given on it

$\ langle X, Y \ rangle$ In this space, each element corresponds to a real number, called the norm and equal to

$\ | X \ | = \ sqrt {\ langle X, X \ rangle}.$

Nothing extraordinary. When I try to imagine the Hilbert space, I start from simple and gradually come to complex.

The simplest example is the space of real numbers: $H = \ mathbb {R}$ . In this case, the scalar product of two numbers $X$ and $Y$ we'll have
$\ langle X, Y \ rangle = XY.$
Then I move to Euclidean space $H = \ mathbb {R} ^ n$ . Now
$\ langle X, Y \ rangle = \ sum_ {i = 0} ^ nX_iY_i.$
This space can be extended to the space of complex vectors: $H = \ mathbb {C} ^ n$ for which the scalar product will be
$\ langle X, Y \ rangle = \ sum_ {i = 0} ^ nX_i \ overline {Y_i}$
(the top bar denotes complex conjugation).
Well, finally I come into the space for adults, a space with infinite dimension. In our case, this will be the space of square-integrable functions defined on some set $\ Omega$ with a given measure $\ mu$ . We will denote it as $H = L ^ 2 (\ Omega, \ mu)$ . Scalar product on it is defined as follows:
$\ langle X, Y \ rangle = \ int_ \ Omega (X \ cdot Y) d \ mu.$
Usually under the set $\ Omega$ implied interval $[a, b]$ , and under the measure $\ mu$ - uniform measure (Lebesgue measure), i.e. $d \ mu = \ mu (d \ omega) = d \ omega$ . And then the scalar product is written in the form of ordinary Lebesgue integral
$\ int_a ^ bX (\ omega) Y (\ omega) d \ omega.$
If we think in terms of probability theory, then $\ Omega$ Is a space of elementary events $X = X (\ omega)$ and $Y = Y (\ omega)$ - random variables, and $\ mu$ - probability measure. Each such measure has its own density function $\ rho$ which may be different from a constant, then $d \ mu = \ rho (\ omega) d \ omega$ and the dot product coincides with the expectation:
$\ langle X, Y \ rangle = \ int_ \ Omega X (\ omega) Y (\ omega) \ rho (\ omega) d \ omega = \ mathbb {E} [XY].$

Gaussian process

It is time to introduce an element of chance into our thoughts. Suppose we have a Hilbert space

$H$ . Then we will call

$\ {W (h) \} _ {h \ in H}$ (isonormal) Gaussian process , if

random vector $(W (h_1), \ dots, W (h_n))$ distributed normally with zero expectation for any $h_1, \ dots h_n \ in H$ and
for $h, g \ in H$
$\ mathbb {E} [W (h) \ cdot W (g)] = \ langle h, g \ rangle.$

By its mathematical essence

$W (h)$ Is a mapping from one Hilbert space to another, from some

$H$ at

$L ^ 2 (\ Omega, \ mathcal {F}, \ mathbb {P})$ - probabilistic space of random variables with a finite variance given by a triple

$\ Omega$ (many elementary events),

$\ mathcal {F}$ (sigma-algebra) and

$\ mathbb {P}$ (probability measure). It is easy to show that this mapping is linear:

$W (ah + bg) = aW (h) + bW (g) \ quad \ forall a, b \ in \ mathbb {R}, \ forall h, g \ in H.$

(in the sense of equality "almost probably", hello "wow effect" # 2)

Example. Let be

$H = L ^ 2 ((0, \ infty), \ lambda)$ where

$\ lambda$ - uniform measure (Lebesgue). Dot product on it

$\ langle f, g \ rangle = \ int (f \ cdot g) d \ lambda.$

Let be

$h (s) = 1 _ {[0, t]} (s)$ - unit function on the interval

$[0, t]$ . Then

$\ | h \ | ^ 2 = \ int 1 _ {[0, t]} (s) ds = t$ and

$B (t) = W (1 _ {[0, t]}) \ sim \ mathcal {N} (0, t)$

none other than the Brownian motion (or the Wiener process). Moreover,

$\ int_0 ^ t f (s) dB (s) = W (1 _ {[0, t]} f)$

called Ito integral of the function

$f$ regarding

$B$ .

In order to implement the Gaussian process, I will use the packages that noble people have already written for us.

 import Data.Random.Distribution.Normal import Numeric.LinearAlgebra.HMatrix as H -- | 'gaussianProcess' samples from Gaussian process gaussianProcess :: Seed -- random state -> Int -- number of samples m -> Int -- number of dimensions n -> ((Int, Int) -> Double) -- function that maps indices of the matrix into dot products of its elements -> [Vector Double] -- m n-th dimensional samples of Gaussian process gaussianProcess seed mn dotProducts = toRows $ gaussianSample seed m mean cov_matrix where mean = vector (replicate n 0) cov_matrix = H.sym $ (n><n) $ map (\i -> dotProducts (quot in, rem in)) [0..]

The gaussianProcess function takes a seed parameter (standard for generators), nSamples is the sample size, dim is the dimension of the vector

$(h_1, ..., h_n) ^ T$ , dotProducts is a function that accepts input

$(i, j)$ , the index of the covariance matrix and the scalar product corresponding to this index

$\ langle h_i, h_j \ rangle$ . On output gaussianProcess produces nSamples of vectors

$(W (h_1), \ dots, W (h_n))$ .

It is already time to combine all the knowledge we have gained together. But before that, it is worth mentioning one useful property of Hermitian polynomials and the normal distribution in the aggregate. Let be

$F (t, x) = \ exp (tx-t ^ 2/2).$ Then, using Taylor's decomposition,

$\ begin {aligned} F (t, x) & = \ exp (x ^ 2 / 2- (xt) ^ 2/2) \\ & = \ exp (x ^ 2/2) \ sum_ {n = 0 } ^ \ infty \ frac {t ^ n} {n! t} \ frac {d ^ n} {dt ^ n} \ exp (- (xt) ^ 2/2) \ bigg | _ {t = 0} \ \ & = \ sum_ {n = 0} ^ \ infty t ^ n \ frac {(- 1) ^ n} {n!} \ exp (x ^ 2/2) \ frac {d ^ n} {dz ^ n } \ exp (-z ^ 2/2) \ bigg | _ {z = x} \\ & = \ sum_ {n = 0} ^ \ infty \ frac {t ^ n} {n!} H_n (x). \ end {aligned}$

Take

$X, Y \ sim \ mathcal {N} (0,1)$ - two standard normally distributed random variables. Through the generating normal distribution function, we can pull out the following relation:

$\ mathbb {E} [F (s, X) \ cdot F (t, Y)] = \ exp (st \ mathbb {E} [XY]).$

Take

$(n + m)$ -th partial derivative

$\ frac {\ partial ^ {n + m}} {\ partial s ^ n \ partial t ^ m}$ equate

$s = t = 0$ on both sides of the equation above and we get

$\ mathbb {E} [H_n (X) \ cdot H_m (Y)] = \ begin {cases} n! (\ mathbb {E} [XY]) ^ n, & n = m, \ 0, & n \ neq m. \ end {cases}$

What does this tell us? First, we got the rate

$\ | H_n (X) \ | ^ 2 = n!$ for

$X \ sim \ mathcal {N} (0,1)$ and, secondly, we now know that different Hermitian polynomials in normal random variables are orthogonal to each other. Now we are ready to realize something more.

Spread the space into chaos

Let be

$\ mathcal {H} _n = \ overline {\ operatorname {span}} \ Big \ {H_n (W (h)) \ Big | \ | h \ | = 1 \ Big \}$ - nth Wiener chaos . Then

$L ^ 2 (\ Omega, \ mathcal {F}, \ mathbb {P}) = \ bigoplus_ {n = 0} ^ \ infty \ mathcal {H} _n,$

Where

$\ mathcal {F}$ - sigma-algebra created

$W (h)$ .

Whoa whoa easy! Let's decompose this theorem on decomposition in pieces and translate from mathematical to human. We will not go into much detail, but only intuitively explain what it is about. Icon

$\ operatorname {span} (X)$ denotes the linear span of a subset

$X$ Hilbert space

$H$ - the intersection of all subspaces

$H$ containing

$X$ . Simply put, this is the set of all linear combinations of elements from

$X$ . Bar above top

$\ operatorname {span}$ denotes closure of a set . If a

$\ overline {\ operatorname {span}} (X) = H$ then

$X$ called the complete set (roughly, "

$X$ tight in

$H$ "). Consequently,

$\ overline {\ operatorname {span}} \ {H_n (W (h)) | \ | h \ | = 1 \}$ - the closure of the linear hull of Hermite polynomials from the Gaussian process on a single hypersphere.

With the notation sort of sorted out. Now that is Wiener chaos. We go from simple:

$\ mathcal {H} _0$ contains all linear combinations of Hermitian polynomials with degree 0, that is, various combinations of numbers

$a \ cdot 1$ , that is, the whole space of real numbers. Consequently,

$\ mathcal {H} _0 = \ mathbb {R}$ . Go ahead. Easy to see that

$\ mathcal {H} _1 = \ {W (h) | h \ in H \}$ , that is, the space composed of Gaussian processes. It turns out that all centered normal values belong to

$\ mathcal {H} _1$ . If we add more

$\ mathcal {H} _0$ , then other normal random variables will join them, whose mathematical expectation is non-zero. Further sets

$\ mathcal {H} _n$ already operate with n-degrees

$W (h)$ .

Example. Let be

$H = L ^ 2 ((0, \ infty), \ lambda)$ and

$X = B (t) ^ 2$ - Square Brownian motion. Then

$\ begin {aligned} B (t) ^ 2 & = W (1 _ {[0, t]}) ^ 2 \\ & = \ | 1 _ {[0, t]} \ | ^ 2 \ cdot W \ bigg ( \ frac {1 _ {[0, t]}} {\ | 1 _ {[0, t]} \ |} \ bigg) ^ 2 \\ & = t \ cdot W \ bigg (\ frac {1 _ {[0, t]}} {\ sqrt {t}} \ bigg) ^ 2 \\ & = tH_2 \ bigg (W \ bigg (\ frac {1 _ {[0, t]}}} {\ sqrt {t}} \ bigg) \ bigg) + t. \ end {aligned}$

The first term belongs

$\ mathcal {H} _2$ , second -

$\ mathcal {H} _0$ . This is called decomposition into Wiener chaos.

We showed earlier that

$\ mathcal {H} _n \ perp \ mathcal {H} _m$ for

$n \ neq m$ . The decomposition theorem states that these sets are not only orthogonal to each other, but also form a complete system in

$L ^ 2 (\ Omega, \ mathcal {F}, \ mathbb {P})$ . What does this mean in practice? This means that any random variable

$X$ with finite variance can be approximated by a polynomial function of a normally distributed random variable.

In fact

In fact, such a decomposition is useful if the distribution

$X$ in a sense, close to the normal distribution. For example, if we are dealing with Brownian motion or with a lognormal distribution. And we have not just mentioned that

$\ mathcal {F}$ is created

$W (h)$ This is a very important condition. In fact, the density of the normal distribution

$\ rho (x) = \ frac {1} {\ sqrt {2 \ pi}} e ^ {- x ^ 2/2}$

very similar to the definition of the Hermite polynomial

$H_n (x) = (-1) ^ ne ^ {x ^ 2/2} \ frac {d ^ n} {dx ^ n} (e ^ {- x ^ 2/2}), \ quad n \ in \ mathbb {N} _0.$

If the distribution

$X$ far from Gauss, then you can try other orthogonal polynomials. For example, the density of the gamma distribution:

$\ rho (x) = \ frac {x ^ {n-1} e ^ {- x}} {\ Gamma (n)}.$

Nothing like? Yes this is the Laguerre polynomials

$L_n (x) = \ frac {e ^ x} {n!} \ Frac {d ^ n} {dx ^ n} (x ^ ne ^ {- x})$

The Legendre polynomials correspond to the uniform distribution, the Kravchuk polynomials, etc., to the binomial distribution. The theory that develops the idea of decomposing a probability space into orthogonal polynomials is referred to in the English literature as “Polynomial chaos expansion” .

Example. Let's take now

$H = \ mathbb {R}$ function

$f$ and set a random variable

$X$ such that

$X = f (\ xi) \ in L ^ 2 (\ Omega, \ mathcal {F}, \ mathbb {P}),$

Where

$\ xi = W (1) \ sim \ mathcal {N} (0,1)$ . By the decomposition theorem, we can represent it as a weighted sum of Hermite polynomials

$f (\ xi) = \ sum_ {n = 0} ^ \ infty f_nH_n (\ xi),$

where the coefficients are given by the formula

$f_n = \ frac {1} {n!} \ mathbb {E} [f (\ xi) \ cdot H_n (\ xi)].$

These values

$f_n$ We got as follows:

$\ begin {aligned} \ mathbb {E} [f (\ xi) \ cdot H_n (\ xi)] & = \ langle f (\ xi), H_n (\ xi) \ rangle \\ & = \ langle \ sum_ {k = 0} ^ \ infty f_k H_k (\ xi), H_n (\ xi) \ rangle \\ & = \ sum_ {k = 0} ^ \ infty f_k \ langle H_k (\ xi), H_n (\ xi) \ rangle \\ & = f_n \ | H_n (\ xi) \ | ^ 2 = f_n n !. \ end {aligned}$

Congratulations! Now, if you have a function of a standard normally distributed random variable, you can decompose it in a Hermitian polynomial basis. For example, throwing a 0-1 fair coin can be represented as

$X = 1 _ {(0, \ infty)} (\ xi).$

After a little conjuring with mathematics (we will leave the reader to calculate simple integrals), we get the decomposition:

$X = \ frac {1} {2} + \ frac {1} {\ sqrt {2 \ pi}} \ sum_ {n = 0} ^ \ infty \ frac {(- 1) ^ n} {2 ^ n (2n + 1) n!} H_ {2n + 1} (\ xi).$

Note that every second element in the decomposition in basis is zero.

 -- | 'second' function takes a list and gives each second element of it second (x:y:xs) = y : second xs second _ = [] -- | 'coinTossExpansion' is a Wiener chaos expansion for coin-toss rv to n-th element coinTossExpansion :: Int -- number of elements in the sum -> Double -- gaussian random variable -> Double -- the sum coinTossExpansion n xi = sum (take n $ 0.5 : zipWith (*) fn (second $ hermite xi)) where fn = 1.0 / (sqrt $ 2 * pi) : zipWith ( \fn1 k -> -fn1 * k / ((k + 1) * (k + 2)) ) fn [1, 3..]

The coinTossExpansion function returns the sum obtained by decomposing a random coin into Wiener chaos for a given

$\ xi$ from

$0$ before

$n$ . The graph shows gradual convergence for selected randomly.

$\ xi$ with increasing

$n$ .

Judging by this schedule, somewhere after

$n \ approx $ 10$ we can trim the amount, round and return as

$X$ .

 -- | 'coinTossSequence' is a coin-toss sequence of given size coinTossSequence :: Seed -- random state -> Int -- size of resulting sequence -> [Int] -- coin-toss sequence coinTossSequence seed n = map (round.coinTossExpansion 100) (toList nvec) where nvec = gaussianProcess seed n 1 (\(i,j) -> 1) !! 0

Check what the sequence of 20 flips will look like.

 Prelude> coinTossSequence 42 20 [0,0,1,0,0,0,1,1,0,1,0,0,0,1,0,1,1,1,0,1]

Now, when you are asked to generate coin flips, you know what to show them.

Well, no joke, we counted something and laid out something, but what is the use of this all, you ask. Do not rush to feel deceived. In subsequent articles, we will show how this decomposition allows us to take a derivative of a random variable (in a certain sense), expand stochastic integration (and your consciousness), and find all this practical application in machine learning.

Source: https://habr.com/ru/post/343148/

All Articles

Wiener Chaos or Another Way to Flip a Coin

Hermite Polynomials

Hilbert space

Gaussian process

Spread the space into chaos

More articles: