📜 ⬆️ ⬇️

Wiener Chaos or Another Way to Flip a Coin


Probability theory never ceased to amaze me, starting from the moment I first encountered it, and to this day. At different times, I was overtaken in different degrees, let's call them “wow effects”, shock shocks to the cerebellum, from which I was covered with the effect of the third eye, and the world forever ceased to be the same.


But most recently, I experienced the fourth, so-called “wow effect”. This is not a single fact, but a whole theory, which I am going to tell in a series of several articles. And if the previous feints of the theory of probability do not surprise you any more, then I beg you for kat (I know, you are already here).
')

Hermite Polynomials


Let's start with ordinary algebra - let's define “probabilistic” (they are slightly different from “physical”) Hermite polynomials :

H n ( x ) = ( - 1 ) n e x 2 / 2 f r a c d n d x n ( e - x 2 / 2 ) , q u a d n i n m a t h b b N 0 .    

Values ​​of the first polynomials: H0(x)=1,H1(x)=x,H2(x)=x2−1,H3(x)=x3−3x, dots


Hermite polynomials have the following properties:


The last relation will help us in calculating n Hermite polynomials for a given x . We will be programmed in Haskell, because it allows mathematicians to express themselves in their usual language - Haskell is pure, strict and beautiful like mathematics itself.

-- | 'hermite' is an infinite list of Hermite polynomials for given x hermite :: (Enum a, Num a) => a -> [a] hermite x = s where s@(_:ts) = 1 : x : zipWith3 (\hn2 hn1 n1 -> x * hn1 - n1 * hn2) s ts [1..] 

The hermite function takes a parameter as input. x , and the output gives an endless sheet of n polynomials for n=0,1,... Who is not familiar with the concept of lazy computing, I strongly advise you to read. For those who know this concept, but not yet fully functional programming: what happens here? Imagine that we already have an infinite sheet with all the values ​​of Hermitian polynomials:

 s = [1, x, x^2-1, x^3-3x, x^4-6x^2+3, ... ] 

The tail of this sheet (without the first element):

 ts = [x, x^2-1, x^3-3x, x^4-6x^2+3, ... ] 

Afterwards, we will take another sheet with natural numbers:

 [1, 2, 3, ... ] 

The zipWith3 function combines the last three sheets using the operator given to it:

 x * [ x, x^2-1, x^3-3x, ... ] - [ 1*1, 2*x, 3*(x^2-1), ... ] = [x^2-1, x^3-3x, x^4-6x^2+3, ... ] 

Add ahead 1 and x, and we get the full set of Hermite polynomials. In other words, we got a sheet with the values ​​of polynomials, using a sheet with these values, that is, a sheet, which we are trying to get. Rumor has it that a full awareness of the beauty and power of FP is akin to the ability to look into your ear.

Check: first 6 values ​​for x=1 :

 Prelude> take 6 (hermite 1) [1,1,0,-2,-2,6] 

What we expected to see.

Hilbert space


Let's move a little to another steppe - let us recall the definition of the Hilbert space. In scientific terms, this is a complete metric linear space with a scalar product given on it  langleX,Y rangle In this space, each element corresponds to a real number, called the norm and equal to

 |X |= sqrt langleX,X rangle.

Nothing extraordinary. When I try to imagine the Hilbert space, I start from simple and gradually come to complex.

  1. The simplest example is the space of real numbers: H= mathbbR . In this case, the scalar product of two numbers X and Y we'll have

     langleX,Y rangle=XY.

  2. Then I move to Euclidean space H= mathbbRn . Now

     langleX,Y rangle= sumni=0XiYi.

    This space can be extended to the space of complex vectors: H= mathbbCn for which the scalar product will be

     langleX,Y rangle= sumni=0Xi overlineYi

    (the top bar denotes complex conjugation).
  3. Well, finally I come into the space for adults, a space with infinite dimension. In our case, this will be the space of square-integrable functions defined on some set  Omega with a given measure  mu . We will denote it as H=L2( Omega, mu) . Scalar product on it is defined as follows:

     langleX,Y rangle= int Omega(X cdotY)d mu.

    Usually under the set  Omega implied interval [a,b] , and under the measure  mu - uniform measure (Lebesgue measure), i.e. d mu= mu(d omega)=d omega . And then the scalar product is written in the form of ordinary Lebesgue integral

     intbaX( omega)Y( omega)d omega.

    If we think in terms of probability theory, then  Omega Is a space of elementary events X=X( omega) and Y=Y( omega) - random variables, and  mu - probability measure. Each such measure has its own density function  rho which may be different from a constant, then d mu= rho( omega)d omega and the dot product coincides with the expectation:

     langleX,Y rangle= int OmegaX( omega)Y( omega) rho( omega)d omega= mathbbE[XY].




Gaussian process


It is time to introduce an element of chance into our thoughts. Suppose we have a Hilbert space H . Then we will call \ {W (h) \} _ {h \ in H} (isonormal) Gaussian process , if

  1. random vector (W(h1), dots,W(hn)) distributed normally with zero expectation for any h1, dotshn inH and
  2. for h,g inH

     mathbbE[W(h) cdotW(g)]= langleh,g rangle.


By its mathematical essence W(h) Is a mapping from one Hilbert space to another, from some H at L2( Omega, mathcalF, mathbbP) - probabilistic space of random variables with a finite variance given by a triple  Omega (many elementary events),  mathcalF (sigma-algebra) and  mathbbP (probability measure). It is easy to show that this mapping is linear:

W(ah+bg)=aW(h)+bW(g) quad foralla,b in mathbbR, forallh,g inH.

(in the sense of equality "almost probably", hello "wow effect" # 2)

Example. Let be H=L2((0, infty), lambda) where  lambda - uniform measure (Lebesgue). Dot product on it

 langlef,g rangle= int(f cdotg)d lambda.

Let be h(s)=1[0,t](s) - unit function on the interval [0,t] . Then  |h |2= int1[0,t](s)ds=t and

B(t)=W(1[0,t]) sim mathcalN(0,t)

none other than the Brownian motion (or the Wiener process). Moreover,

 intt0f(s)dB(s)=W(1[0,t]f)

called Ito integral of the function f regarding B .

In order to implement the Gaussian process, I will use the packages that noble people have already written for us.

 import Data.Random.Distribution.Normal import Numeric.LinearAlgebra.HMatrix as H -- | 'gaussianProcess' samples from Gaussian process gaussianProcess :: Seed -- random state -> Int -- number of samples m -> Int -- number of dimensions n -> ((Int, Int) -> Double) -- function that maps indices of the matrix into dot products of its elements -> [Vector Double] -- m n-th dimensional samples of Gaussian process gaussianProcess seed mn dotProducts = toRows $ gaussianSample seed m mean cov_matrix where mean = vector (replicate n 0) cov_matrix = H.sym $ (n><n) $ map (\i -> dotProducts (quot in, rem in)) [0..] 

The gaussianProcess function takes a seed parameter (standard for generators), nSamples is the sample size, dim is the dimension of the vector (h1,...,hn)T , dotProducts is a function that accepts input (i,j) , the index of the covariance matrix and the scalar product corresponding to this index  langlehi,hj rangle . On output gaussianProcess produces nSamples of vectors (W(h1), dots,W(hn)) .


It is already time to combine all the knowledge we have gained together. But before that, it is worth mentioning one useful property of Hermitian polynomials and the normal distribution in the aggregate. Let be F(t,x)= exp(tx−t2/2). Then, using Taylor's decomposition,

\ begin {aligned} F (t, x) & = \ exp (x ^ 2 / 2- (xt) ^ 2/2) \\ & = \ exp (x ^ 2/2) \ sum_ {n = 0 } ^ \ infty \ frac {t ^ n} {n! t} \ frac {d ^ n} {dt ^ n} \ exp (- (xt) ^ 2/2) \ bigg | _ {t = 0} \ \ & = \ sum_ {n = 0} ^ \ infty t ^ n \ frac {(- 1) ^ n} {n!} \ exp (x ^ 2/2) \ frac {d ^ n} {dz ^ n } \ exp (-z ^ 2/2) \ bigg | _ {z = x} \\ & = \ sum_ {n = 0} ^ \ infty \ frac {t ^ n} {n!} H_n (x). \ end {aligned}

Take X,Y sim mathcalN(0,1) - two standard normally distributed random variables. Through the generating normal distribution function, we can pull out the following relation:

 mathbbE[F(s,X) cdotF(t,Y)]= exp(st mathbbE[XY]).

Take (n+m) -th partial derivative  frac partialn+m partialsn partialtm equate s=t=0 on both sides of the equation above and we get

\ mathbb {E} [H_n (X) \ cdot H_m (Y)] = \ begin {cases} n! (\ mathbb {E} [XY]) ^ n, & n = m, \ 0, & n \ neq m. \ end {cases}


What does this tell us? First, we got the rate  |Hn(X) |2=n! for X sim mathcalN(0,1) and, secondly, we now know that different Hermitian polynomials in normal random variables are orthogonal to each other. Now we are ready to realize something more.

Spread the space into chaos


Let be \ mathcal {H} _n = \ overline {\ operatorname {span}} \ Big \ {H_n (W (h)) \ Big | \ | h \ | = 1 \ Big \} - nth Wiener chaos . Then

L2( Omega, mathcalF, mathbbP)= bigoplus n=0infty mathcalHn,

Where  mathcalF - sigma-algebra created W(h) .

Whoa whoa easy! Let's decompose this theorem on decomposition in pieces and translate from mathematical to human. We will not go into much detail, but only intuitively explain what it is about. Icon  operatornamespan(X) denotes the linear span of a subset X Hilbert space H - the intersection of all subspaces H containing X . Simply put, this is the set of all linear combinations of elements from X . Bar above top  operatornamespan denotes closure of a set . If a  overline operatornamespan(X)=H then X called the complete set (roughly, " X tight in H "). Consequently, \ overline {\ operatorname {span}} \ {H_n (W (h)) | \ | h \ | = 1 \} - the closure of the linear hull of Hermite polynomials from the Gaussian process on a single hypersphere.

With the notation sort of sorted out. Now that is Wiener chaos. We go from simple:  mathcalH0 contains all linear combinations of Hermitian polynomials with degree 0, that is, various combinations of numbers a cdot1 , that is, the whole space of real numbers. Consequently,  mathcalH0= mathbbR . Go ahead. Easy to see that \ mathcal {H} _1 = \ {W (h) | h \ in H \} , that is, the space composed of Gaussian processes. It turns out that all centered normal values ​​belong to  mathcalH1 . If we add more  mathcalH0 , then other normal random variables will join them, whose mathematical expectation is non-zero. Further sets  mathcalHn already operate with n-degrees W(h) .

Example. Let be H=L2((0, infty), lambda) and X=B(t)2 - Square Brownian motion. Then

\ begin {aligned} B (t) ^ 2 & = W (1 _ {[0, t]}) ^ 2 \\ & = \ | 1 _ {[0, t]} \ | ^ 2 \ cdot W \ bigg ( \ frac {1 _ {[0, t]}} {\ | 1 _ {[0, t]} \ |} \ bigg) ^ 2 \\ & = t \ cdot W \ bigg (\ frac {1 _ {[0, t]}} {\ sqrt {t}} \ bigg) ^ 2 \\ & = tH_2 \ bigg (W \ bigg (\ frac {1 _ {[0, t]}}} {\ sqrt {t}} \ bigg) \ bigg) + t. \ end {aligned}

The first term belongs  mathcalH2 , second -  mathcalH0 . This is called decomposition into Wiener chaos.

We showed earlier that  mathcalHn perp mathcalHm for n neqm . The decomposition theorem states that these sets are not only orthogonal to each other, but also form a complete system in L2( Omega, mathcalF, mathbbP) . What does this mean in practice? This means that any random variable X with finite variance can be approximated by a polynomial function of a normally distributed random variable.

In fact
In fact, such a decomposition is useful if the distribution X in a sense, close to the normal distribution. For example, if we are dealing with Brownian motion or with a lognormal distribution. And we have not just mentioned that  m a t h c a l F is created W ( h ) This is a very important condition. In fact, the density of the normal distribution

 rho(x)= frac1 sqrt2 pie−x2/2

very similar to the definition of the Hermite polynomial

Hn(x)=(−1)nex2/2 fracdndxn(e−x2/2), quadn in mathbbN0.


If the distribution X far from Gauss, then you can try other orthogonal polynomials. For example, the density of the gamma distribution:

 rho(x)= fracxn−1e−x Gamma(n).

Nothing like? Yes this is the Laguerre polynomials

Ln(x)= fracexn! Fracdndxn(xne−x)


The Legendre polynomials correspond to the uniform distribution, the Kravchuk polynomials, etc., to the binomial distribution. The theory that develops the idea of ​​decomposing a probability space into orthogonal polynomials is referred to in the English literature as “Polynomial chaos expansion” .

Example. Let's take now H= mathbbR function f and set a random variable X such that

X=f( xi) inL2( Omega, mathcalF, mathbbP),

Where  xi=W(1) sim mathcalN(0,1) . By the decomposition theorem, we can represent it as a weighted sum of Hermite polynomials

f( xi)= sum n=0inftyfnHn( xi),

where the coefficients are given by the formula

fn= frac1n! mathbbE[f( xi) cdotHn( xi)].

These values fn We got as follows:

\ begin {aligned} \ mathbb {E} [f (\ xi) \ cdot H_n (\ xi)] & = \ langle f (\ xi), H_n (\ xi) \ rangle \\ & = \ langle \ sum_ {k = 0} ^ \ infty f_k H_k (\ xi), H_n (\ xi) \ rangle \\ & = \ sum_ {k = 0} ^ \ infty f_k \ langle H_k (\ xi), H_n (\ xi) \ rangle \\ & = f_n \ | H_n (\ xi) \ | ^ 2 = f_n n !. \ end {aligned}

Congratulations! Now, if you have a function of a standard normally distributed random variable, you can decompose it in a Hermitian polynomial basis. For example, throwing a 0-1 fair coin can be represented as

X=1(0, infty)( xi).

After a little conjuring with mathematics (we will leave the reader to calculate simple integrals), we get the decomposition:

X= frac12+ frac1 sqrt2 pi sum n=0infty frac(−1)n2n(2n+1)n!H2n+1( xi).


Note that every second element in the decomposition in basis is zero.

 -- | 'second' function takes a list and gives each second element of it second (x:y:xs) = y : second xs second _ = [] -- | 'coinTossExpansion' is a Wiener chaos expansion for coin-toss rv to n-th element coinTossExpansion :: Int -- number of elements in the sum -> Double -- gaussian random variable -> Double -- the sum coinTossExpansion n xi = sum (take n $ 0.5 : zipWith (*) fn (second $ hermite xi)) where fn = 1.0 / (sqrt $ 2 * pi) : zipWith ( \fn1 k -> -fn1 * k / ((k + 1) * (k + 2)) ) fn [1, 3..] 

The coinTossExpansion function returns the sum obtained by decomposing a random coin into Wiener chaos for a given  xi from 0 before n . The graph shows gradual convergence for selected randomly.  xi with increasing n .


Judging by this schedule, somewhere after n approx$10 we can trim the amount, round and return as X .

 -- | 'coinTossSequence' is a coin-toss sequence of given size coinTossSequence :: Seed -- random state -> Int -- size of resulting sequence -> [Int] -- coin-toss sequence coinTossSequence seed n = map (round.coinTossExpansion 100) (toList nvec) where nvec = gaussianProcess seed n 1 (\(i,j) -> 1) !! 0 

Check what the sequence of 20 flips will look like.

 Prelude> coinTossSequence 42 20 [0,0,1,0,0,0,1,1,0,1,0,0,0,1,0,1,1,1,0,1] 

Now, when you are asked to generate coin flips, you know what to show them.


Well, no joke, we counted something and laid out something, but what is the use of this all, you ask. Do not rush to feel deceived. In subsequent articles, we will show how this decomposition allows us to take a derivative of a random variable (in a certain sense), expand stochastic integration (and your consciousness), and find all this practical application in machine learning.

Source: https://habr.com/ru/post/343148/


All Articles