Kalman filter

On the Internet, including on Habré, you can find a lot of information about the Kalman filter. But it is hard to find an easily digestible conclusion of the formulas themselves. Without conclusion, all this science is perceived as a kind of shamanism, formulas look like a faceless set of symbols, and most importantly, many simple statements lying on the surface of the theory are beyond comprehension. The purpose of this article will be to tell about this filter in the most accessible language.
Kalman Filter is the most powerful data filtering tool. Its main principle is that when filtering information is used on the physics of the phenomenon itself. For example, if you are filtering data from the machine’s speedometer, the inertia of the machine gives you the right to perceive too fast speed jumps as a measurement error. The Kalman filter is interesting because in some sense this is the best filter. We will discuss in more detail below what exactly the words “the best” mean. At the end of the article I will show that in many cases the formulas can be simplified to such an extent that almost nothing will remain of them.

Likbez

Before getting acquainted with the Kalman filter, I propose to recall some simple definitions and facts from probability theory.

Random value

When they say that a random variable is given

, they mean that this value can take random values. It takes different values with different probabilities. When you roll, say, a bone, a discrete set of values will appear:

. When it comes to, for example, the speed of a wandering particle, then, obviously, we have to deal with a continuous set of values. "Fallen" values of a random variable

we will denote by

but sometimes, we will use the same letter, which we denote a random variable:

In the case of a continuous set of values, a random variable characterizes the probability density

which dictates to us that the probability that a random variable “falls out” in a small neighborhood of a point

long

equals

. As we see from the picture, this probability is equal to the area of the shaded rectangle under the graph:

')
Quite often in life, random variables are distributed in Gauss, when the probability density is

We see that function

has a bell shape with a center at

and with a characteristic width of the order

.
Once we started talking about the Gaussian distribution, it would be a sin not to mention where it came from. As well as numbers

and

firmly established in mathematics and are found in the most unexpected places, so the Gauss distribution took deep roots in probability theory. One remarkable statement partially explaining Gaussian omnipresence is as follows:
Let there be a random variable

having an arbitrary distribution (in fact, there are some restrictions on this arbitrariness, but they are not at all rigid). Will hold

experiments and calculate the amount

"Dropped" values of a random variable. We will do a lot of such experiments. It is clear that each time we will receive a different value of the amount. In other words, this sum is in itself a random variable with its own specific distribution law. It turns out that with sufficiently large

the distribution law of this sum tends to the Gaussian distribution (by the way, the characteristic width of the "bell" grows as

). We read in more detail in Wikipedia: the central limit theorem . In life, very often there are quantities that are made up of a large number of equally distributed independent random variables, and therefore distributed according to Gauss.

Average value

The average value of a random variable is what we will get in the limit if we carry out a lot of experiments and calculate the arithmetic average of the values that have been dropped. Mean value is denoted differently: mathematicians like to denote by

(mathematical expectation or mean value), and foreign mathematics through

(expectation). Physics through

. We will designate in a foreign way:

.
For example, for Gaussian distribution

, the average value is

Dispersion

In the case of the Gauss distribution, we clearly see that the random variable prefers to fall out in a certain neighborhood of its mean value.

Once again admire the Gaussian distribution

As can be seen from the graph, the characteristic variation of the values of the order

. How to estimate this spread of values for an arbitrary random variable, if we know its distribution. You can draw a graph of its probability density and estimate the characteristic width per eye. But we prefer to go algebraically. You can find the average length (modulus) of deviations from the mean:

. This value will be a good estimate of the characteristic variation of values.

. But we all know very well that the use of modules in formulas is one headache, so this formula is rarely used to estimate the characteristic spread.
An easier way (simpler in terms of calculations) is to find

. This value is called the variance, and is often referred to as

. The root of the variance is a good estimate of the spread of the random variable. The root of the variance is also called the standard deviation.
For example, for the Gaussian distribution

it is possible to calculate that the variance determined above

exactly equal to

and therefore the standard deviation is

, which agrees very well with our geometrical intuition.
In fact, there is a small fraud. The fact is that in the definition of a Gaussian distribution, under the exponent is the expression

. This deuce in the denominator is precisely to ensure that the standard deviation

would be equal to the coefficient

. That is, the Gaussian distribution formula itself is written in a form specially tailored for the fact that we will consider its standard deviation.

Independent random variables

Random variables are dependent and not. Imagine that you are throwing a needle on a plane and recording the coordinates of its both ends. These two coordinates are dependent, they are connected by the condition that the distance between them is always equal to the length of the needle, although they are random variables.
Random variables are independent if the result of the fallout of the first one is completely independent of the result of the fallout of the second one. If random values

and

are independent, then the average value of their product is equal to the product of their average values:

Evidence

For example, having blue eyes and graduating from school with a gold medal is independent random values. If the blue-eyed, say

and gold medalists

the blue eyed medalists

This example tells us that if random variables

and

given by their probability densities

and

, the independence of these quantities is expressed in the fact that the probability density

(the first value fell

and the second

) is according to the formula:

It immediately follows that:

As you can see, the proof is carried out for random variables that have a continuous spectrum of values and are given by their probability density. In other cases, the idea of proof is similar.

Kalman filter

Formulation of the problem

Denote by

the value that we will measure, and then filter. This may be a coordinate, speed, acceleration, humidity, stink degree, temperature, pressure, etc.
Let's start with a simple example that will lead us to the formulation of a common task. Imagine that we have a radio-controlled machine that can only go back and forth. Knowing the weight of the car, the shape, the road surface, etc., we calculated how the control joystick affects the speed of movement.

Then the coordinate of the machine will change according to the law:

In real life, we cannot take into account in our calculations the small perturbations acting on the car (wind, bumps, pebbles on the road), therefore the real speed of the car will differ from the calculated one. A random variable will be added to the right side of the written equation.

We have a GPS-mounted sensor that tries to measure the true coordinate

cars, and, of course, can not measure it accurately, and measures with an error

which is also a random variable. As a result, we get erroneous data from the sensor:

The challenge is that, knowing the wrong sensor readings

, find a good approximation for the true coordinates of the machine

. This is a good approximation we will denote as

.
In the formulation of the general problem, for the coordinate

can answer anything (temperature, humidity ...), and the member responsible for controlling the system from the outside, we denote

(in the example with the machine

). The equations for the coordinates and sensor readings will look like this:

(one)

Let's discuss in detail what we know:

- This is a known value that controls the evolution of the system. We know it from the physical model we built.
Model error and sensor error - random variables. And their distribution laws do not depend on time (on the iteration number ).
The mean values of the errors are zero: .
The very law of the distribution of random variables may not be known to us, but their variances are known. and . Note that dispersions are independent of because the distribution laws are independent of it.
It is assumed that all random errors are independent of each other: what error will be at the moment of time completely independent of the error at another point in time .

It is worth noting that the filtering task is not a smoothing task. We do not strive to smooth the data from the sensor, we strive to get the closest value to the real coordinate

Kalman algorithm

We will reason by induction. Imagine that on

step, we have already found the filtered value from the sensor

which brings the true coordinate of the system well

. Do not forget that we know the equation controlling the change in the unknown coordinate to us:

therefore, without yet getting the value from the sensor, we can assume that in step

the system evolves according to this law and the sensor will show something close to

. Unfortunately, while we can not say anything more accurate. On the other hand, on the move

we will have an inaccurate sensor reading

.
The idea of Kalman is that in order to get the best approximation to the true coordinate

, we have to choose the middle ground between reading

inaccurate sensor and

- our prediction of what we expected to see from him. The sensor reading we give weight

and the predicted value will be weight

Coefficient

called the Kalman coefficient. It depends on the iteration step, so it would be better to write

, but for now, in order not to overload the calculation formulas, we will omit its index.
We must choose the Kalman coefficient

so that the resulting optimal value of the coordinate

would be closest to the true coordinate

. For example, if we know that our sensor is very accurate, then we will trust its readings more and give the value

more weight (

close to one). If the sensor, on the contrary, is not at all accurate, then we will focus more on the theoretically predicted value.

.
In general, to find the exact value of the Kalman coefficient

you just need to minimize the error:

Use equations (1) (those that are on a blue background in a frame) to rewrite the expression for the error:

Evidence

Now it's time to discuss what the expression means to minimize the error? After all, the error, as we see, is in itself a random variable and each time takes on different values. In fact, there is no unambiguous approach to determining what the error means is minimal. Just as in the case of the variance of a random variable, when we tried to estimate the characteristic width of its spread, so here we will choose the most simple criterion for calculations. We will minimize the average of the squared error:

Let's write the last expression:

key to proof

From the fact that all random variables in the expression for

, independent and average values of sensor errors and models are zero:

, it follows that all "cross" members are equal to zero:

.
Plus, the formulas for dispersions look much simpler:

and

(because

)

This expression takes the minimum value when (we equate the derivative with zero)

Here we are already writing the expression for the Kalman coefficient with the step index

, thus we emphasize that it depends on the iteration step.
Substitute the expression for the root-mean-square error

minimizing its value of the Kalman coefficient

. We get:

.

Our problem is solved. We obtained an iterative formula for calculating the Kalman coefficient.

All formulas in one place

Example

The advertising picture at the beginning of the article filtered data from a fictional GPS sensor mounted on a fictional car that travels evenly accelerated with known fictional acceleration

Look again at the result of filtering.

Code on matlabe

clear all; N=100 % number of samples a=0.1 % acceleration sigmaPsi=1 sigmaEta=50; k=1:N x=k x(1)=0 z(1)=x(1)+normrnd(0,sigmaEta); for t=1:(N-1) x(t+1)=x(t)+a*t+normrnd(0,sigmaPsi); z(t+1)=x(t+1)+normrnd(0,sigmaEta); end; %kalman filter xOpt(1)=z(1); eOpt(1)=sigmaEta; % eOpt(t) is a square root of the error dispersion (variance). It's not a random variable. for t=1:(N-1) eOpt(t+1)=sqrt((sigmaEta^2)*(eOpt(t)^2+sigmaPsi^2)/(sigmaEta^2+eOpt(t)^2+sigmaPsi^2)) K(t+1)=(eOpt(t+1))^2/sigmaEta^2 xOpt(t+1)=(xOpt(t)+a*t)*(1-K(t+1))+K(t+1)*z(t+1) end; plot(k,xOpt,k,z,k,x)

Analysis

If you follow up with an iteration step

Kalman coefficient changes

then it can be shown that it always stabilizes to a certain value.

. For example, when the mean square errors of the sensor and the models relate to each other as ten to one, the graph of the Kalman coefficient depending on the iteration step looks like this:

In the following example, we will discuss how this will help to significantly ease our lives.

Second example

In practice, it often happens that we do not know anything at all about the physical model of what we filter. For example, you wanted to filter readings from your favorite accelerometer. You also do not know in advance by what law you intend to turn the accelerometer. The maximum information you can use is the variance of the sensor error.

. In such a difficult situation, all ignorance of the motion model can be driven into a random variable.

But, frankly, such a system does not at all satisfy the conditions that we imposed on a random variable.

because now all the physics of motion unknown to us is hidden there, and therefore we cannot say that at different times the model errors are independent of each other and that their average values are zero. In this case, by and large, the theory of Kalman filter is not applicable. But, we will not pay attention to this fact, and, stupidly apply all the colossus of formulas, choosing the coefficients

and

on the eye, so that the filtered data looked pretty.
But you can go the other way, a much simpler way. As we saw above, the Kalman coefficient

with increasing step number

always stabilizes to the value

. Therefore, instead of choosing coefficients

and

and find the complex ratio of Kalman coefficient

, we can assume that this coefficient is always constant, and select only this constant. This assumption will not spoil anything. First, we are already illegally using the Kalman theory, and secondly, the Kalman coefficient quickly stabilizes to a constant. In the end, everything is very simplified. We do not need any formulas from the Kalman theory, we just need to find an acceptable value.

and paste into the iterative formula:

The following graph shows data filtered in two different ways from a fictional sensor. Provided that we know nothing about the physics of the phenomenon. The first method is honest, with all the formulas from Kalman's theory. And the second is simplified, without formulas.

As we see, the methods are almost the same. A small difference is observed only at the beginning, when the Kalman coefficient has not yet stabilized.

Discussion

As we have seen, the main idea of the Kalman filter is to find the coefficient

such that the filtered value

on average, the least would be different from the real value of the coordinate

. We see that the filtered value

there is a linear function of sensor readings

and previous filtered value

. And the previous filtered value

is, in turn, a linear function of the sensor reading

and the previous previous filtered value

. And so on, until the chain fully unfolds. That is, the filtered value depends on all previous sensor readings linearly:

Therefore, the Kalman filter is called a linear filter.
It is possible to prove that of all linear filters, the Kalman filter is the best. Best in the sense that the mean square error of the filter is minimal.

Multidimensional case

The whole theory of the Kalman filter can be generalized to the multidimensional case. The formulas there look a little worse, but the very idea of their derivation is the same as in the one-dimensional case. In this beautiful article you can see them: http://habrahabr.ru/post/140274/ .
And in this wonderful video, an example of how to use them is analyzed.

Literature

You can download the original Kalman article here: http://www.cs.unc.edu/~welch/kalman/media/pdf/Kalman1960.pdf .
This post can be read in English http://david.wf/kalmanfilter

Source: https://habr.com/ru/post/166693/

All Articles

Kalman filter

Likbez

Random value

Average value

Dispersion

Independent random variables

Kalman filter

Formulation of the problem

Kalman algorithm

Example

Analysis

Second example

Discussion

Multidimensional case

Literature

More articles: