How not to calculate the matrix exponent

Post is written under the influence of post user pchelintsev_an .

In this article I will try to tell you what computational difficulties can be encountered if we go along the “naive” way of calculating the matrix exponent. The article may be useful to those who would like to get acquainted with computational mathematics, but already familiar with such concepts as the system of ordinary differential equations and the Cauchy problem. The experiments were conducted using the GNU Octave system.

What is the matrix exponent in general and what is it used for?

The matrix exponential arises when considering the Cauchy problem for a linear system of ordinary differential equations with constant coefficients:

')
Sometimes such systems arise after partial discretization of partial differential equations, for example, in finite element methods. Then the matrix A is some finite-dimensional approximation of the differential operator. As a consequence, the number of rows of matrix A can easily reach thousands, millions and even several billions. Such matrices cannot even be stored as a square array, so they are stored in special sparse presentation formats.

The solution to the ODE system can be written explicitly:

Here a matrix exponent appears, which is defined as the sum of a series

In fact, the matrix exponent is obtained by the formal substitution of the matrix A into the Taylor series for e ^z (The mathematician notes that the series converges absolutely for any matrix and any t ). Conveniently, using the matrix exponential, one can calculate the solution at any time moment t. For example, if this system is solved by some method of numerical integration, for example, Adams or Runge-Kutta , then the elapsed time will be proportional to time t. However, the solution through the matrix exponential is possible only for autonomous systems with constant coefficients.

In the course of differential equations for practical calculations of the matrix exponential, a method is proposed for reducing the matrix to Jordan form : if matrix A is represented as SJS ^-1 , where J is the Jordan form, S is the transition matrix to the Jordan basis, then the matrix exponent A is expressed through the matrix exponent J :

The matrix exponent of the Jordan matrix J is written out explicitly. However, this method requires the calculation of the system of eigenvectors, as well as vectors attached to them. This task is much more difficult than the original one. Moreover, even for a real matrix A, its Jordan form J can be complex.

Summation series

So, we have an infinite series and the desire to calculate it in the foreseeable time. The obvious way is to finish the summation on some term, discarding the remainder. In this case, the summation of the series ends if the next kth term in the norm is small:

Matrix norms

Let us briefly dwell on the norm of the matrix. There are quite a few different norms of the matrix. You can enter a rate based on the requirement

In this case, we speak about the norm of the matrix induced (or subordinate) to some vector norm. For the most commonly used vector norms, the corresponding matrix norms are of the form:

The first two norms of the matrix are considered elementary, and for the latter, it is required to calculate the singular number, which is very difficult. In addition to induced norms, there are a number of other norms. The Frobenius norm is practically convenient, which, on the one hand, is easily calculated, and on the other hand, it is closely related to the singular numbers A. In fact, this is the Euclidean norm of the vector obtained by “stretching” the matrix into one row:

Surprises in the summation series

To understand the difficulties that arise when summing up the Taylor series, it is not necessary to refer to the matrix exponent example. Let us simply try to plot the e ^-x function on the interval [0, 50], summing up in the same way the Taylor series for the exponent. Summation is completed if the next addendum does not exceed 10 ^-20 . Let's use the following code on Octave:

function [y] = myexp(x, eps) xk = ones(size(x)); y = xk; k = 0; do k = k + 1; xk = xk .* x / k; y = y + xk; until (max(abs(xk)) < eps) end x = linspace(0, 50, 2000); y = myexp(-x, 1e-20); plot(x, y, 'b', x, exp(-x), 'r'); axis([0 50 -0.1 1.1]);

The result is rather strange:

At x> 30, the graph of the function starts to behave erratically and has nothing to do with the exponent.

The reason for this effect is in the accumulation of rounding errors. For example, in calculating e ^-35 , 132 terms were summed, of which the minimum absolute value was the last term s ₁₃₂ ≈ 5.8674 · 10 ^-21 , and the maximum absolute value had the term 35: s ₃₅ ≈ -1.067 · 10 ¹⁴ . Since Octave performs calculations with double precision (about 16 significant digits), the rounding of the 35th term has already introduced an error of the order of 0.01, which significantly exceeds the value ϵ = 10 ^-20 and the value of the sum of the series e ^-35 ≈ 6.305 · 10 ^-16

Is the Taylor series so hopeless? No, rounding errors will not make a significant contribution if the sum of the summable terms does not reach such huge values. For example, such an algorithm will work reliably with small values of x (| x | ≲ 1). For the matrix exponent, the similar condition is $\|t\mathbf{A}\| \lesssim 1$ . In addition to computational stability, such a choice of x ensures fast convergence, that is, it suffices to take only a few terms from the series to obtain acceptable accuracy.

Argument Doubling Method

Note that the matrix exponent satisfies the following relation:

$e^{t\mathbf{A}}=\left( e^{\frac{t}{N}\mathbf{A}} \right )^N$

This means that the matrix exponent can be calculated for the matrix $\frac{t}{N}\mathbf{A}$ and then raise the result to the power of N. Taking as N = 2 ^p, you can use the fast exponentiation algorithm for p matrix multiplications:

Calculate matrix exponent $\\\mathbf{B} := \exp\left\{\frac{t}{2^p}\mathbf{A}\right\}$
Multiply B by yourself $\mathbf{B} := \mathbf{B} \cdot \mathbf{B}$ by repeating this operation p times
As a result $e^{t\mathbf{A}} = \mathbf{B}$

The number p can (and should) be chosen from considerations of computational stability of step 1 of the algorithm, that is,
$p = \lceil\log_2 \|t\mathbf{A}\|\rceil$

Corresponding Octave code:

 function [B] = taylorexp(t, A, eps, max_terms) B = eye(size(A)); Ak = B; k = 0; while norm(Ak, 'fro') > eps && k < max_terms k = k + 1; Ak = (t / k) * Ak * A; B = B + Ak; end end function [B] = argdbl(t, A, eps, max_terms) p = ceil(log2(abs(t) * norm(A, 'fro'))); B = taylorexp(t / pow2(p), A, eps, max_terms); for i = 1:p B = B * B; end; end

Pade approximation method

A segment of the Taylor series is not the only way to approximately calculate the value of a function. Better approximations of a function can be obtained not in the form of polynomials, but in the form of rational functions. The best approximations in the form of a ratio of two polynomials of given degrees are called Pade approximants . For example, the Padé approximation by the ratio of two quadratic polynomials for the exponent is written as:

$e^z \approx \frac{1 + \frac{z}{2} + \frac{z^2}{12}}{1 - \frac{z}{2} + \frac{z^2}{12}}$

A similar approximation for the matrix exponential will be:

$e^{t\mathbf{A}} \approx \left(\mathbf{E} - \frac{t}{2}\mathbf{A} + \frac{t^2}{12}\mathbf{A}^2 \right )^{-1} \left(\mathbf{E} + \frac{t}{2}\mathbf{A} + \frac{t^2}{12}\mathbf{A}^2 \right )$

Padé approximants can be used for step 1 in the doubling algorithm. The main difficulty associated with Pade approximants is the need to invert the matrix once. For large matrices, this can be quite laborious.

Accuracy analysis

How to make sure the matrix exponent is calculated correctly? Is it enough that $\|e^{t\mathbf{A}} - \mathbf{B}\| < \varepsilon$ Or is a more “intelligent” check required?

Recall that the matrix exponent arose not on level ground, but as a result of solving a certain system of ordinary differential equations. The eigenvalues of the matrix A characterize the rates of the processes occurring in the system described by this system of differential equations. For example, the matrix:

$\mathbf{A} = \begin{pmatrix} -1000 & 0 & 0\\0 & 0 & 1 \\ 0 & -1 & 0 \end{pmatrix}$

describes a system with one rapidly decaying process (eigenvalue -1000) and two oscillatory processes (eigenvalues ± i ). A good criterion for the correctness of the calculation of the matrix exponent can be the ratio:

$\lambda(e^{t\mathbf{A}}) = e^{t\lambda(\mathbf{A})}$
connecting the eigenvalues of the matrix A and its matrix exponent.

Computational experiment

To begin with, we investigate algorithms for performance on random matrices of different sizes. As methods, we use the Taylor series summation method, the argument doubling method, and the expm function built into Octave. Initially, we also wanted to compare the correspondence of eigenvalues, but this idea had to be abandoned, since all three algorithms equally failed this check with matrix sizes N ≳ 50. Perhaps the culprit was the eig eigenvalue determination function that does not work accurately or the conditionality of the eigenvalue definition problem itself values for these matrices.

Based on the graph, the doubling method works about twice as long as the library function (it may be worthwhile to weaken the exaggerated accuracy requirement), while direct summation has even a different asymptotic ≈O (N ^3.7 ), while the others have the asymptotic O (N ³ ).

Speed Testing Code

 tries = 10; N = 10; while N < 1000 taylortime = 0; dbltime = 0; expmtime = 0; for it = 1:tries A = rand(N); A = 0.5 * (A + A'); tic; B2 = argdbl(1, A, 1e-20, 1e10); dbltime = dbltime + toc; tic; B3 = expm(A); expmtime = expmtime + toc; tic; B1 = taylorexp(1, A, 1e-20, 1e10); taylortime = taylortime + toc; end taylortime = 1e3 * taylortime / tries; dbltime = 1e3 * dbltime / tries; expmtime = 1e3 * expmtime / tries; printf('%d,%f,%f,%f\n', N, taylortime, dbltime, expmtime); fflush(stdout); N = round(N * 1.25); end

Next, let's check how the algorithm works with large eigenvalues. Consider the following system of ordinary differential equations:

$\\x_1'(t) = -500 x_1(t) + 500 x_2(t) + x_3(t) \\x_2'(t) = 500 x_1(t) - 500 x_2(t) + x_3(t) \\x_3'(t) = -x_1(t) - x_2(t)$
With the initial condition:

$\mathbf{x} = \begin{pmatrix}1 &0& 1 \end{pmatrix}^\mathsf{T}$ .

Its peculiarity is that one of its eigenvalues is -1000 (a rapidly decaying solution), and the other two are equal to ± i √2 (oscillatory solutions). If the imaginary eigenvalues are distorted, it will be clearly visible. Take the time step τ = 0.038, find the matrix exponent $\mathbf{B} = e^{\tau \mathbf{A}}$ different ways. The solution of the system on the grid with step τ is obtained by the following simple algorithm:

The initial solution is known from the initial condition.
Knowing the solution u _n at time t _n , the solution at time t _{n + 1} = t _n + τ is obtained from it by multiplying by the matrix exponent B:

$\mathbf{u}_{n+1} = \mathbf{B} \mathbf{u}_n$
Repeat the second point until the end of the integration segment is reached.

Let's look at the obtained numerical trajectories (only the first component of the vector is displayed):

All trajectories, except those obtained by direct summation of the Taylor series, visually coincide. Please note that the graph does not go out of the initial condition x ₁ (0) = 1. In fact, the graph exponentially approaches the point y = 1, just the characteristic “rotation” of the graph (∼ 1/1000) is much less than the time step τ = 0.038.

TAC decision code

 mu = 1000; function [B] = trueexp(t, mu) B = [ (exp(-mu*t) + cos(sqrt(2)*t))/2.,(-exp(-mu*t) + cos(sqrt(2)*t))/2.,sin(sqrt(2)*t)/sqrt(2); (-exp(-mu*t) + cos(sqrt(2)*t))/2.,(exp(-mu*t) + cos(sqrt(2)*t))/2.,sin(sqrt(2)*t)/sqrt(2); -(sin(sqrt(2)*t)/sqrt(2)),-(sin(sqrt(2)*t)/sqrt(2)),cos(sqrt(2)*t); ]; end A = [-mu/2, mu/2, 1; mu/2, -mu/2, 1; -1, -1, 0]; t = 0:.038:100; x0 = [1 0 1]'; B1 = trueexp(t(2), mu); B2 = expm(t(2) * A); B3 = taylorexp(t(2), A, 1e-20, 1e10); B4 = argdbl(t(2), A, 1e-20, 1e10); y1 = x0; y2 = x0; y3 = x0; y4 = x0; for i = 2:length(t) y1 = B1 * y1; y2 = B2 * y2; y3 = B3 * y3; y4 = B4 * y4; printf('%f', t(i)); y = y1; printf(' %e %e %e', y(1), y(2), y(3)); y = y2; printf(' %e %e %e', y(1), y(2), y(3)); y = y3; printf(' %e %e %e', y(1), y(2), y(3)); y = y4; printf(' %e %e %e', y(1), y(2), y(3)); printf('\n'); end

Why such a strange time step

All the norms of the matrix A have a magnitude of about 1000. Accordingly, the norm of the matrix τA will be about 38. This value corresponds to the area (see the graph for the e ^-x function above), where rounding errors become significant when using the Taylor series head-on summation. If you take a step more, errors will grow catastrophically, the solution will quickly go beyond the digit capacity, if you take a step smaller, the effect will not be so noticeable.

We can say that the example is synthetic, and in practice it will not occur, however it is not. There are many so-called rigid Cauchy problems , which are characterized by a strong scatter of eigenvalues (in fact, the speeds of physical processes).

List of used sources

Fedorenko R.P. Introduction to computational physics. M .: Publishing House of Moscow. Phys.-tech. in-that, 1994. - 528. § 17
Cleve Moler and Charles Nineteen of the Matrix, Twenty-Five Years Later // SIAM Rev., 45 (1), 3–49. (47 pages)
Heirer E., Nersett S., Vanner G. The solution of ordinary differential equations. Non-rigid tasks. M .: Mir,
1990. - 512 s.
Hayrer E., Vanner G. Solution of ordinary differential equations. Hard and differential algebraic problems. M .: Mir, 1999. - 685 p.

Source: https://habr.com/ru/post/239303/

All Articles