Automatic differentiation "on the fingers"

Intel is developing not only software for “external” consumers - programs are written that are used only inside Intel. Among them are quite a lot of tools for numerical modeling of various physical processes occurring in the manufacture of processors - after all, the latter are the main products of Intel. In these programs, of course, various methods of computational mathematics and physics are widely used.
Some time ago I needed to solve one equation programmatically using the Newton method. It would seem that everything is simple, but for this you need to be able to calculate the derivative of the left side of the equation. This left part was rather complicated for me - even just the calculation of its values in the program was scattered across several functions - and the prospect of calculating the derivative on a piece of paper did not please me. The prospect of using any package of symbolic calculations pleased me no more - it was far from pleasant to recruit all formulas that also contain several special cases. The option to calculate the derivative numerically as the difference of the values of the function in two neighboring points divided by the corresponding increment of the independent variable is fraught with loss of accuracy and in general the need to choose the appropriate increment of this variable.
After thinking for a while, I applied the following approach. Then I found out that it is called “ automatic differentiation ”, there is quite extensive literature for it in English, and a number of libraries - but I found only some scientific articles about the use of this method in Russian, and a post on Habrahabr, which tells everything through a mixture dual and complex numbers, and to understand that the move, in my opinion, hard. On the other hand, for understanding and practical application of automatic differentiation, no dual numbers are needed, and I will explain this approach here.

Simple considerations

So, may we have any function f (x)

. Let us at some point x_0

know its meaning f (x_0) = f_0

and know the value of its derivative $f'(x_0)=f_0'$ . We only know two numbers : f_0

and

, we don’t need to know anything more - no expression for f (x)

nor even meanings x_0

. Consider the function g (x) = 2f (x)

and ask ourselves: what is its value and the value of its derivative at the point x_0

? Obviously: g (x_0) = 2f_0

and

. Please note that on the right side there are only those numbers that we know.

The question is a bit more complicated: consider the function h (x) = f ^ 2 (x)

, and ask about it the same question. It is easy to see that in this case we easily find h (x_0) = f_0 ^ 2

and

. Similarly, for $l (x) = \ cos (f (x))$ we have $l (x_0) = \ cos (f_0)$ and $l '(x_0) = - \ sin (f_0) f_0'$ . And so on: for any elementary function applied to f (x)

, we can calculate the value and the derivative using only two numbers: f_0

and

.

Further, let there be some other function. m (x)

, and we also know only two numbers about it: m (x_0) = m_0

and

. Then for the function p (x) = f (x) + m (x)

we can just as easily calculate both the value and the derivative at the same point; and similarly for any binary operation. For example, for multiplication we have: if $q (x) = f (x) \ cdot m (x)$ then q (x_0) = f_0m_0

and

.
')
Thus, we can freely perform any operations on pairs of numbers (the value of a function, the value of its derivative), and we will not need any additional information. If at some point we know the values of some "basic" functions and the values of their derivatives, then for any complex function determined through these "basic" functions, we can automatically calculate both its value and the value of its derivative.

Code

It is easy to write a class that implements such arithmetic:

class Derivable { double val, deriv; public: Derivable(double _val, double _deriv) : val(_val), deriv(_deriv) {} double getValue() {return val;} double getDerivative() {return deriv;} Derivable operator+(Derivable f) { return Derivable(val + f.val, deriv + f.deriv); } Derivable operator-(Derivable f) { return Derivable(val - f.val, deriv - f.deriv); } Derivable operator*(Derivable f) { return Derivable(val * f.val, deriv * f.val + val * f.deriv); } Derivable operator/(Derivable f) { return Derivable(val / f.val, (deriv * f.val - val * f.deriv) / f.val / f.val); } friend Derivable cos(Derivable f); }; Derivable cos(Derivable f) { return Derivable(cos(f.val), -sin(f.val)*f.deriv); }

Now, if we have a code that evaluates a certain function, then it is enough just to replace the double everywhere with Derivable - and we get a code that calculates the same function along with its derivative. True, of course, the question arises: where to start? We still know how to get new Derivable , but where do we get the very first Derivable ? In fact, everything is clear. The expressions for our function include, first, various constants, independent of

and, secondly, actually itself

. Any constant

independent of

, it is necessary, of course, to replace Derivable(c, 0) ; and the occurrences of the variable itself

- on Derivable(x0, 1) . (Here, for clarity, x0 means that

for which we calculate the function. In a real program, most likely, the corresponding variable will also be called x ).

Here is an example of the function calculation $ax ^ 3- \ cos (x / 2)$ together with its derivative:

 Derivable f(double x, double a) { Derivable xd(x, 1); Derivable ad(a, 0); Derivable two(2, 0); return ad*xd*xd*xd - cos(xd/two); }

Naturally, in order not to fence such code, it is easier to add an implicit type conversion to our class:

  Derivable(double c): val(c), deriv(0) {}

and a named constructor for an independent variable:

  static Derivable IndependendVariable(double x) { return Derivable(x,1); }

after which the function code f becomes even simpler:

 Derivable f(double x, double a) { Derivable xd = Derivable::IndependendVariable(x); return xd*xd*xd*a - cos(xd/2); }

Here is an example code to solve the equation. $ax ^ 3 = \ cos (x / 2)$ Newton's method (here I, of course, made the operators global, so that a*xd worked; above, they are class members only in order not to clutter the code). If you want to change the equation to be solved, you need to change the function code f and that's it; the derivative will be automatically calculated correctly. (Of course, in this example, it might be easier to calculate the derivative by hand, because the function is simple, but as soon as the equations become more complicated, the possibility of not thinking about the code for calculating the derivative is very convenient.)

All this will work, even if we have not very complex branches in our code, loops, calls to other functions, etc. - the main thing is that everywhere all intermediate variables (including the results of intermediate functions) make Derivable , the rest of the code will not even need to be changed!

Of course, you can come up with a function for which this approach will not work - what if it comes to your mind to calculate the function f by the Monte Carlo method? - but still the scope of automatic differentiation is quite wide. (And then, if you really calculate your function by the Monte-Carlo method, how do you even imagine its derivative?)

Generalizations

Finally a few words about the generalizations of this approach. In the case of several independent variables, everything is summarized in an obvious way: we simply store the corresponding number of derivatives. It is useful, perhaps, to make the Derivable class a template that accepts the number of independent variables as a template parameter.

It is also easy to generalize everything to the case of derivatives of higher degrees. I did not do this myself, but right off the bat it seems to me that this is not very difficult, although I will have to think a little. First, of course, it is possible to manually derive formulas for recalculating all derivatives to the necessary degree for each operator and elementary function, and hammer these formulas into a code. Secondly, you can still think in terms of dual numbers (see the links from the beginning of the post), or (in my opinion, simpler), in terms of decomposition into a Taylor series: if we store not the derivatives, but the coefficients of decomposition in a row, we just need to be able to perform operations on the rows, which is quite simple.

Thirdly, there is another interesting approach that I saw out of the corner of my eye in one of the open-source codes for automatic differentiation (see the links from Wikipedia). You can make the Derivable class a template that accepts the data type for the function and derivative values as a template parameter (i.e.

Derivable ..; Derivable  ). Derivable<Derivable >, ? . , : , , , , , . , Derivable<Derivable >, , .


, ; «forward» «reverse» ( . ). , -, , , — .


http://habrahabr.ru/post/63055/
http://en.wikipedia.org/wiki/Automatic_differentiation,
http://www.coin-or.org/CppAD/Doc/mul_level.htm ( Derivable<Derivable >)

`Derivable ..; Derivable ). Derivable<Derivable >, ? . , : , , , , , . , Derivable<Derivable >, , . , ; «forward» «reverse» ( . ). , -, , , — . http://habrahabr.ru/post/63055/ http://en.wikipedia.org/wiki/Automatic_differentiation, http://www.coin-or.org/CppAD/Doc/mul_level.htm ( Derivable<Derivable >)`

Derivable ..; Derivable  ). Derivable<Derivable >, ? . , : , , , , , . , Derivable<Derivable >, , .


, ; «forward» «reverse» ( . ). , -, , , — .


http://habrahabr.ru/post/63055/
http://en.wikipedia.org/wiki/Automatic_differentiation,
http://www.coin-or.org/CppAD/Doc/mul_level.htm ( Derivable<Derivable >)

`Derivable ..; Derivable ). Derivable<Derivable >, ? . , : , , , , , . , Derivable<Derivable >, , . , ; «forward» «reverse» ( . ). , -, , , — . http://habrahabr.ru/post/63055/ http://en.wikipedia.org/wiki/Automatic_differentiation, http://www.coin-or.org/CppAD/Doc/mul_level.htm ( Derivable<Derivable >)`

Derivable ..; Derivable  ). Derivable<Derivable >, ? . , : , , , , , . , Derivable<Derivable >, , .


, ; «forward» «reverse» ( . ). , -, , , — .


http://habrahabr.ru/post/63055/
http://en.wikipedia.org/wiki/Automatic_differentiation,
http://www.coin-or.org/CppAD/Doc/mul_level.htm ( Derivable<Derivable >)

Source: https://habr.com/ru/post/170729/

All Articles

Automatic differentiation "on the fingers"

Simple considerations

Code

Generalizations

More articles: