📜 ⬆️ ⬇️

On the question of Bezier curves and speed Arduino, part two

We will go by - and on




In my previous post I showed how you can improve the speed of calculating points on the Bezier curve (KB) by:

  1. Transformations of calculation formulas - acceleration ~ 3 times.
  2. There is almost no transition from the PT numbers to the FT - there is almost no acceleration, but it allows to carry out 3.
  3. Replacing the division operation by multiplication and shift - an acceleration of another 40%.

Sad retreat
- I made an inaccuracy in the last formula, it was still possible to speed up the calculations a bit more by folding another constant expression and, excluding multiplication, instead of 502, get 410 cycles per calculation cycle. Unfortunately, none of the readers of the previous post mentioned this to me in the comments ... but I hoped for it, which means that I could not interest my readers enough to read my opuses correctly (that is, carefully). Okay, let's try one more time.


At the end of the post, I said that you can make the calculations even faster, but what “the boy said - the boy did”.
')
Let me remind you once again the resulting formula for calculating the point on the KB

=(1and+1)and+(=>22+)

The next increase in speed is related to the peculiarity of the task - we don’t need to calculate the KB value for a different value of the “i” parameter, but to find a series of values ​​when this parameter changes (in this case, an increase) by a known, moreover, fixed value (in our case one), which allows us to use the method described below. In my time, this was called the difference calculation method (if memory doesn’t change me, at least it was called so in lectures), the entire Internet is at your disposal, maybe (even for sure) there is another name.

Consider the case of P = A * and (=> 1 *), and suppose that we know the value of P0 for some argument u0. Then the value at the next argument u0 + 1 can be calculated as = * (0 + 1) = * 0 + = 0 + (=> 1+). Without losing any accuracy, we were able to replace the multiplication operation with an addition operation, which is much faster.

Now a more complex example is P = A * and * and (=> 2 *), we consider by analogy P = A * (and + 1) * (and + 1) = A * (and * and + 2 * and + 1) = A * and * and + 2 * A * and + A = P0 + 2 * A * and + A (=> 2 * 2 +). At first glance, we did not win anything, but if we calculate in advance A '= 2 * A, then we get (=> 1 * 2 +), the gain is also quite possible. But we will not stop on what has been achieved and on the resulting expression A '* and apply the method already known to us, then we will get two operations on two variables: = + ' + ; A '= A' + A (=> 3+). If we also take into account that the initial value of A 'can be made more on A, then in general there are only two additions instead of two multiplications. Yes, we had to make two additional variables, but this is a classic exchange - we pay with memory for the time.

It gives only to calculate the correct initial values, but this is done only once at the beginning of the iterations, and if the initial value of the parameter and = 0, then P = 0 is generally trivial, A '= A. Let's test our method at several iterations (this is completely unnecessary, a correctly applied mathematics cannot give a wrong answer), but it will allow us to better understand what is happening. so
iteration 0: and = 0; P = 0 (true); A '= A; A "= 2 * A;
iteration 1: and = 1; P = 0 + A '= 0 + A = A (true); A '= A' + A '' = A + 2 * A = 3 * A;
iteration 2: and = 2; P = A + 3 * A = 4 * A (true); A '= 3 * A + 2 * A = 5 * A;
iteration 3: and = 3; P = 9 * A (true); A '= 7 * A and so on.

We note the connection of the formulas obtained with the Newton method for calculating the value of a function in a neighborhood of a certain point with a known value. Moreover, given that our function has a second order and all derivatives, starting from the third, are equal to zero, we can safely replace the sign of approximate equality with the exact one. The only difference from this method is that we constantly move the point relative to which we are building a new neighborhood, in order to avoid performing multiplication operations in the original formulation.

Rewrite our original expression for KB in expanded form

=1andand+1and+

, then to calculate using this method we need 2+ for the first addend (and two variables), 1+ for the second (and one variable) and 2+ to add everything together (=> 5+). It can be expected that even this (wrong) approach will give a gain compared to 2 * 2 +, but everything is much better. Obviously, the addition operation is additive (thank you, captain), so you can group the terms and replace the constant terms with new expressions, which gives the following algorithm:

1. initial values ​​ = ; A '= A1 + B1; A "= 2 * A1;
2. at the next step, = + '; A '= A' + A '' (=> 2+), which is undoubtedly faster than the Horner scheme.

We implement our algorithm in the form of a program (this is not as trivial as it may seem, since I forgot to simplify the need for shifts, but not too difficult ... for 20 minutes), we get the computational complexity (=> 2 + 1 >>) and measure speed - it turned out 140 (increase in speed by 3 times) cycles per cycle, but this is almost an ideal result. What we have to do to get the ideal (in some sense) option is to pay attention to the dimension of the operands in the formulas. Now we carry out all operations on long (32-bit) integers, and this may be unnecessary in some places. If we reduce the digit capacity of the operands to the minimum necessary, we can get an interest gain of 20-25, but this will require us to go to an assembler (C does not have the means suitable for such operations) and careful analysis of the data of the original problem. Whether it is done by the reader, we have already accelerated the calculations by more than 1900/140 = 13 times compared with the “naive” approach.

The last remark to the implementation of the algorithm - we still exclude the division operation by replacing it with multiplication by a constant, which we take into account when forming the initial data and shifting by a constant multiple of 8. This can be done in various ways with slightly different execution time, leaving similar experiments to the inquisitive reader .

However, quite unexpectedly, one problem arose - we clearly see two addition operations over 32 bit numbers, which should take 4 + 4 = 8 clocks, set another 8 * 3 * 2 = 48 clocks for operand transfers, 4 clocks for receiving the shift result, 4 tact for calling the procedure of calculation and return and 4 cycles for organizing the cycle - it is not clear where another 60 cycles come from. Previously, we simply did not notice this against the background of hundreds of computation bars, but now we can pay attention. Redundant clocks were easily found - unnecessary debugging operations remained in the cycle, if you cleaned up everything neatly, then only 120 clocks remain and the purpose of each one is understandable (well, not so much that would be completely understandable, but still). Next, we find out the time for the empty cycle - 22 clocks, I wonder what they are doing there all this time, but oh well, and clear the actual calculation time, which will be 98 clocks. Note that the estimate of the resulting acceleration of work changes: instead of 1900/140 = 13, we get (1900-50) / (140-40) = 19 times, which makes no practical sense, since the cycle is still necessary, but allows us to raise even more self-esteem.

And the last remark - as it is not difficult to see, we began to search for and eliminate fleas only when we dealt with large deer-beetles and their (fleas) existence became obvious and, moreover, significant. I strongly recommend exactly this approach (and in this recommendation I am not alone) when optimizing programs in terms of speed.

Well, in conclusion about the note "in a certain sense" - if it is a question of sequentially calculating the coordinates of the next point on the KB when the parameter (personifying time) changes, then the proposed algorithm (after all the optimizations) cannot be improved. But if the task is reformulated and, for example, set a goal to simply build a design bureau on the screen (without reference to time), then there is a very promising method, the key word Brezenheim, but “this is a completely different story”, to which I will dedicate a separate post (maybe someday, if the older sister does not mind).

Source: https://habr.com/ru/post/429254/


All Articles