Python, Delphi and C ++ through the eyes of a scientist

An article about using Python in scientific computing prompted me to write this article. This is a story that happened to me and my colleagues 6 years ago. At that time, I had already done enough with Delphi and Python, but only now I feel that I have worked enough with C / C ++ to sensibly assess the time to “repair” the broken code and, in general, the total development time. Yes, this is an article about code that was written by different people in Delphi, Python and C ++ for the same task, within the same team.

Before describing the situation, I want to leave the second preamble for those programmers who have not encountered programming in science and do not have basic ideas about it. Carefully, then there will be a few mini-heart attacks:
- In the code you just have to have a lot of variables. And many of them should have names like mu_e, mu_m, E, T_f, T, and so on - simply because these formulas enter into the formulas in this way, and scientists who are engaged in research in this area can spend a lot of time remembering the name of the coefficient is in its full form, but it will unmistakably recognize the letter, by which it is indicated in all articles, and will say - “Aha! This is our mu ... "
- Nobody protects the code from incorrect input and there is no UX. The code is written in order to work with him scientists, graduate students, and specially trained students, and never - typical users and hackers.
- You have no technical assignment. You do research, so you write one code, look at the result that it issued, analyze, and depending on the results, the next task is formed, under which the existing project will be added.
- Most of the code will be used once or twice. One - if a graduate student, or a researcher makes a certain calculation. Two - if the student makes a certain calculation, and then demonstrates the work of the code to his supervisor. Accordingly, the code is almost never abstract, and it can encounter a crutch of any kind and size.
')
For the first time, that's enough, I'll start my story. I was a little younger, my head was clean, my heart was hot, and my hands were cold and itched to do something. And it so happened that one of the first tasks in the new research group (after a change of manager), for me, was the analysis of someone else's code, which, for reasons unclear to its creator, did not work. Then follow three debugging stories and program settings. One of them goes slightly in the direction of reflection, the other two are about actually taken actions, and just as reliably describe the average experience of a scientist with the appropriate programming language.

The task was the same for all - the solution of a differential equation (for which there is no analytical solution) with different initial conditions. The equation was large and variegated — f (x) and x, describing the state of the system in some non-stationary process, were included in it in derivatives, and in their private derivatives, and in exponents, and in logarithms, and in various power functions. Some coefficients for powers implicitly also depended on them, and they needed to be recalculated at each step. In diplomas and dissertations, such equations are quite common, and they usually introduce substitutions in such expressions to fit in one line, but often substitutions do not help — they are already painted on the whole sheet. Predicting the form of such a function (it is also the behavior of the system) simply by the form of the equation setting it is an absolutely impossible task for a person, even a person with the most developed abilities to analyze functions and the most good intentions. Accordingly, the task of the program is to immediately output this very kind, having drawn a graph.
Such problems are solved simply: there is one approximately known point, which is taken for the original data. Then, a step to the side is taken from it at each iteration of the loop, with the calculation of the new values from the old ones. In the case of the Euler method, the step size is proportional to the derivative at the starting point. In the case of the Runge-Kutta method, the dependence is somewhat more complicated, to be slightly more accurate. There are other methods where the point, strictly speaking, is unknown to you. But you know the y coordinate of one point, and the x coordinate of another. Then slightly more perverted methods are applied, but they always boil down to the enumeration of all possible options and the multiple use of Euler, or RK. Well, finding the point - you must immediately send them to the interface in the schedule. It sounds easy, let's get started.

Let's start with C ++
After checking the entire code, with many variables with one- and two-letter names, you come to the conclusion that the program must correctly calculate all the coefficients, correctly find the values for the next step, proceed correctly to the next step, and correctly plot the data. Consequently, the program as a whole must work correctly, you decide, and start.
The computer freezes up and some very bad error occurs in some strange way connected with memory.
You are left with a problem how to debug everything. The obvious way is to turn on the debugger and walk along the flow of calculations - in this case it is not appropriate. After all, you do not know how many times managed to work out the cycle, suddenly a million, two? So, to start debugging, you need to find the point of departure. That is, as many as four new stages are planned: rewrite the code so that it outputs the cycle number to the log, to notice the last iteration number where the crash occurs, then rewrite the code again to stop at this iteration and set the red dot in the debugger, and walk on the last iteration of the debugger, hoping that the error will become clear. Wherein
a) you have no guarantees that the log will remain due to the OS crash. You may have to do the work visually, breaking the first stage into two: first output the cycle number to the console and try to remember the last number approximately, then rewrite the code to include an artificial retarder after this memorized iteration - then in the next test you can see the numbers before the fall begins to arrive slower, and to have time to read the last iteration (say, 6524th), it means that the number of falls will not be 3, but 4.
b) you have no guarantees that you understand, or even find an error the first time. Most likely, somewhere before the error itself, the code will take you deep into the rabbit hole - through the winding labyrinths of standard libraries, and even if you walk along just one branch, you will need to examine a significant part of the tree of possibilities around this branch to understand what happened, because the functions it often just throws some scraps of data from some checks to others, to understand many of them, you also need to understand the physical structure of machine operations, and understand the meaning of others comes to people century only after attempts to write its operating system. Moreover, after you decide what you find the problem, there are no guarantees that you are right. A long debugging process has begun, and you still have a lot of tests ... Instead of four drops, we get n, where n> 3.
c) the drops required for tests, generally speaking, the thing is rather dumb. What will happen? And even more so when to do it many times.

Now you are sitting and thinking, is it worth starting this plan at all? It looks cumbersome, and the uncertainty of you is a little scary - you never know where the error was hidden.

Pascal / Delphi
Check code. Generally speaking, it may take more time, but in this particular case, you like that all 66 coefficients are declared and specified at the beginning, rather than anywhere in the code, for example, before use. You checked the coefficients, dropped the tablet and now just checked all the formulas. Again, all is well. Run the program, and - lo and behold, the graph is drawn. The graph is drawn, reaches the end of the section of the numerical axis left for it and stops. The program did not crash, nothing terrible happened with the OS. But it happened with the data. He smoothly reaches the coordinate 6524, and then suddenly a mad flea bites him. He jumps up and down completely randomly and fills the remaining half-space on the right with pitch darkness.
At this stage you are depressed by failure. But still you are in a winning position compared to C, because:
a) the system did not fall
b) you have already localized the iteration where the error occurs
c) you can try new fixes without fear that the system will suddenly fall

and d) - you see one value displayed. It is wrong. And it is possible, already at this stage, that it may be, say, an overflow, or an error of floating point numbers. It remains to understand exactly where, in order not to create thousands of extensions. You add a line of code to stop at number 6524. Pass by the debugger and notice a suspicious operation of a very small number on a very large one, starting with which an error lasts, which leads to divergence. There is, in your function, such a particular point, a tiny section where the state has a local minimum, but outside the small fence there is a potential for a steep descent leading to the divergence of the entire solution and leading the values out of the limits and further into the overflow of their types. And naturally, the numerical method misses a little at this step past the desired local minimum, due to the banal lack of accuracy for only two coefficients. And it becomes noticeable only here - in that narrow place, where even a slight deviation sends iterations along the bad path of divergence. So you need to rewrite the code once more, screwing in a couple of extended versions ... But again, some jamb happens, although the graph looks a bit different. I mean, of course, the black part of it. Kicking. How much more need to conjure over the code? Well, work on the implementation of work with long numbers. And to find (after correcting these two) one more "leaking" * coefficient.
* flowing - this is because going beyond the type must lead to a change in the bit to the left of its physical storage, i.e. We need one more, or several bits that are not contained by our reference to an object of this type. Ok, it didn't take that long.

rewrite in Python
Check the code, run. The problem is not observed. The graph is drawn in full, on the whole area. No jumps and blacks, no flights.
The interpreter's magic helps a good python to see how much memory my number requires, and to allocate exactly as much as it will be enough for him.

graph of changes in heat during structural relaxation of AMS

graph of changes in heat during structural relaxation of AMS

Development is complete.

The attentive reader may notice that we, scientists, may soon want to execute this code by going through the parameters, and placing it in a nested loop. And then the speed of Python will stab us in the back. However, I should note that the code for other parameter values could have weak spots at other points, on other calculated coefficients. And then the person has a choice - either to leave the Python code to work all night for the sake of a single graphic. Either leave the code in C ++ to work while he is making tea for himself. But when he comes back - there is a tremendous chance to see the system hang, or errors of another kind. He will sit down to correct them for a new weak point, then he will spend one more trip for tea in vain, as a result he will spit and start writing more abstract code that will suit his growing program requirements. That's just to write abstract code that monitors variables in various computational streams and gives them more memory if necessary - not forgetting to reduce the types to less capacious ones so as not to clutter the memory - this means writing a significant part of the Python interpreter.

So I have since been using Python for almost all scientific works (I did one in a special package, and only for the sake of beautiful visualization for the occasion), and even figured out some of its intricacies. What and you want.
Use Python. Move the science. Rays of good.

Source: https://habr.com/ru/post/349624/

All Articles

Python, Delphi and C ++ through the eyes of a scientist

More articles: