Tail recursion in C ++ using 64-bit variables - Part 2

In the last note, I talked about problems with recursion in the Fibonacci function when using 64-bit variables as its arguments and compiling code using Microsoft Visual C ++. It turned out that the compiler includes tail recursion when 32-bit variables are used, but does not do this when switching to 64-bit ones. Just in case, I remind you that tail recursion is an optimization made by the compiler, in which some types of tail calls are converted into unconditional jumps. Learn more about tail recursion .

I decided that the problem lies in the Visual C ++ compiler itself and is apparently explained by the presence of a bug in it.

Fibonacci series of large integers are rarely computed, but this is a very good example to demonstrate how tail calls are implemented.

The results did not please me, and, following the advice of some readers (in the comments on this blog and on the Reddit and StackOverflow sites), I decided to sort out the problem and see how other compilers behave.
')
Let's take the same Visual Studio project that was used in the last post and which I posted on GitHub . This was the Fibonacci function:

typedef unsigned long long ULONG64; ULONG64 fib_tail_x64(ULONG64 n, ULONG64 res, ULONG64 next) { if (n == 0) { return res; } return fib_tail_x64(n - 1, next, res + next); }

As you must remember, when compiling this code in release mode (which is usually needed to get tail recursion due to the optimization / Ox command) for the Win32 platform, tail tail recursion did not work:

Tail recursion does not work! Assembly code is terrible due to the included frame pointers

There is, however, one thing that I did not try to try (and if to be honest, I simply forgot). This is the choice of the solution platform for building the project. In Visual Studio, a Win32 configuration that compiles a solution using the x86 processor instructions is the default configuration. In the above fragment of the assembler code, you can see that the registers used are EAX , EBX , etc. - are registers of a 32-bit processor, in accordance with the selected configuration.

If we switch the configuration to x64 , build and run the project, we get the following assembly code:

The tail recursion appeared !!!

Such a surprise! The tail recursion worked when using x64 registers ( RAX , RDX , etc.), and the assembler code became much shorter and cleaner!

Summarizing all the above about tail calls, you can create the following visual table:

It turns out that the problem manifests itself only when we use x64 variables, and we choose x86 as the target platform.

Honestly, at this stage I was not so sure about the presence of a bug in the MS compiler, therefore, still dissatisfied with the results, I decided to repeat the experiment with another compiler.

This time I decided to switch to another operating system and work with Clang, based on LLVM and installed on my Macbook as part of Xcode.

Terminal - clang version

To view the generated assembler code, I used the handy Hopper Disassembler utility, available for both OS X and Linux (but, as I understand it, it can also work with the gdb debugger).

I exactly repeated the same experiment. The following picture shows the first version of the generated assembler:

The tail recursion is clearly present here: the JNE instruction (Jump Not Equal, transition by inequality) is only a conditional transition, when the ZF flag (“zero” flag) is set to 0. The red arrow in the picture indicates the address where the transition occurs if the condition is true. The attentive observer must have already noticed that this code was not compiled for 32-bit processors: the assembler registers used are actually 64-bit ( RBD , RDI , etc.).

However, now our goal is to make sure that tail recursion will still work when choosing a 32-bit platform as the target.

The compiler can be forced to generate a 32-bit code using the -m32 key. After rebuilding the solution in 32-bit mode, I got the following:

Tail recursion enabled!

Hooray!!! By the type of registers used, we can conclude that the executable module is built just for the architecture we specified, and in the last line you can see an unexpected result: tail recursion worked in 32-bit mode !!!!!

EDIT 1

As I understood, many users failed to reproduce the problem (see comments below, as well as on Reddit ). I continued to experiment with compiler optimization settings, adjusting some parameters. In release mode, the configuration of the default optimization settings is as follows:

Default optimization settings

After several attempts, I changed the 'Whole Program Optimization' setting from / GL (on) to No (off). I also changed the 'Omit Frame Pointers' parameter because I was told that with the included frame pointers, the generated assembler code for x86 is much longer and uglier (besides, we could have saved a few ticks, avoiding cache misses). StackOverflow has a detailed explanation of how to disable frame pointers.

Anyway, when I recompiled the solution under Win32 with all the changes described above, I got an unexpected result:

X86 tail recursion - triggered when 'Whole Program Optimization' is turned off

EDIT 2

Several people wrote that, regardless of the setting of the WPO ( Whole Program Optimization ) parameter, they did not experience any problems with tail recursion. This at first puzzled me, but then I realized that I did not take into account one important detail - namely, the method of calling the Fibonacci function from main . Until now, the function call in my code to test the Fibonacci function in x64 mode was as follows:

 ULONG64 res = fib_tail_x64(40, 0, 1);

So far so good, there is nothing special about this code — I simply pass literals as arguments to the function.

And what if instead of them we will pass variables or pointers to a function? Let's try:

 ULONG64 a = 40; ULONG64 res = fib_tail_x64(a, 0, 1);

Although the code does not seem to have changed much, calling the function fib_tail_x64 using variables enables tail recursion independently of WPO (provided that we use the optimization key> = / O2 ). One reader on Reddit pointed out that an indirect function call through pointers (regardless of the use of literals) will give exactly the same result:

 volatile bool a = true; ULONG64 res = 0; ULONG64(*ptr)(ULONG64 n, ULONG64 res, ULONG64 next) = NULL; if (a) { ptr = &fib_tail_x64; } res = ptr(40, 0, 1);

Final conclusions

Although I initially sinned to have a bug in the Visual C ++ compiler that caused the tail recursion to be turned off, in the end, it turned out that this was not the case. The assembler codes generated by the VC ++ and Clang compilers for x86 platforms are very similar to each other. As a first conclusion, I decided that it is imperative to turn off the 'Whole Program Optimization' parameter if and only when a direct call to the x64 recursive function occurs, whose literals are passed as arguments .

After the second check, it turned out that some users did not experience the problem even when they called the function directly, using variables, or indirectly through function pointers.

I would be glad to hear any thoughts from someone from the team working on the Visual C ++ compiler on the causes of this " problem ."

That's all, thank you for your attention! To stay in touch with me, write to the mail through the Contact- form or subscribe to the page on Twitter!

See you soon!

Source: https://habr.com/ru/post/261029/

All Articles

Tail recursion in C ++ using 64-bit variables - Part 2

EDIT 1

EDIT 2

Final conclusions

More articles: