About one runtime optimization error

Initially, the post was planned to devote to the 64-bit xlc compiler error that I unsuccessfully caught many hours and which takes place on IBM's AIX architecture servers. But it so happened that this error affects many compilers, and Visual Studio 2010 with SP1 installed is no exception. What seems funny in the end is because it suggests that Microsoft is working with IBM developers to create optimizing compilers.

A little background. There is one scientific project that was written in C ++ for a long time and now it is being successfully transferred to many platforms, among which are HP-UX, IBM AIX and Oracle Solaris mainframes. The transfer in the long run is that the compile-time errors are corrected, a group of tests is started, and if all the tests pass, then a conclusion is drawn that the code is working.

Since the speed of performing mathematical procedures is very important, the compilation runs with the -O2 speed optimization key enabled. But on the IBM AIX architecture, the xlc compiler for some reason cannot create workable code that satisfies the test suite. At the same time, without the -O2 key, everything works fine.
')
I, of course, could try to catch this error directly on the IBM AIX mainframe, if I had enough time in stock, but in the absence of a debugger (the error did not appear in debug mode) I had to catch in the old manner, by inserting printf into code sections. I did not give remote access to IBM AIX, I had to work directly in the data center and during those few hours spent behind the terminal, I could not understand anything intelligible, except that there was a mistake and it was quite stable. As a result, the error has been sitting in the code for a long time.

This continued until I tried to transfer the code to Visual Studio 2010 SP1.

And lo and behold! The error manifested itself in the same original state, namely, in the 32-bit mode, everything works fine when the -O2 flag is turned on and without it, and in x64 when -O2 is turned on, one of the tests “swears” exactly as it was on IBM AIX! This is a victory, because now I could, without limiting myself to the time frame, thoughtfully digging an unplowed field of code, experimenting and consistently comparing the results of printf with the right and wrong passing of tests.

The result was not long in coming. Below is the extract from the full code, it is the most reduced code in size. This code does not work in the 32-bit mode either, since the N parameter is 4. If we set #define N 8, then we get the original code that runs on 32 bits but is not working on x64. For simplicity (not everyone has x64, and many will probably want to try), here is the source code that doesn't work on any architecture.

So, let's try to compile this code with and without the -O2 key:

#include <stdio.h> #define N 4 unsigned char a[N]; void f(unsigned int k) { int i; for(i=0;i<N;++i) { a[i]=k&0xf; k>>=4; } } int main(void) { int i; static unsigned int x=0x76543210; f(x); if (a[3]==2) { printf("Error!\n"); } for(i=0;i<N;i++) { printf("%02x ", a[i]); } printf("\nsizeof(void*)=%d\n", sizeof(void*)); return 0; }

Write the program code in the file test32.c

To compile, we will use Visual Studio 2010 SP1 and we will make the code for a 32-bit operating system. We will build and launch using the following batch file:

 call "C:\Program Files\Microsoft Visual Studio 10.0\VC\vcvarsall.bat" cl /nologo test32.c /Fano_opt >nul echo   test32 pause echo   cl /nologo -O2 test32.c /Fawith_opt >nul test32

After launch, we get the results:

 Setting environment fr using Microsoft Visual Studio 2010 x86 tools.   00 01 02 03 sizeof(void*)=4 Press any key to continue . . .   Error! 00 01 02 02 sizeof(void*)=4

It can be seen that after optimization we get 00 01 02 02 instead of 00 01 02 03.

Why it happens?

Consider the assembler file with_opt.asm obtained with optimization enabled.

The assembler file no_opt.asm obtained with optimization turned off is not very interesting to us, since everything is working fine there. Those interested can find it in their working directory.

Optimization enabled:

 _TEXT SEGMENT _main PROC ; COMDAT ; Line 16 mov eax, DWORD PTR ?x@?1??main@@9@9 mov cl, al shr eax, 4 mov dl, al shr eax, 4 and al, 15 ; 0000000fH and cl, 15 ; 0000000fH and dl, 15 ; 0000000fH mov BYTE PTR _a, cl mov BYTE PTR _a+1, dl mov BYTE PTR _a+2, al mov BYTE PTR _a+3, al ; Line 17 cmp al, 2 jne SHORT $LN4@main ; Line 18 push OFFSET ??_C@_07NPIJMNAB@Error?$CB?6?$AA@ call _printf add esp, 4 $LN4@main:

It is easy to notice that the function call f () really does not occur, the compiler immediately calculates the values of the variable x and fills the array a. Moreover, during optimization, the filling is incorrect, the elements of the _a + 2 and _a + 3 array are filled with the same values from the al register.

The same is true when compiling a 64-bit executable file. To work with 64-bit code, we replace the first line in the batch file:

 call "C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\vcvarsall.bat" amd64

We will get the same incorrect result, but only with sizeof (void *) = 8, which confirms the 64 bits of the received code:

 Setting environment fr using Microsoft Visual Studio 2010 x64 tools.   00 01 02 03 sizeof(void*)=8 Press any key to continue . . .   Error! 00 01 02 02 sizeof(void*)=8

The x64 assembly code looks like this:

 main PROC ; COMDAT ; Line 15 $LN21: push rbx sub rsp, 32 ; 00000020H ; Line 16 mov ecx, DWORD PTR ?x@?1??main@@9@9 movzx eax, cl shr ecx, 4 and al, 15 mov BYTE PTR a, al movzx eax, cl shr ecx, 4 and cl, 15 and al, 15 mov BYTE PTR a+1, al mov BYTE PTR a+2, cl mov BYTE PTR a+3, cl ; Line 17 cmp cl, 2 jne SHORT $LN4@main ; Line 18 lea rcx, OFFSET FLAT:??_C@_07NPIJMNAB@Error?$CB?6?$AA@ call printf $LN4@main:

It is easy to see that the function f () is also not called here, and the compiler immediately calculates the values of the variable x and fills the array a. In this case, the elements of the array _a + 2 and _a + 3 are filled with the same values from the register cl, which is wrong.

As a result, the source code of the f () function was fixed as follows:

 void f(unsigned int k) { int i; for(i=0;i<N;++i) { a[i]=(k>>4*i)&0xf; } }

And then everything worked fine on both Visual Studio x86 / x64 and xlc for IBM AIX.

The speed of the tests with the -O2 key eventually increased about 2.5 - 3 times.

UPD: To avoid misunderstandings, changed the int type to unsigned int in the code, the error remained. Previous version can be found here.

UPD2: Received an official response from Microsoft:
Posted by Microsoft on Nov 2, 2011 at 11:17 am

Thanks for reporting this issue. I can confirm this problem with VS2010 SP1. It will be fixed in Visual Studio.

ian bearman
VC ++ Code Generation and Optimization Team

Source: https://habr.com/ru/post/131615/

All Articles

About one runtime optimization error

More articles: