I was surprised to find that there are no posts about aliasing. The situation needs to be corrected, tk. Aliasing in any complex C ++ program is necessary at least somewhere, yes it is. This can be good, giving the possibility of dexterous optimizations, and can be bad, introducing heightened scall bugs. Under the cut in brief about both cases (well, the constant “compiler beats the back,” of course; for a change, today is gcc).
About aliasing
What is aliasing? Very simple. This is when several different pointers are shown on the same memory location. For example.
int A; int * B = &A; int * C = &A;
')
In this example, the variable A suddenly has
three different names (alias): A, * B, * C. This is a completely legal code. The compiler will successfully process all 3 names, if something is written in A, then through * B it will be possible to read it and vice versa, everything is fine.
About optimization and __restrict
Except for one little thing: possible optimizations. The compiler is obliged to understand and remember about aliasing not only in such a visual case, but also where a person implicitly assumes no alias. For example.
void SumIt ( int * out, const int * in, int count ) { for ( int i=0; i<count; i++ ) (*out) += *in++; }
Normal function with out-parameter, nothing foretells trouble. And she is. No one gave a guarantee to the compiler that the out-variable at * out address does not intersect with the in-data at * in. Independently, he has no right to make such assumptions: you never know how and why a person wants to write? Therefore, at each iteration of the inner loop, * out is written back into memory, even with the maximum level of optimization. Dysasm (gcc -O3 -S, 4.4.3, ubuntu x64) looks like this.
.L7: addq $4, %rax addl (%rsi), %ecx cmpq %rdx, %rax movq %rax, %rsi movl %ecx, (%rdi) ; <-- ! jne .L7
However, the compiler can be told that out does not intersect with in. For this, humanity came up with the
__restrict modifier.
void SumIt ( int * __restrict out, const int * __restrict in, int count ) { for ( int i=0; i<count; i++ ) (*out) += *in++; }
.L14: addq $4, %rax addl (%rsi), %ecx cmpq %rdx, %rax movq %rax, %rsi jne .L14 movl %ecx, (%rdi)
Well, think, 1 instruction? The processors are now smart, with a thick cache and a bunch of pipelines. In this mini-example, the record, of course, will be instantly cached, the extra instruction will probably coexist with something, and I guess the differences will not even be able to measure?
1783293664 in 103818 usec 1783293664 in 69818 usec
Not really. It turns out. Oops, the acceleration is about 1.5 times. Such here and there is the price of one instruction (and two modifiers). Usually it doesn't matter, but for well-loaded inner cycles it is useful.
About strict aliasing and bugs
As you can see, the removal of aliasing can result in a good speed improvement. Apparently, from these considerations in the C99 standard, and through this and C ++, they invented and introduced a rule about
strict aliasing . A reference for people who are proficient in reading and understanding the Standard: N1124, 6.5 (7). A normal person doesn’t really look there: for example, neither the word strict nor the word aliasing is in this paragraph. ;) (It was possible to find it somehow quickly only because in footnote number 74 there is the word aliased.) The particularly important applied meaning “on the fingers”, however, can be explained quite simply.
In strict aliasing mode, the compiler considers that the objects pointed to by "substantially different" types cannot be stored in the same memory area, and can use this for optimizations.What does not matter when the pointers either really show in different places, or are used far enough from each other. But deadly, when pointers are shown in the same memory, they are used side by side, and the compiler is gcc.
#include <stdio.h> typedef unsigned int DWORD; typedef unsigned short WORD; inline DWORD SwapWords ( DWORD val ) { WORD * p = (WORD*) &val; WORD t; t = p[0]; p[0] = p[1]; p[1] = t; return val; } int main() { printf ( "%d\n", SwapWords(1) ); return 0; }
This simple program prints either 65536 when building g ++ test.cpp, or 1 when building g ++ -O3 test.cpp. What the heck?!
The fact is that starting with -O2, -fstrict-aliasing is turned on automatically. And the compiler believes that * p in principle cannot show where val is stored. And this case is successfully optimized to death: if it cannot, then the value of the argument will be returned to us; then SwapWords (1) can be replaced simply with the constant 1.
And in this example, the problem is, in fact, not very much. For if you enable -Wall (or at least -Wstrict-aliasing), the compiler honestly complains about the incomprehensible.
test.cpp:8: warning: dereferencing pointer 'p.14' does break strict-aliasing rules
What is not hard to fix. The kindergarten method is to disable the damned strict aliasing with the -fno-strict-aliasing option. A statutory method of correction, which de facto works everywhere and everywhere, is to stretch the value through the union. Fields union like any other compiler graciously allows aliases with each other. Any tricks with pointers can theoretically spill over (undefined behavior), in the case of gcc, the theory is not hard to turn into practice and vice versa (-fstrict-aliasing).
inline DWORD SwapWords ( DWORD val ) { union { DWORD d; WORD v[2]; } u; ud = val; WORD t = uv[0]; uv[0] = uv[1]; uv[1] = t; return ud; }
Cheers cheers? But alas, there is one small but:
-Wstrict-aliasing does not guarantee anything . To capture
all cases of aliasing that are not compatible with the current compilation mode, it is not enough. I don’t have a short enough one and therefore I have a clear example, so I’ll have to take my word for it: quite a bit of patterned stuffing, a functor is another, and strict aliasing is cleverly disguised and doesn’t allow vorning. In a program with active use of STL and / or Boost, I suspect that imperceptibly violating strict aliasing somewhere in the wilds of the code should be pretty light. Third parties also testify that foci with coercion to void * and back successfully suppressed warning at least on gcc 4.1.x, while leaving the generation of curved code, of course.
Despite the undefined behavior, the screw will of course not format it. (Well, not immediately.) However, the compiler can easily rearrange reading and writing to memory in order to optimize. It looks like this.
inline uint64_t GetIt ( const DWORD * p ) { return *(uint64_t*)p; } int main() { DWORD buf[10]; uint64_t t; buf[0] = 1; buf[1] = 2; t = GetIt(buf); buf[2] = 3; buf[3] = 4; printf ( "%d, %d, %d, %d\n", buf[0], buf[1], int(t>>32), int(t) ); return 0; }
Again, it prints a bit unexpected tsiferki: 1, 2, 32573, -648193368. Unlike the previous example, where the compiler simplified the SwapWords () function to a complete lack of a function, it is here that the read / write swap takes place. The compiler concludes that GetIt (buf) does not depend on the contents of buf, and therefore puts the “call” GetIt () where it sees fit. The necessary is obtained generally before filling the buffer.
mov (%rsp),%r8 ; t = GetIt(buf) ... movl $0x1,(%rsp) ; buf[0] = 1 movl $0x2,0x4(%rsp) ; buf[1] = 2 movl $0x3,0x8(%rsp) ; buf[2] = 3 movl $0x4,0xc(%rsp) ; buf[3] = 4 mov %r8d,%r9d ; r9 = int(t) shr $0x20,%r8 ; r8 = int(t>>32) callq 0x400510 <__printf_chk@plt>
As a result, in some variable there is an incorrect value (either too old or too new) ... and then everything that follows. Catching such a vile bug can be long and unsuccessful: for successful development, the optimizer should decide to optimize it in this “blind” place, optimization should manifest itself so that the results are caught in 2x.
How to confidently fight? I do not know suitable automatic methods and tools. I used to think that the compiler catches more or less; now, however, I know that I can skip a completely trivial conversion, if I slightly wrap it with templates (and maybe even with functions even). Fight because of
prayer and fasting is tough discipline. Changed the pointer type, think about side effects. Feel yourself a compiler, think the boy for him: does the fox run, the eagle fly, the aliasing breaks.
Side-note: You can read a classic detailed post on Comrade about all other subtleties and tricks due to strict aliasing. Mike Acton,
cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.htmlWhat about MSVC?
There are no problems about strict aliasing. Moreover, the ability to include either. Apparently, MS decided that the C99-compliant code in the world without a thin rake about aliasing in the world is much smaller than what is usual, so there is no need to create difficulties. The educational mission is carried out by gcc, well, and the bazhok-the other is sometimes quietly nenerenit, not without it.
This automatically means that tricks about optimization and __restrict for pointers there are somewhat more important. For example, for void SumIt (int64_t * out, const int * in, int count) according to the strict rule, gcc has the right to “guess” that out is unlikely to lie in the middle of in; MSVC will not be able to guess about this. It is necessary to either restrict-it manually, or manually reduce the records to a minimum. Already he can put a local variable in the register.
It is important to understand that a member of the class is
also data that is based on the this pointer. Therefore, a constant reference to a member of a class in a loop can be compiled into a constant torment of memory.
Total
1. You write the inner loop, remember about aliasing and about __ restrict.
2. Convert pointer, remember strict aliasing and nuclear potential of side effects permutation.
3. You use gcc, remember the default -fstrict-aliasing, do not ignore -Wall.
4. You use MSVC, remember about the forced absence of strict-aliasing-style optimizations, optimize with your hand.
5. You see a warning about strict aliasing, figure it out, maybe a UB thread.
6. Can you see a warning about strict aliasing? And he is. Like a gopher.
6.1. Compilers say lies, unreliable versions, -Wall sometimes works (you need to turn it on!), But it does not guarantee anything.
6.2. One cannot believe oneself at all, only benchmarks, dysasm, prayer and fasting.