I will start, perhaps, with the obvious (to the left of this text). The image shown here is fairly well known. It shows that Intel employees usually wear Atom processors and rice grains on their fingers instead of rings. It demonstrates the size of the Intel Atom processor compared to rice grain. And I will demonstrate to you literally “on the fingers” simple and, I hope, useful for C / C ++ programmers, tips on optimizing software for the Intel Atom. In general, the only open source source of optimization wisdom for Intel processors is the Intel® 64 and IA-32 Architectures Optimization Reference Manual - contains an entire chapter (# 13) on Atom. There are numerous optimization tips ... but only for those who write in assembly language. Here is a typical example: " Assembly / Compiler Coding Rule 4.-For Intel Atom Processors, Catch Rule 4". Do you understand everything? Fine! But the number of Atoms in the universe is constantly growing, and the number of software writers in assembly language is decreasing, so that many can only experience an inferiority complex to read higher-level tips below.
Multithreading, more precisely - two-threading, according to the number of logical cores. Atom is a very efficient Hyper Threading (available on most Atom models). So if you split your code into two threads, you can count on a performance boost of 30-50% (against the expected 15-20% on Intel desktop architectures).
Memory alignment. When aligning memory by 16 bytes, it is really possible to get 10% of the winnings in an application that actively allocates and copies memory.
A serious threat to performance is the repeated challenge of short (small) functions. Whenever possible, such functions should either be merged or forcedly inline. Performance gains from such the simplest optimization on serious applications can be as high as 20%! Short functions can be hidden in the libraries used (for example, math), as well as in shared Linux objects (PIC code). By the way, the next function will also be short if bar = 0. void foo () { if (bar) { / * do something necessary, long and complicated * / } }
The cache in Atom is small, so the locality of access to data is particularly relevant here. if possible, do not “jump” through arrays, but bypass them sequentially; structure what is often used and do not load the cache with access to dead souls data.
Atom works very slowly with data of type double. About 5 times slower than with a float! Moreover, both in scalar and vector instructions (SSE). So, whenever possible, refuse double precision.
Atom is also slowly engaged in division. It’s better not to divide at all, but if you have to, then know that unsigned division is faster than signed, 8-bit is faster than 16-bit, which, of course, is faster than 32-bit. The Intel compiler has a special flag to reduce the precision = acceleration of the “-no-prec-div” division. And yet - the division unit in the processor is one, it is shared by all threads, so this can become a bottleneck.
To work with float (even in the scalar case) faster through vector instructions (or intrinsic). The gcc compiler's “–fpmath = sse” flag generates code from x87 to sse. Intel compiler does the same automatically..
And finally, compilers. Here are the recommended flags for compiling under Atom. In addition to the accelerations described above, the Intel compiler optimizes the code at the level of microinstructions (Intel engineers honestly studied the manual given at the beginning :)) gcc <4.5: -march = core2 -mtune = generic -fpmath = sse –O3 [–ffast-math] gcc> = 4.5: -march = atom –fpmath = sse -03 [-flto] [–ffast-math] icc <11.1: -xL –O3 –ipo [-fno-alias] [-no-prec-div] icc> = 11.1: -xSSE3_ATOM –ipo [–ansi-alias] [-no-prec-div]
Tips are given simply in the form of recipes, without explaining why this is so. But the standard answer sounds like, “Thus God created the Atom.” If deeper explanations are required - comments and personal correspondence at your service.