
Let's try to understand what is new in the GCC compiler for Intel Atom architecture processors and how this affects the performance and code size of the well-known benchmark
EEMBC CoreMark .
Above is a graph showing the performance of CoreMark compiled with peak and base options set by different versions of GCC relative to the performance of the basic set of options for GCC version 4.4.6 (higher is better).
The following compiler options were used for testing:
base options set (base):
“-O2 -ffast-math -mfpmath = sse -m32 -march = atom”base options set (base) + if convertion:
“-O2 -ffast-math -mfpmath = sse -ftree-loop-if-convert -m32 -march = atom”peak set of options (peak):
“-Ofast -funroll-loops -mfpmath = sse -m32 -march = atom”, for versions 4.4 and 4.5, “-Ofast” was replaced by “-O3 -ffast-math”More about the optimal options for GCC on x86 was written
here . It is worth noting that the option
“-flto” does not add to the performance of CoreMark.
From the graph, it is clear that the basic set of options with “-ftree-loop-if-convert” achieved peak-performance performance on CoreMark.Below is a graph showing an increase in the size of the CoreMark executable code compiled with a peak set of relative base options for different versions of GCC:
')

Below is a graph showing the increase in the size of the CoreMark executable code compiled by different versions of GCC with a basic set of options relative to the basic set of options on GCC 4.4.6:
“-Ffunction-sections -Wl, - gc-sections -fno-asynchronous-unwind-tables -Wl, - strip-all” have been added to the basic and peak set of options for measuring code size. These options do not affect the performance of CoreMark.
In more detail about options for the optimal size of executable code was written
here .
From the graphs it can be seen that the code size on the peak set of options is 2 times larger than on the base one and continues to grow. The basic set of options, by contrast, provides a slight decrease in code size.All measurements were made for 1 stream on a 2-core Intel Atom CPU D525, 1.80GHz with 4Gb of memory, the Fedora 17 operating system.GCC showed very good progress from version 4.4 to version 4.8 (mainly from version 4.6 to version 4.7 and from “-ftree-loop-if-convert” on the basic set of options version 4.8). The size of the code on the basic set of options remains unchanged, on the peak set it grows.
Below is a brief description of options and changes in GCC from version to version:
- In the GCC 4.5 version, the option "-march = atom" was first introduced ( for more information ). GCC 4.4 is mentioned in the article as the latest version without Atom support. CoreMark for this version was compiled with “-march = i686 -mtune = generic -mssse3” . Until now, a large number of Unix systems use gcc-4.4 +. However, it is worth noting that some special gcc-4.4 can support “-march = atom” . For example, Android NDK gcc-4.4.
- GCC version 4.6 is distinguished by the best heuristics for function substitution (inline) and the opportunity to improve the performance of CoreMark due to the appeared option: "-ftree-loop-if-convert" . By default, this option is enabled starting from “-O3 (-Ofast)”. Added to the basic set of options, it speeds up CoreMark by ~ 8%. Official list of changes to GCC 4.6.
- In the GCC version 4.7 with the enabled “-march = atom” , new Atom-specific optimizations have appeared, in particular, optimizations that improve the performance of the LEA and IMUL instructions. Initially, on the Atom architecture, IMUL required switching to a special mode, and therefore it was advantageous to group IMUL (fixed in the latest Atom processor Silvermont ). LEA, the result of which went to the ALU, lost performance. Therefore, it was beneficial to replace some LEAs with a sequence of ADD and MOV (fixed in the latest Silvermont Atom processor). Official list of changes .
- In version 4.8, optimization over logic commands has been improved. As a result, the pressure on registers in some CoreMark functions has decreased (this applies only to the basic set of options with "-ftree-loop-if-convert" ). Also in version 4.8, it became possible to reduce the pressure on registers: “-fschedule-insns -fsched-pressure " (Previously, options were extremely unstable. At CoreMark, this adds about 1% to the peak set of options. Most often, " -fschedule-insns -fsched-pressure " improves performance when the option is turned on: " -funroll-loops . " Official changelog .
What if in GCC version 4.8
"-march = atom" would only include
“-march = i686 -mtune = generic -mssse3” ? CoreMark performance would drop 5%.
"-ftree-loop-if-convert” adds another 13% to the performance of the basic options set.
If code size and performance are important for your Atom application, switch to GCC version 4.8 and try to compile with options:
“-O2 -ffast-math -mfpmath = sse -ftree-loop-if-convet -fschedule-insns -fsched-pressure -m32 -march = atom”If only performance is important, then GCC 4.8 is optimal with options:
“-Ofast -flto -funroll-loops -mfpmath = sse -fschedule-insns -fsched-pressure -m32 -march = atom”