Optimal options for x86 gcc

It is widely believed that GCC is lagging behind other compilers in performance. In this article we will try to figure out which basic optimizations of the GCC compiler should be applied to achieve acceptable performance.

What are the default options in GCC?

(1) By default, GCC uses the optimization level “-O0”. It is clearly not optimal in terms of performance and is not recommended for compiling the final product.
GCC does not recognize the architecture on which the compilation is run until the ”-march = native” option is passed. By default, GCC uses the option specified during its configuration. To find out the GCC configuration, just run:

gcc -v
“Configured with: [PATH] / configure ... --with-arch = corei7 --with-cpu = corei7 ...”

This means that GCC will add “-march = corei7” to your options (unless another architecture is specified).
Most GCC compilers for x86 (basic for 64-bit Linux) add: “-mtune = generic -march = x86-64” to the given options, because the configuration did not specify the options that define the architecture. You can always find out all the options passed when you start GCC, as well as its internal options with the command:

echo "int main {return 0;}" | gcc [OPTIONS] -xc -v -Q -

As a result, often used:

gcc -O2 test.c

build test.c without any specific architectural optimizations. This can lead to a significant performance degradation (relative to architecturally optimized code). Disabled or limited vectorization and non-optimal code planning are the most common causes of a performance degradation if you do not specify or specify the wrong architecture.
To indicate the current architecture, you need to compile as follows:

gcc -O2 test.c -march = native

Specifying the architecture used is important for performance. The only exception can be those programs where the call of library functions takes almost the entire launch time. GLIBC can choose the optimal function for a given architecture at runtime. It is important to note that with static linking, some GLIBC functions do not have versions for different architectures. That is, a dynamic build is better if the speed of the GLIBC functions is important. .
(2) By default, most GCC compilers for x86 in 32-bit mode use the x87 floating-point model, since they were configured without “-mfpmath = sse”. Only if the GCC configuration contains “--with-mfpmath = sse”:

gcc -v
“Configured with: [PATH] / configure ... --with-mfpmath = sse ...”

the compiler will use the default SSE model. In all other cases, it is better to add the “-mfpmath = sse” option to the assembly in 32-bit mode.
So, often used:

gcc -O2 -m32 test.c

can lead to a significant loss of performance in the code with real arithmetic. Because the correct option:

gcc -O2 -m32 test.c -mfpmath = sse

Adding the option "-mfpmath = sse" is important in 32 bit mode! The exception is the compiler, in the configuration of which there is “--with-mfpmath = sse".

32 bit mode or 64 bit mode?

32-bit mode is usually used to reduce the amount of used memory and as a result, to accelerate work with it (more data is placed in the cache).
In 64 bit mode (compared to 32 bit mode), the number of available shared registers increases from 6 to 14, XMM registers from 8 to 16. Also, all 64 bit architectures support SSE2 expansion, therefore, in 64 bit mode, you do not need to add the option “-mfpmath = sse ".
It is recommended to use 64 bit mode for counting tasks, and 32 bit mode for mobile applications.

How to get maximum performance?

There is no definite set of options for maximum performance, however there are many options in GCC that are worth trying. Below is a table with recommended options and growth predictions for Intel Atom processors and 2nd Generation Intel Core i7 regarding the “-O2” option. The predictions are based on the geometric average results of a specific set of tasks compiled with GCC version 4.7. It is also assumed that the compiler configuration was done for x86-64 generic.
The forecast of productivity increase on mobile applications is relatively “-O2” (only in 32 bit mode, since it is the main one for the mobile segment):

-m32 -mfpmath = sse	~ 5%
-m32 -mfpmath = sse -Ofast -flto	~ 36%
-m32 -mfpmath = sse -Ofast -flto -march = native	~ 40%
-m32 -mfpmath = sse -Ofast -flto -march = native -funroll-loops	~ 43%

The performance forecast for computing tasks with respect to “-O2” (in 32 bit mode):

-m32 -mfpmath = sse	~ 4%
-m32 -mfpmath = sse -Ofast -flto	~ 21%
-m32 -mfpmath = sse -Ofast -flto -march = native	~ 25%
-m32 -mfpmath = sse -Ofast -flto -march = native -funroll-loops	~ 24%

The performance forecast for computing tasks with respect to “-O2” (in 64-bit mode):

-m64 -Ofast -flto	~ 17%
-m64 -Ofast -flto -march = native	~ 21%
-m64 -Ofast -flto -march = native -funroll-loops	~ 22%

The advantage of 64-bit mode over 32-bit for computational tasks with the options “-O2 -mfpmath = sse” is about ~ 5%
All data in the article are predictions based on the results of a specific set of benchmarks.
Below is a description of the options used in the article. Full description (in English): http://gcc.gnu.org/onlinedocs/gcc-4.7.1/gcc/Optimize-Options.html "

"-Ofast" is similar to "-O3 -ffast-math" includes a higher level of optimizations and more aggressive optimizations for arithmetic calculations (for example, real reassociation)
"-flto" intermodule optimizations
"-m32" 32bit mode
"-mfpmath = sse" includes using XMM registers in real arithmetic (instead of real stack in x87 mode)
"-funroll-loops" includes looping

Source: https://habr.com/ru/post/158939/

All Articles

Optimal options for x86 gcc

What are the default options in GCC?

32 bit mode or 64 bit mode?

How to get maximum performance?

More articles: