
The times when programmers tried to squeeze the maximum out of the size of their application were irretrievably gone. The main reason is a significant increase in the amount of RAM and disk space on modern computers. Few people remember how when you downloaded an application from a cassette you could go to eat. Or how it was possible to count the blinking of the drive, indirectly determining the size of the application. Perhaps, only developers of software for embedded systems still care about the size of the code and the memory consumed. Can tablets and smartphones bring developers back to the future?
This article is designed to help software developers using the GCC compiler reduce the code size of their applications. All data in the article was obtained using the x86 GCC compiler version 4.7.2 on the Fedora 17 operating system for the Intel Atom architecture.
Quite a significant gain in terms of GCC size comes from dynamic linking (enabled by default). How much dynamic linking gains from static depends heavily on the libraries used.
Most often, when it comes to size optimization, the “-Os” option is used. Below is a table with the average geometric size code for a set of applications for smartphones and tablets.
The results in the plate are shown relative to “-Os”. A smaller result indicates a smaller code size. “-M32, -mfpmath = sse, -march = atom” are included by default in all cases.
-O2 | 6% |
-O2 -flto | -five% |
-Ofast | 11.5% |
-Ofast -flto | 3% |
-Ofast -funroll-loops | nineteen% |
-Ofast -funroll-loops -flto | 10.5% |
“-Ofast” (or “-O3”) and “-funroll-loops” obviously increase the size of the code. The “-flto” option, due to a more aggressive function substitution (inline), should also increase the size of the code. However, the result is the opposite. Why?
“-Flto” makes it possible to remove unused functions. A function can become such if it is not called in a specific application configuration or was completely and in all places of the call is substituted into the code. In order to remove unused functions without “-flto” you can use “-ffunction-sections -Wl, - gc-sections”. This technique gives a good result if the application uses internal static libraries.
Is the application still too large? There are some more techniques for reducing the size. By default, GCC uses the “-fasynchronous-unwind-tables” option, which increases the size of the EH (exception handling) section, even when compiling applications in the “C” language. This makes debugging easier, but can significantly increase the size of the code. To disable it, add “-fno-asynchronous-unwind-tables” to the compilation options.
“-Wl, - strip-all” will tell the linker to remove all symbolic information. This will make the debugging process even more difficult. And yet, if the code size is critical, the option is acceptable.
Below is a label reflecting the effect of adding:
- “-Ffunction-sections -Wl, - gc-sections” (+ garbage collector)
- “-Ffunction-sections -Wl, -gc-sections -fno-asynchronous-unwind-tables” (+ without promotion tables)
- “-Ffunction-sections -Wl, - gc-sections -fno-asynchronous-unwind-tables -Wl, - strip-all” (+ delete characters)
to various optimization options.
The results in the plate are shown relative to “-Os”. A smaller result indicates a smaller code size. “-M32, -mfpmath = sse, -march = atom” are included by default in all cases.
| default | + garbage collector | + without spin tables | + delete characters |
-Os | - | -five% | -10.5% | -22,5% |
-O2 | 6% | 0.5% | -3.5% | -13.5% |
-O2 -flto | -five% | -five% | -eight% | -17% |
-Ofast | 11.5% | 6% | 2% | - 6.5% |
-Ofast -flto | 3% | 2.5% | 0.5% | -6.5% |
-Ofast -funroll-loops | nineteen% | 12.5% | 9.5% | 3% |
-Ofast -funroll-loops -flto | 10.5% | ten% | 8.5% | 2.5% |
Below is a description of the compiler options used in the article. Full description (in English):
gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Optimize-Options.html- "-Ofast" is similar to "-O3 -ffast-math" includes a higher level of optimizations and more aggressive optimizations for arithmetic calculations (for example, real reassociation)
- "-flto" intermodule optimizations
- "-m32" 32bit mode
- "-mfpmath = sse" includes using XMM registers in real arithmetic (instead of real stack in x87 mode)
- "-funroll-loops" includes looping
- "-ffunction-sections" places each function in a separate section
- "-Os" optimizes performance and size
- "-fno-asynchronous-unwind-tables" guarantees the accuracy of spin tables only within the limits of the function
Below is a description of the linker options used in the article. Full Description (in English):
sourceware.org/binutils/docs/ld/Options.html- “--Gc-sections” includes removing unused sections
- “--Strip-all” removes symbolic information