When size matters

Sometimes there are tasks for which reducing the size of the application, or rather, the right balance between size and performance, is even higher priority than the speed of its implementation. Problems of this kind exist, in particular, when developing for embedded systems. For them, applications are “sharpened” for a specific type of processor with a very limited memory size, which means that the size of our application will directly affect the cost of the final product. In addition, you can add more functionality and improve the quality of the code itself.

Intel compilers usually give preference to performance and care little about the size of the resulting output. By default, focus on maximum speed. The task of the developer is the ability to find the right balance between the speed of the application and the used compiler optimizations, and its size. The Intel C / C ++ compiler has a number of features that allow you to control this balance and make the size of the application a higher priority than its performance. Let's look at these possibilities.

Less is better!
The compiler has a special option -Os, with which you can enable optimizations that create a smaller code size compared to O2 by default. In addition, you can lower the level of optimization to O1, which would entail disabling vectorization and a number of other optimizations, but significantly reduce the size. However, it is better to turn off the optimization step by step, and not all at once.
')
Options: -Os (Linux / OS X) and / Os (Windows)
Visual Studio: Optimization> Favor Size or Speed
+: Reduced code size compared to O2 option
-: slight performance loss

Do not use - "in the furnace"
By default, the linker works at the level of COMDAT sections. At compilation, all code is stored in a single section of the object file — an integral block that the linker does not have the right to cut. As a result, it cannot delete unused functions and global variables, and we would really like that.
But we can enable linking of the function level, packaging functions and variables into separate sections, for example, with the / Gy option on Windows. In this case, the linker will be able to manipulate them separately and using the linker key (/ OPT: REF) will throw out all entities that no one refers to at all (that is, the dependency graph is built, and everything that does not fall into this graph is discarded) . Thus, you can significantly reduce the size of the application.

Options: -fdata-sections, -ffunction-sections, -WI, -gc-sections (Linux / OS X) and / Gy / Qoption, link, / OPT: REF (Windows)
+: leave only the code used in the execution
-: linker support needed and possible link time increase

Disable inlining intrinsic functions
Typically, code that uses intrinsics is faster, because there is no overhead for function calls. Therefore, the compiler likes to replace calls of some functions with intrinsiki, generating more code for the sake of performance. For example, analogues have functions abs, strcpy, sin, and so on. You can disable inlining for one or more of these intrinsic functions.

Options: -fno-builtin [-name] (Linux / OS X) and / Oi- (/ Qno-builtin-name) (Windows)
Visual Studio: Optimization> Enable Intrinsic Functions (/ Oi)
+: decrease object object code
-: other optimizations can be disabled; slower library functions possible

We link Intel libraries dynamically
We can cancel the static linking of Intel libraries ( -static-intel ), which increases the size, on Linux. If you do this on OS X, you will still have to set the variable DYLD_LIBRARY_PATH . The -shared-intel option is also included with -mcmodel = medium or -mcmodel = large , which control the compiler’s memory model.

Options: -shared-intel (Linux / OS X)
Xcode: Runtime> Intel Runtime Libraries
+: no effect on performance; all libraries are available for use
-: will have to ship the library with the application

We use interprocedural optimization
A rare case when we need to enable optimization to reduce the size of the application. Optimization of IPO allows you to reduce the size of the code due to the fact that the code is not generated for functions that are always inline or never called, as well as the removal of dead code.
Options: -ipo (Linux / OS X) and / Qipo (Windows)
Visual Studio: Optimization> Interprocedural Optimization
Eclipse: Optimization> Enable Whole Program Optimization

+: improves performance and reduces executable size
-: the size of binary files may increase; not recommended for cases where object files are the final product

Disable argument passing through registers
The compiler has an optimization that allows you to pass arguments through registers, rather than a stack. It can be disabled by avoiding the creation of an additional entry point that increases size. The option is available only on 32 bit architecture.

Options: -opt-args-in-regs = none (Linux / OS X) and / Qopt-args-in-regs: none (Windows)
+: may reduce code size
-: code reduction may be significantly less relative to performance loss

Turning off the inlining
Inline improves performance by not calling functions, but increases size. An alternative to the complete disconnection of inline is the use of a special factor that allows for more fine control of its use.

Options: -fno-inline (Linux / OS X) and / Ob0 (Windows)
/ Qinline-factor = n (0 <= n <100)
Visual Studio: Optimization> Inline Function Expansion
Eclipse: Optimization> Inline Function Expansion
Xcode: Optimization> Inline Function Expansion

+: reduce code size
-: reduced performance

Work with exceptions
Do not forget that the compiler creates a special code for handling exceptions, which naturally increases the size of the application due to the large size of the EH (exception handling) section. The option -fno-exceptions allows you to turn off the generation of exception handling tables, and you cannot use it for applications that generate exceptions. In case we have compiled a code with this option, any use of exception handling, for example, a try block, will generate an error.
There is also an option -fno-asynchronous-unwind-tables, which allows you to disable the creation of spin tables for the following functions:
- C ++ functions that do not create objects with destructors and do not call other functions that can generate exceptions
- C / C ++ functions compiled without -fexceptions and, in the case of Intel® 64 architecture, without the -traceback option
- C / C ++ functions compiled with -fexceptions that do not contain calls to other functions that can generate exceptions

Options: -fno-exceptions , -fno-asynchronous-unwind-tables (Linux / OS X)
+: reducing the size of the binary to 15%
-: application behavior may change

We do not use libraries
You can tell the compiler to follow the rules for working in an autonomous environment (Freestanding Environment), in which it does not use standard libraries. Thus, the compiler will generate calls only for those functions that are in the code. An example of such an application is the OS kernel.

Options: -ffreestanding , -gnudefaultlibs (Linux / OS X) and / Qfreestanding (Windows)
+: reduce the size of the binary file to 15%
-: loss in performance is possible if code from libraries was actively used

Remove characters from a binary file
Throwing out debugging and symbolic information from binaries is quite logical if we care about the size of the application.

Options: --WI, - strip-all (Linux / OS X)
+: significant size reduction
-: application debugging is almost impossible without symbol information

Disable vectorization
You can go for a full or partial (through directives) disabling vectorization.

Options: -no-vec (Linux / OS X) and / Qvec- (Windows)
+: significantly reduced compile time, smaller size
-: productivity will significantly decrease. It is possible to disable vectorization for some cycles that are not critical for performance, using the directive #pragma novector

Unnecessary alignment of 16 bytes
On the 32-bit architecture, the compiler performs 16-byte alignment, which can create additional instructions for aligning the stack when the function is called. In case there are many small functions in the code, the size can be greatly increased. Use the option if
- there are no calls to other library functions built without this option in the code
- the code is designed for architectures that do not support SSE instructions and do not require alignment for correct results
Options: -falign-stack = assume-4-byte (Linux / OS X 32 bit)
+: reduction in code size due to the lack of additional instructions; performance can also be increased due to a decrease in the number of instructions
-: incompatibility when linking with other libraries

Disable loop unrolling
Scan cycles can increase the size in proportion to the unroll factor. Disabling or limiting this optimization will reduce the size due to loss of performance. It is possible to control the sweep using the directive #pragma unroll. The option is enabled by default with -Os / -O1 .

Options: -unroll = 0 (Linux / OS X) and / Qunroll: 0 (Windows)
+: decrease code size; the ability to control the scan for individual cycles
-: performance may noticeably sag, since other cycle optimizations may also be limited

In conclusion, I note that using the options listed step by step, you can gradually reduce the size of the application, somehow sacrificing performance. The main thing is to find the right balance!

Source: https://habr.com/ru/post/264795/

All Articles

When size matters

More articles: