Sometimes there are tasks for which reducing the size of the application, or rather, the right balance between size and performance, is even higher priority than the speed of its implementation. Problems of this kind exist, in particular, when developing for embedded systems. For them, applications are “sharpened” for a specific type of processor with a very limited memory size, which means that the size of our application will directly affect the cost of the final product. In addition, you can add more functionality and improve the quality of the code itself.
Intel compilers usually give preference to performance and care little about the size of the resulting output. By default, focus on maximum speed. The task of the developer is the ability to find the right balance between the speed of the application and the used compiler optimizations, and its size. The Intel C / C ++ compiler has a number of features that allow you to control this balance and make the size of the application a higher priority than its performance. Let's look at these possibilities.
Less is better!The compiler has a special option -Os, with which you can enable optimizations that create a smaller code size compared to O2 by default. In addition, you can lower the level of optimization to O1, which would entail disabling vectorization and a number of other optimizations, but significantly reduce the size. However, it is better to turn off the optimization step by step, and not all at once.
')
Options:
-Os (Linux / OS X) and
/ Os (Windows)
Visual Studio: Optimization> Favor Size or Speed
+: Reduced code size compared to O2 option
-: slight performance loss
Do not use - "in the furnace"By default, the linker works at the level of COMDAT sections. At compilation, all code is stored in a single section of the object file — an integral block that the linker does not have the right to cut. As a result, it cannot delete unused functions and global variables, and we would really like that.
But we can enable linking of the function level, packaging functions and variables into separate sections, for example, with the / Gy option on Windows. In this case, the linker will be able to manipulate them separately and using the linker key (/ OPT: REF) will throw out all entities that no one refers to at all (that is, the dependency graph is built, and everything that does not fall into this graph is discarded) . Thus, you can significantly reduce the size of the application.
Options: -fdata-sections, -ffunction-sections, -WI, -gc-sections (Linux / OS X) and / Gy / Qoption, link, / OPT: REF (Windows)
+: leave only the code used in the execution
-: linker support needed and possible link time increase
Disable inlining intrinsic functionsTypically, code that uses intrinsics is faster, because there is no overhead for function calls. Therefore, the compiler likes to replace calls of some functions with intrinsiki, generating more code for the sake of performance. For example, analogues have functions abs, strcpy, sin, and so on. You can disable inlining for one or more of these intrinsic functions.
Options:
-fno-builtin [-name] (Linux / OS X) and
/ Oi- (/ Qno-builtin-name) (Windows)
Visual Studio: Optimization> Enable Intrinsic Functions (/ Oi)
+: decrease object object code
-: other optimizations can be disabled; slower library functions possible
We link Intel libraries dynamicallyWe can cancel the static linking of Intel libraries (
-static-intel ), which increases the size, on Linux. If you do this on OS X, you will still have to set the variable
DYLD_LIBRARY_PATH . The
-shared-intel option is also included with
-mcmodel = medium or
-mcmodel = large , which control the compiler’s memory model.
Options:
-shared-intel (Linux / OS X)
Xcode: Runtime> Intel Runtime Libraries
+: no effect on performance; all libraries are available for use
-: will have to ship the library with the application
We use interprocedural optimizationA rare case when we need to enable optimization to reduce the size of the application. Optimization of IPO allows you to reduce the size of the code due to the fact that the code is not generated for functions that are always inline or never called, as well as the removal of dead code.
Options:
-ipo (Linux / OS X) and
/ Qipo (Windows)
Visual Studio: Optimization> Interprocedural Optimization
Eclipse: Optimization> Enable Whole Program Optimization
+: improves performance and reduces executable size
-: the size of binary files may increase; not recommended for cases where object files are the final product
Disable argument passing through registersThe compiler has an optimization that allows you to pass arguments through registers, rather than a stack. It can be disabled by avoiding the creation of an additional entry point that increases size. The option is available only on 32 bit architecture.
Options:
-opt-args-in-regs = none (Linux / OS X) and
/ Qopt-args-in-regs: none (Windows)
+: may reduce code size
-: code reduction may be significantly less relative to performance loss
Turning off the inliningInline improves performance by not calling functions, but increases size. An alternative to the complete disconnection of inline is the use of a special factor that allows for more fine control of its use.
Options:
-fno-inline (Linux / OS X) and
/ Ob0 (Windows)
/ Qinline-factor = n (0 <= n <100)
Visual Studio: Optimization> Inline Function Expansion
Eclipse: Optimization> Inline Function Expansion
Xcode: Optimization> Inline Function Expansion
+: reduce code size
-: reduced performance
Work with exceptionsDo not forget that the compiler creates a special code for handling exceptions, which naturally increases the size of the application due to the large size of the EH (exception handling) section. The option -fno-exceptions allows you to turn off the generation of exception handling tables, and you cannot use it for applications that generate exceptions. In case we have compiled a code with this option, any use of exception handling, for example, a try block, will generate an error.
There is also an option -fno-asynchronous-unwind-tables, which allows you to disable the creation of spin tables for the following functions:
- C ++ functions that do not create objects with destructors and do not call other functions that can generate exceptions
- C / C ++ functions compiled without -fexceptions and, in the case of Intel® 64 architecture, without the -traceback option
- C / C ++ functions compiled with -fexceptions that do not contain calls to other functions that can generate exceptions
Options:
-fno-exceptions ,
-fno-asynchronous-unwind-tables (Linux / OS X)
+: reducing the size of the binary to 15%
-: application behavior may change
We do not use librariesYou can tell the compiler to follow the rules for working in an autonomous environment (Freestanding Environment), in which it does not use standard libraries. Thus, the compiler will generate calls only for those functions that are in the code. An example of such an application is the OS kernel.
Options:
-ffreestanding ,
-gnudefaultlibs (Linux / OS X) and
/ Qfreestanding (Windows)
+: reduce the size of the binary file to 15%
-: loss in performance is possible if code from libraries was actively used
Remove characters from a binary fileThrowing out debugging and symbolic information from binaries is quite logical if we care about the size of the application.
Options:
--WI, - strip-all (Linux / OS X)
+: significant size reduction
-: application debugging is almost impossible without symbol information
Disable vectorizationYou can go for a full or partial (through directives) disabling vectorization.
Options:
-no-vec (Linux / OS X) and
/ Qvec- (Windows)
+: significantly reduced compile time, smaller size
-: productivity will significantly decrease. It is possible to disable vectorization for some cycles that are not critical for performance, using the directive
#pragma novectorUnnecessary alignment of 16 bytesOn the 32-bit architecture, the compiler performs 16-byte alignment, which can create additional instructions for aligning the stack when the function is called. In case there are many small functions in the code, the size can be greatly increased. Use the option if
- there are no calls to other library functions built without this option in the code
- the code is designed for architectures that do not support SSE instructions and do not require alignment for correct results
Options:
-falign-stack = assume-4-byte (Linux / OS X 32 bit)
+: reduction in code size due to the lack of additional instructions; performance can also be increased due to a decrease in the number of instructions
-: incompatibility when linking with other libraries
Disable loop unrollingScan cycles can increase the size in proportion to the unroll factor. Disabling or limiting this optimization will reduce the size due to loss of performance. It is possible to control the sweep using the directive #pragma unroll. The option is enabled by default with
-Os / -O1 .
Options:
-unroll = 0 (Linux / OS X) and
/ Qunroll: 0 (Windows)
+: decrease code size; the ability to control the scan for individual cycles
-: performance may noticeably sag, since other cycle optimizations may also be limited
In conclusion, I note that using the options listed step by step, you can gradually reduce the size of the application, somehow sacrificing performance. The main thing is to find the right balance!