How much does the compiler cost?

The compiling toolchain is one of the largest and most complex components of any system, and is usually based on an open source code, either GCC or LLVM. On a Linux system, only the operating system kernel and browser have more lines of code. For commercial systems, the compiler must be absolutely reliable, whatever the source code, it must generate a reliable, high-performance binary code.

How much is such a large, complex and important component of the system? Thanks to the open source, not as much as you might think. In this post, I will give a real example, which shows us that building a new commercial compiling toolchain is possible without huge costs.

How much code do you need?

An analysis done using ( SLOCCount , author David A Wheeler) shows that GCC contains more than 5 million lines. LLVM is smaller, only 1.6 million lines, but it is more youthful, supports only C and C ++ by default and has three times less targeted architectures. However, a useful tool has more components.
')

Debugger: either GDB (800K lines) or LLDB (600K lines)
Linker: GNU ld (160K lines), gold (140K lines), or lld (60K lines)
Assembler / Disassembler: GNU gas (850K lines) or LLVM assembler
Binary utilities: GNU (90K lines) and / or LLVM (included in the LLVM sources)
Emulation library: libgcc (included in the GCC sources) or CompilerRT (340K lines)
Standard C library: newlib (850K lines), glibc (1.2M lines), musl (82K lines) or uClibC-ng (251K lines)

In addition, the toolchain needs testing. In most GNU tools, regression tests are built into the sources. In LLVM, regression tests have a separate code base, 500K lines. Plus, in embedded systems, you need a debugging server to interact with the debugger when testing.

What do you need to port the compiler?

We need to port the toolchain for commercial use. Many graduate students around the world port compilers during their research, but their efforts are concentrated in one narrow area of this research. As a result, the compiler is ported quickly, but it is neither complete nor reliable, since it was not the goal of the research program.

This article is devoted to creating a toolkit that reliably generates correct and efficient binaries for any source code, and is intended for commercial and industrial use.

Fortunately, most of the huge code base is generalized. When developing all the major compilers, considerable effort was put into good separation of code depending on the target platform, and porting the compiler toolchain to the new architecture is a feasible task. This task includes five stages.

1. Prototyping
First, a port is created containing all the components. Creating a prototype is important in order to identify the most problematic places with full porting, and should take several months. At the end of this stage, we should be able to compile programs that demonstrate that all components work together as expected.

2. Implementing full functionality
Completion of all functions of the compiler and other tools. Attributes, built-in / intrinsic functions, and emulation of missing functionality must be completed. The linker should work, written by BSP, and, if necessary, custom options have been added. At the end of this process, the user receives a full-featured toolchain. Most importantly, a complete set of regression tests must pass.

3. Testing.
The biggest part of the project. Testing should cover three areas:

- regression testing, showing that the toolchain is not broken and can work for different architectures.
- Attestation testing (compliance testing), often used tests of the customer, showing that all the required functionality is present.
- benchmarks showing that the code generated by the toolchain satisfies the criteria for the required code execution speed, code size and energy efficiency.

4. Deployment
At this stage, you need to help users learn the new compiler and how it differs from previous tools, and usually this requires guides in writing and on video. New bugs will be detected, and there will also be a lot of bug reports caused by differences between the old and the new compiler. This is what happens when LLVM and GCC replace old proprietary compilers, due to the fact that they are much more advanced in terms of functionality. If the user base is large, the deployment phase becomes very significant.

5. Escort
LLVM and GCC are developing very actively, and new functionality is being added constantly, both to support new language standards in the frontend, and to add new optimizations to the backend. You must keep your compiler up to date with all these changes. Plus, of course, you will have new functionality specific to the target architecture, and bug reports from users.

How much effort is needed in the general case

Consider the general case. New architecture with a large user base, must support C and C ++, for both bare metal and Linux. In this case, the architecture probably supports various implementations, from small processors used for bare metal applications or RTOS in embedded systems to large processors capable of running a full-fledged Linux environment.

The full release of such a tulchana takes 1-3 person-years. The initial version of the toolchain (proof of concept) must be implemented within 3 months. The implementation of all the functionality should take 6-9 months, and another 3 months if support for bare metal and Linux is required.

Testing takes at least 6 months, but with a large number of specific tests it can take up to 12 months. Initial deployment takes 3 months, but with a large user base, it can take 9 months longer.

Support efforts depend heavily on the number of reports on users and the number of new features. These efforts can take from 0.5 man-months per month to, more likely, 1 man-months per month.

It is important to note that a whole team of engineers should work on the project: a compiler specialist, a debugging specialist, a library implementation engineer, etc. Compiler development is one of the most difficult disciplines, and no engineer can have experience in all areas at once.

How much effort is needed in the simplest case

Not always need a compiler that will be used by a large number of users. There are various specific processors, especially DSPs, which are developed in small companies that consist of one engineer. Where such processors have proven their commercial success, they begin to evolve, and instead of a tiny kernel programmed in assembly language by a lone engineer, they become much more powerful processors with a large assembler team. In this case, the transition to the compilation of the C language will mean a huge increase in productivity and a decrease in the cost of development.

For such cases, the toolchain must support C, without C ++, and the minimum required library C. Also, the architecture can have an already assembler and linker that can be used. This greatly reduces the effort, up to one person-year, to create a fully working compiler.

The proof-of-concept stage still takes 3 months, but then it is brought to a fully functional version in 3 more months. Testing still takes the most effort, and lasts from 3 to 6 months, but with a small user base 3 months is more than enough.

Support is also needed, but for a small system with a small user base, 0.25 person-months per month will suffice.

For small customers, it may be important to stop after implementing all the functionality. If a small handful of standard programs are compiled, this may be enough to demonstrate the efficiency of the compiler, without going through the full set of tests.

Research topic

In 2016, Embecosm was approached by a company engaged in the development of microelectronics, which for many years used its own DSP with a 16-bit address space, designed specifically for their narrow field. They used the third generation of their processor when they realized that they were spending too much effort on programming in assembler. The situation was aggravated by the fact that the standard codecs that they used had the reference implementation in C. They had a compiler, but very old, and porting code to a new generation of DSP was impossible.

Embecosm ordered an LLVM-based toolchain capable of compiling LLVM codecs and producing high-quality code. It was assumed that the code, if necessary, will be refined manually. The customer had a ready assembler and linker who combined all the assembler files into one, linked all the links, and generated a binary file loaded into the DSP. The customer also wanted to gain experience in building compilers, so one of the customer’s engineers joined the Embecosm team and supported the compiler when the project was completed.

In the first three months, we developed the toolchain based on the existing assembler and disassembler. In order to use newlib, we created a pseudolinker that extracts the required files from newlib in the form of assembler sources, and combines them with a test program. Since the processor in silicon was unavailable; we tested it on a Verilator chip model. To do this, we wrote a gdb server that allows GDB to interact with the model. In the absence of ELF, debugging at the source level is impossible, but GDB is able to load the program and get the result, which is enough for testing.

This allowed us to compile a test program and demonstrate that the compiling toolchain works. It became clear that there are two obstacles to achieving full functionality: 1) lack of support for binary ELF files and 2) lack of support for 16-bit char.

In the second phase, we implemented the GNU assembler / disassembler using CGEN, which took about 10 days. We also implemented support for 16-bit char in LLVM, as described in this post . With these two things, completing a full-featured toolchain has become easier, and we have been able to run standard LLVM lit and GCC regression tests for the toolchain, most of which were successful. DSP has a number of special modes of support for fixed point arithmetic. To support them, we implemented special built-in and intrinsic functions.

At this stage, we have a compiler that correctly compiles the customer code. ELF support means that such techniques as link-time optimization (LTO) and garbage collection for sections are possible, which leads to successful code optimization, and this meets the requirements of the customer with a limited amount of memory. With a cost of 120 man-days, the goal of compiling the C-code for the new DSP was achieved.

The customer decided that at this stage the functionality suits him, and no further work is required. If they decide that the compiler will be available to the general consumer, they can continue to work with full testing of the toolchain.

findings

Two factors made it possible to build a full-featured compiling toolchain for 120 man-days.

Using an open source compiler
The tools used in the project are the result of a total effort of thousands of man-years spent by the compiler community over the past three decades. Our customers were able to use this advantage to get a modern tulchain.

Team of experts
Although it was a 120-day project, a team of five people worked on it, each of whom has many years of experience. No one person can know everything about emulation, GDB, CGEN assemblers, GNU linker and LLVM compiler.

Source: https://habr.com/ru/post/354458/

All Articles