Towards a brighter future for smart compilers

Now the topic of machine learning and artificial intelligence is extremely popular, at the moment thanks to the computing power of computers, ideas and algorithms that originated for a long time can be implemented and significantly improved. Almost every day you can read news about new achievements in this area. Moreover, machine learning is used in almost all areas ... and the development of compilers is no exception. However, the area is quite specific and has its own characteristics and difficulties in creating "smart" compilers. At the same time, there are a lot of research on this topic and they have been conducted for a long time both in the academic environment and within various companies.

Where exactly are trying to apply machine learning techniques when building compilers? And why haven't smart compilers become a part of the developer’s daily life?

Variants of machine learning in the development of compilers

Let's start with the first question about the specific use cases of machine learning. The fact is that modern compilers are complex systems with a large number of optimizations that allow to get more efficient machine code. However, some of the optimizations and other tasks, such as register allocation, are NP-complete, which forces compiler developers to use heuristic algorithms. As a result, most compilers have a large number of optimization flags that allow you to customize the heuristics used. In LLVM, almost every pass has several hidden options that can affect its operation, they can be used either with the –mllvm flag when calling clang, or in the opt utility. However, this variety of flags is hidden behind much more frequently used options, which contain many settings at once and are usually called optimization levels. For C / C ++ compilers, these are known to most -O1, -O2, -O3 to optimize runtime and -Os to optimize code size. But, unfortunately, the optimal code is not always the result (assembly language experts can rewrite the generated code in the best way), much depends on the source code itself in a high level language, processor architecture, language features, etc.

Despite the fact that today the modern processors have a sufficient amount of RAM and sufficiently high performance, there are still areas where application performance, energy efficiency and the size of the machine code play a key role. Examples of such areas are software development for embedded systems with a limited amount of RAM, digital signal processing, real-time systems, etc. Therefore, in cases when it is necessary to obtain high-performance machine code for fairly large systems, the selection of the right compilation options that give the best result is an important task. In addition, the worst execution time ( WCET ) problem has not disappeared when real-time systems need to calculate and minimize, as far as possible, the execution time of a specific task on the platform. Until now, programmers working with systems with a limited amount of RAM cannot rely entirely on compilers, and often independently optimize the generated machine code.
')
It is hard for a person to predict which optimizations will give a good result, and which ones can lead to regressions, because for this you need to be well versed in the intricacies of the applied heuristic algorithms, know the structure and passages of the compiler used, and also completely know the code of the compiled program, modern application development process is impossible. As a result, the identification of the best compilation parameters of a program for a person becomes the task of completely sorting out various combinations of options and measurements of performance and code sizes.

In addition, there is a limitation in the form of a compilation unit with which you can work and for which you can choose options. So for C / C ++, this is still a file that can contain a lot of code that might be useful to optimize in different ways, but at the moment it is not possible. Therefore, an “intelligent” compiler that can be trained and then getting code that is well optimized for a wide variety of situations is a dream for some developers.

Existing research and solutions

Naturally, the problem of automated selection of compilation options for many years interested researchers. One of the most well-known projects is the development of G. Fursin and researchers from his MILEPOST GCC team, which is a version of the gcc compiler that is able to choose optimization passes based on previous training on the received data sample. In this work, we used a set of 55 characteristics to solve the problem and a rather simple model built on the idea of distributing good solutions based on the K nearest-neighbor algorithm. This development has shown that the adjustment of optimization passes can result in a code that is two times faster than the code obtained using the standard option of maximum optimization -O3.

There are also studies by G. Pehimenko and A.D. Brown for IBM's TPO ( Toronto Portable Optimizer ). Their main task was to select heuristically selectable values for optimizations and the set of code transformations themselves. For implementation, a logistic regression was used, which allowed for effective tuning of penalties for faster learning. The classifier was built in Matlab. For each pass, the probability of use was calculated, and it was used if it was more than 50%. As a resultant characteristic, which this study tried to reduce, was a static compile time.

A.Askhari dealt with direct selection of compilation options for the entire program at once to minimize execution time, compile time, code size and power consumption. For this, the cTuning Framework and Collective Mind Framework , developed by G. Fursin and A. Lokhmotov (also developed on GitHub ), were used.

There are also studies by M. Stephenson and S. Amarasing of the selection of optimizations for certain particularly important algorithms (register allocation, DATA PREFETCHING, HYPERBLOCK FORMATION). For each function, their respective characteristics were used. A genetic algorithm was used for the solution. Testing of the developed product was carried out at the Open Research Compiler (ORC).

There is also a project MAGEEC (Machine Guided Energy Efficient Compiler), whose goals are somewhat different. The developed infrastructure uses machine learning to select the optimizations needed for code generation with maximum energy efficiency for high-performance computing systems. MAGEEC is designed to work with both gcc and LLVM. This compiler is part of a larger project TSERO (Total Software Energy Reporting and Optimization).

One of the studies directly related to LLVM is LLVMTuner , a software product developed at the University of Illinois, I. Chen and V. Adve. In 2017, a report was presented describing the results available at that time. In this paper, we optimized individual “hot” cycles. This framework is designed for automated configuration of large programs. LLVMTuner runs on the LLVM IR intermediate code, uses profiling to identify hot loops, and then automatically sets up heuristics for them. The focus is on cycles at the top level. The selected cycles and any call functions are transferred to a separate module, which is then subjected to the necessary optimizations. This solution allows you to get improved performance on large programs.

Existing problems

However, so far there is no widely used compiler that independently tunes heuristics for optimizing passes. What is the problem? As is known, the effectiveness of the application of machine learning methods and the quality of the models obtained depends on the correct choice of features and data quality for training (despite the existence of algorithms less sensitive to “noisy” data). Without knowing the structure and algorithms used in the compiler, it is not easy to select a complete and sufficient set of characteristics for training, although there are quite understandable and logical, for example, cycle size, number of loop exits, etc. Therefore, to develop a universal solution that is suitable for a variety of compilers at once is difficult, and not a fact that is even possible. Moreover, it is likely that this is not required.

Since the development of compilers should be effective and feasible in a fairly short time, then naturally even large companies develop their industrial compilers based on ready-made solutions. Most modern solutions can be divided into two categories: running on virtual machines, for example JVM - JIT compilers, and compilers built on the basis of LLVM, a system implementing a virtual machine with RISC-like instructions - static and dynamic compilers. There are, of course, still own solutions of companies, but they are becoming less and less competitive due to the lack of a large community involved in the development of the technologies used in them. As a result, today many large companies, such as Google, Apple, Adobe, ARM, use LLVM to develop their own solutions. Of course, gcc remains the main compiler for C / C ++, there are other solutions for other languages, but anyway, if, for example, a solution is found for LLVM, this already covers a decent part of the currently existing compilers.

Also, the collection of characteristics for learning itself becomes a big problem, since multi-pass compilers strongly convert the intermediate representation, and the characteristics collected at the initial stage are not quite relevant for later optimizations of the compiler, these characteristics can change with high probability. Characteristics, moreover, it makes sense to assemble separately for different types of elements: modules, cycles, basic blocks, since optimizations are usually intended to change a particular type of element; in the LLVM, even according to this feature, the aisles are separated.

But, first, the question of identifying the elements for which you need to collect characteristics. There are many ways to calculate unique identifiers that can be saved during all optimizations, for example:

AST-based hash in frontend
unique numbers assigned by parsing the code in the frontend
64-bit numbers generated from arcs in CFG (control-flow graph) using a checksum (similar to PGO (Profile-guided optimization) in LLVM)

However, it is necessary to properly preserve these identifiers during transformations, when elements can merge into one, split, create new ones and delete the original ones, which is not an easy task.

Secondly, it is difficult in principle to estimate the boundaries of the initial cycles, base blocks, etc., written in the source code, on the already converted IR. For example, due to the multi-step generation of machine code adopted in LLVM, information about machine base blocks is lost after code generation based on machine instructions in AsmPrinter. And accordingly, information about the identifiers of the basic blocks and cycles is lost, for which, for example, the offset from the beginning of the function is measured, therefore with this method only at the stage of generating the machine code according to the instructions, you can get the offset in the form of bytes. However, at subsequent stages of generating the machine code, when working with machine fragments, various alignments can be added to it, which change the size of the instructions taken into account earlier, and nop instructions are also added. Due to this, for base blocks located at the end of large functions, the computation error can be very large, up to a complete shift to another block / cycle. And although some of the transformations in the later stages can be traced and taken into account, this will not guarantee the accuracy of the measurements, since the size of the instructions can vary up to the linker.

As you can see, it is quite difficult and time-consuming to even get a collection of signs, on the basis of which it is necessary to conduct training, and which in the future are likely to become an input set for a trained model for decision making. And there is no obvious solution to these problems, which makes it difficult to work directly with machine learning and attracting a large number of people due to the lack of sufficient data sets. Well, the typical difficulties of finding solutions to machine learning problems, choosing models, methods, determining the correct subset of features with a large number of them, etc. exist in this case. Almost everyone knows about them, who came across machine learning and, perhaps, there is something unique and specific for compilers.

It is difficult to predict when smart compilers will become widespread. Modern compilers have other problems that are unlikely to work out in this way, and which, for the time being, are probably a priority. However, compilers have already become much more intelligent than they were at the dawn of their appearance, and this process will continue, although perhaps somewhat slower than most developers would like.

Source: https://habr.com/ru/post/450850/

All Articles

Towards a brighter future for smart compilers

Variants of machine learning in the development of compilers

Existing research and solutions

Existing problems

More articles: