⬆️ ⬇️

Comparing the same project in Rust, Haskell, C ++, Python, Scala and OCaml

In the last semester of the university, I chose the CS444 compiler course . There, each group of 1-3 people had to write a compiler from a substantial Java subset in x86. Language to choose from. It was a rare opportunity to compare the implementation of large programs of the same functionality, written by very competent programmers in different languages, and compare the difference in the design and choice of language. Such a comparison has generated a lot of interesting thoughts. It is rare to find such a controlled comparison of languages. It is not perfect, but much better than most of the subjective stories on which people's opinions about programming languages ​​are based.



We made our compiler on Rust, and at first I compared it with the project of the Haskell team. I expected their program to be much shorter, but it was the same size or larger. Same for OCaml. Then I compared it with the compiler in C ++, and there, as expected, the compiler was about 30% more, mainly because of headers, the lack of sum types, and pattern matching. The following comparison was with my girlfriend, who made the compiler independently in Python and used less than half of the code, compared to us, because of the power of metaprogramming and dynamic types. Another companion program on Scala was also less than ours. Most of all I was surprised by the comparison with another team that also used Rust, but they had three times more code due to different design decisions. In the end, the biggest difference in the amount of code was within the same language!



I will tell you why I consider this a good comparison, give some information about each project and explain some reasons for differences in the compiler size. Also draw conclusions from each comparison. Feel free to use these links to go to the section of interest:



Content





Why I think it is meaningful.



Before you say that the amount of code (I compared both strings and bytes) is a terrible metric, I want to note that in this case it can provide a good understanding. At the very least, this is the most well-controlled example when different teams write the same great program I have heard or read about.

')



Thus, I think that the amount of code provides a decent understanding of how much effort it will take to support each project, if it were long-term. I think that not too much difference between the projects also allows you to refute some extraordinary statements that I read, for example, that the Haskell compiler will be more than half the size of C ++ due to the language.



Rust (basis for comparison)



I and one of my comrades used to write more than 10k lines on Rust, and a third colleague wrote, perhaps 500 lines on some hackathons. Our compiler came out in 6806 lines of wc -l , 5900 lines of source code (without spaces and no comments), and 220 KB of wc -c .



I found that in other projects these proportions are approximately respected, with a few exceptions, which I will note. For the rest of the article, when I refer to strings or a sum, I mean wc -l , but it doesn’t matter (if I don’t notice the difference), and you can convert with a factor.



I wrote another article describing our design , which passed all public and secret tests. It also contains several additional functions that we did for fun, not for passing tests, which probably added about 400 lines. There are also about 500 lines of our unit tests.



Haskell



The Haskell team included two of my friends who wrote perhaps a couple of thousand lines of Haskell each, plus read a lot of online content about Haskell and other similar functional languages, such as OCaml and Lean. They had another teammate whom I didn’t know very well, but it seems that a strong programmer used Haskell before.



Their compiler was 9750 lines for wc -l , 357 KB and 7777 lines of code (SLOC). This command also has the only significant differences between these ratios: their compiler is 1.4 times larger than ours in rows, 1.3 times in SLOC and 1.6 times in bytes. They did not implement any additional functions, passed 100% of public and secret tests.



It is important to note that the inclusion of tests most affected this team. Since they carefully approached the correctness of the code, they included 1600 lines of tests. They caught several borderline situations that our team did not catch, but these cases were simply not verified by course tests. So without tests on both sides (6.3 thousand lines versus 8.1 thousand lines) their compiler is only 30% more than ours.



Here I tend to bytes as a more reasonable measure of volume comparison, because in the Haskell project, on average, longer lines, since it does not have a large number of lines from one closing bracket, and rustfmt does not break single-line chains of functions into several lines.



Having rummaged with one of my teammates, we came up with the following explanation for this difference:





These differences plus tests explain all differences in volume. In fact, our files for the folding of constants and the resolution of the context are very close in size. But still, there is still some difference in bytes due to longer lines: probably because more code is needed to rewrite the whole tree on each pass.



In the end, postponing the design decisions, in my opinion, Rust and Haskell are equally expressive, perhaps with a slight advantage of Rust due to the ability to easily use mutation when it is convenient. It was also interesting to know that my choice of the recursive descent method and the handwritten lexical analyzer paid off: it was a risk that contradicted the recommendations and instructions of the professor, but I decided that it was easier and that was right.



Haskell fans will argue that the team probably did not take full advantage of Haskell’s capabilities, and if they knew the language better, they could have done a project with less code. I agree, someone like Edward Kmett can write the same compiler in a much smaller volume. Indeed, my friend's team did not use many fancy super-abstractions and fancy combinator libraries such as lens . However, all this affects the readability of the code. All the people in the team are experienced programmers, they knew that Haskell is capable of very fancy things, but decided not to use them, because they decided that their understanding would take longer than they would save, and would make the code harder for others to understand. This seems like a real compromise to me, and the statement that Haskell is magically suitable for compilers goes into something like “Haskell requires extremely high qualifications for writing compilers if you do not care about supporting code with people who are also not very skilled at Haskell.”



It is also interesting to note that at the beginning of each project, the professor says that students can use any language that runs on the university server, but warns that the Haskell teams are different from the others: they have the largest variation in grades. Many people overestimate their abilities and the Haskell teams have the most bad grades, while others do great, like my friends.



C ++



Then I talked to my friend from the C ++ team. I knew only one person in this team, but C ++ is used in several courses at our university, so probably everyone on the team had C ++ experience.



Their project consisted of 8733 lines and 280 KB, not including the test code, but including about 500 lines of additional functions. What makes it 1.4 times the size of our code without tests, which also has about 500 lines of additional functions. They passed 100% of public tests, but only 90% of secret tests. Presumably because they did not implement the fancy vtables arrays required by the specification, which may take 50-100 lines of code.



I didn’t delve very deeply into these differences in size. I guess this is mainly due to:





We also compared the compile time. On my laptop, the net debug build of our compiler takes 9.7 seconds, the net release is 12.5 seconds, and the incremental debug build is 3.5 seconds. My friend did not have the timings on hand for their C ++ build (using parallel make), but he said that the numbers are similar, with the proviso that they put the implementations of many small functions in the header files to reduce duplication of signatures at the cost of a longer time ( therefore, I cannot measure the net overhead of lines in header files).



Python



My friend, a very good programmer, decided to do the project alone in Python. She also implemented more advanced features (for fun) than any other command, including an intermediate SSA representation with register allocation and other optimizations. On the other hand, since it worked alone and implemented many additional functions, it paid the least attention to the quality of the code, for example, throwing undifferentiated exceptions for all errors (relying on backtracks for debugging) instead of implementing error types and corresponding messages like us.



Its compiler consisted of 4581 lines and passed all public and secret tests. She also implemented more additional functions than any other command, but it’s hard to determine how much additional code it took because many of the additional functions were more powerful versions of simple things that everyone needed to implement, such as folding constants and generating code. Additional functions are probably 1000-2000 lines, at least, so I am sure that its code is at least twice as expressive as ours.



One big part of this difference is probably dynamic typing. Only in our ast.rs 500 lines of type definitions, and many more types defined elsewhere in the compiler. We are also always limited by the type system itself. For example, we need infrastructure for ergonomic addition of new information to AST as it passes and access to it later. While in Python you can simply set new fields on the AST nodes.



Powerful metaprogramming also explains some of the difference. For example, although she used the LR parser instead of the recursive descent method, in her case, I think it took less code, because instead of going overwriting the tree, her LR grammar included Python code fragments to build AST, which the generator could turn into Python functions using eval . Part of the reason why we did not use the LR parser is that building an AST without overwriting a tree will require a lot of ceremonies (creating either Rust files or procedural macros) to link the grammar with Rust code fragments.



Another example of the power of metaprogramming and dynamic typing is the 400-line visit.rs file, which is basically a repeating pattern code that implements the visitor on a heap of AST structures. In Python, this may be a short function of about 10 lines, which recursively introspect the fields of the AST node and visit them (using the __dict__ attribute).



As a fan of Rust and statically typed languages ​​in general, I tend to point out that the type system is very useful for preventing errors and for performance. Unusual metaprogramming can also make it difficult to understand how code works. However, this comparison surprised me that I did not expect that the difference in the amount of code would be so big. If the difference in general is really close to having to write twice as much code, I still think that Rust is a suitable compromise, but still half the code is an argument, and in the future I tend to do something in Ruby / Python if you just need to quickly build something alone, and then throw it away.



Rust (another group)



The most interesting comparison for me was with my friend, who also did a project in Rust with one teammate (whom I did not know). My friend had a good Rust experience. He contributed to the development of the compiler Rust and read a lot. I know nothing about his friend.



Their project consisted of 17,211 raw lines, 15k source lines and 637 KB, not including the test code and the generated code. He had no additional functions, and he passed only 4 of 10 secret tests and 90% of public tests for code generation, because they didn’t have enough time before the deadline to implement more fancy parts of the specification. Their program is three times larger than ours, written in the same language, and with less functionality!



This result was really amazing for me and overshadowed all the differences between the languages ​​that I have studied so far. Therefore, we compared the lists of file sizes wc -l , and also checked how each of us implemented some specific things that resulted in a different code size.



It seems that it all comes down to the consistent adoption of various design decisions. For example, their frontend (lexical analysis, parsing, AST building) took 7597 lines against our 2164. They used the lexical analyzer on DFA and the parser LALR (1), but other groups did similar things without so much code. Looking at their weeder file, I noticed a number of design solutions that were different from ours:





I did not look at the code for analyzing their compiler passes, but they are great too. I talked to my friend, and it seems that they did not implement anything similar to the visitor infrastructure like ours. I suppose that this, along with some other smaller differences in design, explains the difference in the size of this part. The visitor allows our analysis passes to focus only on the parts of the AST they need, instead of matching the pattern across the entire AST structure. It saves a lot of code.



Their part for generating code consists of 3594 lines, and ours is 1560. I looked at their code and it seems that almost all the difference is that they chose an intermediate data structure for assembler instructions, where we just used string formatting to directly assembler output . They had to define the types and output functions for all the instructions used and the types of the operands. It also meant that building the assembler instructions took more code. Where we had a format statement with short instructions, such as mov ecx, [edx] , they needed a giant rustfmt operator, broken up into 6 lines, which built instructions with a bunch of intermediate nested types for operands that included up to 6 levels of nested brackets. We could also output blocks of related instructions, such as the preamble of a function, in one format statement, where they had to specify the complete structure for each instruction.



Our team considered the possibility of using such an abstraction as theirs. It was easier to be able to either output the text assembly or directly output the machine code, but this was not a requirement of the course. The same could be done with less code and better performance, using X86Writer X86Writer with methods like push(reg: Register) . We also took into account that this could simplify debugging and testing, but we realized that viewing the generated text assembler is actually easier to read and test with the help of testing snapshots , if liberal comments are inserted. But we (apparently, correctly) predicted that it would take a lot of additional code, and there was no real benefit, considering our real needs, so we did not worry.



It is good to compare this with the intermediate representation, which the C ++ team used as an additional function, which robbed them of only 500 additional lines. They used a very simple structure (for simple type definitions and build code) that used operations close to what Java required. This meant that their intermediate representation was much smaller (and, therefore, required less build code) than the resulting assembler, since many language operations, such as calls and casts, were extended to many assembly instructions. They also say that it really helped debug, since it cut a lot of garbage and improved readability. The presentation of a higher level also made it possible to do some simple optimizations on their intermediate presentation. The C ++ team came up with a really good design that brought them much more benefit with a lot less code.



In general, it seems that the common cause of the threefold difference in volume is due to the consistent adoption of various design decisions, both large and small, in the direction of a larger amount of code. They implemented a number of abstractions that we didn't do — they added more code, and missed some of our abstractions, which reduce the amount of code.



This result really surprised me. I knew that design decisions matter, but I wouldn’t have guessed in advance that they would lead to any differences of this size, given that I only examined people I consider to be strong, competent programmers. Of all the comparison results, this is the most significant for me.It probably helped me that I read a lot about how to write compilers before I took this course, so I could use clever projects that other people invented and found to work well, like AST visitors and the recursive descent method, although they were not taught we have on the course.



What really made me think is the cost of abstraction. Abstractions can facilitate expansion in the future or protect against some types of errors, but they need to be taken into account given that you can get three times more code for understanding and refactoring, three times more places for errors and less time for testing and further development Our training course was different from the real world: we knew for sure that we would never touch the code after it was developed, this eliminates the advantages of proactive abstraction. However, if I had to choose which compiler to expand with an arbitrary function that you will say later, I would choose ours, even without considering my familiarity with it. Just because it has a lot less code to understand, and I could potentially choose the best abstraction for the requirements (for example,intermediate presentation of the C ++ team when I know the specific requirements.



Also in my view the taxonomy of abstractions has strengthened: there are those that shorten the code, considering only current requirements, like our visitor template, and there are abstractions that add code, but provide the advantages of extensibility, debugging, or correctness.



Scala



I also talked with a friend who was doing a Scala project in the previous semester, but the project and tests were exactly the same. Their compiler consisted of 4141 lines and ~ 160 KB of code, not counting the tests. They passed 8 of 10 secret tests and 100% of open tests and did not implement any additional functions. Thus, in comparison with our 5906 lines without additional functions and tests, their compiler is 30% less.



One of the factors of small-scale design was a different approach to parsing. The course allowed the use of the command line tool for the LR table generator. Nobody used it except this command. This eliminated the need to implement the LR table generator. They also managed to avoid writing the LR grammar using a 150-line Python script that scraped the Java grammar web page they found on the Internet and translated it into generator input format. They still needed to do some kind of tree in Scala, but on the whole the parsing stage was 1073 lines compared to our 1443, although our gradient descent method here gave an advantage in terms of volume compared to all other teams.



The rest of their compiler was also smaller than ours, without any obvious big differences in design, although I didn’t delve into the code. I suspect that this is due to differences in the expressiveness of Scala and Rust. Scala and Rust have similar programming functionality, useful for compilers, such as pattern matching, but Scala managed memory saves the code needed for borrow borrower in Rust. In addition, Scala has more varied syntactic sugar than Rust.



Ocaml



Since all members of our team are interning at Jane Street (technology computational trading company - approx. Lane.), I was particularly interested to see the results of other former Jane Street interns who chose OCaml to write the compiler.



Their compiler was 10914 lines and 377 KB, including a small amount of test code and no additional functions. They passed 9/10 secret tests and all public tests.



Like other groups, it seems that the main difference in size is associated with using the LR parser and tree rewriting for parsing, as well as the regex-> NFA-> DFA conversion pipeline for lexical analysis. Their frontend (lexical analysis, syntax analysis, AST construction) is 5548 lines, and ours is 2164, with similar ratios for bytes. They also used testing for their parser with the expectation that looks like our snapshot tests, which put the expected output out of the code, so their tests of the parser were ~ 600 lines of the total, and ours were about 200.



This leaves for the rest of the compiler 5366 lines for them (461 lines of which are interface files with type declarations) and 4642 for us, the difference is only 15% if their interface files are counted, and almost the same size if not counted. It seems that, apart from our design solutions for parsing, Rust and OCaml seem equally expressive, except that OCaml needs interface files, but Rust does not.



Conclusion



In general, I am very glad that I made this comparison, I learned a lot and was surprised many times. I think the general conclusion is that design decisions are much more important than language, but language matters because it gives you the tools to implement different designs.

Source: https://habr.com/ru/post/456638/



All Articles