📜 ⬆️ ⬇️

Compiling Kotlin: JetBrains VS ANTLR VS JavaCC


How quickly does Kotlin parse and what does it matter? JavaCC or ANTLR? Is the source code from JetBrains suitable?

Compare, fantasize and wonder.

tl; dr


JetBrains are too hard to carry, ANTLR is hyip but unexpectedly slow, and JavaCC is still too early to write off.

Parsing a simple Kotlin file with three different implementations:
1000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.8
1000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.8
1000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.8
1000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.8
1000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.8

One serene sunny day ...


I decided to build a translator into GLSL from some convenient language. The idea was to program the shaders right in the idea and get "free" support for IDE - syntax, debugging and unit tests. It turned out really very convenient .
')
Since then, the idea has remained to use Kotlin - you can use the name vec3 in it, it is more strict and more convenient with the IDE. In addition, it is hype. Although from the point of view of my internal manager these are all insufficient reasons, but the idea came back so many times that I decided to get rid of it simply by implementing.

Why not java? There is no operator overloading, so the syntax of vector arithmetic will be too different from what you are used to seeing in game dev

JetBrains


The guys from JetBrains have put their compiler code on the githab . How to use it, you can spy here and here .

At first, I used their parser along with the analyzer, because in order to translate into another language, it is necessary to know what type the variable has without explicitly specifying val x = vec3() . Here the type for the reader is obvious, but in AST this information is not so easy to get, especially when there is another variable on the right, or a function call.

Here I was disappointed. The first run of the parser on a primitive file takes 3s (THREE SECONDS).

Kotlin JetBrains parser
first call elapsed : 3254.482ms
min time in next 10 calls: 70.071ms
min time in next 100 calls: 29.973ms
min time in next 1000 calls: 16.655ms
Whole time for 1111 calls: 40.888756 seconds

This time has the following obvious inconveniences:

  1. because it is plus three seconds to launch a game or application.
  2. during development I use hot shader overload and see the result immediately after changing the code.
  3. I often restart the application and am glad that it starts quickly enough (second or two).

Plus three seconds to warm up the parser - this is unacceptable. Of course, it immediately became clear that during subsequent calls the parsing time drops to 50ms and even to 20ms, which removes (almost) the inconvenience No. 2 from the expression. But the other two do not go anywhere. In addition, 50ms per file is plus 2500ms for 50 files (one shader is 1-2 files). And if this is Android? (Here we are talking only about time.)

Noteworthy is the crazy work of JIT. The parsing time of a simple file falls from 70ms to 16ms. Which means, firstly, JIT itself consumes resources, and secondly, the result for another JVM can be very different.

In an attempt to find out where these figures come from, I found an option to use their parser without an analyzer. After all, I just need to arrange the types and it can be done relatively easily, while the JetBrains analyzer does something much more complicated and collects much more information. And then the launch time drops twice (but almost half a second is still decent), and the time of subsequent calls is much more interesting - from 8ms in the first ten, to 0.9ms, somewhere in a thousand.

Kotlin JetBrains parser (without analyzer) ()
first call elapsed : 1423.731ms
min time in next 10 calls: 8.275ms
min time in next 100 calls: 2.323ms
min time in next 1000 calls: 0.974ms
Whole time for 1111 calls: 3.6884801 seconds
()
first call elapsed : 1423.731ms
min time in next 10 calls: 8.275ms
min time in next 100 calls: 2.323ms
min time in next 1000 calls: 0.974ms
Whole time for 1111 calls: 3.6884801 seconds

I had to collect just such numbers. The time of the first run is important when loading the first shaders. It is critical, because here you will not distract the user while the shader loads in the background, it just waits. The fall in execution time is important in order to see the dynamics itself, how the JIT works, how efficiently we can load the shader on the warmed up application.

The main reason to look first at the JetBrains parser was the desire to use their typewriter. But once the rejection of it becomes a discussed option, you can try using other parsers. In addition, non-JetBrains are likely to be much smaller in size, less demanding of the environment, easier to support and include code in the project.

ANTLR


There was no parser on JavaCC, but on the ANTLR, it is expected, there is ( one , two ).

But what was unexpected was speed. The same 3c for loading (first call) and fantastic 140ms for subsequent calls. There is not only the first launch lasts unpleasantly long, but then the situation is not corrected. Apparently, the guys from JetBrains, did some magic, allowing JIT to optimize their code so much. Because ANTLR is not optimized at all with time.

Kotlin ANTLR parser ()
first call elapsed : 3705.101ms
min time in next 10 calls: 139.596ms
min time in next 100 calls: 138.279ms
min time in next 1000 calls: 137.20099ms
Whole time for 1111 calls: 161.90619 seconds
()
first call elapsed : 3705.101ms
min time in next 10 calls: 139.596ms
min time in next 100 calls: 138.279ms
min time in next 1000 calls: 137.20099ms
Whole time for 1111 calls: 161.90619 seconds

JavaCC


In general, we are surprised to refuse ANTLR services. Parsing should not be so long! In Kotlin's grammar there are no cosmic ambiguities, and I checked it on almost empty files. So, it is time to uncover the old JavaCC, roll up your sleeves, and still "do it yourself and how you should."

This time the numbers turned out to be expected, although in comparison with alternatives, they were unexpectedly pleasant.

Kotlin JavaCC parser ()
first call elapsed : 19.024ms
min time in next 10 calls: 1.952ms
min time in next 100 calls: 0.379ms
min time in next 1000 calls: 0.114ms
Whole time for 1111 calls: 0.38707677 seconds
()
first call elapsed : 19.024ms
min time in next 10 calls: 1.952ms
min time in next 100 calls: 0.379ms
min time in next 1000 calls: 0.114ms
Whole time for 1111 calls: 0.38707677 seconds

Sudden advantages of your JavaCC parser
Of course, instead of writing my own parser, I would like to use a ready-made solution. But the existing ones have huge drawbacks:

- performance (pauses when reading a new shader are unacceptable, as well as three seconds of warming up at the start)
- a huge runtime runner, I'm not even sure whether it is possible to pack a parser into its final product using it
- by the way, in the current solution with Groovy the same trouble - runtime stretches

While the resulting parser on JavaCC is

+ excellent speed at the start and in the process
+ just a few classes of the parser itself

findings


JetBrains are too hard to carry, ANTLR is hyip but unexpectedly slow, and JavaCC is still too early to write off.

Parsing a simple Kotlin file with three different implementations:

1000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.8
1000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.8
1000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.8
1000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.8
1000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.81000 () JetBrains 3254 16,6 35.3 JetBrains (w/o analyzer) 1423 0,9 35.3 ANTLR 3705 137,2 1.4 JavaCC 19 0,1 0.8

At some point, I decided to look at the size of the jar with all the dependencies. JetBrains are great, as expected, but ANTLR runtime is surprising in its size .
UPDATE: Initially, I wrote 15MB, but, as suggested in the comments, if you connect antlr4-runtime instead of antlr4, the size drops to the expected. Although the JavaCC parser itself remains 10 times smaller than ANTLR (if you remove all the code at all, except for the parsers themselves).
The size of the jara as such is important, of course, for mobile phones. But it also matters for the desktop, because, in fact, it means the amount of additional code in which bugs can occur, which should index the IDE, which, in fact, affects the speed of the first load and the warm-up speed. In addition, for complex code there is no particular hope to broadcast to another language.
I do not call for counting kilobytes and I appreciate the programmer’s time and convenience, but still, it’s worth thinking about saving, because this is how projects become sluggish and difficult to maintain.

A couple more words about ANTLR and JavaCC

A serious feature of ANTLR is the separation of grammar and code. It would be good if you didn’t have to pay so much for it. Yes, and this is only important for “serial grammar developers,” and for final products this is not so important, because even the existing grammar will still have to be traversed to write your own code. Plus, if we save and take a “third-party” grammar - it may simply be inconvenient, it will still need to be thoroughly understood, to transform the tree for yourself. In general, JavaCC, of ​​course, mixes flies and burgers, but does it matter much and does it feel bad?

Another counter of ANTLR is the set of target platforms. But here you can see from the other side - the code from under JavaCC is very simple. And it's very simple ... to broadcast! Directly with your custom code - at least in C #, at least in JS.

PS


All code is here github.com/kravchik/yast

The result of the parsing I have is a tree built on YastNode (this is a very simple class, in fact, a map with convenient methods and an ID). But YastNode is not exactly a “spherical node in a vacuum”. It is this class that I actively use, based on it I have collected several tools - a typifier, several translators and an optimizer / inliner.

The JavaCC parser does not yet contain all the grammar, there are 10 percent left. But it does not seem that they could affect performance - I checked the speed as rules were added, and it did not change noticeably. In addition, I have already done much more than I needed and just try to share the unexpected result found in the process.

Source: https://habr.com/ru/post/433000/


All Articles