Papa Carlo and incremental compilers

Colleagues,

and remember there was such an article-translation in Habré Checklist of a developer of a programming language Colin McMillen about the problems of new programming languages? The article is simply amazing! If not read - be sure to look.
')
One of the key problems that Colin talks about is that no one needs languages without good IDE support. Of course, this is not the only problem facing the developer of a programming language. But, I think everyone will agree that, other things being equal, a language supported by many editors will already have a good competitive advantage.

By coincidence, I have been working on compilers and language plugins for IDE for several years now. And I will be happy to share with you the experience, talking about how to make a compiler that will be much easier to integrate with many modern code editors. And at the same time I will talk a little about my own work in this area.

Problem

It's not a secret, friends, that many of us have thought about creating our own language, and some have even tried or are going to try. The process is very creative and fascinating. If you do not go into details, in general, the compiler consists of the following components:

A parser that parses the source code and turns it into a syntax tree.
A semantic analyzer that connects the essence of the language with each other: links between names, variables, classes.
Backend responsible for generating and optimizing machine code.

Most compilers work in one pass: the programmer provides the source code, and the output is a ready-made program, or a list of syntax errors that need to be fixed. Depending on the size of the source code and the complexity of the language, the compilation process can take anywhere from seconds to minutes.

When developing a language plug-in for a code editor, say, for a language with static typing like Java, this approach is not applicable. That is, you cannot make the programmer wait until our compiler recompiles the entire project and checks if any errors appear in the code after one small change. Of course, if we want to do something not quite as trivial as syntax highlighting, but at least displaying syntax errors in real time.

The project full recompilation approach will not be applicable for the IDE, even if we disable the compiler backend. As the amount of source code in the project increases, the compilation time will still grow.

The so-called incremental approach comes to the rescue. The idea is that the compiler caches almost all intermediate results of its work: the compilation results of individual files, their syntax trees, and even individual lexemes of the language. And if a user programmer makes small changes to the source code, the compiler re-parses only some local code fragment around these changes. Thus, the compiler's performance becomes proportional to the changes made to the code, and not to the volume of the entire code.

Unfortunately, developing a parser for an incremental compiler is quite a non-trivial task. Especially since the parser must also be able to parse code that contains syntax errors. For example, if at the beginning of a class declaration a programmer makes a syntax error:

import javax.swing.JApplet; import java.awt.Graphics; public class Hello extends JApplet { int x = //    ,    . public void paintComponent(final Graphics g) { g.drawString("Hello, world!", 65, 95); //         . } }

In the method below, the programmer can still write code that the editor will understand: the user will be able to access both the code-completion, jump-to-definition, and many other IDE functions.

There are quite few ready-made generators and combinators of incremental parsers, and they are very specific. For example, in such a monstrous product as ANTLR in the latest version, there is support for incremental parsing in some form. It must be said that ANTLR is a rather heavy product, it is much more difficult to work with it than with any PEG combinator of ordinary (not incremental) parsers like PEG.js in JavaScript.

We have to admit the sad fact that today, with rare exceptions, the development of language plug-ins for each individual text editor or IDE is being conducted more or less “on the knee”. And it is quite a challenge, from which independent products do not rarely grow.

Decision

Now I am working on a Papa Carlo project, which will greatly simplify the task of creating language plug-ins for an IDE. This is a library on Scala, which allows you to build a full-featured incremental parser suitable for creating a language plug-in, or even a full-featured incremental compiler.

The developer sets the grammar of the language directly in the code on the Rock, using the API of this library. And the resulting parser can also parse the code containing syntax errors, and create a syntax tree right out of the box. There is no additional code generation step. The parser is created in runtime, like many modern combinators of ordinary parsers like the same JParsec for Java.

The developer then links the outputs of the generated compiler with the API of those code editors that he wants to support. For example, with the Sublime Text API .

In my opinion, this approach is much easier than developing a compiler and language plug-ins separately from scratch.

The project is not yet complete, but I have already laid out a working version on GitHub under the Apache license, and conducted several experiments. For example, there is a ready-made incremental parser for JSON files. The parser is defined by exactly one grammar file on Scala. Parser code can be found here .

In one of the input tests, this is the json that contains explicit syntax errors:

 { "key 1": "hello world", "key 1.1": "key 2": ["array value 1", "array value 2", "array value 3"], }

However, at the output the parser quite successfully parses those parts that do not contain errors. And it creates a tree like this:

 object 1 { entry: entry 27 { key: "key 1" value: string 26 { value: "hello world" } } entry: entry 25 { key: "key 1.1" value: string 24 { value: "key 2" } } }

While pointing to the syntax errors, of course:

  > code mismatched: { "key 1": "hello world", "key 1.1": "key 2"<<<: ["array value 1", "array value 2", "array value 3"],>>> }

However, another example is much more interesting, in which a relatively large file of 600 lines is introduced. After the first start, the parser safely creates the syntax tree, and it works 0.27 seconds. That, in general, is not enough. Then small changes are made to the file two times, and on the second and third launches the parser is already working 0.007 and 0.008 seconds, respectively. Similarly, creating a syntax tree for all 600 lines of these new files. This effect is achieved thanks to the use of the cache obtained during previous launches of the parser.

Input file	Size (lines)	The difference with the previous (line)	Time parsing and building AST (milliseconds)
step0.json	634	-	270
step1.json	634	one	7
step2.json	634	2	eight

Conclusion and links

Unfortunately, the format of the article makes it possible to present the topic only in the most general terms, omitting the details. Nevertheless, I hope that I managed to convey the very essence of the problem of incremental compilation and operation of language extensions for code editors.

I am sure that among us there are still developers who have experience in creating extensions for the IDE. It would be very interesting to hear your additions and comments.

Here are some useful links:

Grammar Kit . JetBrains parser constructor used in plugin development for IntelliJ Idea.
Java Development Tools . An incremental java parser used inside Eclipse.
Parboiled Constructor of non-incremental parsers for Scala and Java. Despite the fact that it builds ordinary, non-incremental parsers, it is one of the most developed and well-known parser constructors in the Scala community. In my opinion, the project deserves attention.
Papa carlo My own incremental parser constructor for Scala, mentioned above.

Source: https://habr.com/ru/post/201774/

All Articles

Papa Carlo and incremental compilers

Problem

Decision

Conclusion and links

More articles: