DSL Application Development and Code Generation

What is it all about

In this post I want to speculate abstractedly on the topic of application development. At first, I decided to write simply about code generation, but as I thought about the topic, I had many thoughts that I also want to share. Therefore, it turned out a little wider than just about DSL.

What is DSL (Domain Specific Language) and Code Generation?

DSL is a language specific to a particular domain domain. Those. it is a language that operates on the concepts of this field directly. Usually opposed to general purpose languages. In principle, nothing prevents the language from being just a formal syntax that cannot be interpreted by a computer, but there is not much benefit from such a language. A computer language usually involves processing in some way, so it would be nice to have some kind of interpreter for DSL. Accordingly, there are two standard approaches - interpretation and compilation. The interpretation is more or less clear, and with the compilation the story is as follows. You can, of course, translate directly into processor instructions or at worst into an assembler, but why, if you can “write” normal code, in the sense of compiling into the text of a high-level language that is then converted by your compiler into something that is not started by a computer. Therefore, they often say “code generation” rather than compilation, although the latter term is also correct and is used.

Labor productivity

If we take the development of applications, then I consider the main problem to be low productivity, i.e. "Amount of product" on the effort expended. In principle, a similar problem occurs in all industries, and there are both general solutions and specific ones. We have a lot of different things to raise this very performance - high-level languages, powerful IDE, continious integration tools, scrum, canban, coffee points, coffee ladies, and much more. Nevertheless, product development takes a lot of time. This is especially noticeable when something that needs to be done can be easily described in words in a few minutes, and it takes weeks to do it. There is a significant gap between “what” and “how”. “What to do” is simple and understandable, “how to do” is simple, understandable, but for a long time. I want to do “how” - quickly, and ideally not to do at all. In short, a declarative approach.

Levels of abstraction

There is a very useful concept - the level of abstraction. It helps to structure the application. Suppose we have an application for some subject area. On the one hand (above) there are concepts from this subject area that will somehow appear in the application, on the other hand there is a general-purpose programming language (below), in which there are bytes, types, methods, and similar elements that have nothing in common with the subject area (we will not go down below to the operating system, electrical impulses, transistors, molecules, atoms, protons, quarks ...). The job of the programmer is precisely to link these two layers or to fill the area in the picture (left picture). If the application is large and the domain area is sufficiently “far”, then various intermediate levels of abstraction arise in the application, otherwise you may not be able to cope with the complexity (right picture).

Levels, of course, arise, but they arise logically. And it is necessary to put some effort so that the code also supports levels. This is especially difficult if the language is one and everything is running in the same process. After all, nothing prevents to call a method from level 1 at level 3. And functions or classes are usually not labeled with an abstraction level. What does DSL with a codogen suggest to us? We still need to fill the same area. Accordingly, the upper part is filled with our language, and the lower one with the generated code:
')
Unlike the previous example, the level here is impenetrable, i.e. DSL instructions cannot be invoked from the generated code (especially if they are not there). We will not consider cases when the generator makes the code on the same DSL ... Another important point here is that the generated code can be viewed as compiled, in the sense that it is created automatically and there is no need to look at it. Provided that the generator is already written (and well tested). Those. By writing a language and a generator to it, you can significantly narrow the scope of the application. This is especially valuable when developing multiple applications in this area or with the constant change of one.

Management "complication"

Let's imagine a situation that I think is quite common. Suppose you receive an order for the development of a certain system. You bring an ideal specification and you come up with an ideal system architecture where everything is fine, components, interfaces. encapsulation and many other equally beautiful patterns. Take a concrete example - an online bicycle store. You wrote according to the specification online store and everyone is happy. The store is thriving and thinking about expanding the business, namely, to start trading more scooters and motorcycles. And here they come to you and ask you to modify the store. You had a beautiful architecture, sharpened on bicycles, but now you need to peretachivat. On the one hand, scooters and motorcycles are similar to bicycles, and those and those have parts, accessories, and related products, but there are differences.
The system as a whole remains the same, but some of the functions must support still new types of objects, or separate functions must appear for new types of objects.
There has been a complication of the domain area, i.e. instead of bicycles only, bikes, scooters and motorcycles should now be supported. Our system must also be complicated. I think that in the general case the complexity of the software system corresponds to the complexity of the system being modeled. At the same time, there is a minimum possible level of complexity at which the problem can still be solved. (The top level does not exist - you can come up with an infinitely complex solution for any problem). I think that we should strive for the minimum level of complexity, because of all the possible solutions, the simplest is the best. In short, the code should be simple.
Let's return to our online store. Let there is a certain function which is written for the bicycle. Now it should work for new types.

public void process(Bicycle b) {
genericCode
specificForBicycle
}

for this, there must be specificForMotobike code inside. What are the solutions?

Copy / paste

public void process(Motobike b) {
genericCode
specificForMotobike
}
Copied the method, replaced the type-specific code and all. Simple, but there is a problem. If you need to change genericCode, then you need to change the same thing in several places, and this time, errors ...

If / else

public void process(Object b) {
genericCode
if(b instanceof Bicycle) {
specificForBicycle
} else if(b instanceof Motobike) {
specificForMotobike
}

Set the conditions and everything is ready. A bit better than copy / paste, but again there is a problem. And tomorrow they will want to sell ATVs and will have to look for such pieces throughout the code and add another one else.

Abstract method

abstract void specific()
public void process(Vehicle b) {
genericCode
b.specific()

At this point, an abstract method is invoked, which is implemented for each type. In principle, this may be an acceptable option, and may significantly complicate the system. Multi-storey inheritance hierarchies with a bunch of overridden methods, when it’s not easy to figure out which particular method is being called is not uncommon.

DSL and code generation

DSL is designed in such a way that all features of types can be described. In the code generator, templates are written that are applied to the type description and the code is obtained as in copy / paste
Template:
public void process("TYPE" b) {
genericCode
"SPECIFIC CODE"
}

DSL:

type Bicycle:
property A, ( description, value, links ...)
type Motobile:
property B,

property C,

Further, for each type of DSL template is transformed into a specific code. From my experience, it is difficult to immediately write a language that would support new entities without changes, but changes to the language and generator are usually small and simple. In general, the approach is the following - a lot of simple code is generated that is easy to read and understand, and it doesn’t matter that there are a lot of files and they can be several thousand lines. After all, it's not writing with your hands.

DSL at the beginning or formalized specification

Here I come to the most important thing. (before this was the introduction :) How does the process of starting a project usually look like? Specifications are written, diagrams are drawn, the architecture, the stages of the project are being worked out. And when it's all done, they start writing code. Specifications are free-form documents. Why don't the specifications be formalized? My main idea is to first develop a system description language in terms of the domain domain. This will be partly both a description of the architecture, and a partly formalized specification. At the same time, the customer will understand the language, as he directly uses the terms of the subject area, and he, too, will be able to take part in the development of the system. The idea, of course, is not mine. In the literature, this approach is called Domain-Driven Design (DDD). I just say that the DDD approach works well with DSL and code generation.
Formalization means the possibility of automatic processing. You can add a variety of checks for consistency, consistency. On the other hand, system developers have a ready-made formalized declaration of what should be. It remains to write the converter ~~in as~~ in a working system, the same code generators.

Not so smooth

Of course, not everything is so simple and smooth. Like any other approach, there are problems and shortcomings.

It is not always clear what to generate. We must imagine the final system. After all, not all code is generated and you need to understand what will be generated, and what is written by hand, and how it will all work together. Sometimes it is easier to first write everything manually (keeping in mind the future generation), and then pull out part of the code into templates and generators.
The second problem is the balance of the generated and manual code. There is no point in putting code into a template that is not actually parameterized and is always the same. A bad practice is to use the approaches from the examples above at the same time.
Dependencies between manual and generated code. Do not make the manual code break when DSL changes. (text on DSL)
"Damage" of brain kodogeneratsii. Writing code generators is somewhat different from writing regular programs. Using the “wrong” style leads to writing “not very” code. Saves review and "healthy" colleagues.
Another point I encountered is difficult to convince the customer of the correct approach. They say that they used to do it somehow, and we will continue to live normally, and here you are with your ideas. And in general, where is the support of scooters, which you should have done yesterday? Go to work.
Have you seen a DSL developer job? But here, probably, just like getting a Haskell programmer. Make yourself a Java programmer (C ++, Perl, Python, etc). Convince ~~Haskell~~ DSL to be awesome. And here you are a DSL developer.

Tools for developing DSL and writing code generators

Everything that I wrote before would have little practical meaning without normal development tools. Fortunately, there are such funds. The tools are different, but my choice is Eclipse Xtext. The most important thing in xtext is integration into the Eclipse IDE, in particular, there are all the standard features - syntax highlighting, errors and warnings, content assist, quick fix. This is what is called "out of the box." And then what fantasy enough for that. I think I will make a few more practical posts on the topic, if there is interest.

Conclusion

I think I did not discover America. Much of what I wrote is trivial things. But on the other hand, I think the topic of DSL and code generation is not sufficiently disclosed, so I decided to try my hand at enlightenment. And about Eclipse Xtext is not so much heard, and even more use.

Source: https://habr.com/ru/post/239361/

All Articles