Non-standard approach to the construction of a modern programming language

From the time of the university, I periodically find time to test the quality of existing products and conduct research in development. It so happened that one of my research was the creation of a modern programming language. Unfortunately, I did not succeed in this matter, but I discovered for myself some doors that I will share in articles.

The article is very introductory and "water", without technical details. Below, I’ll try to start a difficult conversation on how long keywords underestimate your development speed. You will have to believe the author by his word or conduct his research in order to trample the words of this post into the mud and generate truth in a dispute. Actually, the author calls for this.

There is something wrong with modern programming languages

In order not to be bored, the discussion will focus on the missing subtleties of languages, and not on the shortcomings of existing tools (often the language is strongly associated with a specific development environment). So, below I have selected 5 of the most "unnecessary and useless" items, which "no one talks about," and I will tell you. Yes, the author is a man and loves holivary in the comments, but asks to refrain.
')
1. Normal programmers want to write shorter and more understandable code for all. This trend is very noticeable with the advent of Python. See Swift versus Objective-C, D versus C ++, and other languages that have appeared recently. All of them seek to visually facilitate the construction of the language, retaining their meaning and purpose. Of course this does not work everywhere.

2. In companies, more and more time eats testing. It is time to automate testing closer to the language used to relieve testers from writing unnecessary codes. Built-in modern language constructs for unit tests should become monolithic with the language itself so that development environments can automate some of the code testing processes.

3. Until now, programmers have observed low portability of previously written code. With libraries / frameworks, everything is fine, but modules and classes are lost in tons of code. No OOP, AOP and functional approach does not improve the reuse of such a code. This task is solved in a complex way: the ease of the language, the functions of the development environment and the environment for a specific developer. Like a knowledge portfolio, your code will one day be portable as a folder with documents. You can easily combine, split, copy, form new packages from their combination. In a word, all that was not completed to the end with the classes java and many more.

4. In modern languages, there is a lack of code transparency. We are talking about your code, even without connected libraries. How much do you know exactly what is happening under the hood of this beast? Fortunately, problems are extremely rare. But when they arise, people climb into the jungle of bytecode, stack devices, and java class translations to find the cause of the problem in a code that looks normal at first glance. This problem is rare, complex (the language is certainly not the culprit) and generally not correctable. But the depth of the code you can reach to find a problem is not enough. The most cunning nuances of the device data types and their location in memory are hidden outside the code. This should be fought by increasing the transparency of the code from the high-level call to each bit.

5. It would be great if the source codes (in fact, the logic) would be independent of the iron architecture as much as possible. Today, all data types, languages, and their standard libraries are focused on binary architecture. This is great and absolutely correct, but it does not give freedom to researchers of non-standard hardware and architectures. Constructions of modern languages will not be applicable for decimal or ternary architecture.

I try to solve these and many other, much more important tasks in my development.

ΣL programming language

At the beginning of a few facts about why this name was chosen. Initially, the language was called WL (from white light), but someone famous took this abbreviated name with its grandiose project Wolfram Language. So I turned the W 90 degrees, and I liked it. After asking the Internet, it came to the realization that I turned the letter in the wrong direction. Despite the fact that the language with the name “Sigma” existed from 1965 until the 90s, there was no desire to rename the project again.

"Sigma" is a general-purpose programming language.

The basis of the language "Sigma" is a set that combines most of the existing programming paradigms. Sets are much more powerful and flexible tool than objects. The sets have been with programmers always and in every programming language before. An array is a collection of similar elements. A function is a set of instructions. Structures / records is a collection of data fields. A class / object is a set of data fields and function methods. Namespaces in modules are also sets that contain all this stuffing: functions, classes, etc. Sets like mat. sets can be combined and possess a number of properties.

At first glance, this is very similar to java objects and classes. In fact, everything is more complicated. Remember the above about arrays. Arrays are collections. Now remember the operations on the array and its contents. Provide a safe, controlled change to an object or class as if you were working with an array. Now you have a better idea of the sets.

The base set has the classic name source, all other sets are inherited from it.

A set is a collection of elements. It would be more accurate to say that a set is an interface for a collection of elements. Elements are functions, variables, other sets, etc.

The internal structure of the set is in fact a mathematical set. With the exception of some nuances, the set has the properties of a set. Or, if it is more clear, the set is a container of elements. The description of the set is visually reminiscent of classes. Hence the opportunity to apply the term "inheritance" from the PLO. Inheritance is the inclusion of a collection of elements in a new set. Multiple inheritance is the union of sets.

It is noteworthy that there are sets in all languages, but they were not used as containers at the level of classes and objects. At least in popular languages, I did not observe this.

Let's go to the numbers. Usually, all numerical values are recognized by the compiler during the compilation process, while in Sigma, the numbers are the basic structures of the language, the presence of which, like the presence of the source set, is unchallenged. It is actively used as a language advantage. To indicate numbers, an underline character is used:

_ - ,
__ - ,
__._ - ,
__,_ - ,
_._ - ,
_,_ -

It is possible to specify a value range, for example, __ [- 10:10] is an integer that takes a value from -10 to 10 inclusive.

The language has an unusual keyword "me". It is a replacement for this, but I will tell about it later.

Sample code in Sigma:

 @binary //       binary { signed { //… } unsigned { //  -       bit: source { //source -   ,    _[0:1] value; state True is { //  -: true, TRUE, True, tRUE, ... me.value == 1; } state False is { me.value == 0; } // ,        operator := {} alloc(){} init(){} dealloc() {} } //bit } //unsigned alias bit signed.bit; alias ubit unsigned.bit; }

Note the description of the value of the bit and its state. And also remember this code, we will come back to it.

The reason for this decision is not clear to many, but the programmer immediately gets used to this decision and simply uses it. It seems that there is no problem, but in fact there were a lot of them. To make the description of the state so clear and transparent it took months. You can talk for a long time about the type implementations in each language and about the work of the if statement, but this is tedious.

The above description explicitly introduces a state that becomes transparently described for the programmer, and not implicitly nailed to the type by a keyword. Roughly speaking, this code documents itself. At any time you can see what this or that state means, as a result of which you no longer need to memorize excerpts of documentation. Remember values like Infinity, NaN? Now they can be explicitly prescribed by such a construction.

The state of a set is the correspondence of certain elements of the set to the desired values.

This is currently implemented as set operations. A state is a fixed collection of certain elements from a set, its static subset. A set is in the described state if each element of a static subset is equal to its corresponding element from the set. Elements can / will be sets too. Slightly analyze the expression in braces:

 state False is { me.value == 0; }

On the left, we substitute the value in the process of execution (me.value), on the right, the static set (0) specified by the description. There may be several such expressions (of course, there is no contradiction, it is not permissible to write me.value a second time).

A difference of two sets is made. If they are equal, then the set is subtracted from itself (the result is an empty set). You should have a question about how the conditional operator works with it (and other operators too). The if statement checks for an empty set or not. If the answer is positive, a sequence of instructions is executed.

And do not worry, when compiling for the target platform, everything is optimized. Part of the code turns into a regular comparison on the target platform.

The main disadvantage of such a decision: you can not be able to prescribe a condition other than the identical. For example: you can not write "me.value> = 0;".

In the code, it looks quite familiar. Below are 4 options for writing a comparison and 4 options for assigning a state. All valid for the current version of the language ΣL.

 use binary; alias bit binary.unsigned.bit; bit b := 1; if (b == 1) { b := 0; } if (b == True) { b := False; } if (b is True) { b := binary.unsigned.bit.state.False; } if (b.state is True) { b.value := 0; }

In the binary.unsigned.bit set, comparison and assignment operators must be implemented (the latter for both number and state). The rules indicate their priority separate conversation.

Now is the time to return to the first code example, which I asked you to remember.

In the example above, a bit was described that is the basis of real binary iron data ( ~~Yours, Your Captain Obvious~~ ). Program logic based on such a narrative is completely at the mercy of the programmer and less dependent on the architecture of the hardware. In this case, the programmer has the opportunity, without changing the programming language and not entering new keywords, to implement the basic types of any iron architecture. This means that it becomes easier to dock such code with the architecture of a machine or an emulator of such an architecture.

All other language types for binary machines are derived from the binary module. For example, the description of types for x86 architecture begins like this:

 @x86 use binary; x86: binary { signed { byte {

Similarly done for both decimal and ternary architecture.

Also pay attention to the “use” keyword, after using it, the usual “import”, “using” and “include” look like “begin” and “end” after sishnyh brackets. The author is lazy to print, and his rushing. In fact, the author has seen many programmers who really liked this solution.

There are many other unusual solutions in the language. Everything has its time. “Sigma” is in development, therefore it is impossible to try this language at work now. Perhaps in the future, community interest will allow the language to emerge and occupy its niche.

The keyword me replaces self / this, etc.

In this part we will talk superficially about how the replacement of the usual keyword this (or self) with the word me affects the speed of typing code and sensations during the programming process.

Immediately make a reservation that the replacement of one keyword is a small oasis in a huge desert; that is, I want to suggest that such a replacement will not affect the development speed in any way. Only by adding many such blocks into a single tool can we achieve significant simplification and acceleration in development.

Any replacement to speed up text input can be brought to the point of absurdity, and in the case of a programming language, get something similar to BrainFuck. For this reason, using different hieroglyphs, lattice characters, and the like is absurd to replace the keyword this. Important note: for someone such a replacement may seem absurd, just this article is not for them. Moreover, the input of such characters on our keyboards is not always provided and convenient.

Programming language expressions should be as clear as possible, clear, short (and then concise and elegant for perception). If possible, use human-readable phrases. It helps to remember the constructions and adapt faster in the new language. Niklaus Wirth showed successful examples of such rigorous, clear and human-readable programming languages. But they had serious flaws in the speed of writing and reading code. A simple example is the begin end construct in the Pascal language. I think everyone will agree that after curly brackets {} of C, writing begin end is simply exhausting. New programming languages in addition to technology, past developments and modern padigms should take into account the above.

What human-readable and short keyword can replace the current this in order not to distort its meaning? As the author of a new language, this question worried me. Very weak, but disturbed. Therefore, in the comments write your own version, maybe it is you who offer “my”, which I refused.

Two perspective variants come to my mind: a very short i and a longer me. In my design, I stopped at the second version. This option is very similar to the communication kitten, but seriously, I justify my choice in one of the following publications.

If someone develops their own programming language in our century, I highly recommend trying to develop the i-method. With it, you can try to remove the reference point to the method or field, getting problems with the Hungarian notation and similar to Apple products iValue names.

Let us return to the question of how such changes affect the speed of the programmer. Five colleagues of programmers, including your humble servant, specially conducted this insignificant experiment. Omitting the details, the obvious was confirmed.

In the measurement of the speed of continuous printing (within the first 6 seconds from the start of the experiment), our team demonstrated the following averaged results:

3 this.
3 self.
5 me.

In the following seconds, the speed of typing me was significantly ahead of this and self.
The value of typing speed, besides continuous typing of the same phrase, is weakly related to the real work of the programmer. Therefore, a more important measurement was taken, already in a calm atmosphere and at the time of writing meaningful code. This time no one was in a hurry, the situation was close to everyday work.

The results showed that on average, self and this spend the same amount of time in the programmer’s work. Thus, self and this can hardly compete with each other. If the created set (class / object) rarely calls its own methods in the implementation and rarely refers to its data fields, then using me does not give any advantages. When writing code in different sets (mostly with frequent access to the own data fields), for every 10 self printed or up to this, up to 16 me was printed at the same time. Some colleagues noted that when using me, they focused more on the name of the method or field, that is, the expression:

me.valueResult += me.valueOne * me.valueTwo;

at the time of printing, it read as if I were not at all:

valueResult += valueOne * valueTwo;

According to them, this did not happen with self and this, but I do not believe them. In my opinion, self and this of course eat up more space in the line of code than they attract attention when reading. By the way, for someone it can be a plus. Funny minus - the speed of reading the code drops: “self” and “zys” read milliseconds longer than “m”. All the fun is that this really affects the speed of human understanding of the code.

It is important that such a small replacement, did not cause problems. All subjects very easily switched to the use of the keyword me, no one felt discomfort. On the contrary, everyone liked such a replacement.

If this topic interests you, be sure to conduct similar experiments, regardless of what programming language you write. It would be interesting to gather the opinions of not only the positive and negative experiences from such an experiment, but also those who do not care.

In any case, you are unlikely to spoil your project: the combination Ctrl + H plus replacing all “me.” (With a space in the beginning) with “this.” Will return everything as it was.

Thank you for reading.
Let patience and strength come with you!

Source: https://habr.com/ru/post/336598/

All Articles

Non-standard approach to the construction of a modern programming language

There is something wrong with modern programming languages

ΣL programming language

The keyword me replaces self / this, etc.

More articles: