📜 ⬆️ ⬇️

A new programming language with your own hands and head

Hi, Habr! Immediately to the point. At the moment I am reading The Dragonbook and I am developing a compiler for my programming language, named Lolo (in honor of a penguin from a Soviet-Japanese cartoon). I plan to finish within a year if nothing hurts. At the same time, I will be posting interesting excerpts from the experience of broadcasting, building intermediate code, optimization, etc., and today I will just introduce you to the language. Sit down and go.

The language is compiled, imperative, not object-oriented, semantics is impudently written off from C and supplemented with many useful features. Let's start with them.

Semantic modifications


Safe pointers


You may have thought about smart pointers from Rust, but it’s not them. In my language, the security of accessing memory is ensured by two idioms. First, the absence of a pointer dereference operation. Instead of it, when referring to the declared pointer, the object itself is referenced. That is, you can and should write like this:

int # pointer ~~ new int(5) int variable ~ pointer + 7 

The variable variable now contains the number 12. Now you see an unfamiliar syntax and, perhaps, a little perplexed, but I will explain everything in the course of the article. Second idiom: no pointer operations. Again: all operations when accessing pointers, including assignment, increment and decrement are performed on objects. The only operation directly related to a pointer is the assignment to an address, or, as I call it, identification. In the example code above, in the first line it is the identification. Any pointer can be put to the address only of the already allocated memory area, which is the one returned by the new operation. You can also put a pointer to the address of another variable allocated at least in the heap, even in the stack. Here is an example:
')
 int variable ~ 5 int # pointer ~~ variable 

Here "~" is the usual assignment operation. You can also identify pointers with a special null pointer. It acts as a pointer, referring to the zero address. After identifying the comparison operation and the comparison of the identity (the same address) with null will produce the truth:

 int # pointer ~~ null if (pointer = null) nop ;; true if (pointer == nul) nop ;; true 

Here "=" is a comparison of values, "==" is a comparison by addresses, "nop" is an empty operation, and after ";;" - comment. And yes, null is the only pointer, operations with which are possible without checking type compatibility.

Thus, pointers can be assigned only to allocated memory areas or null and cannot be moved anywhere. However, these measures do not fully protect against segmentation fault errors. To get it, just follow these steps:

 int # pointer1 ~~ new int(5) int # pointer2 ~~ pointer1 delete pointer1 int variable ~ pointer2 ;; segmentation fault! 

I think everything is clear here. But you can make such a mistake only on purpose, and then, having tried hard. After all, the delete operation does the same thing as the garbage collector, only less secure. By the way, about him ...

Garbage collector


Garbage collector - he and Lolo collector. Probably no need to explain what it is. I can only say that it can be turned off by a special option when compiling. We tested the program with the collector, everything works as it should - you can enter an option and try to optimize the program using manual memory management.

Embedded arrays


Although I said that the semantics of the language is written off from C, the differences are quite substantial. Here arrays are ≠ pointers. Arrays have their own syntax and secure addressing. No, not with range checking. With them, in principle, it is difficult to get a runtime error. This is because each array stores a length variable in size, as in Java, and with each indexing from the index ... the remainder of the division by this size is found! A stupid decision, at first glance, until we look at the negative indices. When finding the remainder of dividing -1 by the length of the array, we get a number equal to size-1, that is, the most final element. With this maneuver we can turn on indices not only from the beginning, but also from the end of the array. Another trick is casting any primitive type to a byte [] array. But what about all the same get a runtime error, you ask? I will leave this question to you as an easy riddle.

Links


I don’t know for sure if the current C standard includes references, but they will definitely be in Lolo. Perhaps the lack of references in earlier versions of C is one of the main reasons for pointers to pointers. They are needed to pass arguments to an address, to return values ​​from functions without copying. Pointers and arrays can also be passed by reference (since when passing by value, arrays will be completely copied, and pointers put to a new place will not save it).

Multithreading


Everything is more beautiful and more beautiful. I'm already in love with my language. His next fad is multithreading. Honestly, I have not fully decided what tools it will provide. Most likely, the synchronized keyword with all the ala-Java properties and, possibly, the concurrent keyword before non-embedded (inline) functions, which means “to run these functions in parallel threads”.

Embedded strings


It is strings, not string literals, as in C ++. Each line will have its own length, indexing will occur with finding the remainder. In general, strings in Lolo are very similar to character arrays, with the exception that arrays do not have concatenation operations with "+", multiplication with "*" and comparisons with "<" and ">". And since we started talking about strings, we need to mention symbols. The characters in Lolo are not numbers, as in C ++. Both contain not one byte, but 4 for DKOTI characters and 6 for UTF characters. I will tell you about DKOTI next time, but for now just know that Lolo supports characters and strings in two encodings. And yes, the length property can be taken even from constants:

 int len ~ "Hello, world!".length ;; len = 13 

Boolean type with three values


In the vast majority of programming languages ​​that have a logical data type, binary logic is used. But in Lolo it will be ternary, or rather, fuzzy ternary. Three values: true - true, false - false and none - nothing. So far I have not found operations that return none in the language itself, but I remember many examples from practice when flags with three values ​​would be very useful. Had to use enums or an integer type. No longer have to. That's just the name of this type can not choose. The most banal - “logical”, but too long. More options - “luk” in honor of Jan Lukasevich, “brus” in honor of N. P. Brusnetsov and “trit”, but strictly speaking, this type is not a perfect one. In general, a poll at the end of the article.

Lists of initialization of structures and lists


If after declaring a structural variable, put a ~ and open square brackets, you can set the values ​​of its fields in turn or as a dictionary. If you carry out such a procedure with an array, you can set the values ​​of its cells, only without a dictionary. There is nothing special to tell, just look at the code:

 struct { int i; real r; str s; } variable ~ [ i: 5, r: 3.14, s: "Hello!" ] int[5] arr ~ [ 1, 2, 3, 4, 5 ] 


Returning multiple values ​​from functions


Just like in Go! You can write several variable names separated by commas and assign all of them the values ​​returned from the function:

 int, real function() { return 5, 3.14 } byte § { int i; real r i, r ~ function } 

Modules instead of headers


It's all clear. Instead of C-shnyh headers - modules from Java.

for (auto item: array)


Again, native Java. Since we have arrays with length, it's a sin not to use the expression for each.

The select statement is not just for int


I do not know about you, but in C and C ++ I am terribly enraged by the lack of the ability to use the switch-case operation for non-integer variables. And the syntax is also annoying. Here in Pascal is another matter. And now in Lolo:

 case variable { "hello", "HELLO": nop "world": { nop; nop } "WORLD": nop } 

Operators of exponentiation and division of the whole


And this is from Python.

 real r ~ 3.14 ** 2 int i ~ r // 3 

Tuples of function parameters


Remember that in Lolo all operations with pointers are prohibited, except for identification? Now let's remember how to access function parameters from variable-length parameter lists. You need to declare a pointer to the first element, and then increment until the truth test returns true. In Lolo it is impossible to increment. But that's okay. After all, the list of parameters here is presented as a fixed-tuple (call-dependent) length, with safe, as with arrays, indexing. His name is "?" Type checking is performed only for parameters set in the function definition. The rest (“multi-point”) parameters are reduced to any type, and with an awkward movement, their behavior is not defined. But still, such a tuple is much safer and more convenient than macros in C.

 void function(...) { if (?.size > 1) { int i ~ ?[0] real r ~ ?[1] } } 

Numeric intervals


And one more character - a family of interval types (range, urange, lrange, etc.). They are given by two integers through two points (..) and can cut an array from an array, from a string a string, in general, a useful thing, I think.

 int[5] arr ~ [ 1, 2, 3, 4, 5 ] int[3] subarr = arr[1..3] ;; [ 2, 3, 4 ] 

In operator


From pascal. Works with strings, arrays, tuples? and ranges.

 int[5] arr ~ [ 1, 2, 3, 4, 5 ] if (4 in arr) nop 

Function Parameter Dictionary


Honestly, I got confused already, as this thing is correctly called, with the help of it you can set the arguments of non-pure (pure) functions directly:

 int pos = str_find(string, npos: -1) 

Default settings


From C ++. There is no need to give an example, and so everything is clear.

Exceptions


So where do without them?

 try { raise SEGMENTATION_FAULT_EXCEPTION } except (Exception e) { print(e.rus) } 

Lack of unconditional transition


Because in 2019, using the operator GOTO is like death.

Syntax


Well, a little talk about the syntax. As you noticed, the semi-colon is shaded. Modern programming languages ​​do very well without this source of errors. Examples are Python, Kotlin. The arrow operator (->) is combined with the dot operator. When calling functions without arguments, brackets are optional. Strings are given in numbers and vice versa. Logical and bitwise operators are combined. There are modifiers of functions for tabularization. Nested functions type_of. And the most important thing is multilingualism. Yes, I am going to duplicate the keywords, properties of strings and arrays and all the identifiers of the standard library in all languages ​​of international communication, namely English, Russian, Japanese, Chinese, Spanish, Portuguese, Arabic, French, German and Latin.

In fact, all of the above does not include half the possibilities of Lolo. I just can not immediately remember all of its features. I will add as the compiler is ready.

Source: https://habr.com/ru/post/460283/


All Articles