Programming language o42a

I do not like programming. I need a result.

It is clear that any “result” in programming is intermediate. It is followed by maintenance, error correction, development, and, therefore, work with already written code. Therefore, the result includes not only a working program, but also its source code, the maintenance of which will be the more expensive the less it is suitable for it, or, simply, the more it is connected in this code.

But the main thing is to make it work. And the sooner - the better.
')
All that separates the idea from the result is programming, the essence of which is the presentation of thoughts, the logic of solving the set task - in a representation available for machine processing.

So here. Modern programming languages interfere with expressing thoughts, burdening with a lot of details that are not relevant to the problem being solved, and are required only in order for the translator of the language to “get it”. And it's not even about syntax, although many, especially compiled, languages are too verbose. This, above all, is about linguistic entities, which are the “terms” in which they should be expressed. These entities — for example, functions, variables, classes, methods, packages, namespaces, generalizations, patterns — are too narrow, specialized, designed more for machine representation than for human understanding. They force to translate thoughts into their language. It is not difficult, of course. But this is absolutely not relevant to the problem being solved. The selection of suitable linguistic entities and the translation of them distract from the task, reduce concentration and, consequently, reduce the efficiency of development. And, I suspect, in a substantial way. Understanding the essence in the process of reading such code is even more difficult, which also does not have the best effect on labor productivity, especially during team development.

The problem with modern programming languages is that they force the programmer to adapt to the machine or the theories on which they are based, instead of adapting themselves to the programmer. And the fact that mathematical theories are strict, iron is iron, and the convenience of a programmer is subjective, does not mean that one should not even try.

The main idea of o42a is to automate the work of a programmer. And this is achieved by a radical reduction of the types of linguistic entities to the only one that can directly replace them all. The task of effective machine representation of such an entity falls entirely on the compiler.

Idea

I must say at once: such an entity should not be something like a Swiss knife, useless in its universality. But this is not a primitive brick to build anything from such bricks (like lists in Lisp).

In different programming paradigms there are many entities that would seem incompatible with each other. So how do you achieve their symbiosis?

The idea many years ago suggested to me the Prologue. In addition to the magic of the predicate calculus, there is one more striking feature in it: looking at the predicate record, you can see that at the same time it is a record of the usual function. You can consider this way and that, the essence does not change.

I wanted to apply this idea as much as possible. Wouldn't that different concepts from different programming paradigms have much more in common than it seems? The main thing in this business is less dogmatism.

The second idea of o42a is the possibility of separating language semantics and syntax. Syntax does not have to correspond one-to-one with linguistic entities, however universal they may be. Syntax should express the semantics of a program, not a programming language. It should be convenient for perception and rather strict. It is the task of the compiler, and not the programmer, to bring the text of the program to language entities.

You can match the syntax with the representation, and software entities with the model from the MVC paradigm. And you can call it DSL .

Here I will begin description with syntax. I suppose that its least significant features will cause the greatest rejection.

Basics of syntax

I decided that the best syntax invented by mankind is written speech. So the closer the language syntax is to written language (English, of course, according to the tradition that has developed in programming), the better. In the end, the expression "readability of the code" predisposes to this choice, because written speech was created to be readable, and it was taught to read from childhood.

There are no keywords in o42a. For all that is needed, icons are used. At first glance, this may seem like a terrible decision. However, it is necessary to take into account that there is only one entity in the language, and there are not too many syntactic constructions in which it participates. So that icons are not required much.

Names

Names in o42a are case-insensitive and consist of words, Latin numbers and hyphens, separated from each other by spaces:

Hello World Links 2- 3- 4 -90

Letters in words are any letters of a unicode. Multiple spaces in a row mean the same as one space. Spaces do not have to be between letters and non-letters, numbers and non-numbers. A name must not begin with a digit or a hyphen and must not end with a hyphen. Hyphens must be single. The space before the hyphen is prohibited in order to distinguish it from the subtraction sign.
So the names above can be written differently:

 Hello world links2-3-4 -90

Comments

Comments in o42a are also unusual. Tildes are used as separators. And they are used for both lowercase and block comments.

A line comment begins with two or more tildes:

 a + b ~~ The sum

A line comment ends at the end of the line, or with two or more tildes:

 a +~~~plus~~~ b

A block comment begins and ends with a horizontal line of three or more tildes. No characters other than whitespace should be on the same line as the line:

 ~~~~~~~~~~~~~~~~~~~~~~~~~~ Copyright (C) 2012 This file is part of o42a. ~~~~~~~~~~~~~~~~~~~~~~~~~~

For documentation purposes, it is intended to use Markdown.

Line breaks and underscore

The statements in o42a are combined into sentences and can be separated by different punctuation marks: commas, periods, commas, exclamation marks, or question marks. Each of these signs has its own purpose. But I will tell about it later. It is important that the punctuation mark at the end of the line is optional - then a point is assumed. It is necessary to transfer expressions or prescriptions to the next line explicitly, using an underscore:

 Sum = _left operand + _right operand

The underscore can be placed at the end of the previous one or at the beginning of the next line.

Also, the underscore character must be used to separate names, since the names themselves may contain spaces:

 Print error _nl ~~     "\n" ~~      (stderr  C).

Other

Decimal numbers:

 1 234 567

There are no literals for real numbers or for hexadecimal notation in the language. However, this is not a problem, since there are phrases (more about them later):

 float '3,141 592 653 59'

The strings are “glued together” as in C:

 "abc" "def" ~~  ,  : "abcdef"

Shielding is normal, using a backslash. Unicode code points are always recorded in hexadecimal form and escaped in a special way:

 "\t \42a\ \" \' \\ \r\n"

There is support for multi-line text. Escaping in multiline text does not work:

 """"""    """"""

Objects

The object is the main essence of o42a.

An object is created by inheriting from another object. This is the only way to create objects.

All objects are directly or indirectly inherited from the Void object - the only object that is not inherited from anyone.

Fields and inheritance

An object may have fields. A field is an nested named object.

Here is an example of a field declaration:

 Object := void ( Field := "Value" ~~ `Field` -    `Object`   `String`. )

By default, fields have a public scope. But you can declare them internal (private) and protected (protected).

Any expression in o42a either refers to an existing object, or creates a new one. There are no other expressions. Thus, absolutely any expression in o42a is an object reference. A string literal is a reference to an object inherited from a standard String object. Number - respectively from Integer .

You can access the field of the object with a colon:

 Object: field

Any object can be inherited. When an object is inherited, all its fields are also inherited. In this case, the fields can be overloaded:

 Derived object := object ( ~~ `Derived object`   `Object`. Field = "New value" ~~  `Field` . )

To overload a field, use the = sign instead of := .

However, in an inheritance object, you can declare a field with the exact same name as in the inherited one:

 Another object := object ( Field := 123 ~~        `Field`. )

In this case, the new object will have two fields with the same name. You can refer to them as follows:

 Another object: field ~~ 123 Another object: field @object ~~ "Value" Another object: field @another object ~~ 123

Prototypes and abstract fields

As it is not difficult to see, objects replace classes. Indeed, why do we need classes at all if objects have complete information about their own structure? Only one application remains: when it is necessary to define a (abstract) program interface and several different implementations of it.

For these purposes, you can use prototypes:

 Interface :=> void ( ~~~ .     `:=>`. ~~~ Name :=< string ~~~  .     `:=<`. ~~~ )

The difference between prototypes and ordinary objects is that their contents (fields for example) cannot be accessed. The following code will result in an error:

 Interface: name ~~ : `Interface` -  .

But the prototype can be inherited, like any other object. This is, in fact, the only thing for which it is needed.

In addition, the prototype may contain abstract fields. Such fields must be overloaded with inheritance:

 Implementation 1 := interface ( Name = "Implementation 1" Implementation 1-specific field := 1 ) Implementation 2 := interface ( Name = "Implementation 2" Implementation 2-specific field := 2 )

According to its purpose, the prototype is a normal class. In contrast, a regular object in o42a is both a class and an instance of it.

The latter should be familiar to Java programmers. These are anonymous classes:

 Runnable task = new Runnable() { @Override public void run() { System.err.println("Done!"); } }

This expression creates both an anonymous class and its instance. The difference of o42a is that the “classes” created in this way are not necessarily anonymous.

Object values

Every object has a value. The type of this value is inherited from the ancestor object and cannot be changed unless it is void .

There are several types of values in o42a. Each of them is represented by a standard object. Here are a few simple types:

Void - null, the base type for all others.
Integer is a 64-bit integer.
Float is a 64-bit floating point number.
String - a string of unicode characters.

There are also more complex types: for example, rows and arrays, connections and variables. The type system will expand over time, as needed.

The value of the object is not necessarily a constant. It is calculated using the algorithm defined by the definition . A value definition is a set of statements in the body of an object. It can be quite complicated: with conditions, cycles, and everything. But the actual value of the object is given by the prescription (return) of the value of the form:

 = value

The following ads are equivalent:

 Value := 5 ~~   ,  : Value := integer (= 5)

The definition of a value is inherited and may be overloaded.

Here is an example of determining the sum of two numbers:

 Sum :=> integer ( Left operand :=< integer Right operand :=< integer = Left operand + right operand )

Note that the same definition can lead to different values:

 Sum (Left operand = 1. Right operand = 2) ~~ 3 Sum (Left operand = -1. Right opernad = 10) ~~ 9

Adapters and Samples

When creating an object, in addition to the ancestor, you can specify one or several samples by which the object will be created:

 Object := ancestor & sample 1 & sample 2 (~~  ~~)

In this case, the fields and definitions of the samples will be inherited by the new object, and the object itself will become compatible with the samples (in the sense of the Barbara Liskov principle of substitution ). Possible conflicts of inheritance will be resolved in accordance with certain rules.

Yes, this is multiple inheritance. But is it always appropriate to use it? In practice, inheritance (including multiple) is applied in one of three cases:

To indicate that an object is a variant of the object inherited by it, that is, instead of the “this” particle. For example, "Watermelon is a berry."
To give the object certain properties, for this purpose some languages use “impurities” (traits, mixins).
To add an additional program interface to an object or to bring it to another type, in this case it is better to use composition.

Samples are convenient to use in the first two cases, and for the third in o42a there is a separate adapter mechanism.

An adapter is a field of an object whose identifier is not a name, but another object:

 Foo := void ( Value := 123 @String := "Foo=" + value ~~ `String`   `@`  ~~   ,    . )

The adapter object always inherits its object identifier. When casting, o42a first checks if the object is inherited from the desired one, and then tries to use an adapter to it. So here is this code:

 Print [foo] nl

Prints Foo=123 , despite the fact that the Print parameter must be a string, and the Foo object from String not inherited. The corresponding adapter will be passed as a parameter.

Standard value types are converted to string and other types using adapters.

The adapter can be accessed directly:

 Foo @@string

And also you can refer to the fields of the adapter itself:

 Foo: length @string ~~ 7

Note that the syntax for referring to adapter fields is the same as for referring to fields of the object itself. Is that an indication of the source field ( @string in this case) - necessarily.

Adapters also apply in other cases. For example, to designate the main object of the application:

 Use namespace 'Console' ~~ `Print`  `Main`    `Console`. @Main := * { ~~  ,    `Main`  . Print "Hello, World!" nl }

The execution of the application will be addressed to the @Main adapter, which will print the famous text.

Applications can come up with many. You can, for example, use adapters to calculate the hash function of an arbitrary object. It is enough to define the adapter @Hash code .

Adapters allow you to add the desired functionality to any object. This eliminates the need to have any fields in the base Void object. In addition, it is type safe, unlike annotated or specially named methods of the form __str__ .

Generalized programming

In o42a there are no usual templates (templates) or generalizations (generics) with their characteristic type parameters. Each object in o42a is already a generalization, parameterized by its enclosing object and the adjacent fields.

The fact is that the reference to an object is usually not static. This means that if you inherit an object that contains such a link, then the inheriting object can receive another object using the same link.

Here is an example:

 Base := void ( A := void ( F := 123 ~~ `Base: a`   `F'. ) B := a ( ~~ `Base: b`   `Base: a`.  = 456 ) ) Object := base ( ~~  `Base`. A = * ( ~~  `A`. G := f * 10 ~~   `A: g`. ) ) Object: a: g ~~ 1230 Object: b: g ~~ 4560

Notice that the Object: a: g field was already defined after Base: b . However, the expression Object: b: g absolutely correct.

The fact is that objects inherit not just other objects, but expressions that reference them. Being executed in another context such an expression may resolve to another object and, therefore, lead to a slightly different inheritance hierarchy. It is important that in whatever context the expression is not resolved, the resulting object is always compatible with the original one, that is, it is either he himself or his heir.

Such functionality will not completely replace traditional generalizations, but in many cases will eliminate the need for them. For more complex cases, the macro mechanism is provided in o42a (also original, but how else?). But I will not talk about macros and metaprogramming possibilities in this article.

Summarizing the above

In o42a there is still a lot about which I have not told:

on how to build sentences,
about imperative programming support,
about phrases - exclusively syntactic way of constructing object-oriented expressions (no lambda calculus!)
about operator overloading (operators are also phrases, by the way),
about complex data types, including variables and relationships,
about macros and directives - metaprogramming mechanisms involved during compilation,
about the modules and tree structure of the source code of the application.

And, of course, I omitted many details. This is material for future articles.

However, the purpose of this article was to introduce you to the plan and its implementation. I hope I explained everything clearly.

So, the main and only semantic unit of a language is an object that can replace a great deal with itself:

Object is a namespace
There is nothing special to comment on: the object's fields are symbols in this space.
Object is a class
Also described above. Classes are not needed if you directly inherit objects.
Object is a function (and method at the same time)
The arguments of such a function are the fields of the object, and the result is its value.
Object inheritance is a function call, field overloading is a parameter substitution (currying).
But the object can more. After all, it can be inherited again, and the fields can be reloaded again (re ... curry?).
The object is a generalization.
Yes, peculiar. However, the possibilities are even greater.
A generic object has the same advantage over traditional generics or patterns as a function object over ordinary functions: the parameter fields can be overloaded again and again. Try a java generic or C ++ template instance to replace an already substituted type parameter. And sometimes you really want not to produce unnecessary abstract classes.

Redundancy and normalization

You have to pay for everything. And the price of the “everything is an object” approach is high.

Simplest expression

 a + b

where a and b are integers that just need to be folded, it turns out to be a jumble of objects:

 Integers: add ( Left operand = a Right operand = b )

that need to be constructed (inherited by overloading fields), only to request their values.

However, there is nothing unexpected in this. It was immediately clear that the discrepancy between machine representation and human understanding would lead to a serious redundancy of this same machine representation. Unless, of course, implement it "in the forehead." But do not do that.

Redundancy can and should be eliminated. This process is called normalization. Thanks to him, the example above will be compiled in addition of two numbers. After all, if a person understands that this is just an addition, then the compiler can be taught this.

Normalization techniques are rather non-trivial. And the implementation of these techniques is in the initial stage. However, in such simple cases, everything works. This can be verified by compiling, for example, the test set o42a with enabled and disabled normalization ( o42ac -normalize=0 ). The sizes of executable files will differ by four.

And this is what the Hello, World! Program looks like in LLVM IR:

hello_world.ll

 ; ModuleID = 'hello_world' target datalayout = "Ep:64:64:64-S0-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f16:16:16-f32:32:32-f64:64:64-f128:128:128-v64:64:64-v128:128:128-a0:0:64" target triple = "x86_64-pc-linux-gnu" %o42a_val_t = type { i32, i32, i64 } @CONST.STRING.1 = private constant %o42a_val_t { i32 6145, i32 13, i64 ptrtoint ([13 x i8]* @DATA.STRING.0 to i64) } @CONST.STRING.2 = private constant %o42a_val_t { i32 1, i32 1, i64 10 } @DATA.STRING.0 = private constant [13 x i8] c"Hello, World!" define i32 @main(i32, i8*) nounwind { main: call void @o42a_init() nounwind call void @o42a_io_print_str(%o42a_val_t* @CONST.STRING.1) nounwind call void @o42a_io_print_str(%o42a_val_t* @CONST.STRING.2) nounwind ret i32 0 } declare void @o42a_init() declare void @o42a_io_print_str(%o42a_val_t*)

Agree, there is almost no excess. And not a single complex structure like an object. But the program involved several.

The name “normalization” was chosen intentionally. Unlike "optimization", based on not always suitable heuristics and often incorrect assumptions, normalization implies a predictable result, achieved in accordance with the principle of minimal redundancy. And the strict semantics of the language, limiting the number of linguistic entities - a serious help in the implementation of techniques and rules of normalization. If more mathematical theory to bring this matter ...

The essence of normalization is that the compiler, instead of generating executable code in accordance with the original designs, tries to find out: how a particular object is used in the program and, in accordance with this knowledge, first simplify the universal "object" to the minimum redundant performing entity (constant, block of executable code, function or pointer to one of several functions, structure ...) and generate executable code for it.

If you think about it, then normalization fundamentally changes the program development process. If you now have to think through software interfaces in advance, make decisions about whether something will be implemented as a class, a function, its parameter or something else, then the normalizing compiler will make such decisions on its own, based on an already written program, knowing exactly how another entity is actually used and should not be used . From the point of view of the normalizing compiler, architectural decisions about software interfaces, made in advance, based on assumptions are nothing more than a premature optimization.

But besides the normalization, one more thing has been implemented:

During compilation everything is calculated that is possible. And this is not an optimization, but the main functionality. Without it, the language would have to be made interpretable.
Everything that is not used is thrown out of the program: for example, unnecessary objects, as well as unused fields in the necessary objects. This functionality is quite expected, so I do not refer it to normalization.

State and prospects

Project o42a has been implemented by one person on a full-time basis for the past three and a half years. So you must understand that I take this job seriously.

The project has never been thought of as an academic one, and its goal is to create a pragmatic general-purpose programming language, as well as a platform for it. I came up with the first ideas a long time ago, when I was still a student, but I began to create a language with many years of experience in developing new projects and supporting old projects. So the very idea of the language, all decisions on it, as well as the actual implementation, are the fruits of practical experience, and not abstract theories.

The development of the project is at the implementation stage of the prototype. The compiler is written in Java and uses LLVM to generate executable code. Source codes are available under the GPLv3 + license (runtime libraries - under LGPLv3 +).The project has a website with documentation in rather bad English, as well as a forum, which is still empty (I don’t provide direct links for understandable reasons).

The current version o42a-0.2.4 contains over 130 thousand lines of source code. The compiler is still quite fragile, and the libraries are almost absent. The target platform is GNU / Linux x86_64, others have not been tested.

In the coming months, I plan to implement a collection library as well as an input / output library. At the same time I will debug the compiler, resolve the remaining questions on the language itself, and also write examples with the Rosetta Code. It is assumed that version 0.3.0 can already be used somehow.

With future prospects, everything is rather vague. At the beginning, I didn’t assume that the development would take so much time, but now I can well imagine the scope of work. For one person it is a lot, and the savings, which I lived in recent years, are exhausted. So if the development does not take a serious office “under the wing”, then it will slow down, since I will have to return to freelance, to my native oDesk. I don’t plan to ask for donations. Somehow it is not serious: you will not collect decent money, but you will be obliged.

If anyone has a serious desire to do such an ambitious project - please contact us. Though I am a simple programmer, I can come up with ideas for monetizing this monster. Only you need to take into account that this is not a web two-zero-price-10k-want-milen start-up. These are serious, long-term investments, including serious marketing.

If there is a desire to work for free for the benefit of Open Source in general and the o42a project in particular, then I will have tasks, including a task not directly related to o42a and useful as an independent project (the required library on pure C).

If you have thoughts on how to finance the development, please share it.

Source: https://habr.com/ru/post/157329/

All Articles