Stack programming with a human face

I think many of you have found articles and books about stack programming and the Forth language on the Internet. First, a wave of enthusiasm: how simple it is, logical, understandable and powerful! And why are these ideas so insignificant? Why do so few programmers really use languages like Fort? After some time, a wave of disappointment comes up: yes, an interesting idea, but how hard it is to read the source code, how dreary work is being done with variables, strings and fractional numbers! An interesting toy, useful to a group of baitosles, no more.

Often this ends it. But personally, I could not come to terms with the idea that elegant concatenative programming will remain in the shadow of other ideas. Yes, difficulty reading the source code. Yes, one line syndrome. Yes, each time to understand the algorithm, you have to imagine in the imagination to broadcast the program and imagine the stack, reading the source code. But really, these are the shortcomings that are necessarily inherent in stack languages and without which stack programming would no longer be stackable? Is it really impossible to at least smooth out these shortcomings and make life easier for programmers? It turns out that it is possible and necessary!

Problem one: single line syndrome

I first found the phrase “single-line syndrome” in Barron’s “Introduction to Programming Languages”. And although the term is not widely spread, it very expressively characterizes many languages.
')
A single-line syndrome is characteristic of those languages that allow the free structure of the source code of a program and have short, and sometimes even single-character keywords. The programmer is trying to “cram” as many keywords as possible into one line, which is why the programs do not look too readable. This syndrome is especially pronounced in the APL and its descendants, in Brainfuck and, of course, in the Fort. Look again at the illustration at the beginning of the post and calculate the average number of words per line.

But this is a solvable problem. In Forte, the single-line syndrome appeared due to the addictions of Moore, who loves short and meaningful English words very much. Unfortunately, such words quickly end, and suffixes and prefixes (so-called namespaces on the knee) Moore does not like. Therefore, now the whole world enjoys laconic hieroglyphs in the spirit of "@! C @ C! / MOD CR \ S P", stuck in one line. The problem is simply solved by the selection of more successful words. Why write:

: FOR-EXAMPLE OVER OVER + ROT ROT - * ;

If I may:

 define for-exemple (ab -- {a+b}*{ab})    over over    (ab -- abab)    +    (ab -- ab a+b=c)    rot rot    (abc -- cab)    - * end

And even better:

 define for-exemple [ab -- r]    [ab -> abab]    +    [abc (=a+b) -> cab]    - / end

But such a record will be discussed below.

Problem two: code inside out

Another difficulty is the management structure inside out. Of course, they are elegant from the point of view of the idea and implementation, but how can such a code be hard to read:

 : sign-test ( n -- ) dup 0 < [ drop "" ] [ zero? [ "" ] [ "" ] if ] if print ;

In this example, the code blocks are elementary and the whole definition of the word is in full view. But it is worth a little more to complicate it, as the program will seem unreadable to the author a week after writing.

The most usual if and while from imperative programming are in a hurry to help:

 define sign-test [x]   [x -> xx]   0 > if     " "    else     "   "   end-ifprint-string end

Maybe they are not as powerful and extensible, but very understandable and practical. And if you really want to create something like arithmetic if from the good old Fortran, the mechanism of code blocks at the top of the stack can be included in the language as an additional element and not be used everywhere, but only where it is really necessary and justified. Fortunately, to eat such a mechanism is not asking, and the implementation will not be too complicated.

As for the Fort, he has another problem with it: all the same words. Moore did not want to add identical final words like END to complex structures, so IF, DO and other words should be closed with their own unique words. Therefore, we see all these IF ELSE THEN, which lead even experienced programmers to a dead end. If you really do not want to complicate the parser and the mechanism of words, why not enter words like END-IF? This is reflected in Moore’s dislike of suffixes and prefixes. But, like the first problem, this moment is also easily solved and is not a specific shortcoming of stack languages.

Problem three: interpretation in the head

Due to a number of features of the program on Forte and other stack languages, it is difficult to read and delve into their essence. The thing is that every time you read a block of code in which a new word is introduced, you have to interpret the program in your imagination and at every step imagine what elements are in what order are on the stack and how functions work with them. Needless to say that in some places it is very tiring and unproductive. But the most unpleasant thing is that, unlike the previous features that have simply historically been so unfortunate, the need for such an interpretation is an eternal companion of stack programming.

Of course, it is impossible to get rid of this drawback, but there are ways that make it much easier for a programmer to read the source code. And most importantly: the first steps have already been taken. Indeed, Fort programmers adopted a certain notation to make it easier to read the source code:

 (  --  )

Such comments facilitate the understanding of the numbers on the stack, their number and order. You do not need to strain your imagination every time to understand how many numbers and in what sequence you need to put on the stack before calling the function and how many numbers on the stack will remain as a result of the calculations. But here is a riddle: if such comments are very visual and useful, why do Fort programmers write them only at the beginning of the definition of a word, and even then not always? What religion interferes with after the heap drop dup swap rot to write such an explanatory comment?

Let's return to the second code example. Of course, this is a case in a vacuum, but in real complex programs such comments will be relevant:

 define for-exemple (ab -- {a+b}/{ab})    over over    (ab -- abab)    +    (ab -- ab a+b=c)    rot rot    (abc -- cab)    - / end

There is no need for the programmer every time after all these swap, rot and + to simulate in his head the order of the numbers in the stack. Just need to run through the eyes of the comments. But here is a new problem: records

 (ab -- abab)

AND

 over over

are similar. Just the first entry is made in a declarative style, and the second - in the imperative. That is, the lines in the program are constantly duplicated. With the same success, we can talk about the convenience of the assembler: on the left, write code with a bunch of mov and ret, and on the right in the comments a = a + 1, and then talk about readability. Well, yes, read the comments. But they can be contrived to write so that when using any programming language they will be read very easily and clearly. From this, of course, it does not follow that such languages are convenient. They can be arbitrarily bad from the point of view of reading the source code.

There is a natural desire to combine (ab - abab) and over over. Indeed, if you write comments in a certain notation, the compiler can parse them and manually add the necessary words to rearrange the elements of the stack. Comments in parentheses are completely ignored and are intended only for humans. Comments in square brackets are needed both by the person and the broadcaster. The translator analyzes such comments in which a person writes what he needs, and not how to arrive at such a result. That is, you can work with the stack in a declarative style.

Consider the main, so to speak, varieties of such expressions:
1. define foo [bar] or [bar - foo]
After the name of the new function in square brackets you need to specify the number of arguments that should be on top of the stack. If, when calling a function that requires three arguments, there are only two arguments on the stack, then the traditional Fort system will generate an error: the stack is empty. But with such comments, the translator will be able to “look” at the comment and determine the names of the missing arguments: aha, the argument r is missing here when calling the function foo. Agree, it is much more informative than the lean "stack is empty." Naturally, all that has been said applies only to working interactively when source code is available. After compilation, these comments will disappear, they are needed only for debugging and reading the source code. For comparison, error output in gforth:

 : add-xy ( xy -- x+y ) compiled + ; ok 4 5 add-xy . 9 ok 4 add-xy . :6: Stack underflow 4 >>>add-xy<<< . Backtrace: $7FF17383C310 + ok

2. [ab -> a]
The following types of expressions differ from the purely descriptive word ->, which is similar to the word - in classic comments. If the number of elements on the left is greater than the number of elements on the right, then the translator comes to the conclusion: some elements need to be thrown away, they are no longer needed. If an element is on the top of the stack, then this structure is converted to drop, but if it is not, the translator first performs a permutation of the elements of the stack so that the garbage will be on top of the stack, after which the drop will be applied. Do I have to say how such a record makes life easier for a programmer in the event that you need to remove, say, two items from the middle of the stack. Let the translator himself decide how best to do it, the programmer only describes what he needs.
3. [ab -> ba]
Such expressions mean a permutation of elements: the number of elements to the left and right of the word -> is the same and the elements themselves are the same. It remains only to choose the optimal scheme of permutation. Let the car do it, people are so arranged that long chains over swap drop rot enter them into a stupor.
4. [ab -> aabb]
The number of elements on the left is less than the number of elements on the right. This suggests that some elements need to be duplicated. But what are these elements and in what order should they be located? Let the translator decide. It is inconvenient for a person to think in terms of swap dup rot dup.
5. [ab -> ab]
Nothing has changed, purely descriptive design. An example application is given below.

Naturally, in such declarative expressions you can use the usual comments:

 dup rot + [ab -> ab (a + b)]

We describe some words of the Fort in the new form:

 define dup [x]    [x -> xx] end define drop [x]    [x -> ] end define swap [xy]    [xy -> yx] end define over [xyz]    [xy -> xyx] end define rot [xyz]    [xyz -> yzx] end

Problem four: floating point numbers

Imagine a stack consisting of 32-bit cells that store integers. All such a stack is good: and the speed of arithmetic operations, and the convenience of working with the addresses of variables. But if we need to multiply 3.14 by 4? Where to put a fractional number? If you organize the stack as a set of 64-bit cells in which floating-point numbers are stored and store integers as fractional numbers with a zero fractional part, advantages such as the speed of arithmetic operations will immediately evaporate.

We introduce the second stack for floating point numbers. Cells in it will be, say, 64-bit, but they will be less. All numbers without a point are put on the vertices of the integer (integer) stack, and all numbers with a dot (even if they have a zero fractional part) are stored in a fractional (float) stack. That is, we can multiply the numbers mentioned as follows:

 4.0 3.14 *f

where * f multiplies fractional numbers (similar to + f, -f, and so on).

We also introduce a new declarative expression:
[stack: abc -> stack: abc]
Which takes items from one stack and puts them in another:

 4 5 [integer: z1 z2 -> float: z1 z2] 2 times print-float end-times

The fractional stack will show the numbers 4.0 and 5.0. The reverse operation "cuts off" the fractional part.
Define new words:

 define integer->float [x]    [integer: x -> float: x] end define float->integer [x]    [float: x -> intger: x] end

Similarly with the return stack.

Fasting turned out quite voluminous and sometimes controversial, so much of the material will be in the second part. Again, the ideas and criticisms from the discussion will correct the writing plan.

Source: https://habr.com/ru/post/242821/

All Articles

Stack programming with a human face

Problem one: single line syndrome

Problem two: code inside out

Problem three: interpretation in the head

Problem four: floating point numbers

More articles: