EWD: Substitution Processes

Hi, Habr! I present to you the translation of the article Substitution Processes (1962) by Edsger Dijkstra . The division into paragraphs is not original.

Introduction

A machine determines (by its very structure) a language, namely: its own input language - and, on the contrary, the semantic definition of a language defines a machine capable of understanding it. In other words, a machine and a tongue are two sides of the same coin. I'm going to describe such a medal. I leave to you the decision of which of these two aspects of the subject of my conversation is, in your opinion, the most important, since I myself consider this choice rather ridiculous. The language, the outline of which I am going to give you, is prohibitively difficult for a person, and the car that I am going to describe is perversely ineffective.

Therefore, if my mental construct, nevertheless, has the right to exist, its justification must be found in other qualities. In my opinion and judgment, you can find them in my car, at least in its exceptional simplicity and elegance, in the uniformity of the ways in which it performs quite different (at first glance) operations; the justification of my language is its clarity and an unusually high degree of ambiguity arising from strict sequential interpretation and explicit indication of the operations performed in the program, although usually all operations are implied (and some misunderstandings arise from this). If someone wants, he can consider my car and language invented for educational purposes.

Before I begin the description, I would like to warn you about two deliberate omissions. The system that I am going to present is the result of a careful choice between a multitude of “neighboring possibilities”. I will not explain my choice, I will even deliberately leave the alternatives not mentioned. In other words, I will refrain from introducing my system as, as it were, a “local optimum”, at least in a sense. Since this reduces the persuasiveness of my presentation, I personally regret this omission; however, I must skip such explanations for the sake of brevity.

Another question that I will not touch upon is the question of how to implement this system using an ordinary machine. You can even raise the question (and I did it myself, for the sake of verifying that what I am doing is not stupid) whether it can be implemented at all, no matter how rude. Take my word for it that this is possible. I have developed a method of implementation to the extent that I could convince the most suspicious listener of this opportunity. But I do not intend to show you the details of this implementation, because I had to include too many arbitrary decisions that, if they were mentioned, would distract attention from the basics. In particular, the issue of memory allocation will remain intact.
')

Numbers and arithmetic

My machine manages (and manages) the units of information that I call words. Without loss of generality, I can limit myself to a finite number of different words, each of which is represented by the same number of bits.

The machine distinguishes between different types of words, such as numbers, operators, variables, and delimiters. Now we will focus on the first of them: "words-numbers" and "words-operators."

A normal arithmetic operation, such as the addition or multiplication of two numbers, contains two numerical words as input and one word, also a number, as an output. The rules that assign numerical words (for example, derived from bits) to numerical values are embodied in the operation of an arithmetic unit, which has the usual property that the same rules apply to both input and output: the output of the arithmetic unit can be entered again him in later stages of work. Since we assume that the properties of an arithmetic unit are constant in time, we can say that numerical words have a "fixed meaning." Since the fixed interpretation of numerical words is connected with the constant properties of the arithmetic unit, it is not surprising that we will denote the basic arithmetic operations with operator words (" + ", " - ", " * ", " / ", etc.), the meaning of which can also be considered fixed.

The machine is running a program that mainly consists of a string of words. At present, I will limit myself to parts of the program that prescribe the calculation of arithmetic expressions.

Consider an expression that is usually written as:

  5 + 39 / (7 + 2 * 3) - 6

in the usual postfix notation (also known as “Reverse Polish Notation”), this will result in the following sequence of numbers and operators (the adjacent elements in this sequence for the readable presentation on paper are separated by spaces):

  5 39 7 2 3 * + / + 6 -

A well-known mechanism specifically designed for the sequential evaluation of such expressions is what I prefer to call the “stack”. (This device was invented and generalized independently by many people, which is why it is now known by a wide variety of names such as push down list, nesting store, cellar, last-in-first-out-memory etc.) If we consider the above sequence of numbers and operators as a line of words representing a part of a program, the machine reads this line word by word from left to right. If she meets a numeric word, this number (i.e., a copy of that numeric word) is added to the top of the stack; if she meets an operator word, the corresponding operation is performed on the top of the stack. In the illustration below, the lines indicate the successive states of the upper part of the stack, while the vertex itself is located on the right side of the line:

  ... 5 ... 5 39 ... 5 39 7 ... 5 39 7 2 ... 5 39 7 2 3 ... 5 39 7 6 ... 5 39 13 ... 5 3 ... 8 ... 8 6 ... 2

and the final result of the execution of this small part of the program is that the value of the expression was added to the stack.

Calculations and substitutions

As clearly shown in the example above, the machine starts by copying the text of the program word by word to the top of the stack. Sooner or later it needs to be interrupted, otherwise our machine will simply be a copying machine. In the above system, the copying process is interrupted by the appearance of an arbitrary statement in the program text. Thus, the function of the operator is double: firstly, it shows that the copying should be interrupted, because now the operation needs to be performed, secondly, it sets this operation. I propose to separate these two completely different functions: henceforth, arithmetic operators are basically processed in the same way as numbers are processed, i.e., the word of the operator is also copied onto the stack. Every time the copying process has to be interrupted, I will indicate this in the program explicitly by inserting a special word entered from now on and denoted by “ E ” (from “Evaluate” - calculate). Now my car has the following form: it reads the text of the program word by word from left to right, where “reading” means the following: if the read word is not equal to “ E ”, its copy is added to the stack, if this word is equal to “ E ”, it is not is copied, and instead the operation is executed, indicated (initially) by the top word of the stack.

According to these rules, the program prescribing the evaluation of the expression of our previous example will now consist of the following line of words:

  5 39 7 2 3 * E + E / E + E 6 - E

and under the control of this part of the program text (i.e., when this line of words is “machine readable”) the top part of the stack will be sequentially changed as shown in the following lines:

  ... 5 ... 5 39 ... 5 39 7 ... 5 39 7 2 ... 5 39 7 2 3 ... 5 39 7 2 3 * ... 5 39 7 6 ... 5 39 7 6 + ... 5 39 13 ... 5 39 13 / ... 5 3 ... 5 3 + ... 8 ... 8 6 ... 8 6 – ... 2

As shown above, the machine performs the operation indicated by the top word of the stack when it reads the word “ E ” in the program text. We confine ourselves to such programs that at the moment after reading the word “ E ” the top word of the stack is really an operator word (and not, for example, a numerical word). In addition, we confine ourselves to the case when the words located directly behind the operator meet any requirements put forward by this operator. (For example, in the case of the binary arithmetic operations described above, the two words immediately after the operator must be numbers.)

In other words: if the operand of an arithmetic operation is an expression, we substitute its expression for its numeric value before performing the operation; with this, we satisfy the condition that, in its essence, arithmetic operations are defined only on numerical operands.

Variable calculations

We consider replacing an expression or subexpression with its numerical value as “substitution”, and we explicitly indicate when these substitutions should be performed, although it is unnecessary - “ 3 + 4 ” will always be equal to “ 7 ”, regardless of when we execute it. addition.

However, the situation changes radically as soon as variables (in contrast with constant numbers) come into play. (Below, we will denote variables in small letters, reserving capital letters for “special words” such as “ ” and others that will be introduced later.) Suppose we need to calculate the value of the expression:

  x + 4

at the moment when the value of the variable is equal to 3. This means that in the expression above, we must substitute its value for at the time of the calculation; only after that can we perform an arithmetic substitution (“ 3 + 4 ” replace with “ 7 ”). From something that depends on " x " (i.e., the expression " x + 4 "), we get a result (i.e., " 7 ") that does not depend on future changes of " x ", since we replaced " x "on its value when performing the substitution. We created a snapshot of the variable “ x ”. Obviously, I insist on explicitly specifying when to create this snapshot of the variable " x " (which changes over time!).

Now we will harvest the first fruits of our labor, since the mechanism of this explicit indication has already been introduced. The part of the program that prescribes the evaluation of the expression:

  x + 4

now looks like this:

  x E 4 + E

and, in accordance with the above assumptions, the sequence of states of the stack is:

  ... x ... 3 ... 3 4 ... 3 4 + ... 7

Our machine invites us to describe the fact that “the value of the variable x is 3 ” in several other formulations, namely: that the state of the program execution process is such that reading the word “ E ” at the moment when the top word of the stack is “ x ”, leads to the replacement of this upper word with the number word " 3 ". Thus, a variable in the upper part of the stack is considered as an operator of this variable, which in the process of calculation is replaced by something depending on the state of the program execution process at that moment; such an “operator-variable” is performed without setting special requirements to the stack words immediately following it. (The similarity between operators and variables will be further emphasized by our next example.)

Intermediate Calculations

All words read in the text of the program are added to the stack, with the exception of the word “ E ”, which causes the machine to perform substitution. For the reasons that will be explained below, we would also like to be able to add the word " E " to the stack. However, the framework for this extension is already present. We introduce a special operator, denoted by the word " P " (from "Postponement" - deferment), which is performed in the course of calculations by a fixed substitution, i.e. it is replaced by the word " ". We will illustrate the use of the “ P ” operator in the following example.

In this example, we have three variables with the names " x ", " y ", and " plinus " (plus-minus plinus plus or minus). Suppose that the state of the program execution process is such that reading “ plinus E ” generates the word “ + ” at the top of the stack. When reading the text:

  x PE y PE plinus EPE

the top of the stack will have a sequence of states:

  ... x ... x P ... x E ... x E y ... x E y P ... x E y E ... x E y E plinus ... x E y E + ... x E y E + P ... x E y E + E

and the top of the stack, therefore, contains a string of words that, when read as part of a program, will perform the evaluation of the expression " x + y ". If the value of the variable “ plinus ” were “ - ”, we would generate the expression “ x - y ” (i.e., a string of words that, being part of the program, will perform the evaluation of this expression).

What we have produced is an intermediate evaluation of the expression “ x plinus y ”, the result of which is again an expression. In our previous examples, the final action with the stack always left one number. The number is a trivial example of an expression, because the production of not only numbers, but also more general expressions in the form of intermediate results, is an obvious continuation of common practice.

Variable changes

So far, we have described the generation of words at the top of the stack, but not what we will do with these words. In addition, we have assumed that with respect to a given variable, the program execution process may be in such a state that the calculation of this variable will lead to a previously defined replacement, but the definition of this replacement has not been previously mentioned. These two spaces will be filled with the introduction of assignment operators.

To assign a value to one word, as in " x := 3 ", we could write in our program:

  3 x := E

resulting in top of stack states:

  ... 3 ... 3 x ... 3 x :=

During the execution of the assignment operator “ := ”, the machine examines the word immediately behind it. This should be a variable to which the assignment should be applied; the word following it is assigned to this variable (more on this below), and the three words at the top of the stack (which are now being processed) are removed from the stack. Prior to the subsequent assignment (i.e., the new assignment of the variable “ x ”), the calculation of this variable will lead to the replacement of the top word of the stack with the word “ 3 ”.

String assignments

Accurately to the position of the left and right side, this is closely related to the similar assignment statement used in ALGOL 60 . However, we don’t have enough to assign a value from just one word, we need to assign a value from a string of words, and therefore we should be able to indicate how deeply the assigned value is pulled into the stack. The easiest way to do this is to insert a marker in the stack, for example, the special word “ ” (from “Terminal” - end) after the last word of the value assigned. In addition, we introduce another assignment operator “ :- ” (called “string assignment” as opposed to “word assignment”, presented in the previous paragraph). During the execution of this operator, the machine explores the top of the stack in a downward direction. The first word (immediately under the operator " :- ") should be a variable, which should be assigned a value. After that, the machine continues its word-for-word study down until it meets the special “ ” marker: the words thus transmitted form together a string, which is perceived as an assigned value.

The easiest way to add “ ” to the stack is to simply insert the word “ ” in the right place in the program that controls the stack. However, we will not do this; for reasons that will be explained later, we need the ability to generate " " on top of the stack under the control of the program, which itself does not contain this word. We can do this with the same trick that allowed us to create an “ E ” on top of the stack. We introduce a new operator, denoted by the word " S " (say, from "Separator" is a separator, or simply because " S " precedes " T " in the alphabet), which is replaced by the word " T " during execution, and we will establish a rule that this is the only way to add the words " " to the stack.

Using all this, we have an alternative entry of the assignment operator “ x := 3 ”, namely:

  SE 3 x :- E

giving the sequence of states of the upper part of the stack:

  ... S ... T ... T 3 ... T 3 x ... T 3 x :-

The result of this is equivalent to the previous form, which used the assignment of the words " := ".

Saving intermediate calculations to strings

Let's use a more powerful assignment in the example, which is an extension of one of our early ones, namely, that which describes the intermediate calculation of the expression “ x plinus y ”. The result of this intermediate calculation was an expression depending on the variables “ x ” and “ y ”, and suppose we want to call this expression “ z ”. To do this, we write in the program:

  SE x PE y PE plinus EPE z :- E

When the last operator “ E ” of this line is executed, the upper part of the stack will look like this (with the same assumption regarding the value of “ plinus ”):

  ... T x E y E + E z :-

and after executing this statement, the above words will be removed from the stack, including the word " T " inclusive. Before the subsequent assignment, the calculation of the variable “ z ” will mean the execution (“reading”) of the string assigned to it. During the calculation of the variable " z ", the machine must have access to the first word of this line; however, when she starts reading this line, she also needs to know where the last word of this line is. We assume that the assignment statement takes this into account by adding the end marker again, and for this purpose we can use the same word " ". During the calculation of the variable “ z ”, the line assigned to it will be read as a part of the program from left to right until the end marker “ T ” is encountered. The new situation associated with the last assignment can be easily represented:

  z → x E y E + ET

Similarly, our previous assignments:

  3 x := E

  SE 3 x :- E

will contribute to the situation presented as follows:

  x → 3 T

One of the most striking aspects of this arrangement is that the usual distinction between “numbers” and “instructions” has completely disappeared. The value of a variable is defined as part of the program, the calculation of this variable implies the execution of this part of the program.

Notes on current developments

On duality of assignments and readings

In addition, we would like to draw attention to a certain form of duality between appropriation, on the one hand, and reading the text, on the other. When a machine reads a piece of program text, the top of the stack is filled under the control of that program text. In a readable text assignment, it is created under the control of the stack contents. Duality can also be illustrated with regard to accessibility requirements. Words in the stack should only be available from top to bottom. However, if the assignment statement converts the top of the stack into readable text, subsequent words become available in the other direction.

Finally, the stack is reserved for “anonymous intermediate results,” whereas the readable text is, in principle and at least, always “named” because we create it by assigning it to a variable.

About recursion

The attentive reader noticed that along with the presentation of the value of the variable, we silently entered two more changes in our car.

The first - the appearance of the word " " in the text of the program and the "immediate response" of the machine to it is relatively simple. According to our agreements, the word " ", read in the text, is not copied to the top of the stack! Instead, it causes the machine to continue reading until the word variable is immediately before the “ E ” operator that caused this calculation of the variable in question. In other words, it acts as a “ return ” at the end of a closed subroutine.

But computing a variable may require computing other variables (including itself): the practical definition of computing a variable is basically recursive, and the mechanism that needs the accompanying recursive definition is ... another stack! I call this second stack the “activation stack” as opposed to the first one, which I call the “anonymous stack”. One of the functions of the activation stack is to control the reading process. When the calculation of a variable begins, the stack of activations is filled, and when the corresponding word “ ” is read, it decreases to its previous size. (In the usual terminology of the machine structure: the activation stack contains a stack of order counter values, its top element is, by definition, the counter of the current order; in the same terminology, its old elements act as a stack containing return addresses.)

Comment. We could try to merge our two stacks into one.This merging occurs in a completely natural way if they are to expand and contract synchronously with each other. In general, however, this is not the case, and trying to combine these two stacks into one will give a very unnatural construction.

Concept of identifier

We will use the activation stack for another purpose to satisfy a very fundamental need, namely the creation of new variables. Above, I used special words (" x", " y", " plinus", etc.) to refer to variables, and I carefully avoided using the term "identifier". I have used the term “variable” in the designation of a single and unique object that exists for a certain period of time and is capable of consistently taking on different meanings. This concept of a variable should be carefully distinguished from the “identifier” used in ALGOL 60, because the same identifier can be used to refer to many objects, to a large number of different variables.

First of all, we note the fact that the same identifier can play different roles due to the fact that it is found in more than one ad. Then the lexicographic rule can determine which of these declarations in which case to apply, for all uses of the specified identifier. This form of reuse of the same identifier can be removed by a simple renaming process.

But there is a much more subtle case of “repeated use of the same identifier”, namely: as soon as a certain block appears in one or several nested activations (as in the case of a recursive procedure). In other words: the same identifier sometimes refers to this variable, sometimes to another.

Use identifiers

In fact, an identifier denotes a variable and in order to clearly define which one, I will explicitly denote the moment when variable substitution occurs instead of an identifier.

For convenience (more precisely: convenience for the machine, and not for a hypothetical user), I intend to use the same identifiers for the local variables of each activation. (What I call “activation” is close to the same block or procedure used ALGOL 60.) I use the special identification words “ L0”, “ L1”, “ L2”, etc. for this purpose .

If the machine starts calculating a variable, the activation stack increases by one item. The beginning of this element also contains the record that so far no local variables have been entered in this activation.

If the machine reads the word " E" at the moment when the top of the anonymous stack contains one of the word identifiers (for example, " L2"), it examines the top element of the activation stack. If this is the first calculation of the identifier in the current activation, the machine creates a new variable for it (and can give this variable an empty value) and makes a record of this action in the top element of the activation stack. It then replaces the upper word of the anonymous stack with the newly created variable. During the next calculation of the same identifier within the same activation that remains current or returns to this state, the machine finds in the top element of the activation stack the record left there during the first calculation of this identifier and the upper word of the stack is replaced with the same variable.

Procedures

Now we can consider a more complex example. Let the values of the variables " x", " y" and " complus" (from "complex-plus" - complex plus) be represented as follows:

  x → 10 23 T

  y → 5 -2 T

  complus → L0 E := E L1 E := E L2 E := E L1 EE + E L1 EE L0 EE + E T

if we read the text now:

  SE x E y E complus E z :- E

the end result will be that we can present a new value " z":

  z → 15 21 T

and what we have done can be interpreted as the addition of two complex numbers.

In terminology ALGOL: “complus” is a procedure with four numerical parameters, all called by value. The simple structure of the process allows the first of them to remain anonymous even in the body of the procedure. In addition, it is a kind of “procedure with type”, that is, it accepts and returns, syntactically speaking, one value of some type instead of two primitives.

Note on arbitrariness of primitives

Let me finish with a trivial example. Suppose we want to write " plus" instead of " +". Assignment:

  SE + PE plus :- E

leads to the situation:

  plus → + ET

then expressions:

  x E y E plus E

  x E y E + E

fully equivalent. This example is included to show the arbitrariness of our primitives as clearly as possible.

Conclusion

I am fully aware that the developments cited here are definitely incomplete. It is especially important to introduce a conditional operator and some equivalent " goto" if you want to create a complete system of this. At the moment I do not understand them, and I do this for two reasons. Firstly, for the sake of brevity, and secondly, because I have not yet decided: I know several possible ways, but none of them completely satisfy me.

With some versions of these objects I created some more complex programs. They showed me both the strength and the weakness of my tongue, its strength - flexibility and unambiguity, weakness in that it was reasonably difficult to use it, at least for me.

If, however, I demand attention to this project, I do not do it just because it enchants me and can enchant others. This report is the quintessence of my thinking after we completed our implementation ALGOL 60. This implementation was conceived at high speed, and the main rationale for the numerous decisions made during those difficult months was the recognition that our designs would lead to our goal and somehow do our job. However, the machine described in this report represents the end of the spectrum of possible implementations of the algorithmic language, which (as in the case ofALGOL 60) satisfies recursiveness. In this capacity, everything was very understandable for me personally: it helped me a lot in evaluating various (not originally appreciated) tricks that we included intuitively, and this clearly showed us a number of alternative solutions. This justifies the hope that in the future the construction of translators and machine design will develop in the same vein.

In addition, the machine presented here is so ridiculously inefficient that, in all likelihood, any practical implementation of a practical algorithmic language can be regarded as its optimization, optimization through the restriction of a language. It may be useful to compare the proposed language with my language; During the language building process, timely detection of “costly functions” may be helpful. Whether such an expensive function will be included in the language or not, it is more or less a political question, but it is nice that the question was asked at all, regardless of how you answer this question.

Finally, the language described in this report (or a language developed in a similar way) may be an appropriate means to formalize the semantic definition of an algebraic language. The absence of such a strict semantic definition is one of the recognized shortcomings of the official “Report on Algorithmic Language ALGOL 60” and, having seen the huge problem caused by this defect, I sincerely hope that this report will help to avoid this error the next time an algorithmic language is developed.

Acknowledgments

A significant number of people contributed to this, consciously or not. In addition to all my colleagues in the Computing Department of the Amsterdam Mathematical Center, I would like to mention Dr. Maurice Wilks and Professor John McCarthy , who turned out to be inspirational listeners, and especially Mr. Michael Woodger : his criticism and comments (I recall his lack of enthusiasm for my The first tests in this direction now with gratitude) were a big help for me.

Source: https://habr.com/ru/post/336982/

All Articles