Tuples in programming languages. Part 2

In the previous part, I looked at the implementation of tuples in various programming languages (I also considered non-scriptable compiled languages with static typing and classical C-like syntax, which surprised some readers). In this part, I propose to go beyond the existing and take up the essence of the programming language design. The source data is the same: a compiled non-scripting language with static typing and a C-like syntax that includes an imperative paradigm (although not limited to it, of course).

In this part we will try to dream and experiment - and what can you do with tuples? How to squeeze the maximum out of them? How to use them to make the programming language more powerful and expressive, how to cause admiration from the true Hackers of the Code and at the same time not too confuse the ordinary programmers? What unexpected possibilities appear in the language, if you correctly and correctly extrapolate the semantics of tuples in different directions, and what difficulties arise in this case?

So, if you like reflections and holivars on the topic of programming language design, then I ask for cat.

SYNTAX
To begin with we will be defined with syntax. It is possible to consider different versions of tuples in the code for a long time - in curly brackets, in round brackets, without brackets at all ... I like the curly version for a number of reasons, and take it as a basis (at least until there is a real need for another syntax). So, a tuple is a sequence of names or expressions, separated by commas and enclosed in braces. Like unified initialization in C ++.

{1,2,3} {i, j, k}

Other options may exist, but for our purposes this will be enough for now.
')
I would also like to mention the simple idea of creating tuples from ranges and repetitions. Ranges - a way to specify tuples for enumerated elements (which include integers and enumeration elements)

 {1..10}

Repetitions - taken from Assembler, a way to fill a tuple with the same value.

 {10 dup 'A'}

In this article, I will not use these methods, and just mentioned them as a beautiful idea.

GENERAL IDEAS
In the first part, I mentioned a certain thought that caused some confusion - that the tuple is “not quite the type”. If you open the same Wikipedia, then it clearly states

In some programming languages, such as Python or ML, a tuple is a special data type.
In programming languages with static typing, a tuple differs from the list in that the elements of a tuple can belong to different types and the set of such types is predefined by the type of tuple, and hence the size of the tuple is also defined. On the other hand, collections (lists, arrays) have a restriction on the type of stored items, but do not have a limit on the length.

However, what did I mean by saying that this is “not exactly a type”? Sometimes (and quite often) you want to have a certain language construct that would allow locally grouping arbitrary objects at the compilation stage. Some group pseudonym that would simply be the equivalent of its constituent parts without introducing any additional entities. In fact, this is just a way to look at the tuples from a slightly different side. For example, multiple return from functions. It may well be considered as the return of a single object of the tuple type; but on the other hand, the assignment operation for multiple returns is just one of many operations; using Rust as an example, we have seen the possibility of another operation on tuples — comparisons for equality. And what if you allow all operations to be performed on tuples? Immediately, various questions arise - what kind of operations can they be, can they be performed on tuples of different lengths, on a tuple and a single value? You can also consider the semantics of transferring tuples to functions (for example, “opening” a tuple during transmission to a function, or performing a function on all elements of a tuple, etc.). Of course, we will need convenient tools for creating and decomposing tuples, accessing elements of a tuple. Perhaps something else? ..

MULTIPLE RETURN FROM FUNCTIONS AND VOID TYPE
Let's start with the simplest, and already implemented in many languages - with multiple return from functions. A function can return multiple values, and we call it returning a tuple of several values — just as a function can take multiple arguments. The following generalization suggests itself: a function that returns a single value returns a tuple of one value. At the very least, it can be taken as a wish that tuples from one value be freely converted to these same values, and vice versa.

Further extrapolation leads us to the void type. As you know, this is a special type used in most programming languages to indicate that a function does not return a result. Creating objects of this type is prohibited. It is not difficult to guess that void is not a full-fledged type, but a symbol for a zero-length tuple. Creating objects of this type in the usual sense is really impossible; but if our compiler is able to work with tuples in a more advanced way, then nothing prevents us from introducing a “pseudo-object” of type void in the form of an empty tuple {} and something to do with it. The question is what? And in what cases can it be useful? For now, just note this and move on to the following extrapolations.

MULTIPLE OPERATIONS
We considered multiple return from functions and the associated multiple assignment. But assignment is just one of many possible operations. By analogy with the assignment (from which everything started), we will try to build other operations on tuples:

 {x,y,z} = foo(); //   ,   foo()  3  {x,y,z} += foo(); //   ,   ? {x,y,z} ++; //    {x,y,z} = {1,2,3} + {a,b,c}; //

So far, it looks good, although perhaps a little alarming is the thought of how such expressions with a large number of operations will look. The important thing is that each object in the expression is a tuple of the same number of elements . Deviation from this rule gives rise to various ambiguities, which should be avoided when designing a programming language. However, they must be considered necessarily (and, if possible, allowed).

GROUP OPERATIONS
In addition to multiple operations, group operations on a tuple and a single value seem very attractive. In fact, this is the first deviation from the rule of the same number of elements in each operand of an expression. But I want to make it beautiful, so we try:

 {i, j, k} = 0; //    ...  ,    ,     ? ++{i, j, k}; //    ... {i, j, k} += 100; //    rec.{x,y,z}.At(i).flag = true; //

The general view of such a binary operation is “tuple operator value” , or “value operator tuple” (most operations are commutative); And under the form “value operator tuple” , the assignment of a single variable to a whole tuple, and in particular, the result of a multiple return of a function , falls.

 x = {1, 2, 3}; //     x   ? x = foo(); //  foo()   ,    x ?

In the case of assigning multiple returns from a function, obviously I want the first return value to be written in “x” and the following ones ignored. Here is such an unexpected ambiguity. But somehow it is necessary to solve it - I really want to have such a syntax. It could be useful not only for group assignments, but also in a number of interesting cases. For example, indexing an array and even accessing a structure field by name is also an operation.

 arr[ {1,2,3} ] = { 10, 20, 30 }; //     obj.{x,y,z} = {10,20,30}; //     ... {obj1, obj2, obj3}.x = { 10, 20, 30 }; //  ... {arr1, arr2, arr3} [ 0 ] = {10, 20, 30 }; //

Despite the apparent obviousness of the expressions, formally arr [{1,2,3}] is the form “value operator tuple”, where “value” is arr, “tuple” is {1,2,3}, and “operator” - square brackets. Unlike assigning a tuple to a single value, there are no questions here - the result of such an operation must be a tuple {arr [1], arr [2], arr [3]}. But for the compiler, that assignment, that indexing are just binary operations. Hence, an expression of the form x = {1,2,3} should unfold in {x = 1, x = 2, x = 3}, that is, the variable x is sequentially assigned all values of the tuple, and the result is also a tuple (by the way, if such a programming language was in reality - it would be an interesting question for backfilling for all sorts of interviews: what would its elements be equal to? {1,2,3} or {3,3,3}?)
Thus, the question of the proper organization of operations on a tuple and the only element is still open.

BINARY OPERATIONS ABOVE COLORS WITH DIFFERENT SIZES
Consider a more general case - the sizes of the tuples - operands of a binary operation do not match. What to do in this case? As always, the simplest solution is to prohibit :) But let's try to figure it out ... Go and Rust have a special metavariable "_" (underscore), which is used when some of the elements of the tuple to the right of the assignment operator are not needed. In our syntax, this will look like this:

 {x, _ ,z} = foo(); {x, _, z} = {1, 2, 3};

The operation with the second component of the tuple is simply ignored. Using the metavariable "_" in Go and Rust is mandatory for multiple assignment in case some results of the returned tuple are not needed. Thus, in these languages, the requirement of mandatory matching of the sizes of the tuples is preserved. But in these languages there are no other multiple operations except assignment; when trying to extrapolate the metavariable "_" to other operations, such interesting results are obtained that they should be considered in a separate chapter.

Let's try to consider the general case: what to do if such an expression is written (@ is a generalized binary operation)

 {a, b} @ {x, y, z}

What are the opportunities?

Perform smaller tuple size operations. That is, the result is {a @ x, b @ y}. It contradicts group operations (if we consider single values equivalent to tuples with one element) —for a group operation, a single operand must interact with each element of the tuple.
Perform operations on the size of a large tuple. At the same time, elements of a larger tuple, which did not have enough elements of a smaller tuple, remain unchanged. {a @ x, b @ y, z}. Also contradicts group operations for the same reason.
Perform operations on the size of a large tuple. When the smaller tuple ends, cycle to its beginning. {a @ x, b @ y, a @ z}. This corresponds to group operations, but abstrusely.
Perform operations on the size of a tuple with a larger size. When the smaller tuple is complete, perform operations on the last element of the smaller tuple. {a @ x, b @ y, b @ z}. Also corresponds to group operations, and also abstruse.
Perform operations in pairs "each element of one tuple with each element of another tuple" ("Cartesian product"). {a @ x, a @ y, a @ z, b @ x, b @ y, b @ z}. Also corresponds to group operations, and also abstruse. And by the way, the question arises of how to group pairs - according to the first tuple or the second, or some other way.

Perhaps there are other options. But now one thing is clear: if a programmer is given such an opportunity, then an indication of the way tuples interact should be made explicitly . In this case, it is almost a prerequisite for maintaining clarity in the code. For this purpose, some keywords can be used that cover such an expression. Until we specify the list of such words (we can assume beforehand that for each of the options there can be one keyword)

ACCESS TO ELEMENTS
And now we should turn to another important opportunity - access to the elements of the tuple.
Traditionally, brackets are used for indexing, and a dot is used for access by name; although the Swift variant with the index through a dot is not so bad, it is ambiguous when using named numeric constants instead of numbers, and it is not at all unusual; I would prefer to use square brackets for access by index (it is important - by constant index) and a point for access by name (if there is one)

 {10,20,30}[1]; // 20 {x:10,y:20,z:30}.z; // 30

It seems everything is simple? Actually no longer. Since we have introduced multiple and group operations on tuples, and these include indexing "[]" and referring to named members of the structure ".", Using these operations on tuples, whose elements are complex objects (for which indexing is defined or "Point") - it is not clear what to do: access the element of a tuple or perform a group operation on all elements of a tuple?

 {arr1, arr2, arr3} [1] // "arr2"  "{arr1[1], arr2[1], arr3[1]} " ? {x:obj1, y:obj2, z:obj3}.z // "obj3"  "{obj1.z, obj2.z, obj3.z}" ?

Another interesting aspect is getting (designing) tuples. According to the agreement adopted at the beginning of the article, we use curly brackets for simple construction of a tuple object. However, in some cases it may be necessary to build a tuple from another tuple by excluding or adding elements. Syntactically, this can be done using “multiple indexing”, applying essentially the same rules as for arrays or structures.
To obtain a tuple, you could use multiple indexing or ranges:

 {a,b,c,d,e}[1,4,0] // {b,e,a} {a,b,c,d,e}[1..4] // {b,c,d,e} {a,b,c,d,e}[1..3,0] // {b,c,d,a}

(the indexing operation itself contains brackets, therefore, within brackets one more curly ones can be omitted)
If the elements of the composite object have names, you can refer to the names:

 obj.{a, d, e} //   obj.{b .. f} //  -

It should be noted an interesting consequence - since the tuples are indexed by constant indices, the idea suggests that the field names should be reduced to constant indices (and maybe vice versa)

Thus, there are at least two “special” operations, “dot” and “square brackets”, which can act on the tuple itself as an integral object. The remaining operations for the tuple are not defined, although it can be assumed that we need, for example, concatenation of tuples — flat pasting of two or more tuples into one long one. Therefore, the open question arises: is it necessary to somehow allocate access operations directly to the elements of the tuple? Or is it more appropriate to allocate operations on each element of the tuple?

FROM OPERATIONS TO FUNCTIONS
Any operation is equivalent to some function. For example, the unary bit-inversion operation ~ x can be represented as neg (x), and the binary addition x + y as sum (x, y);
therefore, considering operations on tuples as multiple operations, the question arises - what to do if a function call is involved in such an expression?
To begin with, a unary operation:

 ~{1,2,3}; //    neg({1,2,3}); //

By analogy with “group assignment”, we need to expand the tuple as follows:

 {neg(1), neg(2), neg(3)}

at first glance, this seems quite logical; the function itself takes a single value and returns a single value; must return a single value so that you can make a tuple of them.
Probably, functions with two arguments could be similarly expanded. for example

 {1,2,3} + {4,5,6} sum( {1,2,3}, {4,5,6} )

 { sum(1,4), sum(2,5), sum(3,6) }

Although it must be admitted that the syntax of an implicit multiple function call for tuples is too implicit, and just like with operations, some explicit way to indicate such a call suggests (although the possibility of such a call is not bad by itself).

On the other hand, recall the syntax Go, D, C∀: there the tuple passed to the function unfolds inside the argument list, replacing the corresponding number of arguments. And in general, this is also very logical and expected - but again incompatible with “group” semantics! Is it possible to somehow resolve this contradiction? And we have not yet considered complex options, when the dimensions of the tuples-arguments do not coincide, when tuples and single values are mixed, when we want to get the Cartesian product from the results of the operation on the elements of tuples, etc.

The solution, which seems to be quite good, came oddly from C ++. There is such an opportunity as templates with a variable number of parameters, and for transferring a package of parameters (by the way, too, a tuple) to another template, the syntax is used with a triple-point. The ellipsis visually marks that the given argument is “revealed” in this context (and this is very important for the perception of the code!). It is important that this is immediately visible in the code. The only thing that is not visible is how many arguments it reveals. But if you want to specify in detail - nothing prevents access to individual elements of the tuple.

 foo(t..., 10, 20); //  foo(t[0], t[1], t[2], 10, 20); //  foo(tx, ty, tz, 10, 20); //

Finally, the most difficult situation: a function with several parameters, and tuples are transferred to some (or all) parameters, and with different lengths. Similar to the situation with binary operations, there are many possibilities for opening such a call: a multiple function call for a minimal number of elements, for a maximum with different types of additions to shorter tuples, a Cartesian product, etc. All these methods are rather complicated for perception, and therefore any of them should be declared only explicitly - for example, using the appropriate keyword before calling a function.

META VARIABLE "_"
We return to the consideration of the behavior of the metavariable "_", which is used in some languages to ignore the elements of the source tuple when assigning tuples. Let's see if this metavariable can be extrapolated to more complex cases (after all, assignment is just a binary operation).

 {x,y,z} = {1,2,3} + {a,_,c};

By analogy, the operation of adding the number 2 with "_" will be ignored, but what will be the result? In general, there are two possibilities: either leave the number 2 in the resulting tuple, or extend the "_" there. In the first case, "_" can be considered as a " neutral element ". for any operation (that is, for any operation "@" and any argument "x", it is true x @ _ == _ @ x == x). For example, the expression x = y * ~ (_ + z) can be transformed into x = y * ~ z.
However, this is not all clear. For example, the unary operation of changing the sign "-x" can be written as a binary operation subtracting the number from zero "0-x". If instead of "x" put "_", then this expression will have a different meaning depending on the method of recording.

 z = y * (0 - _) // z = y*0,   z = 0 z = y * (- _ ) // z = y

In the second case, when "_" appears in some position of the tuple, all further calculations for this position from the node of the syntactic tree containing "_" to the root of the tree (that is, the end of the expression, as a rule - semicolons) are discarded (that is, it is true x @ _ == _ @ x == _). That is, the presence of at least one "_" in the i-th element of a tuple means that all calculations with the i-th element of all tuples in the whole expression are discarded.
I find it difficult to say which way to work with "_" is better. This is another question that requires careful thought.

ATTACHED TRAINING
Another interesting aspect that is practically not considered in the documentation for languages is nesting of tuples. The most obvious and realistically existing (even in the C language) application of nested tuples is the initialization of nested structures.

 struct Foo { int x, y, z; }; struct Bar { Foo f; int i, j; }; Bar b = { {10, 20}, 30 };

x, y, i, z j. ( , / ). , . .

bool. :

 {x, y, z} == {1, 2, 3} //  {false, true, false}

if, while .. bool! ?

( true)
- ( - true).

. , , . : , , . , — .
? , ; ( - ) .

« » :

 if( {x, y, z} == 0) { /*...*/ }

— « » ( ) - , .

, ! , ; , . — « » , ( — , , - — Evernote Wiznote ). , , — !

Source: https://habr.com/ru/post/280608/

All Articles

Tuples in programming languages. Part 2

More articles: