Internal device llst, part 1. Introduction to Smalltalk

Good day. I bring to your attention the second article from the cycle on Low Level Smalltalk (LLST). Who does not know what is at stake, I recommend reading the previous review article , which tells about what llst is and why it was created.

In this part, we will focus on the Smalltalk language itself, its syntax and "rules of the game."

In the subsequent parts, we smoothly move on to the features of the virtual machine implementation and the internal representation of objects in memory. We will discuss the organization of the memory manager and the garbage collector. Let's talk about the bytecodes of the virtual machine. We learn how the text of the Smalltalk method turns into a sequence of commands. Finally, we will follow the path from loading the image into the memory of the machine to the processes that occur when sending messages between objects, as well as learn how closures are implemented in blocks.
')

Introduction

Programming languages are different. Some have a rather low entry threshold. Others scare off the potential adept at further approaches, terrifying him by the bizarre and alien syntax, excessive verbosity of the narrative, or complexity of the concepts. There are languages that require the programmer to literally turn the brain inside out in order to learn how to think as it is necessary for successful programming in such a language. Some languages only look simple, but really demanding the candidate to have a strong baggage of knowledge from the field of mathematics, lambda calculus and category theory ...

Smalltalk is a simple language. Simple language is not only in terms of syntax, but also in terms of understanding it by an unprepared person. Not surprisingly, the author initially positioned Smalltalk as a language for teaching programming to children. Not grown together, unfortunately. Children prefer PHP and BASIC (or is it fashionable now?). Well, let's not talk about it.

As you know, any theory needs to be confirmed by practice, therefore, in support of the above-mentioned thesis, we will now go over the key concepts of the language and after the introduction is complete we can confidently read the source code of the program.

Object World

Forget about Java, forget about C ++. ~~Forget everything you were taught.~~

The phrase “in language X everything is an object” is so worn out that I did not want to use it here. Nevertheless, it is difficult to describe the full depth of this thought in relation to Smalltalk, without resorting to similar stamps.

Indeed, in Smalltalk, all are objects. Strings, numbers (however there is one useful exception), arrays are understandable. But each object has its own class. Which (surprise!) Is also an object. And yes, he also has his own class, which is also an object, etc. Class methods are also objects. Bytecode methods - well, you understand. Even pieces of code presented in the so-called language. blocks, too, are objects, with each of which can be a friendly chat, and he will tell you everything he knows.

In the description of the basic image of LittleSmalltalk there is such a wonderful psychedelic place :

name subclassOf instanceOf Object MetaObject nil Class MetaClass Object MetaObject Class Class MetaClass Class MetaObject

It tells us that:

class Object is a subclass of MetaObject and instance nil (from non-existence were objects)
class Class is a subclass of MetaClass and an instance of Object (all classes are also objects)
MetaObject class is a subclass of Class and its instance (uh ...)
class MetaClass is a subclass of Class instance of class MetaObject (metaclasses are also classes)
Note to connoisseurs: There is no Behavior class in Little Smalltalk.

Brain explosion? Yeah. But this is the only place in the entire class hierarchy that looks contradictory. It is contradictory. But at the price of this small insanity great opportunities are achieved.

I brought this place not to frighten the reader, but to demonstrate that classes and objects in Smalltalk, like the Taoist symbol Yin and Yang, are interpenetrating entities.

Returning to the question of simplicity, I note that this inconsistency does not in the least interfere with programming, and in 99% of cases the programmer does not think about it at all (or does not know). In the remaining percentage of cases, it allows you to do things that greatly simplify programming and reading the texts of programs in this language.

Form

Like real living creatures, objects are born and die. They live in the image - the memory area of the computer in which the objects of the virtual machine are stored. This image can be saved to disk and loaded later. Having loaded from a disk, we will receive exactly the same representation that was at the time of recording. This applies to all objects without exception, from numbers and strings to user interface elements. To take any special measures to preserve the state of the system is not required - it provides an image. In this sense, the user interfaces of programs written in the Smalltalk language, I think, would have liked Jeff Raskin in terms of his persistence.

For example: the user closed the text editor, and then returned to him after some time. Having downloaded from the image, he will find that he has at his disposal a system in exactly the state in which he left it. The position of the cursor, the selected areas of text, the contents of the clipboard will be restored to its original form. This is fundamentally different from the startup model of ordinary programs. This concept is continued in modern desktop environments; however, from my point of view, this is a pale similarity to what could be.

Message concept

Programming in the Smalltalk language is entirely reduced to communicating with objects in the image. There is no traditional editing of the source sheet. Rather, it is successfully replaced by work in the built-in IDE. But the basis is still the interaction of some objects with others.

For example, to create a new class, we send a message to his ancestor, asking him to create an heir. To create a new method, we ask the class to create a method, and then fill it with meaning. In fact, we spawn a method object, and then add it to the class's list of methods. There are no behind-the-scenes intrigues, operations in the native code and other arrangements here. Everything is within the scope of the protocol.

So, sending a message is the only operation of the language that allows objects to interact with each other (and even with themselves). Moreover, it is actually the only complex operation that a virtual machine can do. All other instructions are used to provide the main task.

However, the easiest way to see this is with examples (how to run it can be found in the first article ):

 ->2 + 3 5

Here we took object 2 and sent him a message + with parameter 3 . The result of the message is the amount object that was returned to the outside and displayed by the shell. This is an example of a binary message in which two objects are involved.

Here is an example of a unary message . Let's ask object 2 which class corresponds to it. This is done by sending a class message:

 ->2 class SmallInt

Fine. Two was an object of class SmallInt .

And what else can instances of class SmallInt ? Let's ask:

 ->SmallInt listMethods * + - / < = asInteger asSmallInt bitAnd: bitOr: bitShift: hash quo: rem: truncSmallInt SmallInt

Yeah. Operator + known to us and a handful of arithmetic operations. To get this information, we sent a listMethods message to the listMethods class. This was possible because the SmallInt class SmallInt also an object to which messages can also be sent. And all thanks to the above-described "psychedelic" tricks with inheritance. It is important to note that sending messages to classes and objects is implemented in the same way, that is, it is the same mechanism (without crutches). Classes and objects really coexist alongside and in no way interfere with each other.

What kind of objects are there at all?

There are: ordinary objects (which are instances of a certain class), classes themselves, metaclasses. Metaclasses are such objects that ordinary classes are their instances. Simple, but at first you can not pay any attention to it.

And there is true , false and nil . The first two are the only instances of the classes True and False respectively. That is, there is only one true object in the whole image. All places where it is supposed to return or store a Boolean value are used, either explicitly or implicitly.

Now let's talk about nil . As you might have guessed, this object is an empty, not initialized (or erroneous) value. But, unlike the C ++ null pointer and null from the Java world, nil is a complete object.

Let's check:

 -> 1 isNil false -> nil isNil true -> nil class Undefined

As we see, sending messages to this object is no different from others, which, I think, is very convenient.

Characters

Another important type of object is symbols . A character in Smalltalk is an object that is similar in its properties to a string, but, like nil , true and false , is present in a single-copy image.

This is how ordinary strings behave:

 ->'hello' = 'hello' true ->'hello' == 'hello' false -> 'hello' + 'World' = 'helloWorld' true -> 'hello' + 'World' == 'helloWorld' false

Here, the operator = used to formally compare the values of two strings, whereas the operator == checks objects for identity . The == operator returns true only if the object and the passed parameter are the same object. In the case described above, this is not the case, since two instances of the String class, which are created one after the other but are not the same object, are checked for identity.

But what happens in the case of characters:

 -> #helloWorld = #helloWorld true -> #helloWorld == #helloWorld true -> ('hello' + 'World') asSymbol == #helloWorld true

Smalltalk controls the creation of characters and ensures that they do not lose their uniqueness. Symbols are typically used in the role of various identifiers, keys in collections, as well as method selectors (see below).

Cascading messages

So far, we have operated with a single object, sending messages to it and observing the result. But the result is also an object. So you can send a message to him.

We try:

 -> Array parent Collection -> Object parent nil -> Array parent isNil false -> Object parent isNil true

In this example, we first derive the ancestors of the classes Array and Object, and then send the isNil message to the result to check for the presence of a value. The Object class is the top of the hierarchy of regular classes, so it returns nil in response to the parent message. As we see, to combine several messages, it is enough to write them through the space. And such queues can be of any length:

 -> 12 12 -> 12 class SmallInt -> 12 class methods Dictionary (* -> Method, + -> Method, - -> Method, / -> Method, < -> Method, = -> Method, asInteger -> Method, asSmallInt -> Method, bitAnd: -> Method, bitOr: -> Method, bitShift: -> Method, hash -> Method, quo: -> Method, rem: -> Method, truncSmallInt -> Method) -> 12 class methods keys OrderedArray (* + - / < = asInteger asSmallInt bitAnd: bitOr: bitShift: hash quo: rem: truncSmallInt) -> 12 class methods keys size 15

Cascading is another form of message aggregation. In this case, a series of messages is sent to the same object without the need to specify the name of the addressee object each time. In this case, the messages are written one after another through a semicolon. At the end of the whole sentence put a point.

Note: Message cascading in Little Smalltalk now works differently from standard implementations. Why this happens remains to be seen.

Key messages

At the moment we already know two types of messages: unary and binary. There are also key messages that can take one or more parameters. Let's take from the last example the dictionary of methods of the class SmallInt and ask which key lies under the index 7:

 -> SmallInt methods keys at: 7 asInteger

Here we send the #at: message to the keys object with parameter 7. The indices in Smalltalk are counted from 1, so the first element has the index 1 and the last element is the size of the container.

Here is another example of a key message:

 -> (Array new: 5) at: 1 put: 42 Array (42 nil nil nil nil)

First, we created an array by sending the #new: message to the Array object with a parameter of 5, meaning the number of elements. Then we placed 42 in the newly created array at index 1. The resulting array was displayed on the screen. Notice that the remaining 4 cells are filled with nil values.

A remarkable feature of key messages is that the string at: 1 put: 42 is one parameterized message #at:put: and not two, as you might think. In the style of C-like languages, this could be written like keys->atPut(1, 42) , but in such a record the correspondence of the passed parameters and their purpose is lost.

Suppose we have a certain class Rectangle , representing a rectangle on some plane. In C ++ code, we encountered the following lines:

 Rectangle* rect1 = new Rectangle(200, 100); Rectangle* rect2 = new Rectangle(200, 100, 115, 120, 45);

How to understand which numbers correspond to what? Let's say, in the first case, our experience will tell us that most likely we are talking about the sizes of a rectangle, and that the first parameter corresponds to the size of X, and the second one of Y. But in order to find out exactly, we need to look at the prototype of the Rectangle class constructor. The second option looks even less readable. Of course, a good programmer would add comments to the code, and change the function's prototype so that it accepts “talking” types, like Point , but this is not the case now.

Let's see how a similar construct might look like in Smalltalk:

 rect1 <- Rectangle width: 200 height: 100. rect2 <- Rectangle new width: 200; height: 100; positionX: 115; positionY: 120; rotationDegrees: 45.

In the first case, we sent the #width:height: message to the Rectangle class, which created an instance of itself and set the value of the corresponding fields from its parameters. In the second case, we created the instance in the usual way, sending the message #new , and then cascaded the messages to set the values one by one. Notice how visual the code becomes. We do not even need to add comments so that the reader will understand what is happening.

In principle, this code could also be written “in the forehead,” but it looks less beautiful:

 " " rect1 <- Rectangle width: 200 height: 100. rect2 <- Rectangle new. " " rect2 width: 200. rect2 height: 100. rect2 positionX: 115. rect2 positionY: 120. rect2 rotationDegrees: 45.

The ability to intersperse parts of a message selector with passed parameters, it seems to me, is one of the strengths of Smalltalk. With proper use of the names of variables and selectors, this allows you to write very clear methods that practically do not require commenting.

Take a look at the following sample code. This is the Dictionary class code that responds to the #keysAsArray unary message #keysAsArray:

 keysAsArray | index result | result <- Array new: keys size. 1 to: keys size do: [ :index | result at: index put: (keys at: index) ]. ^ result

In the body of this method, we first create an array of return values, and then fill it with the contents of the keys field. Here the message is transmitted to the unit #to:do: with two parameters. The first is the keys size , and the second is a piece of code that needs to be executed (the expression in square brackets). In Smalltalk, such pieces of code are called blocks . Of course, they are objects and can be stored in a variable. Here, the variable for the block is not created, but it is transmitted immediately at the place of use. In order to execute a block, it needs to send a #value message, or #value: if it takes a parameter. This is what the SmallInt class will do in the implementation of its #to:do: method.

In our case, the block will be called size once, each time it will be given an iteration number, which will be interpreted as an index for selecting values from keys and adding them to result .

Syntax

At the moment, we suddenly realize that we already know 90% of the entire syntax of Smalltalk. It remains only to comment on certain points and explain the purpose of certain parts. In order not to be completely boring, let's do it with a real code example. I brazenly borrowed it from the source image of the primary image. First, I will give the entire text, and then go through the parts and comment on the purpose of the individual lines.

 METHOD Collection sort ^ self sort: [ :x :y | x < y ] ! METHOD Collection sort: criteria | left right mediane | (self isEmpty) ifTrue: [^self]. mediane <- self popFirst. left <- List new. right <- List new. self do: [ :x | (criteria value: x value: mediane) ifTrue: [ left add: x ] ifFalse: [ right add: x ] ]. left <- left sort: criteria. right <- right sort: criteria. right add: mediane. ^ left appendList: right !

The Collection class represents some abstract collection of elements. Collection does not know how to store data, it only provides general algorithms for operating with them. One of these algorithms is sorting.

So, by the way:

 METHOD Collection sort ^ self sort: [ :x :y | x < y ] !

Here we declare a default sorting method that takes no parameters and calls its partner - the #sort method: which takes as its parameter a block that compares two elements of the collection based on some criteria. We have provided the default criterion: the ratio is more or less. Although, for complex elements of the collection, no one forbids calling additional messages, like x someField < y someField .

Record [ :x :y | describes the formal parameters of a block, the symbol ^ is the equivalent of return from the world of C. The self keyword is used to send a message to itself, super - to send to its ancestor.

Go ahead:

 sort: criteria | left right mediane | (self isEmpty) ifTrue: [^self]. mediane <- self popFirst.

Here the #sort method is declared: with one formal parameter criteria . Next are local variables, separated from the rest of the text by vertical bars. The style is allowed to write them in the same line, although you can transfer to the following:

 sort: criteria | left right mediane | (self isEmpty) ifTrue: [^self]. mediane <- self popFirst.

Then we check the recursion base. In the case of an empty collection, the result of the sort will also be an empty collection.

In the conventional sense, there is no syntax in Smalltalk. There are separate messages and keywords that have a special meaning, but there are no hard-coded rules. Therefore, there are no conditional statements. Instead, the fact that they are able to make objects so well is used with success - to exchange messages.

The construction (self isEmpty) ifTrue: [^self] essentially no different from any other similar one. Parentheses are not required here and are inserted exclusively for decorative purposes. First, we send the #isEmpty message to #isEmpty , and then the result of this action (one of the instances of the Boolean class) is to send the #ifTrue: message with the block parameter, which must be executed in case of truth.

In the last line, we associate the local variable mediane with an object that returns the current object in response to the #popFirst message. I intentionally used the verb “bind” instead of “assign” in order to emphasize that no copying takes place here. All variables store only object references, not values. Collections also store links, so we don’t have to worry about the problems of copying large amounts of data. To explicitly create a copy of an object, separate messages are provided for full or shallow (non-recursive) copying.

The next part of the code, sorting itself:

  left <- List new. right <- List new. self do: [ :x | (criteria value: x value: mediane) ifTrue: [ left add: x ] ifFalse: [ right add: x ] ].

We create a pair of lists for storing items that satisfy and do not satisfy the sorting criteria. Conventionally, we call them the "left" and "right" halves. Then we go through the contents of the current collection (method #do: , for each element ( x ) we call the comparison block with the median and decompose the elements into lists, based on the result of the comparison.

Notice that the blocks, being actually separate objects, calmly refer to the above declared variables. So, the block inside #do: uses the variable mediane , whereas the block with #ifTrue: refers to both the variable x and left , declared even higher in the hierarchy. This is made possible by the fact that the blocks in Smalltalk are closures and are tied to the lexical context of their use.

Finally, the rest of the method:

  left <- left sort: criteria. right <- right sort: criteria. right add: mediane. ^ left appendList: right

We recursively sort the resulting parts, and then combine the sorted left, median and sorted right parts into one list, which we return as the result of the sort.

Let us now see how sorting can be used:

 " " -> #(13 0 -6 221 64 7 -273 42 1024) sort Array (-273 -6 0 7 13 42 64 221 1024) "  " -> #(13 0 -6 221 64 7 -273 42 1024) sort: [ :x :y | x > y ] Array (1024 221 64 42 13 7 0 -6 -273) " " -> #(13 0 -6 221 64 7 -273 42 1024) sort: [ :x :y | x asString < y asString ] Array (-273 -6 0 1024 13 221 42 64 7) "    " -> #(13 0 -6 221 64 7 -273 42 1024) sort: [ :x :y | x asString size < y asString size ] Array (7 0 13 -6 42 64 221 1024 -273) "   " ->' ,     !' words List ( ,     !) "        " ->' ,     !' words sort: [ :x :y | x size < y size ] List (   !  , )

Thus, a single generalized algorithm can be successfully used to process any type of data. It is enough to set the correct criterion.

Lists

List, Collection , . , ( Link) .

, :

 METHOD List add: anElement elements <- Link value: anElement next: elements. ^ anElement !

elements, , .

, :

 METHOD List addLast: anElement elements isNil ifTrue: [ self add: anElement ] ifFalse: [ elements addLast: anElement ]. ^ anElement ! METHOD Link addLast: anElement next notNil ifTrue: [ ^ next addLast: anElement ] ifFalse: [ next <- Link value: anElement ] !

Smalltalk Haskell ( Haskell). : #add:, ++ #addLast: .

, #appendList: . :

 METHOD List appendList: aList | element | (elements isNil) ifTrue: [ elements <- aList firstLink. ^self ]. element <- elements. [element next isNil] whileFalse: [element <- element next]. element next: aList firstLink. ^self !

, .

Conclusion

Smalltalk. , , . , . Thanks for attention!

:)

PS: sheknitrtch , llst MSVC10, . - downloads — .

Source: https://habr.com/ru/post/164769/

All Articles