📜 ⬆️ ⬇️

LLST: New Life Little Smalltalk


Hello! With the end of the world and the upcoming holidays :)
As a gift to the Open Source community, as well as to fans of antiques, we (together with a friend of humbug ) decided to post our latest research project.

We bring to your attention from scratch a rewritten in C ++ implementation of a virtual machine compatible with Little Smalltalk . At the moment, the virtual machine code is written and basic primitives are implemented. Humbug wrote a series of simple tests that, nevertheless, helped to detect problems in the original version of VM. The implementation is binary compatible with the images of the original LST fifth version.

Month of work, 300+ commits. And what happened in the end, you can find out under the cut.

')

But why?


I always liked Smalltalk. With its clinical simplicity (forgive me, lispers and forters) and no less clinically wide opportunities. I believe that it is undeservedly forgotten by the programmer community, although in the 21st century you can get a lot of benefit from it. However, the existing industrial implementations are too cumbersome for the first acquaintance and do not shine with the beauty of their forms. Meet, as you know, on clothes. A newcomer who first saw such an interface is unlikely to treat it as something modern and groundbreaking.

Little Smalltalk is compact enough to sort it out in a couple of hours. At the same time, this is a full-featured Smalltalk, although it is not compatible with the standards of Smalltalk-80 or ANSI-92. From my point of view, a competent implementation of such a microsystem could be a good help in the process of teaching students to technical universities. It is particularly useful in the study of OOP, since the concepts of encapsulation, polymorphism and inheritance acquire here an absolutely clear and at the same time obvious expression. Many of my friends were confused in these concepts or did not understand their original meaning. Having such a tool on your hands, in 10 minutes you can literally show on the fingers the advantages of the PLO and the mechanisms of its operation. Moreover, unlike other languages, these principles do not look "far-fetched", since they constitute the actual core of the language.

In the end, it is rather funny to have something written, in fact, on itself and in 100 KB that fits a virtual machine, a compiler and a standard library with the full code of all its methods.

However, I started talking, and the post is not quite about that. Let's talk better about the project and its goals. So,

Goal # 1 New VM (Completed).


Rewrite Little Smalltalk code in C ++, eliminate the flaws in the original design, uncomment the code, make it readable and easily modified.

Unfortunately, the original code was written either by Hindu students, or by someone else. From my point of view, the educational project (and this is how the author was positioned by Little Smalltalk) is unacceptable to have similar sources. Switch blocks per thousand lines, sprinkled with goto and macros, reusing the same variable in five different places for different purposes ... well, fun. Plus, for the whole code, one and a half comments in Landavshits style, like: "this obviously follows ...".

Of course, it was impossible to live this way. Therefore, the code was analyzed, and, in an attempt to understand the Great Idea, the current implementation appeared. A convenient type system, templates for containers and template pointers to heap objects were developed so that you would not have to think about the collector every time you create an object. Now it is possible from C ++ to work with objects of a virtual machine as easily as with ordinary structures. All work with memory, calculation of the sizes of objects and their correct initialization now fall on the shoulders of the compiler.

As an example, I will give the code for the implementation of opcode number 12 "PushBlock".

So it was (formatting and author comments are saved):
case PushBlock: DBG0("PushBlock"); /* create a block object */ /* low is arg location */ /* next byte is goto value */ high = VAL; bytePointer += VALSIZE; rootStack[rootTop++] = context; op = rootStack[rootTop++] = gcalloc(x = integerValue(method->data[stackSizeInMethod])); op->class = ArrayClass; memoryClear(bytePtr(op), x * BytesPerWord); returnedValue = gcalloc(blockSize); returnedValue->class = BlockClass; returnedValue->data[bytePointerInContext] = returnedValue->data[stackTopInBlock] = returnedValue->data[previousContextInBlock] = NULL; returnedValue->data[bytePointerInBlock] = newInteger(bytePointer); returnedValue->data[argumentLocationInBlock] = newInteger(low); returnedValue->data[stackInBlock] = rootStack[--rootTop]; context = rootStack[--rootTop]; if(CLASS(context) == BlockClass) { returnedValue->data[creatingContextInBlock] = context->data[creatingContextInBlock]; } else { returnedValue->data[creatingContextInBlock] = context; } method = returnedValue->data[methodInBlock] = context->data[methodInBlock]; arguments = returnedValue->data[argumentsInBlock] = context->data[argumentsInBlock]; temporaries = returnedValue->data[temporariesInBlock] = context->data[temporariesInBlock]; stack = context->data[stackInContext]; bp = bytePtr(method->data[byteCodesInMethod]); stack->data[stackTop++] = returnedValue; /* zero these out just in case GC occurred */ literals = instanceVariables = 0; bytePointer = high; break; 

And so it became:
 void SmalltalkVM::doPushBlock(TVMExecutionContext& ec) { hptr<TByteObject> byteCodes = newPointer(ec.currentContext->method->byteCodes); hptr<TObjectArray> stack = newPointer(ec.currentContext->stack); // Block objects are usually inlined in the wrapping method code // pushBlock operation creates a block object initialized // with the proper bytecode, stack, arguments and the wrapping context. // Blocks are not executed directly. Instead they should be invoked // by sending them a 'value' method. Thus, all we need to do here is initialize // the block object and then skip the block body by incrementing the bytePointer // to the block's bytecode' size. After that bytePointer will point to the place // right after the block's body. There we'll probably find the actual invoking code // such as sendMessage to a receiver (with our block as a parameter) or something similar. // Reading new byte pointer that points to the code right after the inline block uint16_t newBytePointer = byteCodes[ec.bytePointer] | (byteCodes[ec.bytePointer+1] << 8); // Skipping the newBytePointer's data ec.bytePointer += 2; // Creating block object hptr<TBlock> newBlock = newObject<TBlock>(); // Allocating block's stack uint32_t stackSize = getIntegerValue(ec.currentContext->method->stackSize); newBlock->stack = newObject<TObjectArray>(stackSize, false); newBlock->argumentLocation = newInteger(ec.instruction.low); newBlock->blockBytePointer = newInteger(ec.bytePointer); // Assigning creatingContext depending on the hierarchy // Nested blocks inherit the outer creating context if (ec.currentContext->getClass() == globals.blockClass) newBlock->creatingContext = ec.currentContext.cast<TBlock>()->creatingContext; else newBlock->creatingContext = ec.currentContext; // Inheriting the context objects newBlock->method = ec.currentContext->method; newBlock->arguments = ec.currentContext->arguments; newBlock->temporaries = ec.currentContext->temporaries; // Setting the execution point to a place right after the inlined block, // leaving the block object on top of the stack: ec.bytePointer = newBytePointer; stack[ec.stackTop++] = newBlock; } 

And this situation with almost all the code. Readability, as it seems to me, has increased, although at the cost of some drop in performance. However, normal profiling has not yet been performed, so there is room for creativity. Plus, there are lst forks on the network, which are said to have greater performance.

Goal number 2. Integration with LLVM.


Some developers believe that JIT for Smalltalk is unproductive due to the high granularity of its methods. However, this usually refers to the “literal” translation of the instructions of the virtual machine into JIT code.

LLVM, on the contrary, besides JIT itself, provides ample opportunities for code optimization. Thus, the main task is to “explain” the LLVM, what can be optimized and how best to do it.

I was wondering how successfully LLVM can be applied in such a “hostile” environment (a large number of small methods, super-late binding, etc.). This is the next major task that will be solved in the near future. This is where the humbug experience with LLVM comes in handy.

Goal number 3. Use as a control system in embedded devices.


As I wrote above, this development is not fully research. One of the real places of application of our VM can be the module of management of the smart home system, which I am developing together with another habrotelovek ( droot ).

Using Smalltalk in embedded systems is not something out of the ordinary. On the contrary, history knows examples of its quite successful application. For example, the oscillo graphs of the Tektronix TDS 500 Osprey Series have a graphical interface implemented on the basis of Smalltalk (the image is clickable).

This device has an onboard MC68020 + DSP processor. The control code is written in Smalltalk, critical sections in assembly language. The image consists of approximately 250 classes and is entirely placed in the ROM. It requires less than 64 KB of DRAM.

In general, in terms of the possibilities of use, there is a presentation in which many points are described. Caution! Vyglazny design and Comic Sans MS.

Goal number 4. Try to imagine how Smalltalk can be “with a human face”.


Alan Kay , who worked in the 1980s at the Xerox PARC lab, developed the Smalltalk language. He also laid the foundations of what we now call the graphical user interface. And the first use of this interface was just in IDE Smalltalk. Actually it was created for him. Subsequently, these developments were used in the projects of Lisa and Machintosh by another smart little fellow, which many now call the “father of the GUI” and the PC to boot.

Severe VisualAge is harsh (clickable)

Classic Smalltalk has always been distinguished by the severity of appearance and square-nested arrangement of elements. The severity of the interface, competing with the library Motif, never added appeal.

Nowadays, customers are accustomed to the “wet floor” and gradients, so that only nerts in “professorial” glasses with a tortoise rim can freely use Smalltalk to solve problems. As a means of developing modern applications, it is not very good. Of course, if only the customer himself is not a fan of such systems, which is unlikely.

Dolphin


Dolphin Smalltalk is the only one out of Squeak, Pharo and other Visual Ages that was originally designed for tight integration with the OS.

Unfortunately, it is paid, only under Windows, and the community version is castrated with rusty garden shears for the most part. After doing a number of tasks from the documentation (good, by the way), there is absolutely nothing to do. Write your classes, and only. Community version does not provide normal user interface creation capabilities. As a result, we have fast native widgets, transparent WinAPI calls and zero portability. Excellent design, which does not want to set free from the abyss of financial occupation.

As part of the LLST project, I want to integrate the Qt library, as well as experiment in terms of the user interface. Subsequently, the library can be ported to industrial Smalltalk.

Where to get the source and what to do with them?


Once you have read this far (which is amazing in itself!), You probably want to get the source code. I have them! The main working repository is currently located on Bitbucket Github at: github.com/0x7CFE/llst ( llst.org is also hosted there )

Note 1: Due to its specificity, the code is built in 32-bit mode. Therefore, to build and run on x64, you need 32 bit libraries ( ia32-libs in the case of Ubuntu), as well as the g++-multilib .
 sudo apt-get install ia32-libs g++-multilib 

Note 2: Who does not want to suffer with compilation can download a ready-made statically assembled package on the release page .

UPD: It is better to read the new build rules in the Usage section on the main page of the repository (do not forget to read the LLVM section).

Collect as follows:
 ~ $ git clone https://github.com/0x7CFE/llst.git ~ $ cd llst ~/llst $ mkdir build && cd build ~/llst/build $ cmake .. ~/llst/build $ make llst 

With the correct phase of the moon and personal luck, in the build directory, you will find the llst executable file, which can be used for good.

For example:
 build$ ./llst 

If all is well, the output should be something like this:
many beeches
  Image read complete.  Loaded 4678 objects

 Running CompareTest
 equal (1) OK
 equal (2) OK
 greater (int int) OK
 greater (int symbol) ERROR

 true (class True): does not understand asSmallInt
 VM: error trap on context 0xf728d8a4

 Backtrace:
 error: (True, String)
 doesNotUnderstand: (True, Symbol)
 = (SmallInt, True)
 assertEq: withComment: (Block, True, String)
 assertWithComment: (Block, String)
 greater (CompareTest)

 less (int int) OK
 less (symbol int) OK
 nilEqNil OK
 nilIsNil OK

 Running SmallIntTest
 add OK
 div OK
 mul OK
 negated (1) OK
 negated (2) OK
 negative (1) OK
 negative (2) OK
 quo (1) OK
 quo (2) OK
 sub OK

 Running LoopTest
 loopCount OK
 sum ok
 symbolStressTest OK

 Running ClassTest
 className (1) OK
 className (2) OK
 sendSuper OK

 Running MethodLookupTest
 newline (Char) OK
 newline (string) OK
 parentMethods (1) OK
 parentMethods (2) OK

 Running StringTest
 asNumber OK
 asSymbol OK
 at (f) OK
 at (o) OK
 at (x) ok
 at (b) OK
 at (A) OK
 at (r) OK
 copy OK
 indexOf OK
 lowerCase OK
 plus (operator +. 1) OK
 plus (2) OK
 plus (3) OK
 plus (4) OK
 plus (5) OK
 plus (6) OK
 plus (7) OK
 plus (8) OK
 plus (9) OK
 reverse OK
 size (1) OK
 size (2) OK
 size (3) OK
 size (4) OK

 Running arraytest
 at (int) OK
 at (char) OK
 atPut OK

 Running gctest
 copy OK

 Running ContextTest
 backtrace (1) OK
 backtrace (2) OK
 instanceClass OK

 Running PrimitiveTest
 SmallIntAdd OK
 SmallIntDiv OK
 SmallIntEqual OK
 SmallIntLess OK
 SmallIntMod OK
 SmallIntMul OK
 SmallIntSub OK
 bulkReplace OK
 objectClass (SmallInt) OK
 objectClass (Object) OK
 objectSize (SmallInt) OK
 objectSize (Char) OK
 objectSize (Object) OK
 objectsAreEqual (1) OK
 objectsAreEqual (2) OK
 smallIntBitAnd OK
 smallIntBitOr OK
 smallIntShiftLeft OK
 smallIntShiftRight OK
 -> 

The observed error relates to the image code, and is not a problem in the VM. The same behavior is observed when running the test image on the original lst5.

Then you can play around with the image and talk to him:
 -> 2 + 3 5 -> (2+3) class SmallInt -> (2+3) class parent Number -> Object class MetaObject -> Object class class Class -> 1 to: 10 do: [ :x | (x * 2) print. $ print ] 2 4 6 8 10 12 14 16 18 20 1 


…and so on. The listMethods , viewMethod and allMethods methods are also allMethods :
 -> Collection viewMethod: #collect: collect: transformBlock | newList | newList <- List new. self do: [:element | newList addLast: (transformBlock value: element)]. ^ newList 


Any class can be asked about the parent (through parent ) and the descendants:
 -> Collection subclasses Array ByteArray MyArray OrderedArray String Dictionary MyDict Interval List Set IdentitySet Tree Collection -> 

You can complete the work by sending the combination Ctrl + D :
 -> Exited normally GC count: 717, average allocations per gc: 25963, microseconds spent in GC: 375509 9047029 messages sent, cache hits: 4553006, misses: 53201, hit ratio 98.85 % 

In general, a lot of interesting things can tell about themselves image. More can be found in its source code, which lies in the file llst / image / imageSource.st .

For easy perception, I wrote a syntax highlighting scheme for Katepart, which lies in the same repository at: llst / misc / smalltalk.xml . To make it work, you need to copy this file to the / usr / share / kde4 / apps / katepart / syntax / directory or to the analog in ~ / .kde and restart the editor. Will work in all editors using Katepart: Kate, Kwrite, Krusader, KDevelop, etc.

Conclusion


I hope I did not tire you with extensive reflections on the subject of smoltok and its place in the programmer’s arsenal. I really want to hear feedback on the project in general and the readability of its source in particular.

The following article discusses the Smalltalk language itself and outlines the basic concepts needed to successfully read the source code. Then a series of articles will follow, where I will paint in more detail the internal structure of the virtual machine and concentrate on the representation of objects in memory. Finally, the final articles will most likely be devoted to the results of working with LLVM and Qt. Thanks for attention! :)

PS: At the moment I am looking for a place for the paid application of my strength (work, that is). If you have interesting projects (especially a similar plan), please knock on the PM. I myself am in the Novosibirsk Academgorodok.

Source: https://habr.com/ru/post/164153/


All Articles