
Hello! With the end of the world and the upcoming holidays :)
As a gift to the Open Source community, as well as to fans of antiques, we (together with a friend of
humbug ) decided to post our latest research project.
We bring to your attention from scratch a rewritten in C ++ implementation of a virtual machine compatible with
Little Smalltalk . At the moment, the virtual machine code is written and basic primitives are implemented. Humbug wrote a series of simple tests that, nevertheless, helped to detect problems in the original version of VM. The implementation is binary compatible with the images of the original LST fifth version.
Month of work, 300+ commits. And what happened in the end, you can find out under the cut.
')
But why?

I always liked Smalltalk. With its clinical simplicity (forgive me, lispers and forters) and no less clinically wide opportunities. I believe that it is undeservedly forgotten by the programmer community, although in the 21st century you can get a lot of benefit from it. However, the existing industrial implementations are too cumbersome for the first acquaintance and do not shine with the beauty of their forms. Meet, as you know, on clothes. A newcomer who first saw such an interface is unlikely to treat it as something modern and groundbreaking.
Little Smalltalk is compact enough to sort it out in a couple of hours. At the same time, this is a full-featured Smalltalk, although it is not compatible with the standards of Smalltalk-80 or ANSI-92. From my point of view, a competent implementation of such a microsystem could be a good help in the process of teaching students to technical universities. It is particularly useful in the study of OOP, since the concepts of encapsulation, polymorphism and inheritance acquire here an absolutely clear and at the same time obvious expression. Many of my friends were confused in these concepts or did not understand their original meaning. Having such a tool on your hands, in 10 minutes you can literally show on the fingers the advantages of the PLO and the mechanisms of its operation. Moreover, unlike other languages, these principles do not look "far-fetched", since they constitute the actual core of the language.
In the end, it is rather funny to have something written, in fact, on itself and in 100 KB that fits a virtual machine, a compiler and a standard library with the full code of all its methods.
However, I started talking, and the post is not quite about that. Let's talk better about the project and its goals. So,
Goal # 1 New VM (Completed).
Rewrite Little Smalltalk code in C ++, eliminate the flaws in the original design, uncomment the code, make it readable and easily modified.
Unfortunately, the original code was written either by
Hindu students, or by someone else. From my point of view, the educational project (and this is how the author was positioned by Little Smalltalk) is unacceptable to have similar sources. Switch blocks per thousand lines, sprinkled with goto and macros, reusing the same variable in five different places for different purposes ... well, fun. Plus, for the whole code, one and a half comments in Landavshits style, like: "this obviously follows ...".
Of course, it was impossible to live this way. Therefore, the code was analyzed, and, in an attempt to understand the Great Idea, the current implementation appeared. A convenient type system, templates for containers and template pointers to heap objects were developed so that you would not have to think about the collector every time you create an object. Now it is possible from C ++ to work with objects of a virtual machine as easily as with ordinary structures. All work with memory, calculation of the sizes of objects and their correct initialization now fall on the shoulders of the compiler.
As an example, I will give the code for the implementation of opcode number 12 "PushBlock".
So it was (formatting and author comments are saved):case PushBlock: DBG0("PushBlock"); high = VAL; bytePointer += VALSIZE; rootStack[rootTop++] = context; op = rootStack[rootTop++] = gcalloc(x = integerValue(method->data[stackSizeInMethod])); op->class = ArrayClass; memoryClear(bytePtr(op), x * BytesPerWord); returnedValue = gcalloc(blockSize); returnedValue->class = BlockClass; returnedValue->data[bytePointerInContext] = returnedValue->data[stackTopInBlock] = returnedValue->data[previousContextInBlock] = NULL; returnedValue->data[bytePointerInBlock] = newInteger(bytePointer); returnedValue->data[argumentLocationInBlock] = newInteger(low); returnedValue->data[stackInBlock] = rootStack[--rootTop]; context = rootStack[--rootTop]; if(CLASS(context) == BlockClass) { returnedValue->data[creatingContextInBlock] = context->data[creatingContextInBlock]; } else { returnedValue->data[creatingContextInBlock] = context; } method = returnedValue->data[methodInBlock] = context->data[methodInBlock]; arguments = returnedValue->data[argumentsInBlock] = context->data[argumentsInBlock]; temporaries = returnedValue->data[temporariesInBlock] = context->data[temporariesInBlock]; stack = context->data[stackInContext]; bp = bytePtr(method->data[byteCodesInMethod]); stack->data[stackTop++] = returnedValue; literals = instanceVariables = 0; bytePointer = high; break;
And so it became: void SmalltalkVM::doPushBlock(TVMExecutionContext& ec) { hptr<TByteObject> byteCodes = newPointer(ec.currentContext->method->byteCodes); hptr<TObjectArray> stack = newPointer(ec.currentContext->stack);
And this situation with almost all the code. Readability, as it seems to me, has increased, although at the cost of some drop in performance. However, normal profiling has not yet been performed, so there is room for creativity. Plus, there are
lst forks on the network, which are said to have greater performance.
Goal number 2. Integration with LLVM.
Some developers
believe that JIT for Smalltalk is unproductive due to the high granularity of its methods. However, this usually refers to the “literal” translation of the instructions of the virtual machine into JIT code.
LLVM, on the contrary, besides JIT itself, provides ample opportunities for code optimization. Thus, the main task is to “explain” the LLVM, what can be optimized and how best to do it.
I was wondering how successfully LLVM can be applied in such a “hostile” environment (a large number of small methods, super-late binding, etc.). This is the next major task that will be solved in the near future. This is where the
humbug experience with LLVM
comes in handy.
Goal number 3. Use as a control system in embedded devices.
As I wrote above, this development is not fully research. One of the real places of application of our VM can be the module of management of the smart home system, which I am developing together with another habrotelovek (
droot ).
Using Smalltalk in embedded systems is not something out of the ordinary. On the contrary, history knows examples of its quite successful application. For example, the oscillo
graphs of the Tektronix TDS 500 Osprey Series have a graphical interface implemented on the basis of Smalltalk (the image is clickable).

This device has an onboard MC68020 + DSP processor. The control code is written in Smalltalk, critical sections in assembly language. The image consists of approximately 250 classes and is entirely placed in the ROM. It requires less than 64 KB of DRAM.
In general, in terms of the possibilities of use, there is a
presentation in which many points are described. Caution! Vyglazny design and Comic Sans MS.
Goal number 4. Try to imagine how Smalltalk can be “with a human face”.
Alan Kay , who worked in the 1980s at the Xerox PARC lab, developed the Smalltalk language. He also laid the foundations of what we now call the graphical user interface. And the first use of this interface was just in IDE Smalltalk. Actually it was created for him. Subsequently, these developments were used in the projects of Lisa and Machintosh by another smart little fellow, which many now call the “father of the GUI” and the PC to boot.
Severe VisualAge is harsh (clickable)
Classic Smalltalk has always been distinguished by the severity of appearance and square-nested arrangement of elements. The severity of the interface, competing with the library Motif, never added appeal.
Nowadays, customers are accustomed to the “wet floor” and gradients, so that only nerts in “professorial” glasses with a tortoise rim can freely use Smalltalk to solve problems. As a means of developing modern applications, it is not very good. Of course, if only the customer himself is not a fan of such systems, which is unlikely.
Dolphin
Dolphin Smalltalk is the only one out of Squeak, Pharo and other Visual Ages that was originally designed for tight integration with the OS.
Unfortunately, it is paid, only under Windows, and the community version is castrated with rusty garden shears for the most part. After doing a number of tasks from the documentation (good, by the way), there is absolutely nothing to do. Write your classes, and only. Community version does not provide normal user interface creation capabilities. As a result, we have fast native widgets, transparent WinAPI calls and zero portability. Excellent design, which does not want to set free from the abyss of financial occupation.
As part of the LLST project, I want to integrate the Qt library, as well as experiment in terms of the user interface. Subsequently, the library can be ported to industrial Smalltalk.
Where to get the source and what to do with them?
Once you have read this far (which is amazing in itself!), You probably want to get the source code. I have them! The main working repository is currently located on
Bitbucket Github at:
github.com/0x7CFE/llst (
llst.org is also hosted there )
Note 1: Due to its specificity, the code is built in 32-bit mode. Therefore, to build and run on x64, you need 32 bit libraries (
ia32-libs
in the case of Ubuntu), as well as the
g++-multilib
.
sudo apt-get install ia32-libs g++-multilib
Note 2: Who does not want to suffer with compilation can download a ready-made statically assembled package
on the release page .
UPD: It is better to read the new build rules in the
Usage section on the main page of the repository (do not forget to read the
LLVM section).
Collect as follows:
~ $ git clone https://github.com/0x7CFE/llst.git ~ $ cd llst ~/llst $ mkdir build && cd build ~/llst/build $ cmake .. ~/llst/build $ make llst
With the correct phase of the moon and personal luck, in the build directory, you will find the llst executable file, which can be used for good.
For example:
build$ ./llst
If all is well, the output should be something like this:
many beeches Image read complete. Loaded 4678 objects
Running CompareTest
equal (1) OK
equal (2) OK
greater (int int) OK
greater (int symbol) ERROR
true (class True): does not understand asSmallInt
VM: error trap on context 0xf728d8a4
Backtrace:
error: (True, String)
doesNotUnderstand: (True, Symbol)
= (SmallInt, True)
assertEq: withComment: (Block, True, String)
assertWithComment: (Block, String)
greater (CompareTest)
less (int int) OK
less (symbol int) OK
nilEqNil OK
nilIsNil OK
Running SmallIntTest
add OK
div OK
mul OK
negated (1) OK
negated (2) OK
negative (1) OK
negative (2) OK
quo (1) OK
quo (2) OK
sub OK
Running LoopTest
loopCount OK
sum ok
symbolStressTest OK
Running ClassTest
className (1) OK
className (2) OK
sendSuper OK
Running MethodLookupTest
newline (Char) OK
newline (string) OK
parentMethods (1) OK
parentMethods (2) OK
Running StringTest
asNumber OK
asSymbol OK
at (f) OK
at (o) OK
at (x) ok
at (b) OK
at (A) OK
at (r) OK
copy OK
indexOf OK
lowerCase OK
plus (operator +. 1) OK
plus (2) OK
plus (3) OK
plus (4) OK
plus (5) OK
plus (6) OK
plus (7) OK
plus (8) OK
plus (9) OK
reverse OK
size (1) OK
size (2) OK
size (3) OK
size (4) OK
Running arraytest
at (int) OK
at (char) OK
atPut OK
Running gctest
copy OK
Running ContextTest
backtrace (1) OK
backtrace (2) OK
instanceClass OK
Running PrimitiveTest
SmallIntAdd OK
SmallIntDiv OK
SmallIntEqual OK
SmallIntLess OK
SmallIntMod OK
SmallIntMul OK
SmallIntSub OK
bulkReplace OK
objectClass (SmallInt) OK
objectClass (Object) OK
objectSize (SmallInt) OK
objectSize (Char) OK
objectSize (Object) OK
objectsAreEqual (1) OK
objectsAreEqual (2) OK
smallIntBitAnd OK
smallIntBitOr OK
smallIntShiftLeft OK
smallIntShiftRight OK
->
The observed error relates to the image code, and is not a problem in the VM. The same behavior is observed when running the test image on the original lst5.
Then you can play around with the image and talk to him:
-> 2 + 3 5 -> (2+3) class SmallInt -> (2+3) class parent Number -> Object class MetaObject -> Object class class Class -> 1 to: 10 do: [ :x | (x * 2) print. $ print ] 2 4 6 8 10 12 14 16 18 20 1
…and so on. The
listMethods
,
viewMethod
and
allMethods
methods are also
allMethods
:
-> Collection viewMethod: #collect: collect: transformBlock | newList | newList <- List new. self do: [:element | newList addLast: (transformBlock value: element)]. ^ newList
Any class can be asked about the parent (through
parent
) and the descendants:
-> Collection subclasses Array ByteArray MyArray OrderedArray String Dictionary MyDict Interval List Set IdentitySet Tree Collection ->
You can complete the work by sending the combination
Ctrl + D :
-> Exited normally GC count: 717, average allocations per gc: 25963, microseconds spent in GC: 375509 9047029 messages sent, cache hits: 4553006, misses: 53201, hit ratio 98.85 %
In general, a lot of interesting things can tell about themselves image. More can be found in its source code, which lies in the file
llst / image / imageSource.st .
For easy perception, I wrote a syntax highlighting scheme for Katepart, which lies in the same repository at:
llst / misc / smalltalk.xml . To make it work, you need to copy this file to the
/ usr / share / kde4 / apps / katepart / syntax / directory or to the analog in
~ / .kde and restart the editor. Will work in all editors using Katepart: Kate, Kwrite, Krusader, KDevelop, etc.
Conclusion
I hope I did not tire you with extensive reflections on the subject of smoltok and its place in the programmer’s arsenal. I really want to hear feedback on the project in general and the readability of its source in particular.
The
following article discusses the Smalltalk language itself and outlines the basic concepts needed to successfully read the source code. Then a series of articles will follow, where I will paint in more detail the internal structure of the virtual machine and concentrate on the representation of objects in memory. Finally, the final articles will most likely be devoted to the results of working with LLVM and Qt. Thanks for attention! :)
PS: At the moment I am looking for a place for the paid application of my strength (work, that is). If you have interesting projects (especially a similar plan), please knock on the PM. I myself am in the Novosibirsk Academgorodok.