hptr<>
. Previously, they used external std::list<>
lists, now the list is maintained in a stack space. This alone sped up the software VM by half. loopBenchmark | sum | sum <- 0. 1 to: 100000 do: [ :x | sum <- sum + 1 ]. ^sum
sum
. Of course, at the output we must get the same one hundred thousand (or our VM can be sent to the garbage). $ ./llst Image read complete. Loaded 5442 objects Soft run: 60 Cold jit run: 46 Hot jit run: 28 JIT Runtime stat: Messages dispatched: 200006 Objects allocated: 200004 Blocks invoked: 200002 Block cache hits: 199999 misses 3 ratio 100.00 % Message cache hits: 400004 misses 6 ratio 100.00 % Hot methods: Hit count Method name 200000 Block>>value: (0 sites) 2 Number>>to:do: (1 sites) value: (index 20, offset 109) class hits: (Block 200000) 2 Undefined>>loopBenchmark (1 sites) to:do: (index 15, offset 73) class hits: (SmallInt 2) 2 Block>>value (0 sites) Patching active methods that have hot call sites Recompiling method for patching: Number>>to:do: Patching Number>>to:do: ...done. Verifying ...done. Recompiling method for patching: Undefined>>loopBenchmark Patching Undefined>>loopBenchmark ...done. Verifying ...done. Optimizing Number>>to:do: ...done. Verifying ...done. Optimizing Undefined>>loopBenchmark ...done. Verifying ...done. Compiling machine code for Number>>to:do: ...done. Compiling machine code for Undefined>>loopBenchmark ...done. All is done. Patched cold jit run: 12 Patched hot jit run: 9 JIT Runtime stat: Messages dispatched: 200010 Objects allocated: 400008 Blocks invoked: 400004 Block cache hits: 399998 misses 6 ratio 100.00 % Message cache hits: 400006 misses 10 ratio 100.00 % Hot methods: Hit count Method name 200000 Block>>value: (0 sites) 4 Block>>value (0 sites) 2 Undefined>>loopBenchmark (0 sites) 2 Number>>to:do: (1 sites) value: (index 20, offset 109) class hits: (Block 200000) 2 Undefined>>loopBenchmark (1 sites) to:do: (index 15, offset 73) class hits: (SmallInt 2) ===-------------------------------------------------------------------------=== ... Statistics Collected ... ===-------------------------------------------------------------------------=== 2 adce - Number of instructions removed 2 branchfolding - Number of block tails merged 2 branchfolding - Number of dead blocks removed 1 cgscc-passmgr - Maximum CGSCCPassMgr iterations on one SCC 31 codegen-dce - Number of dead instructions deleted 63 codegenprepare - Number of GEPs converted to casts 9 codegenprepare - Number of blocks eliminated 114 codegenprepare - Number of memory instructions whose address computations were sunk 47 codegenprepare - Number of uses of Cast expressions replaced with uses of sunken Casts 313 dagcombine - Number of dag nodes combined 0 dse - Number of other instrs removed 12 dse - Number of stores deleted 54 gvn - Number of blocks merged 2 gvn - Number of instructions PRE'd 125 gvn - Number of instructions deleted 2 gvn - Number of loads PRE'd 37 gvn - Number of loads deleted 265 inline - Number of functions inlined 271 inline-cost - Number of call sites analyzed 263 instcombine - Number of dead inst eliminated 1 instcombine - Number of dead stores eliminated 67 instcombine - Number of instructions sunk 492 instcombine - Number of insts combined 159 isel - Number of blocks selected using DAG 7667 isel - Number of times dag isel has to try another path 101 jit - Number of bytes of global vars initialized 5310 jit - Number of bytes of machine code compiled 12 jit - Number of global vars initialized 239 jit - Number of relocations applied 3 jit - Number of slabs of memory allocated by the JIT 1 loop-simplify - Number of pre-header or exit blocks inserted 3 machine-licm - Number of hoisted machine instructions CSEed 11 machine-licm - Number of machine instructions hoisted out of loops 73 machine-sink - Number of machine instructions sunk 6 memdep - Number of block queries that were completely cached 11 memdep - Number of fully cached non-local ptr responses 43 memdep - Number of uncached non-local ptr responses 784 pei - Number of bytes used for stack in all functions 1 phi-opt - Number of dead PHI cycles 15 phielim - Number of atomic phis lowered 31 pre-RA-sched - Number of loads clustered together 48 reassociate - Number of insts reassociated 29 regalloc - Number of cross class joins performed 251 regalloc - Number of identity moves eliminated after coalescing 92 regalloc - Number of identity moves eliminated after rewriting 7 regalloc - Number of instructions deleted by DCE 4 regalloc - Number of instructions re-materialized 1 regalloc - Number of interferences evicted 251 regalloc - Number of interval joins performed 11 regalloc - Number of new live ranges queued 683 regalloc - Number of original intervals 369 regalloc - Number of registers assigned 1 regalloc - Number of registers unassigned 3 regalloc - Number of rematerialized defs for spilling 4 regalloc - Number of rematerialized defs for splitting 3 regalloc - Number of spilled live ranges 2 regalloc - Number of splits finished 4 simplifycfg - Number of blocks simplified 2 twoaddrinstr - Number of instructions aggressively commuted 2 twoaddrinstr - Number of instructions commuted to coalesce 4 twoaddrinstr - Number of instructions promoted to 3-address 30 twoaddrinstr - Number of two-address instructions 14 x86-codegen - Number of floating point instructions 1414 x86-emitter - Number of machine instructions emitted ->
Soft run: 60 Cold jit run: 46 Hot jit run: 28 Patched cold jit run: 12 Patched hot jit run: 9
Number>>to:do:
method, three operations are performed, equivalent to the following message parcels:<
to the counter object inside #to:do:
(loop condition);#value:
to a block object (block call);+
sum
object (loop body).whileTrue:
construction whileTrue:
loopBenchmark | sum | sum <- 0. [ sum < 1000000 ] whileTrue: [ sum <- sum + 1 ]. ^sum
Soft run: 197 Cold jit run: 13 Hot jit run: 4
whileTrue:
expands to a linear instruction set with transitions; all operations occur on numbers for which there are separate branches of execution that involve arithmetic operations directly. Soft run: 48 Cold jit run: 140 Hot jit run: 25 Patched cold jit run: 7 Patched hot jit run: 6
Preparing test data ...done Soft run: 48 Cold jit run: 140 Hot jit run: 25 JIT Runtime stat: Messages dispatched: 210613 Objects allocated: 17746 Blocks invoked: 43006 Block cache hits: 43001 misses 5 ratio 99.99 % Message cache hits: 369520 misses 51704 ratio 87.73 % Hot methods: Hit count Method name 44061 Link>>next (0 sites) 35102 MetaObject>>in:at:put: (0 sites) 27775 Link>>value (0 sites) 25778 Block>>value:value: (0 sites) 17746 Class>>new (0 sites) 17356 MetaLink>>value:next: (3 sites) new (index 3, offset 7) class hits: (MetaLink 17356) in:at:put: (index 11, offset 31) class hits: (MetaLink 17356) in:at:put: (index 18, offset 72) class hits: (MetaLink 17356) 17226 Block>>value: (0 sites) 15619 List>>add: (1 sites) value:next: (index 5, offset 13) class hits: (MetaLink 15619) 1999 List>>isEmpty (1 sites) = (index 4, offset 9) class hits: (SmallInt 1999) 1999 SmallInt>>= (0 sites) 1867 List>>insert:onCondition: (10 sites) isEmpty (index 3, offset 7) class hits: (List 1867) add: (index 10, offset 27) class hits: (List 130) value (index 33, offset 166) class hits: (Link 10419) value:value: (index 40, offset 210) class hits: (Block 10419) next (index 48, offset 268) class hits: (Link 1481) value:next: (index 50, offset 286) class hits: (MetaLink 1481) value:next: (index 57, offset 21) class hits: (Link 1481) next (index 68, offset 8) class hits: (Link 8938) value:next: (index 81, offset 24) class hits: (MetaLink 256) next: (index 83, offset 9) class hits: (Link 256) 1481 Link>>value:next: (0 sites) 392 List>>size (0 sites) 390 MetaList>>new (2 sites) new (index 4, offset 9) class hits: (MetaCollection 390) in:at:put: (index 12, offset 34) class hits: (MetaList 390) 384 Link>>next: (0 sites) 262 Collection>>sort: (13 sites) size (index 3, offset 7) class hits: (List 262) insertSort: (index 12, offset 34) class hits: (List 132) popFirst (index 21, offset 88) class hits: (List 130) new (index 26, offset 126) class hits: (MetaList 130) new (index 31, offset 158) class hits: (MetaList 130) value:value: (index 42, offset 219) class hits: (Block 15359) add: (index 49, offset 279) class hits: (List 8207) add: (index 56, offset 12) class hits: (List 7152) do: (index 59, offset 31) class hits: (List 130) sort: (index 64, offset 64) class hits: (List 130) sort: (index 70, offset 19) class hits: (List 130) add: (index 76, offset 4) class hits: (List 130) appendList: (index 81, offset 24) class hits: (List 130) 260 Link>>do: (2 sites) value (index 18, offset 72) class hits: (Link 260) value: (index 20, offset 82) class hits: (Block 260) 260 List>>do: (1 sites) do: (index 9, offset 25) class hits: (Link 260) 132 Collection>>insertSort: (4 sites) isEmpty (index 3, offset 7) class hits: (List 132) new (index 16, offset 55) class hits: (MetaList 130) insert:onCondition: (index 27, offset 130) class hits: (List 1867) do: (index 30, offset 143) class hits: (List 130) 130 List>>popFirst (3 sites) value (index 14, offset 43) class hits: (Link 130) next (index 19, offset 76) class hits: (Link 130) - (index 25, offset 111) class hits: (SmallInt 130) 130 SmallInt>>- (0 sites) 130 List>>appendList: (7 sites) firstLink (index 8, offset 21) class hits: (List 2) size (index 13, offset 40) class hits: (List 2) next (index 36, offset 181) class hits: (Link 8207) next (index 43, offset 234) class hits: (Link 8079) firstLink (index 54, offset 3) class hits: (List 128) next: (index 56, offset 12) class hits: (Link 128) size (index 61, offset 49) class hits: (List 128) 130 List>>firstLink (0 sites) 2 Collection>>sort (1 sites) sort: (index 10, offset 27) class hits: (List 2) 2 Block>>value (0 sites) ===-------------------------------------------------------------------------=== ... Statistics Collected ... ===-------------------------------------------------------------------------=== 2 adce - Number of instructions removed 14 branchfolding - Number of block tails merged 6 branchfolding - Number of branches optimized 5 branchfolding - Number of dead blocks removed 1 cgscc-passmgr - Maximum CGSCCPassMgr iterations on one SCC 38 codegen-dce - Number of dead instructions deleted 220 codegenprepare - Number of GEPs converted to casts 2 codegenprepare - Number of blocks eliminated 151 codegenprepare - Number of memory instructions whose address computations were sunk 123 codegenprepare - Number of uses of Cast expressions replaced with uses of sunken Casts 854 dagcombine - Number of dag nodes combined 250 dce - Number of insts removed 194 dse - Number of other instrs removed 158 dse - Number of stores deleted 51 gvn - Number of blocks merged 353 gvn - Number of instructions deleted 6 gvn - Number of loads PRE'd 277 gvn - Number of loads deleted 862 inline - Number of functions inlined 862 inline-cost - Number of call sites analyzed 1085 instcombine - Number of dead inst eliminated 69 instcombine - Number of instructions sunk 2540 instcombine - Number of insts combined 194 isel - Number of blocks selected using DAG 18193 isel - Number of times dag isel has to try another path 461 jit - Number of bytes of global vars initialized 12042 jit - Number of bytes of machine code compiled 25 jit - Number of global vars initialized 375 jit - Number of relocations applied 2 jit - Number of slabs of memory allocated by the JIT 15 llst - Number of removed loads from gc.root protected pointers <<<<<< 222 llst - Number of removed roots <<<<<< 4 machine-cse - Number of common subexpression eliminated 1 machine-licm - Number of hoisted machine instructions CSEed 14 machine-licm - Number of machine instructions hoisted out of loops 71 machine-sink - Number of machine instructions sunk 10 memdep - Number of block queries that were completely cached 81 memdep - Number of fully cached non-local ptr responses 84 memdep - Number of uncached non-local ptr responses 2792 pei - Number of bytes used for stack in all functions 9 phielim - Number of atomic phis lowered 2 phielim - Number of critical edges split 36 pre-RA-sched - Number of loads clustered together 23 reassociate - Number of insts reassociated 21 regalloc - Number of cross class joins performed 250 regalloc - Number of identity moves eliminated after coalescing 124 regalloc - Number of identity moves eliminated after rewriting 6 regalloc - Number of instructions deleted by DCE 1 regalloc - Number of interferences evicted 248 regalloc - Number of interval joins performed 21 regalloc - Number of new live ranges queued 1240 regalloc - Number of original intervals 891 regalloc - Number of registers assigned 1 regalloc - Number of registers unassigned 6 regalloc - Number of rematerialized defs for spilling 4 regalloc - Number of rematerialized defs for splitting 6 regalloc - Number of spilled live ranges 4 regalloc - Number of splits finished 13 simplifycfg - Number of blocks simplified 3 twoaddrinstr - Number of instructions re-materialized 43 twoaddrinstr - Number of two-address instructions 40 x86-codegen - Number of floating point instructions 2697 x86-emitter - Number of machine instructions emitted Patching active methods that have hot call sites Recompiling method for patching: MetaLink>>value:next: Patching MetaLink>>value:next: ...done. Verifying ...done. Recompiling method for patching: List>>add: Patching List>>add: ...done. Verifying ...done. Recompiling method for patching: List>>isEmpty Patching List>>isEmpty ...done. Verifying ...done. Recompiling method for patching: List>>insert:onCondition: Patching List>>insert:onCondition: ...done. Verifying ...done. Recompiling method for patching: MetaList>>new Patching MetaList>>new ...done. Verifying ...done. Recompiling method for patching: Collection>>sort: Patching Collection>>sort: ...done. Verifying ...done. Recompiling method for patching: Link>>do: Patching Link>>do: ...done. Verifying ...done. Recompiling method for patching: List>>do: Patching List>>do: ...done. Verifying ...done. Recompiling method for patching: Collection>>insertSort: Patching Collection>>insertSort: ...done. Verifying ...done. Recompiling method for patching: List>>popFirst Patching List>>popFirst ...done. Verifying ...done. Recompiling method for patching: List>>appendList: Patching List>>appendList: ...done. Verifying ...done. Recompiling method for patching: Collection>>sort Patching Collection>>sort ...done. Verifying ...done. Optimizing MetaLink>>value:next: ...done. Verifying ...done. Optimizing List>>add: ...done. Verifying ...done. Optimizing List>>isEmpty ...done. Verifying ...done. Optimizing List>>insert:onCondition: ...done. Verifying ...done. Optimizing MetaList>>new ...done. Verifying ...done. Optimizing Collection>>sort: ...done. Verifying ...done. Optimizing Link>>do: ...done. Verifying ...done. Optimizing List>>do: ...done. Verifying ...done. Optimizing Collection>>insertSort: ...done. Verifying ...done. Optimizing List>>popFirst ...done. Verifying ...done. Optimizing List>>appendList: ...done. Verifying ...done. Optimizing Collection>>sort ...done. Verifying ...done. Compiling machine code for MetaLink>>value:next: ...done. Compiling machine code for List>>add: ...done. Compiling machine code for List>>isEmpty ...done. Compiling machine code for List>>insert:onCondition: ...done. Compiling machine code for MetaList>>new ...done. Compiling machine code for Collection>>sort: ...done. Compiling machine code for Link>>do: ...done. Compiling machine code for List>>do: ...done. Compiling machine code for Collection>>insertSort: ...done. Compiling machine code for List>>popFirst ...done. Compiling machine code for List>>appendList: ...done. Compiling machine code for Collection>>sort ...done. All is done. Patched cold jit run: 7 Patched hot jit run: 6
Array
class is created, in which the message arguments are placed. Time is also being spent on this. Finally, inline methods are almost never used. It concerns only official functions (as we shall see later) and some trivial constructions. All this allows us to conclude that the potential for further optimization is far from exhausted. struct TObject { TSize size; TClass* klass; union { TObject* fields[0]; uint8_t bytes[0]; }; };
String
). Byte codes of methods are stored in instances.ByteArray
which are also binary objects. Binary objects are always padded to multiple lengths. Flag I is used by the garbage collector as it passes through the heap to mark those objects that have already been processed.1 class
and get the expected response.SmallInt
, although this unit in the image will be represented by just such an “object” SmallInt
.Object
. I must say that this is a rather expensive operation, since you have to do a large number of memory accesses. Therefore, search results are cached. Thus, a complete search has to be done only once. The method caches are reset only in two cases - with the next garbage collection (method objects could move) and with the addition / removal of the next method (this could affect the hierarchy). Even with regular cleaning of caches during garbage collection, the percentage of hits remains very high (about 99%), so the time spent on finding a method can be considered on average insignificant.'Hello world'
(class instance String
) #isNil
.#doesNotUnderstand:
, which is guaranteed to be processed (at least in itself Object
). In some cases, classes intentionally intercept this message for specific purposes. For example, proxy objects can do this, which then delivers the message to a valid address.String>>isNil
search string would be:String
→ Array
→ Collection
→ Magnitude
→ Object
. struct TContext : public TObject { TMethod* method; TObjectArray* arguments; TObjectArray* temporaries; TObjectArray* stack; TInteger bytePointer; TInteger stackTop; TContext* previousContext; };
Array
) where the passed arguments are stored.| |
in the method header. struct TMethod : public TObject { TSymbol* name; TByteObject* byteCodes; TSymbolArray* literals; TInteger stackSize; TInteger temporarySize; TClass* klass; TString* text; TObject* package; };
ByteArray
, - .Undefined>>main
there is a code: [ command <- String readline: '->'. command notNil ] whileTrue: [ command isEmpty ifFalse: [ command doIt printNl ] ]
#doIt
, the result of which is displayed on the screen. The method itself is #doIt
as follows: doIt | method | method <- Undefined parseMethod: 'doItCommand ^ ' + self. ^ method notNil ifTrue: [ ^ Context new perform: method withArguments: (Array new: 1) ]
Undefined>>parseMethod:
that, according to the source text, forms the object of the method #doItCommand
using the compiler implemented here in the text of the image. It turns out that Smalltalk compiles its own methods by means of a compiler written in Smalltalk itself and an integral part of the image. I find this moment rather amusing.Method
. The instance itself also contains additional information, which is also used when initializing the context object (see above).SmallInt
, 0-9 , nil , true false , Undefined
, True
False
.Block
, - , . bytePointer .Array
. Further, a pointer to this object returns to the top of the stack, which will be used when initializing the arguments field of the context object being created.Collection>>at:ifAbsent:
at: value ifAbsent: exceptionBlock self do: [ :element | element = value ifTrue: [ ^element ] ]. ^exceptionBlock value
^element
that is in the nested block itself will be compiled using the blockReturn statement . This is necessary because in reality, the block will be executed not in the current method, but much deeper. The method Collection>>at:ifAbsent:
calls a method Collection>>do:
, passing an external block as a parameter. The method Collection>>do:
in turn will call Block>>value:
for each element of the collection, passing it as a parameter to the block. And only inside Block>>value:
is located the primitive number 8, which will already lead to the execution of the block code. Therefore, the code block decides that it is necessary to perform a return value element
, it will do this using the blockReturn instruction , which will transfer control to the very top, beyondCollection>>at:ifAbsent:
, returning the required value as the result of the message.^
standing in the body of the block will be converted to a blockReturn instruction . Whenever possible, the compiler tries to expand the code into simpler instructions: embed the blocks in the body of the executing method and replace the block call with simple transitions according to the instructions. In this case, blockReturn will also be replaced by stackReturn or selfReturn . ->Collection viewMethod: #sort: sort: criteria | left right mediane | (self size < 32) ifTrue: [ ^ self insertSort: criteria ]. mediane <- self popFirst. left <- List new. right <- List new. self do: [ :x | (criteria value: x value: mediane) ifTrue: [ left add: x ] ifFalse: [ right add: x ] ]. left <- left sort: criteria. right <- right sort: criteria. right add: mediane. ^ left appendList: right
->Collection methods at: #sort:; disassemble " " 0000 PushArgument 0 " self" 0001 MarkArguments 1 0002 SendMessage size 0003 PushLiteral 1 " 1 32" 0004 SendBinary < " " 0005 DoSpecial branchIfFalse 16 " " " :" 0008 PushArgument 0 0009 PushArgument 1 0010 MarkArguments 2 0011 SendMessage insertSort: 0012 DoSpecial stackReturn 0013 DoSpecial branch 17 " :" 0016 PushConstant nil " " 0017 DoSpecial popTop "mediane <- self popFirst" 0018 PushArgument 0 0019 MarkArguments 1 0020 SendMessage popFirst 0021 AssignTemporary 2 0022 DoSpecial popTop "left <- List new" 0023 PushLiteral 4 0024 MarkArguments 1 0025 SendMessage new 0026 AssignTemporary 0 0027 DoSpecial popTop "right <- List new" 0028 PushLiteral 6 0029 MarkArguments 1 0030 SendMessage new 0031 AssignTemporary 1 0032 DoSpecial popTop "self do: ..." 0033 PushArgument 0 0034 PushBlock " " 0037 PushArgument 1 "criteria" 0038 PushTemporary 3 "x" 0039 PushTemporary 2 "mediane" 0040 MarkArguments 3 0041 SendMessage value:value: " " 0042 DoSpecial branchIfFalse 52 "left add: x" 0045 PushTemporary 0 0046 PushTemporary 3 0047 MarkArguments 2 0048 SendMessage add: 0049 DoSpecial branch 56 "right add: x" 0052 PushTemporary 1 0053 PushTemporary 3 0054 MarkArguments 2 0055 SendMessage add: 0056 DoSpecial stackReturn 0057 MarkArguments 2 0058 SendMessage do: 0059 DoSpecial popTop " left" 0060 PushTemporary 0 0061 PushArgument 1 0062 MarkArguments 2 0063 SendMessage sort: 0064 AssignTemporary 0 0065 DoSpecial popTop " right" 0066 PushTemporary 1 0067 PushArgument 1 0068 MarkArguments 2 0069 SendMessage sort: 0070 AssignTemporary 1 0071 DoSpecial popTop "right add: mediane" 0072 PushTemporary 1 0073 PushTemporary 2 0074 MarkArguments 2 0075 SendMessage add: 0076 DoSpecial popTop " " 0077 PushTemporary 0 0078 PushTemporary 1 0079 MarkArguments 2 0080 SendMessage appendList: " " 0081 DoSpecial stackReturn "( )" 0082 DoSpecial popTop 0083 DoSpecial selfReturn
Source: https://habr.com/ru/post/191250/
All Articles