📜 ⬆️ ⬇️

Technoporn with WebAssembly

At the request of workers, I am writing about the internal structure of WebAssembly.


WebAssembly - bytecode for the stack virtual machine. So, to run such a code, an interpreter, a stack, and a code storage are needed. If we want to interact with the outside world, we need an interface to the external machine, the host. Additionally, the standard defines two structures: continuous memory and tables. In the MVP version of the standard, there may be one each, or not at all.


As a result, our technoborel looks like this:




Let's do business!


Stack virtual machine.


There are two basic types of machine architectures: stack and register. We are interested in stack.


image


All stack machine operations obviously work with a stack of variables. The WebAssembly virtual machine has two features:



The remaining principles of the machine are usual: before performing the operation, the operands must be pushed onto the stack. The operation takes the arguments from the stack, runs and puts the result on the stack.


When you call a function from WebAssembly, the interpreter:


  1. Puts function arguments on stack
  2. Supplements them with local variables.
  3. Performs work
  4. Removes local variables and parameters from the stack, leaving only the result

There are only 4 types of variables in the standard:



Neither the character nor the types of 16-bit and 8-bit values ​​are defined. All this is at the mercy of the source language compiler. In the case of C / C ++ / Rust, local variables will be extended to 32-bit values. Pointers are represented as i32, which allows only 4GB of memory to be addressed. A version of wasm64 with 64-bit pointers in development.


Let's look at an example:


int increment(int value) { return value + 1; } 

 (module ;; C :   i32,   i32 (type (;0;) (func (param i32) (result i32))) ;;   -   (func (;0;) (type 0) (param i32) (result i32) (local i32 i32) i32.const 1 ;;  1   set_local 1 ;;    1,     get_local 0 ;;     0 (  ) get_local 1 ;;     1 i32.add ;;       set_local 2 ;;    2    get_local 2 ;;     2 return) ;;   ,   (export "incr_decr" (func 0))) 

Trivia

The syntax is highlighted for ocaml, which is built on the same S-expressions as the WebAssembly text format. Suddenly such a dirty hack someone come in handy.


As you can see, the compiler creates quite a lot of unnecessary code for managing local variables. To eliminate this, an optimizer pass has recently been added to binaryen.


I cite examples not in the handwritten form of S-expressions, but in the form of decompiling a binary format using WABT. So it is easier to understand from the habit, IMHO.


Device code.


The code in WebAssembly is the commands for the virtual machine that control the state of affairs on the stack. Traditionally they are called opcodes (from the operation code). The standard set of codes is less than 255, so when binary encoding such codes occupy one byte. Extension sets (SIMD, streams, exceptions) define their sets of opcodes for which single-byte prefixes are used. That is, the opcode extension takes two bytes.


All basic operations are represented by opcodes: arithmetic, type conversions, work with local variables and memory, function calls, execution flow control.


Opcodes are grouped into functions. Each function has a sequence number and signature: a list of parameters and return values ​​with an indication of the type. Based on the signature, it is determined how many values ​​to take from the stack, and how many to return.


Functions are called with the call and call_indirect . First put the arguments on the stack, and as a result, the arguments will be dropped, and the result laid added to stack


  (func (;0;) (type 0) (param f64 f32) (result f32) ;;       get_local 1 ) (func (;1;) (type 1) (result f32) ;;      f64.const 0x1p+6 (;=64;) f32.const 0x1p+5 (;=32;) ;;   0 call 0 ;;     , ;;       ) 

In the traditional assembler, cycles and conditional transitions are organized using goto analogs. But in WebAssembly out of hate for goto went the other way.


Inside the function, the code is organized into blocks. There are three types of blocks:



There are transition commands for blocks:



A jump command parameter is a block that it interrupts. The block is indicated by the depth of nesting: 0 - the last open block, 1 - immediately before the last open one.


Blocks serve as stack limiters and can return values. Inside the block it is impossible to remove from the stack the arguments that were on the stack at the time of entering the block. If the return value is not specified by the block, the stack size at the input to the block will correspond to the size at the exit of the block. Return values ​​will be added to the stack above.


The system came out quite unusual for those who are used to the traditional if-else-end and while , and for those who are used to working with goto .


It looks like this:


 (func (type 1) (param i32) (result i32) i32.const 1 block (result i32) ;;    i32.const 2 drop ;;      i32.const 4 block (result i32) ;;    i32.const 8 get_local 0 ;;       br_if 1 ;;         ;;    ,     8   drop ;;   8   i32.const 1 ;;    end br_if 0 ;;      0,   1 drop i32.const 16 end i32.add ;;    ) 

Moving to the table is a bit more complicated:


 (func (type 5) (param i32) (result i32) block ;;   4 block ;;   3 block ;;   2 block ;;   1 block ;;   0 get_local 0 ;;   ;;         ;;    ,   ;;  (4   ) br_table 3 2 1 0 4 i32.const 99 return end i32.const 100 return end i32.const 101 return end i32.const 102 return end i32.const 103 return end i32.const 104 ) 

To understand, in C it looks like this:


 int switch_test(int l0) { switch(l0) { case 0: return 103; break; case 1: return 102; break; case 2: return 101; break; case 3: return 100; break; default: return 104; break; } } 

Further, the cycles look like this:


 (func (;16;) (type 2) (param i64) (result i64) (local i64) i64.const 1 set_local 1 block ;;     (1)    loop ;;    (0):    get_local 0 i64.eqz br_if 1 ;;     get_local 0 get_local 1 i64.mul set_local 1 get_local 0 i64.const 1 i64.sub set_local 0 br 0 ;;     end unreachable end get_local 1 ) 

Details from the kitchen of writing the interpreter

Although you are bursting, but in the code for execution, all the interpreters I have seen convert all the control instructions into goto analogues even at the reading stage. So faster and more familiar to machine code. What are the reasons for choosing such a format of control structures for bytecode? Is it only for beautiful handwriting ?


Memory


Complex operations in WebAssembly are performed through continuous memory. For WebAssembly, continuous blocks of memory of 64 kilobytes are allocated from host memory. According to the requirements of the standard, the blocks should be allocated only for special needs; neither the stack, nor the tables, nor external resources should be stored in them. Addressing inside the memory is strictly controlled, receiving or recording at an address outside the block is considered an error. Thus, a sandbox is created for safe work with memory. So that the malware on wasm does not have access to the host memory.


Since the stack is limited to basic data types, all operations on complex and composite types are performed through memory. In the same memory are stored static and global variables, string literals.


From the inside, opcodes of the type TYPE.storeN and TYPE.loadN work with memory. The first one writes a variable or its part into memory at a given offset, the second one loads at a given offset.


Memory is the only way to transfer large enough objects of complex types between WebAssembly and a host. On a host, memory is usually represented by an array of bytes. Pointers to memory are i32 type i32 that store the offset within this array. WebAssembly can request more memory with the grow_memory operation. In this case, usually, a new array is allocated with a new size; old data is copied to a new array.


Having knowledge of the memory device, you can finally write the most famous program:


 const char *hello_world() { return "Hello World!"; } 

 (module (type (;0;) (func (result i32))) (func (;0;) (type 0) (result i32) ;;         ;;          i32.const 16 ;;     16 -      return) (memory (;0;) 1) (export "hello_world" (func 0)) ;;         , ;;       . ;;      16 (data (i32.const 16) "Hello World!\00")) 

Tables


A table is a continuous container for items of a specific type (one of the basic types of WebAssembly). Tables, like memory, can be used to communicate with a host. Unlike memory, tables are strongly typed. From a design point of view, lists of links for garbage collection, lists of external resources like file descriptors and sockets can be made based on tables. Now the table is allowed only one and it is used with a strictly defined goal: for the work call_indirect .


call_indirect , which is indirect or virtual call, is a tool for calling code that cannot be given a name when compiling. This can be shown on the example of virtual functions in C ++. At compilation, we determine the offset of the function in the virtual functions table, and the table itself can be determined while the program is running.


call_indirect is also used to call function pointers. In this way, you can organize a call to a pointer to a host function that was not imported into the module directly: put this function in a table during code execution and use call_indirect .


Understand easier by example:


 struct Vec2 { float x; float y; Vec2(float x, float y) : x(x), y(y) { } virtual ~Vec2() { } virtual float sq_dist() const{ return x * x + y * y; } virtual bool is_3d() const { return false; } }; struct Vec3 : Vec2 { float z; Vec3(float x, float y, float z) : Vec2(x, y), z(z) { } virtual ~Vec3() { } virtual float sq_dist() const{ return Vec3::sq_dist() + z * z; } virtual bool is_3d() const { return true; } }; Vec2 makeVec2(float x, float y) { return Vec2 ( x, y ); } Vec3 makeVec3(float x, float y, float z) { return Vec3 ( x, y, z ); } float sq_dist(const Vec2 &vec) { return vec.sq_dist(); } 

 (module (type (;0;) (func (param i32) (result f32))) (type (;1;) (func (param i32))) (type (;2;) (func)) (type (;3;) (func (param i32) (result i32))) (type (;4;) (func (param i32 f32 f32))) (type (;5;) (func (param i32 f32 f32 f32))) (import "env" "_ZdlPv" (func (;0;) (type 1))) (func (;1;) (type 4) (param i32 f32 f32) ... return) (func (;2;) (type 3) (param i32) (result i32) ... return) (func (;3;) (type 1) (param i32) ... return) (func (;4;) (type 0) (param i32) (result f32) ... return) (func (;5;) (type 3) (param i32) (result i32) ... return) (func (;6;) (type 5) (param i32 f32 f32 f32) ;;   Vec3(float x, float y, float z) ;;      ;;   -   ,      (local i32 i32 i32 i32) i32.const 36 ;;       Vec3 i32.const 8 ;;  8       ;;    i32.add ;;       set_local 7 ;;     ;;     Vec3     ;;  , x     4  get_local 0 get_local 1 f32.store offset=4 ;;   x get_local 0 get_local 2 f32.store offset=8 ;;   y get_local 0 get_local 7 i32.store ;;     get_local 0 get_local 3 f32.store offset=12 ;;   z return) (func (;7;) (type 1) (param i32) ... return) (func (;8;) (type 0) (param i32) (result f32) ... return) (func (;9;) (type 3) (param i32) (result i32) ... return) (func (;10;) (type 0) (param i32) (result f32) ;; float sq_dist(const Vec2 &vec) ;;  -     (local i32 i32 f32) get_local 0 i32.load ;;     set_local 2 get_local 2 i32.load offset=8 ;;         8 ;;  ,        set_local 1 get_local 0 get_local 1 call_indirect (type 0) ;;      return) (func (;11;) (type 2) ;;          ;; ,     unreachable) (table (;0;) 8 8 anyfunc) (memory (;0;) 1) (export "memory" (memory 0)) (export "_Z7sq_distRK4Vec2" (func 10)) ;;   .   0     ;;   3  6 -  sq_dist    ;; (         8, ;;      8,    16     ;;  3,    - 6) (elem (i32.const 0) 11 2 3 4 5 7 8 9) ;;     Vec2 (data (i32.const 12) "\00\00\00\00\00\00\00\00\01\00\00\00\02\00\00\00\03\00\00\00\04\00\00\00") ;;     Vec3 (data (i32.const 36) "\00\00\00\00\00\00\00\00\01\00\00\00\05\00\00\00\06\00\00\00\07\00\00\00")) 

Global variables


Global variables are local, only for all. The module has a list of global variables, which can be accessed by any function in the module using the get_global and set_global . Global variables are changeable and constant. The type of global variables, as well as local ones, is limited to base ones. If you need to store something more complicated, memory is used, and the variable will be a pointer to it.


The main purpose of global variables is to transfer the global state to the host and from the host. If you use a global variable when compiling your module, the compiler will most likely place it in linear memory and will not create a global object.


Post Comment

I did not manage to force LLVM to create a global variable inside the module, only imported. On the one hand, this is understandable, and global variables in a general sense are global evil. But if there is a mechanism, there must be a way to use it. And then there is a word, but there is no word.


Let's go...


 typedef struct { char valueChar; short valueShort; int valueInt; long valueLong; long long valueLongLong; unsigned char valueUnsignedChar; unsigned short valueUnsignedShort; unsigned int valueUnsignedInt; unsigned long valueUnsignedLong; unsigned long long valueUnsignedLongLong; float valueFloat; double valueDouble; char *valueCharPointer; char valueCharArray[25]; int valueIntArray[3]; } test_struct; extern test_struct g_importedStruct; static test_struct g_internalStruct = { .valueChar = -1, .valueShort = -2, .valueInt = -3, .valueLong = -4, .valueLongLong = -5, .valueUnsignedChar = 1, .valueUnsignedShort = 2, .valueUnsignedInt = 3, .valueUnsignedLong = 4, .valueUnsignedLongLong = 5, .valueFloat = 123.0f, .valueDouble = 5.0, .valueCharPointer = "StringValue" }; test_struct *get_imported_by_ptr() { return &g_importedStruct; } test_struct get_imported_by_value() { return g_importedStruct; } test_struct *get_internal_by_ptr() { return &g_internalStruct; } test_struct get_internal_by_value() { return g_internalStruct; } extern float g_importedFloat; float get_imported_float() { return g_importedFloat; } 

 (module (type (;0;) (func (param i32 i32 i32) (result i32))) (type (;1;) (func (result i32))) (type (;2;) (func (param i32))) (type (;3;) (func (result f32))) ;;       , ;;   memcpy -   ;;  webassembly       ;;        (import "env" "memcpy" (func (;0;) (type 0))) (import "env" "g_importedStruct" (global (;0;) i32)) ;;  g_importedStruct (import "env" "g_importedFloat" (global (;1;) i32)) ;; - i32... (func (;1;) (type 1) (result i32) ;; get_imported_by_ptr get_global 0 ;;       return) (func (;2;) (type 2) (param i32) ;; get_imported_by_value ;;  ,    ,      ;;   . ,   , RVO, ,  , ;;        ;;   get_local 0 ;;  -    ,     get_global 0 ;;   -      i32.const 112 ;;   call 0 ;;  memcpy(dest, source, size) drop ;;    return) (func (;3;) (type 1) (result i32) ;; get_internal_by_ptr ;;   ,      ;;      i32.const 16 ;;    return) (func (;4;) (type 2) (param i32) ;; get_internal_by_value ;;   ,    get_imported_by_value, ;;      get_local 0 i32.const 16 i32.const 112 call 0 drop return) (func (;5;) (type 3) (result f32) ;; get_imported_float ;;     f32.load,  g_importedFloat   i32.const 0 ;;  ,      - ;; .  ,      , ;;   float  . get_global 1 i32.add f32.load return) (memory (;0;) 1) (export "memory" (memory 0)) (export "get_imported_by_ptr" (func 1)) (export "get_imported_by_value" (func 2)) (export "get_internal_by_ptr" (func 3)) (export "get_internal_by_value" (func 4)) ;;   ,   g_internalStruct    (data (i32.const 16) "\ff\00\fe\ff\fd\ff\ff\ff\fc\ff\ff\ff\00\00\00\00\fb\ff\ff\ff\ff\ff\ff\ff\01\00\02\00\03\00\00\00\04\00\00\00\00\00\00\00\05\00\00\00\00\00\00\00\00\00\f6B\00\00\00\00\00\00\00\00\00\00\14@\80\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00") ;;       ,    ;;    (data (i32.const 128) "StringValue\00")) 

Communication with the host


The highest form of organization is WebAssembly - module. WebAssembly is distributed, loaded, and executed as modules.


In total, we had 4 types of objects in WebAssembly: functions, memory, tables, and global variables. Each of these objects can be imported and exported (and even imported and exported at the same time).


The module contains all the objects defined in it, and describes the structure of imports and exports. As you can see, all objects (and even signatures that are type ) have their own sequence numbers. Each room has its own space. That is, function 2 has nothing to do with either global variable 2 or signature 2. The space of numbers usually begins with imported objects.


From a code point of view, calling a host function or getting a global host variable is no different from normal: just call N or get_global N The interpreter, based on the module's metadata, will determine whether it is necessary to call the host function, or even the function of another module that is loaded by this host.


Memory and tables are more complicated. When sharing several modules of one memory or one table, you need to determine the regions in which each module stores its global data. There is a recommendation for this , but for the time being it is poorly supported by compilers.


As a result, for communication between the module and the host:



Conclusion


With the main guts I finished. Almost every item you can write a huge list of additions and features of the compilation, but this is a special case. As you can see, although the technology is called Web Assembly, there are no bindings inside the web.


The WABT and LLVM 5.0 package was used to compile and decompile the examples. I do not use Emscripten, because it just creates a bunch of everything designed exclusively for web modules that are run in conjunction with JavaScript.


Next, we will discuss what is needed in the standard library, how to allocate memory, and how to run all this bedlam on mobile platforms.


PS I wanted to write something for the front-fenders who want to deal with this technology. Not sure what happened. If you are a fender, and something is not clear to you - write in the comment boldly, I will be corrected.


')

Source: https://habr.com/ru/post/345450/


All Articles