Introduction to WebAssembly

This article is based on my speech at ITSubbotnik, held in Ryazan on October 14, 2017. There is still quite a bit of material on this topic in Russian, I hope that the article will be useful to you.

Disclaimer: The author is not an expert in either WebAssembly or in JavaScript. This article is a compilation of thoughts and ideas derived from the speeches of other people on this topic, plus the episodic experience of studying WebAssembly for several months.

What is WebAssembly?

WebAssembly ( WASM ) is a new binary format that allows you to run code in a browser.
While such a definition will suffice, a more complete definition can be found on Wikipedia .

Problem

Let's see first what problem or problem they were trying to solve by creating a WebAssembly. The problem is not new, in fact it is to quickly execute code in the browser . But not everything is so simple, gradually it turned out that in addition to the problem itself, we still have several related requirements:

Faster than javascript - ideally, with speed ~~Sveta~~ native code of our processor.
Zero configuration is an out-of-box solution, without installation, you just need a browser.
Safe - new technology should not create new threats.
Cross-platform - we have several platforms, including mobile, several operating systems.
Convenient for developers - we need convenient development and debugging tools.

Situation

In an attempt to solve this problem, we have one winner, and this is javascript.

Losers (not a complete list):

ActiveX - technology allowed to do everything at all, without any sandbox, respectively, represented a real security threat.
Flash - in 2017, Adobe announced plans to stop supporting Flash.
Silverlight
and other plugins

Solution 1: Native code right in the browser

Examples: ActiveX, NaCl
What is bad: no portability, potential or real security issues.

Solution 2: Virtual Machine Code

Examples: Java Applets, Silverlight, etc.
What is bad: you need a plugin and / or runtime ⇒ no zero configuration
In general, if you want to provide cross-platform execution of your code, then the virtual machine is the right approach.

What's wrong with javascript?

JavaScript is good. But if you look at the growth of its performance over the years, we will see that it is now on the second “plateau” of the S-shaped curve. At first, the performance was small and grew gradually, with the advent of the V8, we saw a sharp jump, which had long ago moved back to smooth growth.

(Picture from An Abridged Cartoon Introduction To WebAssembly by Lin Clark .)

Let's see how modern JavaScript engines work.

First of all, the source code (text on JS) passes through the parser, as a result, an internal representation of the code arises - an abstract syntax tree. The interpreter continues to work. When executed, individual functions are converted to bytecode — essentially, a sequence of calls to the internal functions of the interpreter. This accumulates statistics on the use of JS-functions. If a call threshold has been overcome for a particular function, then a decision is made that it needs to be optimized and passed to the compiler. The compiler generates machine code that is strongly tied to the types of input values.

Suppose we have a function with two arguments: foo (a, b), and we call it many times with the numeric values of the parameters. At some point, the function will be passed to the compiler and will be executed faster. Suppose we call it with a string argument. As a result, the engine will perform “de-optimization”: it will transfer the function from the compiler back to the interpreter, and the finished machine code will be thrown.

What do I want to say with this? The JavaScript engine developers do an excellent job and thank them for it. JavaScript is not at all bad, but it has internal limitations that will no longer allow it to be made radically faster.

asm.js

Another interesting initiative, already from the Mozilla Foundation, which brings us close to the topic WebAssembly. It appeared in 2010, and in 2013 it became publicly available.

The idea is to create a subset of JavaScript into which you can compile code from C and C ++ using the special Emscripten compiler.

Since this is a subset of JavaScript, such code will be executed in any browser. In addition, the main modern browsers have long been able to quickly recognize asm.js and effectively compile it into native processor code. In comparison with the native code obtained directly from C / C ++, the code obtained from asm.js is the slowest by 1.5-2 times (50-67%).

For the simplest C / C ++ function, the asm.js code looks like this:

Here, 'use asm' is a directive showing the JS engine that the code below is asm.js, and constructions of the form |0 indicate that the work goes with integers (bitwise OR with a zero value zeroes the fractional part of Number).

Development Goals for WebAssembly

Speed is almost like native code.
Efficiency - binary format, fast parsing and compilation.
Portability - all browsers and operating systems.
Security - launch in sandbox.
Easy debugging - debugging support in browsers, the debugger is already there.
An open standard - that is, this is no longer the initiative of a separate company trying to "pull the blanket over itself." The standard has already been adopted in 2017.

So what is WebAssembly?

Binary format
NOT a programming language, but a byte code.
We do not call Java bytecode a programming language.
It is loaded into the browser and executed in the browser.
Formally, WebAssembly is executed by a JavaScript engine, and not by the browser itself, so there are other versions of execution, for example, under NodeJS.
Executed by the virtual machine
This is a simple stack machine with memory, simplicity makes it easy to implement it for any modern processor.
It has nothing to do with the Web, except that it communicates with the outside world through JavaScript.
Indeed, WebAssembly is just a virtual machine that has memory and executes instructions.

Where to begin? Hello world

Mastering WebAssembly I strongly advise starting with the WasmFiddle online tool.
(I myself started with Emscripten and realized my mistake only after a while.)

WasmFiddle interface:

At the top left, the source code, at the bottom left, the result of the compilation by the Build button (now you can see the textual representation), at the top right, the code to run and at the bottom right, the result of the run by the Run button.

C / C ++ sample text

As an example, I used a simple code to calculate the Fibonacci number (yes, again, it is :), not to say that a good code, just the first available variant:

 int fib(int n) { if (n == 0) { return 0; } else { if ((n == -1) || (n == 1)) { return 1; } else { if (n > 0) { return fib(n - 1) + fib(n - 2); } else { return fib(n + 2) - fib(n + 1); } } } }

A little about the textual representation (WAST). As already mentioned, WebAssembly is a binary format, at the output of the compilation we get a WASM file. Textual representation can always be obtained from the WASM file, it allows you to figure out what exactly the assembly contains, which tables and code. This view is also used for debugging.

In this case, from the textual representation, we see that 1 memory page is allocated (each page = 64 Kbytes), the memory and the function fib are visible (exported), and then the definition of this function, that is, its implementation itself goes.

Text View (WAST)

The beginning of the textual representation of this assembly looks like this:

 (module (table 0 anyfunc) (memory $0 1) (data (i32.const 12) "\01\00\00\00\00\00\00\00\01\00\00\00") (export "memory" (memory $0)) (export "fib" (func $fib)) (func $fib (param $0 i32) (result i32) (local $1 i32) (block $label$0 (br_if $label$0 (i32.ge_u (tee_local $1 (i32.add (get_local $0) (i32.const 1) ) ) (i32.const 3) ) ) (return (i32.load (i32.add (i32.shl (get_local $1) (i32.const 2) ) (i32.const 12) ) ) ...

If you put it all together, then the minimum JavaScript code to run the example looks like this:

 var wasmCode = new Uint8Array( [0,97,115,109,1,0,0,0,1,134,128,128,128,0,1,96,1,127,1,127,3,130,128,128,128,0,1,0,4,132,128,128,128,0,1,112,0,0,5,131,128,128,128,0,1,0,1,6,129,128,128,128,0,0,7,144,128,128,128,0,2,6,109,101,109,111,114,121,2,0,3,102,105,98,0,0,10,203,128,128,128,0,1,197,128,128,128,0,1,1,127,2,64,32,0,65,1,106,34,1,65,3,79,13,0,32,1,65,2,116,65,12,106,40,2,0,15,11,2,64,32,0,65,1,72,13,0,32,0,65,127,106,16,0,32,0,65,126,106,16,0,106,15,11,32,0,65,2,106,16,0,32,1,16,0,107,11,11,146,128,128,128,0,1,0,65,12,11,12,1,0,0,0,0,0,0,0,1,0,0,0]); var wasmModule = new WebAssembly.Module(wasmCode); var wasmInstance = new WebAssembly.Instance(wasmModule, []); console.log(wasmInstance.exports.fib(10));

Here, the finished WASM is described in the code as an array of numbers, but in real life, of course, the WASM file will be quite a lot more, and we will load it from the server.

Execution of WebAssembly in the browser looks like this. The browser, as usual, loads an HTML page with which JavaScript is executed, which already loads WebAssembly — you get a “module” (WebAssembly module), then create an instance of the module, after which you can call the functions it exports for this instance.

Notice the gray arrow here: JavaScript functions can be called from within a WebAssembly. Let's take a closer look at the sequence diagram:

Here we first call WebAssembly from JavaScript, then call the JavaScript function from WebAssembly.

In the second call here, I showed how WebAssembly uses any API (for example, DOM / WebGL, etc.). This is possible, but not directly, such calls also occur only through JavaScript. Obviously, a “bottleneck” arises here: if we work intensively with the API from WASM, then we will lose a lot of time on “pinging” these calls through JavaScript.

The WebAssembly memory model is very simple. This is a flat “piece” of memory in which the program code, global variables, the stack and the heap are located. It is possible to make the memory expandable, then if, if at the next memory allocation we do not have enough space, then the upper memory limit automatically increases.

The entire memory block is accessible from JavaScript, simply as an array of bytes (and also as an array of 16-bit and 32-bit words, as an array of 16-bit and 32-bit float values). Moreover, the memory from JavaScript is available for both reading and writing.

Emscripten

Emscripten is the main compiler for getting asm.js and WebAssembly from C / C ++. (There are also compilers in WASM from other languages, for example from Rust and TypeScript.) Here I will consider using Emscripten under Windows, but I do not think that there will be significant differences for other systems.

Llvm

Speaking of Emscripten, it’s worth a little to talk about the Low Level Virtual Machine (LLVM).

LLVM is a family of compilers. The main idea of LLVM is to separate the compilation into frontend and backend. The frontend compiler is engaged in compiling from source code to internal representation (Intermediate Representation, IR). IR is the code for some virtual machine. The backend compiler is already engaged in converting IR into code for a specific platform, for example, backend is often used for x86 and x86-64. If you need a compiler from another programming language, then only a new frontend is written. If you need to compile a new platform, then write a new backend.

Emscripten uses LLVM to compile from C / C ++, and provides its backend compilers for building in asm.js and WebAssembly.

Install emscripten

Installing Emscripten is pretty simple, in my case it was under Windows, and I didn't even need to compile anything from the sources.

Download from here: http://emscripten.org/
We unpack in a separate folder, in my case C:\bin\emsdk
Open the command prompt, go to the emsdk folder and execute three commands:

 emsdk update emsdk install latest emsdk activate latest

Everything is installed and configured, we can compile. emsdk list command, you will receive a list of all versions of all tools available for installation, with notes of what is selected now.

Compiling in asm.js

Let's see how to compile code using Emscripten, let's start with asm.js.

Emscripten Example

The example is the same as above, slightly modified for Emscripten (fib.c):

 # include <emscripten/emscripten.h> int EMSCRIPTEN_KEEPALIVE fib(int n) { if (n == 0) { return 0; } else { if ((n == -1) || (n == 1)) { return 1; } else { if (n > 0) { return fib(n - 1) + fib(n - 2); } else { return fib(n + 2) - fib(n + 1); } } } } int main() { printf("fib(10) = %d\n", fib(10)); return 0; }

Here is added the EMSCRIPTEN_KEEPALIVE macro, which does two things. First, it prevents the function from being thrown away by the compiler, even if it is not used anywhere in our code. Secondly, it indicates that the function must be exported to be called from the outside.

To compile, I use the following batch file:

 SET EMSDKPATH=C:\bin\emsdk CALL %EMSDKPATH%\emsdk_env.bat emcc -O1 fib.c -o fib.html -fno-exceptions -fno-rtti

Here emcc is actually Emscripten, -O1 optimization option, fib.c what to compile, -o fib.html where to compile, and further options -f to disable what we do not need.

What we get at the output

At the output of the compilation, we get an HTML file (fib.html) that includes JavaScript to run the compiled code:

We also received the file fib.js, in the depths of which you can find the function fib () and its call in main ():

In addition, the binary file fib.html.mem is generated - this is the “memory image”, what the memory looks like before launching the program, here all the constants and global variables.

Opening fib.html we will see the following picture:

This is the standard result view for Emscripten. The black rectangle in the center is the output of the “console” (stdout), there in particular printf() is output. The black box at the top is canvas; Emscripten does not know whether you need it, but creates it here just in case.

Compiling in WebAssembly

To compile in WebAssembly, we don’t need to change the source code to C / C ++ at all (and that’s great!).

We change only the command line of the compiler call:

 SET EMSDKPATH=C:\bin\emsdk CALL %EMSDKPATH%\emsdk_env.bat emcc -O1 fib.c -g -o fib.html -s WASM=1 -s NO_EXIT_RUNTIME=1 -s NO_FILESYSTEM=1 -fno-exceptions -fno-rtti --llvm-lto 1

The main difference here is the addition of the -s WASM=1 option. The remaining -s and -f added in an attempt to explain to Emscripten what we do NOT need. Emscripten can do too much, and therefore “just in case,” “suddenly needed,” he adds a lot of things to the resulting files.

As a result of the compilation, we also get fib.html, plus fib.js (a set of Emscripten service functions), and finally fib.wasm:

At the beginning of the WASM file there is a byte 00 and then the characters "asm", using these first four bytes, you can determine that we are loading exactly wasm, and not any stub page with an error code. The next 4 bytes is the version number, in this case 1.0. A separate file for the memory image is not generated, constants and global variables are included in the same WASM file.

I will not give here a screenshot with the result, it looks one-to-one just like for the asm.js example.

Debug WebAssembly in Chrome

Let's see what we have in terms of debugging. Having opened the Chrome developer tools (F12), we go to Sources and there we will see a new section “wasm”, in which we can find our function among a set of blocks, we can put a breakpoint there, stop at it and step in the debugger.

As you can see, the text view (WAST), which I mentioned above, is used for debugging.

Now let's compile the same code with debugging information. To do this, add the option -g emcc command line. As a result, the compiler will generate two more files for us: fib.wast and fib.html.map.
In the fib.wast text view file, there is not only code, but also references to the source code in C / C ++:

  (func $_fib (param $0 i32) (result i32) (local $1 i32) ;; fib.c:11 (block $switch-default

Let's see what it gives us in terms of debugging. After refreshing the page, in the Sources section we will now see our source code for fib.c, we can put a breakpoint in it, stop at it, view local variables and step through the debugger through the code.

Emscripten features

Emscripten has been developing since 2010 and already has a lot of things. In this case, it is no longer about the compiler, but about supporting popular libraries used from C / C ++ code. Supported:

Standard C / C ++ Libraries
SDL 2 - input (keyboard / mouse / joysticks), video, sound
OpenGL, EGL - 2D / 3D graphics, implemented via WebGL
OpenAL - sound

And other features:

File system emulation - many C / C ++ programs intensively work with files, you can leave this code as it is; see Emscripten - File System Overview
EM_ASM("JS code") - execute arbitrary javascript code generated as a string
Workers

More complicated example

I have my own hobby project of an emulator of an old Soviet computer written in C ++, I described it here in this article . Since then I have managed to finish it a little, and also to port it under Qt (Windows / MacOS / Linux). So I have already been allocated the emulation core (~ 280 Kbytes of code, ~ 7 kilo-blocks), which was built under different compilers. Actually, I started learning WebAssembly by compiling this emulator using Emscripten. Before the first successful launch, it took me two nights after work, I consider it a good result, indicating that the threshold for entering the topic is relatively low. Most of the work was related to JavaScript: how to call WASM methods from JS, how to pass parameters, how to draw a canvas and so on.

By the way, the screen of the emulated machine is formed completely inside WASM, in the form of a solid block of memory, with a “pixel” format suitable for canvas. Outward in JavaScript is transmitted only the address of the finished "screen". In JavaScript, it remains only to copy this block of memory to the canvas.

A working emulator can be seen here , well, the source code is also available .

Screenshot of emulator compiled as WASM

Also at some point I decided “to complete the picture” to build this emulator also under asm.js. I made myself a coffee, set aside a couple of free hours, and in less than 15 minutes the emulator started working. It was as if unexpected. In fact, all that had to be done was to look at the differences in the generated HTML file and transfer the added JavaScript block to the right place. The only difference was that asm.js should load the .mem file, the memory image with constants and global variables. Otherwise, all the calls were performed in the same way, and the finished page looked and worked exactly the same, except a little slower.

So, summing up on Emscripten. I was convinced that from the same code it generates a result in the form of asm.js and in the form of WebAssembly, the result obtained looks and works exactly the same (except for speed, of course). The threshold of entry to obtain a real result was relatively low.

On the other hand, the Emscripten is a rather complicated and “stuffed” tool. As a result of compilation, it includes a lot of things that you do not expect, but which may be useful to you. Therefore, even for small sources, a large amount of the resulting code is generated. Some of these things can be disabled with command line options , some not.

I think that you shouldn’t start mastering WebAssembly right away from Emscripten, for the reason that it’s quite difficult to separate for yourself what Emscripten gives from what has WASM out of the box. But on real projects, Emscripten is more than useful, precisely because of the opportunities it provides to the developer.

The current state of WebAssembly

WebAssembly news in 2017:

March 2017 - WebAssembly Cross-browser consensus and end of browser preview
May 2017 - The Chromium team abandoned PNaCl in favor of WebAssembly https://blog.chromium.org/2017/05/goodbye-pnacl-hello-webassembly.html
September - Safari 11 released with WebAssembly support https://webkit.org/blog/7956/new-webkit-features-in-safari-11/

Browser support

At the beginning of October 2017, the situation looked like this:

The Edge version is bound to the operating system version. Together with Windows 10 Fall Creators Update we get Edge 16, in which the WebAssembly works immediately, no longer need to include anything in the settings.

For browsers that do not support WebAssembly, it is supposed to use the so-called. "Polyfill", that is, the automatic conversion of WASM into code that can be executed in this browser. In particular, a prototype was made that effectively performs the WebAssembly conversion to asm.js. But so far I have not seen examples of the real application of this approach.

Future WebAssembly

A number of things that the WebAssembly team is currently working on :

Threads
The current implementation of WASM is completely single-threaded.
SIMD
SIMD support will significantly speed up single-type processing on large amounts of data - images, video, sound.
Exceptions
Garbage collection
The GC implementation will allow compiling for WASM from languages with automatic memory management. This is possible now, but only in the version when GC is implemented in C / C ++ inside WASM.

Performance

The performance question is actually quite complicated. Because not always WebAssembly works faster than the same JavaScript or asm.js. For example, look at comparisons from JavaScript vs WebAssembly easy benchmark . On the very first collisionDetection test, it turns out that WASM gives 88% of JS. And let's say on the next test Fibonacci WASM gives a much better result, 3 times faster than JS.

I will note here only a few moments that affect the speed, of course there are much more of them.

It was already noted above that WebAssembly can lose significantly in performance when intensive calls are made to JS functions.

WebAssembly loses its performance on memory accesses: each such treatment makes a check to go beyond the boundaries of the available memory block.

WebAssembly can significantly benefit from the type of integer variables. In JS, we only have the Number type, in fact, always 64-bit floating point, and integers are floating numbers without a fractional part. When compiling in the JS engine, a 64-bit integer type is used for integers. In WASM, we choose the bitness of the type ourselves, and if we use a 32-bit integer type, operations on which are slightly faster than on a 64-bit integer, then we get here a “dishonest” advantage in computation speed.

In general, I got the impression that there is no such thing as “on average, WebAssembly provides a 10-15% increase in speed”, there is no “on average”; for each algorithm, you need to determine whether you get a speed increase using WASM. But in general, it can be predicted that for intensive calculations, most likely WebAssembly will give some more or less noticeable performance gain. Well, besides, it is clear that even in the last six months, the speed of WASM has slightly increased with the release of new versions of browsers.