Development under WebAssembly: real rake and examples

The WebAssembly announcement was held in 2015 - but now, after years, still few can boast of it in production. The more valuable materials about such experience: first-hand information about what it is like to live with it in practice, is still in short supply.

At the HolyJS conference, a report on the experience of using WebAssembly received high ratings from viewers, and now a text version of this report has been prepared specifically for Habr (the video is also attached).

')
My name is Andrew, I will tell you about WebAssembly. We can say that I started working on the web in the last century, but I am modest, so I will not say so. During this time I managed to work on the backend, and on the front-end, and even drew a little design. Today I am interested in things like WebAssembly, C ++ and other native things. I also love typography and collect old equipment.

First, I will talk about how we and the team implemented WebAssembly in our project, then we will discuss whether you need something from WebAssembly, and finish with a few tips in case you want to implement it in yourself.

How we implemented webasse assembly

I work for Inetra, we are in Novosibirsk and we are doing several projects of our own. One of them is ByteFog. This is a peer-to-peer video delivery technology for users. Our clients are services that distribute a huge amount of video. They have a problem: when a popular event happens, for example, someone’s press conference or a sporting event, you don’t get ready for it, a lot of clients come, rush into the server, and the server is sad. Customers at this time receive very poor video quality.

But everyone watches the same content. Let's ask neighboring devices of users to share pieces of video, and then we will unload the server, save the band, and users will receive the video in better quality. These clouds are our technology, our proxy ByteFog.

We must be installed in every device that can display video, so we support a very wide range of platforms: Windows, Linux, Android, iOS, Web, Tizen. Which language to choose in order to have a single code base on all these platforms? We chose C ++, because it turned out to have the most advantages :-D More seriously, we have a good expertise in C ++, this is really a quick language, and in portability it is probably less than S.

We got a pretty large application (900 classes), but it works fine. Under Windows and Linux, we compile into native code. Under Android and iOS, we build the library that we connect to the application. We'll talk about Tizen another time, but on the Web we used to work as a browser plug-in.

This is the technology of the Netscape Plugin API. As the name implies, it is quite old and also has a drawback: it gives very wide access to the system, so user code can cause a security problem. Perhaps that is why in 2015, Chrome turned off support for this technology, and then all browsers joined this flashmob. So we were left without a web version for almost two years.

In 2017, a new hope has dawned. As you might guess, this is WebAssembly. As a result, we set ourselves the task of porting our application to the browser. Since the spring has already appeared support for Firefox and Chrome, and by the fall of 2017, Edge and Safari were pulled up.

It was important for us to use ready-made code, since we have a lot of business logic, which we didn’t want to double, in order not to double the number of bugs. We take the Emscripten compiler. It does what we need - it compiles the positive application into the browser and recreates the environment familiar to the native application in the browser. We can say that Emscripten is such a Browserify for C ++ code. It also allows you to forward objects from C ++ to JavaScript and vice versa. Our first thought was: now we take Emscripten, just compile, and it will work. Of course, it didn't. With this, our path began on a rake.

The first thing we encountered was dependencies. There were several libraries in our codebase. Now it makes no sense to list them, but for those who understand, we have Boost. This is a large library that allows you to write cross-platform code, but with it it is very difficult to customize the compilation. I wanted to drag as little code as possible into the browser.

Architecture bytefog

As a result, we have identified the core: we can say that this is a proxy server, which contains the main business logic. This proxy server takes data from two sources. The first and the main one is HTTP, that is, the channel to the video distribution server, the second is our P2P network, that is, the channel to another similar proxy from some other user. We give the data first to the player, since our task is to show quality content to the user. If resources remain, we distribute content to the P2P network so that other users can download it. Inside is a smart cache that does all the magic.

Compiling this all, we are faced with the fact that WebAssembly is performed in the sandbox of the browser. So, can not more than gives JavaScript. While native applications use many platform-specific things, such as the file system, the network, or random numbers. All these features will have to be implemented in JavaScript using what the browser gives us. In this tablet listed fairly obvious replacements.

To make this possible, it is necessary in the native application to saw off the implementation of the native capabilities and insert an interface there, that is, to draw a certain border. Then you implement it in JavaScript and leave the native implementation, and then when you build it, you choose the right one. So, we looked at our architecture and found all the places where you can draw this border. Coincidentally, this is a transport subsystem.

For each such place we defined the specification, that is, we fixed the contract: what would be the methods, what parameters would they have, what types of data. Once you have done this, you can work in parallel, each developer on his side.

What was the result? We replaced the main delivery channel of the video from the provider with the usual AJAX. We provide data to the player through the popular HLS.js library, but there is a fundamental opportunity to integrate with other players, if necessary. We replaced the whole P2P layer with WebRTC.

Compiling results in multiple files. The most important is the binary .wasm. It contains a compiled bytecode that the browser will execute and which contains all of your C ++ legacy. But by itself it does not work, the so-called “glue code” is necessary, it is also generated by the compiler. The glue code is loading a binary file, and you upload both of these files to production. For debugging purposes, you can generate a textual representation of the assembler — the .wast file and the sourcemap. You need to understand that they can be very large. In our case, reached 100 megabytes or more.

We are going to bundle

Consider the glue code closer. This is the usual good old ES5, collected in one file. When we connect it to a web page, we have a global variable, which contains all our instantiated wasm-module, which is ready to accept requests to its API.

But connecting a separate file is quite a serious complication for the library that users will use. We would like to collect everything in a single bundle. For this we use a Webpack and a special compilation option MODULARIZE.

It wraps the glue code into the “Module” pattern, and we can pick it up: import or use require, if we write to ES5, - Webpack easily understands this dependency. There was a problem with Babel, - he didn’t like the large amount of code, but this is an ES5-code, it doesn’t need to be translated, we just add it to the ignore list.

In pursuit of the number of files, I decided to use the SINGLE_FILE option. All the binaries that came out when compiled, it translates into Base64-view and pushes the glue code as a string. It sounds like a great idea, but after that the bundle with us became 100 megabytes. Neither Webpack, nor Babel, nor even the browser work on such a volume. And in general, we will not force the user to load 100 megabytes?

If you think about it, then this option is not needed. Glue code downloads binary files. It does this via HTTP, which means we get caching out of the box, we can set any headers we want, for example, enable compression, and WebAssembly files are perfectly compressed.

But the coolest technology is stream compilation. That is, the WebAssembly file, while being downloaded from the server, may already be compiled in the browser as data is received, and this greatly speeds up the loading of your application. In general, the whole WebAssembly technology has a focus on the quick start of a large code base.

Thenable

Another problem with the module is that it is a Thenable object, that is, it has a .then () method. This function allows you to hang a callback at the time of the start of the module, and it is very convenient. But I would like the interface to match Promise. Thenable is not a Promise, but do not worry, we wrap ourselves. Let's write this simple code:

return new Promise((resolve, reject) => { Module(config).then((module) => { resolve(module); }); });

We create a Promise, we start our module, and as a callback we call the function resolve and pass there the module that we instantiated. Everything seems to be obvious, everything is fine, we are launching - something is wrong, our browser is hanging, DevTools are hanging, and the computer is warming up the processor. We understand nothing - some kind of recursion or an infinite loop. Debugging is quite difficult, and when we interrupted JavaScript, we found ourselves in the Then function in the Emscripten module.

 Module['then'] = function(func) { if (Module['calledRun']) { func(Module); } else { Module['onRuntimeInitialized'] = function() { func(Module); }; }; return Module; };

Let's look at it in more detail. Plot

 Module['onRuntimeInitialized'] = function() { func(Module); };

responsible for hanging callback. Everything is clear here: an asynchronous function that calls our callback. Everything we want. There is another part of this function.

 if (Module['calledRun']) { func(Module);

It is called when the module has already started. Then the callback is synchronously called immediately, and the module is passed to the parameter. This simulates the behavior of Promise, and seems to be what we expect. But what then is wrong?

If you carefully read the documentation, it turns out that there is a very subtle point about Promise. When we resolve a Promise using a Thenable object, the browser will expand the values from this Thenable object, and for this it will call the .then () method. As a result, we rezolvim Promise, pass him a module. The browser asks: Is this objectable? Yes, this is the Thenable object. Then the module calls the .then () function, and the function itself is passed as the callback.

The module checks if it is running. It is already running, so the callback is called immediately, and the same module is transmitted to it again. As a callback, we have the function resolve, and the browser asks: is this a Thenable object? Yes, this is the Thenable object. And it all starts again. As a result, we fall into an infinite loop from which the browser never returns.

I did not find an elegant solution to this problem. As a result, I simply delete the .then () method in front of the resolve, and it works.

Emscripten

So, we compiled the module, compiled JS, but something is missing. Perhaps we need to do some useful work. To do this, you need to transfer data and link the two worlds - JS and C ++. How to do it? Emscripten provides three options:

The first is the ccall and cwrap functions. Most often you will meet them in some tutorials on WebAssembly, but they are not suitable for real work, as they do not support the C ++ features.
The second is WebIDL Binder. It already supports C ++ functions, it is already possible to work with it. This is a serious interface description language used, for example, by the W3C for its documentation. But we did not want to carry it into our project and used the third option.
Embind. We can say that this is a native way of connecting objects for Emscripten, it is based on C ++ templates and allows you to do a lot of things by forwarding different entities from C ++ to JS and back.

Embind allows you to:

Call C ++ functions from JavaScript
Create JS objects from C ++ class
From C ++ code, you can access the browser's API (if for some reason you want this, you can, for example, write a front-end framework entirely in C ++).
The main thing for us is to implement the JavaScript interface described in C ++.

Data exchange

The last point is important, since this is exactly the action that you will constantly do when porting the application. Therefore, I would like to dwell on it in more detail. There will now be C ++ code, but don't be scared, it's almost like TypeScript :-D

The scheme is as follows:

On the C ++ side, there is a kernel to which we want to give access, for example, to an external network — to shake a video. Previously, it did this using native sockets, there was some HTTP client that did this, but there are no native sockets in WebAssembly. We need to somehow get out, so we cut off the old HTTP client, insert the interface into this place, and implement the implementation of this interface in JavaScript using ordinary AJAX, in any way. After that, we will pass the received object back to C ++, where the kernel will use it.

Let's make the simplest HTTP client that can only get get requests:

 class HTTPClient { public: virtual std::string get(std::string url) = 0; };

At the input, it takes a string with the URL to download, and at the output
string with the result of the query. In C ++, strings may have binary data, so this is appropriate for video. Emscripten makes us write here
such a terrible wrapper:

The main thing in it is two things - the name of the function on the C ++ side (I marked them in green), and the corresponding names on the JavaScript side, (marked them in blue). As a result, we write a communication declaration:

It works like Lego cubes, from which we assemble it. We have a class, this class has a method, and we want to inherit from this class to implement the interface. It's all. We go to javascript and inherit. This can be done in two ways. The first is extend. This is very similar to the good old extend from Backbone.

The module contains everything compiled with Emscripten, and it has a property with an exported interface. We call the extend method and pass an object there with the implementation of this method, that is, some method will be implemented in the get function
getting information using AJAX.

At the output, extend gives us the usual JavaScript constructor. We can call it as many times as necessary and generate objects in the quantity we need. But there is a situation when we have one object, and we want to simply transfer it to the C ++ side.

To do this, you must somehow bind this object to a type that C ++ understands. This is what the implement function does. At the output, it gives not a constructor, but an object already ready for use, our client, which we can give back to C ++. You can do this, for example, like this:

 var app = Module.makeApp(client, …)

Suppose we have a factory that creates our application, and it accepts its dependencies in the parameters, for example, client and something else. When this function runs, we get the object of our application, which already contains the API that we need. You can do the opposite:

 val client = val::global(″client″); client.call<std::string>(″get″, val(...) );

Directly from C ++ take our client from the browser global scope. And in place of the client, there can be any browser API, starting from the console, ending with the DOM API, WebRTC - whatever you want. Next, we call the methods that this object has, and we wrap all the values in the magic class val, which provides us with Emscripten.

Binding errors

In general, this is all, but when you start developing, you are confronted with binding errors. They look something like this:

Emscripten tries to help us and explain what is going wrong. If this is all summed up, then you need to make sure that they match (it is easy to seal up and get a binding error):

Names
Types
Number of parameters

Embind syntax is unusual not only for front-end vendors, but also for people who deal with C ++. This is a DSL, in which it is easy to make a mistake, you need to follow this. Speaking of interfaces, when you implement some kind of interface in JavaScript, you need to match it exactly to what you described in your contract.

We had an interesting case. My colleague Jura, who was involved in the project on the part of C ++, used Extend to test its modules. They worked well for him, so he committed them and gave them to me. I used implement to integrate these modules into the JS project. And they stopped working for me. When we figured out, it turned out that when we binded the names of the functions, there was a typo.

As we see from the name, Extend is an extension of the interface, so if you have sealed it somewhere, Extend will not give an error, it will decide that you just added a new method, and everything is fine.

That is, it hides the binding errors until the method itself is called. I suggest using Implement in all cases where it suits you, as it immediately checks the correctness of the interface that was thrown. But if you need an Extend, you need to cover the challenge of each method with tests so as not to screw up.

Extend and ES6

Another problem with Extend is that it does not support ES6 classes. When you inherit an object spawned from an ES6 class, Extend expects all properties to be enumerated in it, but this is not the case with ES6. Methods are in the prototype, and they have enumerable: false. I use this crutch, in which I go over the prototype and turn on enumerable: true:

 function enumerateProto(obj) { Object.getOwnPropertyNames(obj.prototype) .forEach(prop => Object.defineProperty(obj.prototype, prop, {enumerable: true}) ) }

I hope someday you can get rid of it, since there is talk in the Emscripten community about improving support for ES6.

RAM

Speaking of C ++, you can not affect the memory. When we checked everything on SD-quality video, everything was perfect, it worked just perfect! As soon as we did the FullHD test, a memory shortage error. It does not matter, there is an option TOTAL_MEMORY, which sets the starting value of the memory for the module. We made half a gigabyte, everything is fine, but somehow it is inhumane for users, because we reserve memory for everyone, but not everyone has a subscription to FullHD content.

There is another option - ALLOW_MEMORY_GROWTH. It allows you to grow memory
gradually as needed. It works like this: Emscripten by default gives the module 16 MB to work. When you all use them, a new piece of memory is allocated. All the old data is copied there, and you still have the same amount of space for the new ones. This happens until you reach 4 GB.

Suppose you allocated 256 megabytes of memory, but you know for sure that you thought that 192 is enough for your application. Then the remaining memory will be used inefficiently. You have allocated it, taken away from the user, but do not do anything with it. I would like to somehow avoid it. There is a small trick: we begin work with memory increased by one and a half times. Then in the third step we reach 192 megabytes, and this is exactly what we need. We reduced the memory consumption for the remainder and saved the extra memory allocation, and the farther, the longer they take. Therefore, I recommend using both of these options together.

Dependency Injection

It would seem that all, but then the rake went further. There is a problem with Dependency Injection. We write the simplest class in which dependency is needed.

 class App { constructor(httpClient) { this.httpClient = httpClient } }

For example, we transfer our HTTP client to our application. Save to class property. It would seem that everything will work well.

 Module.App.extend( ″App″, new App(client) )

We inherit from the interface in C ++, first create our object, pass it a dependency, and then inheritance. At the time of inheritance, Emscripten does something incredible with the object. The easiest way to think is that he is killing an old object, creating a new one based on his template and dragging all public methods there. But at the same time the state of the object is lost, and you get an object that is not formed and does not work correctly. To solve this problem is quite simple. It is necessary to use a constructor that works after the inheritance stage.

 class App { _construct(httpClient) { this.httpClient = httpClient this._parent._construct.call(this) } }

We do almost the same thing: we keep the dependency in the field of the object, but this is the object that turned out after inheritance. One should not forget to forward the constructor call to the parent object, which is on the C ++ side. The last line is an analogue of the super () method in ES6. This is how inheritance occurs in this case:

 const appConstr = Module.App.extend( ″App″, new App() ) const app = new appConstr(client)

At first, we inherit, then we create a new object, to which we are already passing the dependency, and it works.

Trick with pointer

Another problem is passing objects by pointer from C ++ to JavaScript. We have already made an HTTP client. To simplify, we missed one important detail.

 std::string get(std::string url)

The method returns the value immediately, that is, it turns out that the request must be synchronous. But AJAX requests are for AJAX, that they are asynchronous, so in real life the method will return either nothing, or we can return the request ID. But in order for someone to return the answer, we pass the second parameter to the listener, in which there will be a callback from C ++.

 void get(std::string url, Listener listener)

In JS, it looks like this:

 function get(url, listener) { fetch(url).then(result) => { listener.onResult(result) }) }

We have a get function that accepts this listener object. We start downloading the file and hang up the callback. When the file is downloaded, we pull the listener to the desired function and pass the result to it.

It would seem that the plan is good, but when the get function completes, all local variables will be destroyed, and with them the function parameters, that is, the pointer will be destroyed, and the runtime emscripten will destroy the object on the C ++ side.

As a result, when it comes to calling the listener.onResult (result) line, the listener will no longer exist, and accessing it will result in a memory access error, which will cause the application to crash.

I would like to avoid this, and there is a solution, but it took several weeks to find it.

 function get(url, listener) { const listenerCopy = listener.clone() fetch(url).then((result) => { listenerCopy.onResult(result) listenerCopy.delete() }) }

It turns out there is a pointer cloning method. For some reason, it is not documented, but it works fine, and allows you to increase the reference count in the Emscripten index. This allows you to hang it in the closure, and then when we start our callback, our listener will be accessible by this pointer and you can work as we need.

The most important thing is to remember to delete this pointer, otherwise it will lead to a memory leak error, which is very bad.

Quick write to memory

When we download videos, these are relatively large amounts of information, and I would like to reduce the amount of data copying back and forth to save both memory and time. There is one trick how to write a large amount of information directly to the WebAssembly's memory from JavaScript.

 var newData = new Uint8Array(…); var size = newData.byteLength; var ptr = Module._malloc(size); var memory = new Uint8Array( Module.buffer, ptr, size ); memory.set(newData);

newData is our data in the form of a typed array. We can take its length and request the memory allocation of the size we need from the WebAssembly module. The malloc function will return a pointer to us, which is simply an array index that contains all of the WebAssembly's memory. From the JavaScript side, it just looks like an ArrayBuffer.

With the next action, we will open a window into this ArrayBuffer of the required size from a certain place and copy our data there. Despite the fact that the set operation has copy semantics, when I looked at this site in the profiler, I did not see a long process. I think that the browser optimizes this operation with the help of move-semantics, that is, it transfers ownership of memory from one object to another.

And in our application, we also rely on move-semantics to save copying memory.

Adblock

An interesting problem, rather, on change, with Adblock. It turns out that all popular blockers in Russia receive a subscription to the RU Adlist list, and there is such an excellent rule in it that prohibits downloading WebAssembly from third-party sites. For example, from CDN.

The solution is not to use a CDN, but to store everything on your domain (this does not suit us). Or rename the .wasm file so that it does not fit this rule. You can still go to the forum of these comrades and try to convince them to remove this rule. I think they justify themselves by the fact that they are struggling with miners in this way, however, I don’t know why miners cannot guess to rename the file.

Production

In the end, we went into production. Yes, it was not easy, it took 8 months and I want to ask myself if it was worth it. In my opinion - it was worth:

No need to install

We received that our code is delivered to the user without installing any programs. When we had a browser plugin, the user had to download and install it, and this is a huge filter for disseminating technology. Now the user just watches the video on the site and does not even understand that the whole machinery is working under the hood, and that everything is complicated there. The browser simply downloads an additional file with a code, like a picture or .css.

Single code base and debugging across platforms

At the same time, we managed to maintain our single code base. We can twist the same code on different platforms and it has repeatedly happened that bugs that were invisible on one of the platforms appeared on the other. And, thus, we can reveal hidden bugs with different tools on different platforms.

Quick release

We received a quick release, since we can be released as a simple web application and update C ++ code with each new release. It doesn’t compare with how to release new plugins, mobile app or SmartTV app. The release depends only on us: when we want, then it will come out.

Fast feedback

And that means quick feedback: if something goes wrong, we can find out during the day that there is a problem and respond to it.

I believe that all these problems were worth these advantages. Not everyone has a C ++ application, but if you have one, and you want it to be in the browser, WebAssembly is one hundred percent use case for you.

Where to apply

Not everyone writes in C ++. But not only C ++ is available for WebAssembly. Yes, this is historically the very first platform that was available even in asm.js - the early Mozilla technology. By the way, so she has pretty good tools, because they are older than the technology itself.

Rusty

The new language Rust, which also develops Mozilla, is now catching up and surpasses C ++ in terms of tools. Everything goes to the fact that they will make the coolest development process for WebAssembly.

Lua, Perl, Python, PHP, etc.

Almost all languages that are interpreted are already available in WebAssembly, since their interpreters are written in C ++, they were simply compiled into WebAssembly and now you can turn PHP into a browser.

Go

In version 1.11, they made a beta version of the compilation in WebAssembly, in 2.0 they promise release support. They got support later, because WebAssembly does not support garbage collector, and Go does not have a memory-managed language. Therefore, they had to drag their garbage collector under WebAssembly.

Kotlin / Native

About the same story with Kotlin. Their compiler has experimental support, but they also have to do something with the garbage collector. I do not know what the status is there.

3D graphics

What else can you think of?The first thing that turns in the language - 3D applications. And, indeed, historically, asm.js and WebAssembly began with porting games to browsers. And it is not surprising that now all popular engines have export to WebAssembly.

Processing data locally

You can also come up with the processing of user data directly from him in the browser on his computer: take the downloaded image or from the camera, record sound, process the video. Read the archive downloaded by the user, or compile it yourself from a pack of files and upload to the server in one request.

Neural networks

This picture shows almost all neural network architectures. And, indeed, you can take your neural network, train it and give it to the client so that it processes the live stream from a video camera or microphone. Or, for example, track the movement of the user's mouse and control gestures; face recognition - the possibilities are almost endless.

For example, a piece of Google Chrome, which is responsible for determining the language of the text, is already available as a WebAssembly library. It can be connected as an npm-module and that's it, you use Wasm, but you work with regular JS. You do not connect with neural networks, C ++ or something else - everything is available out of the box.

There is a popular HunSpell spelling library - just install and use it as a Wasm module.

Cryptography

Well, the first rule of cryptography - "Do not write your cryptography." If you want to sign user data, encrypt something and send it to the server in such a form, generate strong passwords or need GOST, connect OpenSSL. There is already an instruction on how to compile for WebAssembly. OpenSSL is a robust code, proven by thousands of applications, you don’t need to invent anything.

Removal of calculations from the server

Cool use case is on the site wotinspector.com. This is a service for players of World of Tanks. You can upload your replay, analyze it, collect statistics on the game, draw a beautiful map, in general, a very useful service for professional players.

One problem - the analysis of such replay takes a lot of resources. If this happened on the server, it would probably be a closed paid service, not available to everyone. But the author of this service, Andrei Karpushin, wrote business logic in C ++, compiled it into WebAssembly, and now the user can start processing directly in his browser (and send to the server so that other users can access them).

This is an interesting case in terms of monetizing the site. Instead of taking money from users, we use the resources of their computer. This is similar to monetization with the help of a miner. But unlike the miner, who simply burns electricity from users, and in return brings the authors of the site a penny, we make a service that does the work really necessary to the user. That is, the user agrees to share resources with us. Therefore, this scheme works.

Libraries

Also in the world there are a lot of libraries written over a long history in C, C ++. For example, the project FFmpeg, which is the leader in video processing. Many people use programs for video processing, where inside ffmpeg. And here it can be run in the browser and encode the video. It will be long and slow, yes, but if you make a service that generates avatars or three-second videos, then the browser resources will be enough.

The same with audio - you can record in a compressed format and send already small files to the server. And the OpenCV library is the leader in machine vision, available in WebAssembly, you can do face recognition and gesture control. You can work with PDF. You can use a SQLite file database that supports true SQL. SQLite porting under WebAssembly was made by Emscripten, he probably tested the compiler on it.

Node.js

Not only does the browser receive bonuses from WebAssembly, you can also use Node.js. Probably everyone knows Sass - the preprocessor css. It was written in Ruby, and then rewritten in C ++ to speed up (the libsass project). But no one wants to run a separate program for processing source codes, I want to integrate into the build process of the bundle with Webpack, and for this you need a module for Node.js. The node-sass project solves this problem; it is a JS wrapper for this library.

The library is native, which means we must compile it under the platform under which the user will run it. And that brings us to the matrix of versions. These columns need to be multiplied:

This leads to the fact that for a single release of node-sass you need to do about 100 compilations for each combination from the table. Then all this needs to be stored, and these are dozens of megabytes of files for each (even minor) release. How WebAssembly solves this problem: it collapses the entire table into one file, because the executable file WebAssembly does not depend on the platform.

It is enough to compile the code once and upload only one file to all platforms regardless of the architecture or version of Node. Such a project is already there, porting for WebAssembly is already being done in the libsass-asm project . The work is being done recently, and the project needs helpers to work. This is a great chance to practice with WebAssembly on a real project ...

Application acceleration

There is a popular application Figma - graphics editor for web-designers. This is to some extent an analogue of Sketch, which works on all platforms, because it runs in a browser. It is written in C ++ (which few people know about), and asm.js was originally used there. The application is very large, so it did not start quickly.

When WebAssembly appeared, the developers recompiled their sources, and the application launch accelerated 3 times. This is a major improvement for the editor, which should be ready to work as quickly as possible.

Another familiar Visual Studio Code application, despite the fact that it works in Electron, uses native modules for the most critical sections of code, so they have the same problem with a huge number of versions, like Node-sass. Perhaps, developers control only the Node version, but to support OS platforms and architectures, they have to rebuild these modules. Therefore, I am sure that the day when they will also go to WebAssembly is not far off.

Porting applications to the browser

But the coolest example of porting a code base is AutoCAD. The software is already 30 years old, it is written in C ++, and this is a huge code base. The product is very popular among designers whose habits have long been established, so the development team would have to do a lot of work on porting all the accumulated business logic to JavaScript, when porting to the browser, which made this idea almost hopeless. But now, thanks to WebAssembly, AutoCAD is available as a web service , where you can register and start using it in 5 minutes.

There is a cool demo that Fabrice Bellard made, a unique programmer, in my opinion, because he has done so many popular projects that an ordinary programmer does, perhaps, one in his life. I mentioned FFMpeg - this is his project, and his other development is QEMU. Perhaps few people have heard of it, but it is based on the KVM virtualization system, which is certainly a leader in its field.

Bellard has been supporting the QEMU browser port since 2011 . This means that you can run any system using the emulator directly in your browser. In general, Linux with a console , a real Linux kernel running in a browser without a server, is some kind of additional connection.

You can turn off the Internet, and it will work. There is a bash, you can do everything that is in normal Linux. There is another demo - with GUI . It is already possible to launch a real browser. Unfortunately, there is no network in the demo, and you won’t be able to open yourself in it ...

And, to convince you, I will show you something incredible. This is Windows 2000 , the same one that was 18 years ago, only now it works in your browser. Previously, a whole computer was needed, and now Chrome (or FireFox) is quite sufficient.

As you can see, there are lots of applications for WebAssembly, I just listed what I found myself, and you will have new ideas, and you can implement them.

How to implement it at home

I want to give some tips for those who want to port their application to WebAssembly. The first thing to start with is with the team, of course. The minimum team is two people, one from the side of native technologies and a fronder.

It happens that C ++ application programmers are not very well versed in web technologies. Therefore, our task as front-tenders, if we find ourselves in such a project is to take on this part of the work. But the ideal team is those people who are interested not only in their platform, but also want to understand the one on the other side of the compiler.

Fortunately, in our project it turned out that way. My colleague Yura, a great C ++ specialist, as it turned out, had long wanted to learn JavaScript, and Flanagan's book helped him a lot. I took a small volume of Straustrup, and with Yurina, I began to delve into the basics of C ++. As a result, during the project, we talked a lot about our main languages to each other, and found surprisingly much in common with JS and C ++, however strange it may seem.

And if you pick up just such a team - it will be perfect.

CI Pipeline

What did our daily development process look like? We brought all the JS artifacts to a separate repository, so that it would be more convenient to configure the assembly through the Webpack. When changes appear in the native code, we tighten them, compile them (sometimes it takes the most time), and the result of the compilation is copied to the JS project. Then it is picked up by the webpack in watch mode, it assembles a bundle, and we can launch the application in the browser or run the tests.

Debugging

Of course, debugging is important for us when developing. With this, unfortunately, is not very good.

It is necessary to enable DevTools experiments in Chrome, and we will see the folder with wasm-units on the Sources tab. We see breakpoints (we can stop the browser in some place), but, unfortunately, we see the code in the textual representation of the assembler.

Although our architect Kolya, when he looked at this picture for the first time, ran his eyes over the listing and said: “Look, this is a stack machine, here we are working with memory, here is arithmetic, everything is clear!”. In general, Kohl is able to write under the embedded-system, but we do not know how, and would like some kind of explicit binding to the source code.

There is a small trick: at the maximum debug level -g4 additional comments appear in the wast-file, and it looks like this.

You need an editor that can open a file of 100 megabytes in size (we chose FAR). The numbers are the numbers of the modules that we have already seen in the Chrome console. E: / _ work / bfg / bytefrog / ... - link to the source code. You can live with this, but I would like to see the real C ++ code right in the browser debugger. And that sounds like a task for SourceMap!

Sourcemap

Unfortunately, there are problems with them.

Works only in Firefox.
The --sourcemap-base = http: // localhost option specifies that we need to generate the SourceMap and the address of the web server where the sources will be stored.
Access to source code via HTTP.
Paths to source files must be relative.
On Windows, there is a problem with ":" in the paths. All paths are truncated to colon.

The last two points touched us. CMake in the assembly brings all the paths to an absolute form, as a result, the files can not be found at the URL on the web server. We decided it this way: we pre-process the wast-file and all the paths are reduced to a relative view, removing at the same time the colon. I think you will not come across this.

As a result, it looks like this:

C code ++ in the browser debugger. Now we have seen everything! On the left, the source tree, there are breakpoints, we see the stack trace, which led to this point. Unfortunately, if we touch any wasm call in the stack trace, we fail into an assembler, this is an annoying bug that I think will be healthy.

Unfortunately, another bug will not be fixed - SourceMap fundamentally does not support the connection of variables. We see that local variables have lost not only their names, but also their types. Their meanings are presented as a signed integer, and we will not know what was really there.

But we can bind them to a specific assembly location using the generated name “var0”.

Of course, I would just like to hover over the variable name and see the value. Perhaps in the future they will invent a new format SourceMap, which will allow not only the code base, but also the variables.

Profiler

You can also look at the profiler. It works in both Chrome and Firefox. Firefox is better - it "unwinds" the names, and they can be seen as they are in the source code.

Chrome encodes them a bit (for those who understand, these are Mangled function names), but if you squint, you can see what they are related to.

Performance

Let's talk about performance. This is a complex and multifaceted topic, and this is why:

Rantaym Performance measurement depends on the runtime you are using. Measurements in C ++ will differ from measurements in Rust or Go.
Losses at the JS - Wasm border. Measuring math does not make sense, because performance losses occur at the intersection of the JS and Wasm frontier. The more calls you make back and forth, the more you move objects, the more the speed drops. Browsers are now working on this problem, and gradually the situation is improving.
Technology is evolving. Those measurements that were made today will not make sense tomorrow, and even more so in a couple of months.
Wasm accelerates the launch of the application. Wasm does not promise to speed up your code or replace JS. The WebAssembly team is focused on speeding up the launch of large application code bases.
In synthetics, you get speed at JS level.

We made a simple test: graphic filters for the image.

wasp_cpp_bench
Chrome 65.0.3325.181 (64-bit)
Core i5-4690
24gb ram
5 measurements; rejected max and min; averaging

Got such results. Here everything is normalized to the execution of a similar filter on JS - a yellow bar, in all cases exactly one.

C ++, compiled without optimization, behaves in some strange way. This is illustrated by the example of the Grayscale filter. Even our C ++ developers could not explain why. But when optimization is turned on (green bar), we get a time that almost coincides with JS. And, running ahead, we get similar results in the native code, if we compile C ++, as a native application.

Crash and error collection

We use Sentry, and there is a problem with it - the wasm frames disappear from the stackrays. It turned out that the traceKit library that the Sentry client uses - Raven - simply contains a regular expression that does not take into account that wasm exists. We made a patch, and, probably, we will send it soon pull request, but for now we use it with npm install of our JS project.

It looks like this. This is the production version, there are no function names, only unit numbers. And this is what a debug build looks like; you can already figure out what went wrong:

Total

WebAssembly can already be used in combat, and our project has proven it.
Porting even a large application is real. It took us 8 months, the lion's share of which we spent on refactoring our C ++ application to highlight boundaries, interfaces, and so on.
The tools are still weak, but work in this direction is underway, as WebAssembly is actually the future of the web.
The speed is at JS level. Modern JS-machines optimize the program code to such an extent that it simply “falls” into machine instructions, and runs as fast as your processor can.

If you take up work, I recommend:

Take Emscripten and Embind. These are good and working technologies.
If you need something weird in Emscripten - take a look at the tests. The documentation is there, but not all covers, but the test file contains 3000 lines of all possible situations of using Emscripten.
Sentry is suitable for collecting errors.
Debug in Firefox.

Thanks for attention! I am ready to answer your questions.

HolyJS, : 24-25 HolyJS . (, Node.js Ryan Dahl!), — 1 .

Source: https://habr.com/ru/post/441140/

All Articles