Porting C / C ++ library to JavaScript (xml.js)

The article is a supplemented translation of the article “ HOWTO: Port a C / C ++ Library to JavaScript (xml.js) ” (by azakai ). The author of the original article has a decent experience in porting C / C ++ libraries to JavaScript. In particular, he successfully ported lzma.js and sql.js. In his article, he describes the general scheme for porting C / C ++ code using the example of libxml , an open library for validating XML.

In addition, this article contains the complete sequence of actions that were required to port libxml in the Ubuntu 12.04 environment. Including the necessary environment setting and emscripten .

Installing and Configuring Emscripten

Emscripten - compiler from LLVM bytecode to javascript. C / C ++ code can be compiled into LLVM bytecode using the clang compiler. Some other languages also have compilers in LLVM bytecode. Emscripten generates the corresponding JavaScript code, which can be executed by any JavaScript interpreter, for example, a modern browser. With emscripten, the guys from Mozilla have recently successfully ported Doom.

Emscripten provides: emconfigure - a utility for setting the environment and then running ./configure; emmake is a utility for setting up the environment and then running make; emcc - LLVM compiler in javascript;
')
So, let's set up the environment for working with emscripten (see the manual ).

Install clang + llvm (> = 3.0):

wget llvm.org/releases/3.0/clang+llvm-3.0-i386-linux-Ubuntu-11_10.tar.gz
tar xfv clang + llvm-3.0-i386-linux-Ubuntu-11_10.tar.gz

Install node.js (> = 0.5.5):

sudo apt-get install nodejs

Unload the current version of emscripten:

git clone git: //github.com/kripken/emscripten.git
cd emscripten

Check the performance of the clang:

../clang+llvm-3.0-i386-linux-Ubuntu-11_10/bin/clang tests / hello_world.cpp
./a.out
>> hello, world!

Checking node.js operability:

node tests / hello_world.js
>> hello, world!

Run emcc for the first time to create the configuration file ' ~ / .emscripten ':

./emcc

In the configuration file, you need to specify the clang + llvm directory, as well as the emscripten installation directory:

EMSCRIPTEN_ROOT = os.path.expanduser (' ~ / path / emscripten ') # this helps projects using emscripten find it
LLVM_ROOT = os.path.expanduser (' ~ / path / clang + llvm-3.0-i386-linux-Ubuntu-11_10 / bin ')

You need to run emcc again to make sure that it is configured correctly. In this case, it will display the message ' emcc: no input files ':

./emcc
>> emcc: no input files

Now you can verify that everything works correctly by compiling hello_wolrd.cpp using emcc:

./emcc tests / hello_world.cpp
node a.out.js
>> hello, world

Part 1: Compiling C / C ++ sources

Before you start porting, you should make sure that the source code of the project is compiled without errors by the C / C ++ compiler.

Unload libxml from the repository and compile:

git clone git: //git.gnome.org/libxml2
cd libxml2
git checkout v2.7.8
CC = ~ / path / clang + llvm-3.0-i386-linux-Ubuntu-11_10 / bin / clang ./autogen.sh --without-debug --without-ftp --without-http --without-python - without-regexps --without-threads --without-modules
make

The libxml includes a console utility xmllint for validating xml schemas. It can be used to verify the correctness of the compiled code. It is necessary to perform such checks, including to make sure that the original and ported versions work equally well. Testing with xmllint looks like this:

$. / xmllint --noout --schema test.xsd test.xml
>> test.xml validates

If everything works correctly, make a few changes to the test.xml file and then xmllint will display an error message.

Part 2: Configuration

To configure a project for compilation using emscripten, use the command:

~ / path / emscripten / emconfigure ./autogen.sh --without-debug --without-ftp --without-http --without-python --without-regexps --without-threads --without-modules

emconfigure sets the environment variables so that ./configure uses the emcc compiler instead of gcc or clang. It configures the environment so that ./configure works correctly, including configuration tests (which compile the native code).

The default configuration results (without flags) include a lot of functionality that is not needed at this stage, for example, HTTP and FTP support. We just want to validate xml-schemas, so you should configure the project, eliminating unnecessary functionality. In general, it is a good idea to exclude unnecessary functionality when porting. Due to this, the code will be smaller in size, which is important for the network environment. In addition, some header files may require manual editing (those files that use newlib, not glibc).

Part 3: Build Project

Assembly is performed by the command:

~ / path / emscripten / emmake make

emmake is similar to emconfigure: it also sets environment variables. Thanks to emmake, LLVM bytecode instead of native code is generated during build. This is done to avoid generating JavaScript code for each object file and then linking it. Instead, the bytecode LLVM linker is used.

As a result of the assembly a lot of different files are built. But they can not be executed. As mentioned above, this is LLVM bytecode (it can be viewed with BC), so we need the next step.

Part 4: Conversion to JavaScript

xmllint depends on xmllint.o and libxml2.a. LLVM linker does not support dynamic linking (late binding) and emcc ignores it. Therefore, you will have to manually specify the libxml2.a static library for linking.

A bit less obvious is the dependency on libz (open library for compression). If you build without libz.a, an error will occur during the execution of an attempt to call the “gzopen” function. Accordingly, you need to build libz.a:

cd ~ / path
wget zlib.net/zlib-1.2.7.tar.gz
tar xfv zlib-1.2.7.tar.gz
cd zlib-1.2.7
~ / path / emscripten / emconfigure ./configure --static
~ / path / emscripten / emmake make

Now you can compile JavaScript code:

cd ~ / path / libxml2
~ / path / emscripten / emcc -O2 xmllint.o .libs / libxml2.a ../zlib-1.2.7/libz.a -o xmllint.test.js --embed-file test.xml --embed-file test.xsd

Where:

emcc - replacement for gcc or clang (see above);
-O2 - optimization flag. LLVM- and advanced JavaScript-level optimizations are running, including Closure Compiler (in advanced mode);
files to build;
-o - the resulting xmllint.test.js file. The “js” suffix indicates emcc for the format of the generated code, in this case JavaScript;
- embed-file - instructs emcc to include the contents of the specified file in the generated code and configure the virtual file system so that these files are accessible through standard stdio calls (fopen, fread, etc.). This is the easiest way to access files from compiled code.

Part 5: Testing JavaScript

The JavaScript console provided by Node.js, SpiderMonkey, or V8 can be used to run this code:

node xmllint.test.js --noout --schema test.xsd test.xml
>> test.xml validates

The result should be exactly the same as the native code. Similarly, if you add errors to the xml schema, xmllint should detect them.

Important : all arguments used for the native and JavaScript assemblies must be exactly identical.

Part 6: Refactoring and reuse

Currently there are two files in the script for validation. We also need a generic function to validate any XML file according to the scheme. In fact, this is easy to do, though you need to take into account that the code is optimized using the Closure Compiler, which adds work.

The first thing you need to do is call emcc with the --pre-js option. It adds JavaScript code before the generated code (post-js, respectively, after). The important thing is that --pre-js adds the code before the optimization is done . This means that the code will be optimized together with the generated code, which is necessary for correct optimization. On the other hand, the Closure Compiler optimizer can discard the functions we need as unused.

Here is the script that needs to be enabled using the --pre-js option:

  Module ['preRun'] = function () {
     FS.createDataFile (
       '/',
       'test.xml',
       Module ['intArrayFromString'] (Module ['xml']),
       true
       true);
     FS.createDataFile (
       '/',
       'test.xsd',
       Module ['intArrayFromString'] (Module ['schema']),
       true
       true);
   };
   Module ['arguments'] = ['--noout', '--schema', 'test.xsd', 'test.xml'];
   Module ['return'] = '';
   Module ['print'] = function (text) {
     Module ['return'] + = text + '\ n';
   };

Consider this script:

Module is the object through which the code generated using emscripten interacts with other JavaScript code.
It is important to use string names to access the module, for example Module ['name'] instead of Module.name. In this case, Closure will leave the name unchanged.
The first thing you need to do is change the Module.preRun that runs right before the generated code (but after setting up the environment). In the preRun function, two files are created using the file system API ( Emscripten FileSystem API ). For simplicity, the same file names are used as in previous tests (test.xml and test.xsd). The contents of these files are set to Module ['xml'] and Module ['xsd']. These variables must contain XML and XML Schema. Strings are converted to an array using intArrayFromString.
Install Module.arguments - the equivalent of the argument list for the console command. The arguments must be exactly the ones we used earlier in testing. The only difference is that test.xml and test.xsd files will contain user data.
Module.print is called when the code tries to invoke an operation from stdio. We save all output to the buffer to later read it.

Thus, we have ensured that the input files test.xml and test.xsd will contain the information entered by the user, and the validation results will be saved to the buffer.

However, this is not all. Compile the code:

~ / path / emscripten / emcc -O2 xmllint.o .libs / libxml2.a ../zlib-1.2.7/libz.a -o xmllint.raw.js --pre-js pre.js

The command to compile looks like before, except that we no longer need to include files. Instead, use the --pre-js flag to include the pre.js file.

After compiling, xmllint.raw.js contains optimized and minified code. For ease of use, wrap it with a JavaScript function:

  function validateXML (xml, schema) {
     var Module = {
       xml: xml
       schema: schema
     };
     {{{GENERATED_CODE}}}
     return Module.return;
   }

GENERATED_CODE must be replaced by the result of the compilation (xmllint.raw.js). The validateXML function assigns the corresponding arguments to the xml and schema fields. In this way, we ensure that the test.xml and test.xsd files contain user data. After the generated code is executed, the function will return validation results.

That's all! xml.js can be used from regular JavaScript code. All that is needed is to simply include the js-file and call the function validateXML with xml and schema.

Source: https://habr.com/ru/post/143583/

All Articles