The article is a supplemented translation of the article “
HOWTO: Port a C / C ++ Library to JavaScript (xml.js) ” (by
azakai ). The author of the original article has a decent experience in porting C / C ++ libraries to JavaScript. In particular, he successfully ported
lzma.js and
sql.js. In his article, he describes the general scheme for porting C / C ++ code using the example of
libxml , an open library for validating XML.
In addition, this article contains the complete sequence of actions that were required to port libxml in the Ubuntu 12.04 environment. Including the necessary environment setting and
emscripten .
Installing and Configuring Emscripten
Emscripten - compiler from LLVM bytecode to javascript. C / C ++ code can be compiled into LLVM bytecode using the
clang compiler. Some other languages also have compilers in LLVM bytecode. Emscripten generates the corresponding JavaScript code, which can be executed by any JavaScript interpreter, for example, a modern browser. With emscripten, the guys from Mozilla have recently successfully ported Doom.
Emscripten provides:
emconfigure - a utility for setting the environment and then running ./configure;
emmake is a utility for setting up the environment and then running make;
emcc - LLVM compiler in javascript;
')
So, let's set up the environment for working with emscripten (see the
manual ).
Install clang + llvm (> = 3.0):
wget llvm.org/releases/3.0/clang+llvm-3.0-i386-linux-Ubuntu-11_10.tar.gz
tar xfv clang + llvm-3.0-i386-linux-Ubuntu-11_10.tar.gz
Install node.js (> = 0.5.5):
sudo apt-get install nodejs
Unload the current version of emscripten:
git clone git: //github.com/kripken/emscripten.git
cd emscripten
Check the performance of the clang:
../clang+llvm-3.0-i386-linux-Ubuntu-11_10/bin/clang tests / hello_world.cpp
./a.out
>> hello, world!
Checking node.js operability:
node tests / hello_world.js
>> hello, world!
Run emcc for the first time to create the configuration file '
~ / .emscripten ':
./emcc
In the configuration file, you need to specify the clang + llvm directory, as well as the emscripten installation directory:
EMSCRIPTEN_ROOT = os.path.expanduser (' ~ / path / emscripten ') # this helps projects using emscripten find it
LLVM_ROOT = os.path.expanduser (' ~ / path / clang + llvm-3.0-i386-linux-Ubuntu-11_10 / bin ')
You need to run emcc again to make sure that it is configured correctly. In this case, it will display the message '
emcc: no input files ':
./emcc
>> emcc: no input files
Now you can verify that everything works correctly by compiling hello_wolrd.cpp using emcc:
./emcc tests / hello_world.cpp
node a.out.js
>> hello, world
Part 1: Compiling C / C ++ sources
Before you start porting, you should make sure that the source code of the project is compiled without errors by the C / C ++ compiler.
Unload libxml from the repository and compile:
git clone git: //git.gnome.org/libxml2
cd libxml2
git checkout v2.7.8
CC = ~ / path / clang + llvm-3.0-i386-linux-Ubuntu-11_10 / bin / clang ./autogen.sh --without-debug --without-ftp --without-http --without-python - without-regexps --without-threads --without-modules
make
The libxml includes a console utility
xmllint for validating xml schemas. It can be used to verify the correctness of the compiled code. It is necessary to perform such checks, including to make sure that the original and ported versions work equally well. Testing with xmllint looks like this:
$. / xmllint --noout --schema test.xsd test.xml
>> test.xml validates
If everything works correctly, make a few changes to the test.xml file and then xmllint will display an error message.
Part 2: Configuration
To configure a project for compilation using emscripten, use the command:
~ / path / emscripten / emconfigure ./autogen.sh --without-debug --without-ftp --without-http --without-python --without-regexps --without-threads --without-modules
emconfigure sets the environment variables so that ./configure uses the emcc compiler instead of gcc or clang. It configures the environment so that ./configure works correctly, including configuration tests (which compile the native code).
The default configuration results (without flags) include a lot of functionality that is not needed at this stage, for example, HTTP and FTP support. We just want to validate xml-schemas, so you should configure the project, eliminating unnecessary functionality. In general, it is a good idea to exclude unnecessary functionality when porting. Due to this, the code will be smaller in size, which is important for the network environment. In addition, some header files may require manual editing (those files that use newlib, not glibc).
Part 3: Build Project
Assembly is performed by the command:
~ / path / emscripten / emmake make
emmake is similar to emconfigure: it also sets environment variables. Thanks to emmake, LLVM bytecode instead of native code is generated during build. This is done to avoid generating JavaScript code for each object file and then linking it. Instead, the bytecode LLVM
linker is used.
As a result of the assembly a lot of different files are built. But they can not be executed. As mentioned above, this is LLVM bytecode (it can be viewed with BC), so we need the next step.
Part 4: Conversion to JavaScript
xmllint depends on xmllint.o and libxml2.a. LLVM linker does not support dynamic linking (late binding) and emcc ignores it. Therefore, you will have to manually specify the libxml2.a static library for linking.
A bit less obvious is the dependency on
libz (open library for compression). If you build without libz.a, an error will occur during the execution of an attempt to call the “gzopen” function. Accordingly, you need to build libz.a:
cd ~ / path
wget zlib.net/zlib-1.2.7.tar.gz
tar xfv zlib-1.2.7.tar.gz
cd zlib-1.2.7
~ / path / emscripten / emconfigure ./configure --static
~ / path / emscripten / emmake make
Now you can compile JavaScript code:
cd ~ / path / libxml2
~ / path / emscripten / emcc -O2 xmllint.o .libs / libxml2.a ../zlib-1.2.7/libz.a -o xmllint.test.js --embed-file test.xml --embed-file test.xsd
Where:
- emcc - replacement for gcc or clang (see above);
- -O2 - optimization flag. LLVM- and advanced JavaScript-level optimizations are running, including Closure Compiler (in advanced mode);
- files to build;
- -o - the resulting xmllint.test.js file. The “js” suffix indicates emcc for the format of the generated code, in this case JavaScript;
- - embed-file - instructs emcc to include the contents of the specified file in the generated code and configure the virtual file system so that these files are accessible through standard stdio calls (fopen, fread, etc.). This is the easiest way to access files from compiled code.
Part 5: Testing JavaScript
The JavaScript console provided by Node.js, SpiderMonkey, or V8 can be used to run this code:
node xmllint.test.js --noout --schema test.xsd test.xml
>> test.xml validates
The result should be exactly the same as the native code. Similarly, if you add errors to the xml schema, xmllint should detect them.
Important : all arguments used for the native and JavaScript assemblies must be exactly identical.
Part 6: Refactoring and reuse
Currently there are two files in the script for validation. We also need a generic function to validate any XML file according to the scheme. In fact, this is easy to do, though you need to take into account that the code is optimized using the Closure Compiler, which adds work.
The first thing you need to do is call emcc with the --pre-js option. It adds JavaScript code before the generated code (post-js, respectively, after). The important thing is that --pre-js adds the code
before the optimization is
done . This means that the code will be optimized together with the generated code, which is necessary for correct optimization. On the other hand, the Closure Compiler optimizer can discard the functions we need as unused.
Here is the script that needs to be enabled using the --pre-js option:
Module ['preRun'] = function () {
FS.createDataFile (
'/',
'test.xml',
Module ['intArrayFromString'] (Module ['xml']),
true
true);
FS.createDataFile (
'/',
'test.xsd',
Module ['intArrayFromString'] (Module ['schema']),
true
true);
};
Module ['arguments'] = ['--noout', '--schema', 'test.xsd', 'test.xml'];
Module ['return'] = '';
Module ['print'] = function (text) {
Module ['return'] + = text + '\ n';
};
Consider this script:
- Module is the object through which the code generated using emscripten interacts with other JavaScript code.
- It is important to use string names to access the module, for example Module ['name'] instead of Module.name. In this case, Closure will leave the name unchanged.
- The first thing you need to do is change the Module.preRun that runs right before the generated code (but after setting up the environment). In the preRun function, two files are created using the file system API ( Emscripten FileSystem API ). For simplicity, the same file names are used as in previous tests (test.xml and test.xsd). The contents of these files are set to Module ['xml'] and Module ['xsd']. These variables must contain XML and XML Schema. Strings are converted to an array using intArrayFromString.
- Install Module.arguments - the equivalent of the argument list for the console command. The arguments must be exactly the ones we used earlier in testing. The only difference is that test.xml and test.xsd files will contain user data.
- Module.print is called when the code tries to invoke an operation from stdio. We save all output to the buffer to later read it.
Thus, we have ensured that the input files test.xml and test.xsd will contain the information entered by the user, and the validation results will be saved to the buffer.
However, this is not all. Compile the code:
~ / path / emscripten / emcc -O2 xmllint.o .libs / libxml2.a ../zlib-1.2.7/libz.a -o xmllint.raw.js --pre-js pre.js
The command to compile looks like before, except that we no longer need to include files. Instead, use the --pre-js flag to include the pre.js file.
After compiling, xmllint.raw.js contains optimized and minified code. For ease of use, wrap it with a JavaScript function:
function validateXML (xml, schema) {
var Module = {
xml: xml
schema: schema
};
{{{GENERATED_CODE}}}
return Module.return;
}
GENERATED_CODE must be replaced by the result of the compilation (xmllint.raw.js). The validateXML function assigns the corresponding arguments to the xml and schema fields. In this way, we ensure that the test.xml and test.xsd files contain user data. After the generated code is executed, the function will return validation results.
That's all!
xml.js can be used from regular JavaScript code. All that is needed is to simply include the js-file and call the function validateXML with xml and schema.