Overview of C ++ deep learning libraries Apache.SINGA, tiny-dnn, OpenNN

Enjoying the creation of models in Python on wonderful Deep Learning frameworks like Keras or Lasagne , from time to time I want to see what interesting things have appeared for C ++ developers, in addition to mainstream TensorFlow and Caffe. I decided to take a closer look at three representatives: tiny-dnn , Apache.SINGA and OpenNN . A brief description of the experience of installing, building and using under Windows you will find under the cut.

Model problem of binary classification of word strings

I made a comparison of C ++ deep learning libraries as part of experiments with different ways of representing words in a simplified task of building the so-called language model . A detailed description of all options is beyond the scope of this publication, so I will formulate the problem briefly.

There are n-grams (chains of words) of a pre-selected equal length, in this case 3 words. If the n-gram is derived from the text corpus, then we assume that this is a valid combination of words and the model should produce the target value y = 1. Please note that n-grams are extracted from the body without taking into account the syntactic structure of the sentence, so the 3-gram " cat sat on " will be correct, although the border on the right has passed between the preposition and its object. If the N-gram is obtained by random replacement of one of the words and such a chain is not found in the body, then the target value y = 0 is expected from the model.
')
Invalid N-grams are generated during the analysis of the hull in the same quantity as the valid ones. It turns out perfectly balanced dataset, which facilitates the task. And the absence of the need for manual markup makes it easy to “play” with such a parameter as the number of records in the training set.

Thus, the binary classification problem is solved. For experimenting with C ++ libraries, I took the simplest version of the word representation through the word2vector combination of individual word representations. The w2v model was trained on a 70- gb corpus of texts with a vector size of 32. As a result, each 3-gram is represented by a length 96 vector. The training set consists of 340,000 entries. Validation and final assessment is performed on individual sets of approximately the same volume. The generation of data files for C ++ models is done by a python script , so that all compared libraries and models are guaranteed to be trained and validated on the same data.

Solving the problem of binary classification in C ++ using tiny-dnn

Search on Habré for references to the tiny-dnn library gives one link to the article: habrahabr.ru/post/319436

Installing and connecting tiny-dnn to your project is extremely simple.

1. Clone the contents of the repository .
2. Compilation of the library is not required, since everything is implemented in the header files.
To use, it is enough to specify the directive #include "tiny_dnn / tiny_dnn.h" in your code.

Ease of connection results in a significant increase in compile time. On the other hand, there is such a thing as precompiled headers, and in this case it should help.

Now let's see what can be achieved using tiny-dnn. I will make a reservation in advance that I saw
tiny-dnn the first time, so I admit that I missed some important functional
opportunities that would improve results. However, I will describe my
user experience.

The source code for the implementation of a simple feed forward grid (MLP) in C ++ and the corresponding
The project for VS 2015 lies here . The construction of the grid consists in calling one function, which indicates the number of inputs, the size of the hidden layer and the number of outputs:

auto nn = make_mlp<sigmoid_layer>({ input_size, num_hidden_units, output_size });

Then an object is created that performs optimization of the weights, a couple of callback functions to track the learning process, and the training starts:

 gradient_descent optimizer; optimizer.alpha = 0.1; auto on_enumerate_epoch = [&]() { /*...*/ }; auto on_enumerate_data = [&]() { /*...*/ }; nn.train<mse>(optimizer, X_train, y_train, batch_size, nb_epochs, on_enumerate_data, on_enumerate_epoch);

It hurts the eye a little that the training and verification data transmitted to the training procedure is announced like this:

 std::vector<vec_t> X_train = ...; std::vector<label_t> y_train = ...;

Wherein:

 typedef std::vector<float_t, aligned_allocator<float_t, 64>> vec_t;

A vector consisting of vectors as a rectangular matrix leaves an alarming sense of irregularity, at least in terms of performance.

The constructed neural net, in which the activation of all layers is sigmoid, gives an accuracy of approximately 0.58. This is much worse than the values that can be obtained in the model on Keras or Apache.SINGA. I assume that by changing the output layer activation to softmax, activating the intermediate layers to relu, and playing with the optimizer settings, you can improve the result.

Generally speaking, this library has all the basic architectural solutions needed for experiments with deep learning models. In particular, there are convolutional layers, layers for pooling , recurrent layers, as well as regularizers - dropout and batch normalization . The means for organizing more complex computational graphs are also visible, for example, a layer for merging .

But still, the library requires adding functionality (perhaps it is, but I did not notice it) at least in terms of implementing early stopping, model checkpoint, since without such basic capabilities it is difficult to use it for serious tasks. You can see that the on_enumerate_epoch callback function passed to the tiny_dnn :: network <...> :: train <...> method returns void. It would be wise to make it a return bool. Then the user code could transmit the exit signal from the learning cycle according to epochs and independently implement its criterion for an early stop, preventing the receipt of a retrained model.

Solution of the problem of binary classification in C ++ using Apache.SINGA

Why do I like Apache.SINGA?

First, they have a nice logo . Kitty-maskot - such a sweetness can not be a bad project!

Secondly, viewing examples of creating grids in C ++ using this library
( https://github.com/apache/incubator-singa/blob/master/examples/cifar10/alexnet.cc ) demonstrates ideological proximity to the description of grid models, for example, in Keras . A thought slips through - yes, it's almost like that of people, in the sense of pythonists.

Spoiler

Well, not exactly like people, all the same, it's C ++ style.

The documentation positions SINGA as a library with support for distributed processing of large grid models and implementation of basic deep learning architectures, including convolutional and recurrent layers. Support for calculations on GPU through CUDA and OpenCL is declared, and this is a very serious application for production with heavy models. However, I only checked the CPU option, because under Windows I have no GPU, and under Ubuntu I’m not ready to repeat the entire build cycle (see below).

On Habré there was one mention of Apache.SINGA a year ago in the form of a couple of paragraphs, so a more detailed description with installation tips and examples, I think, will be useful to someone.

We proceed to the installation and testing.

Installing Apache.SINGA from Windows sources

Here we do not have a python but a hardcore C ++, and even under Windows, so be prepared for pain and suffering. To minimize stress, brew a cup of tea and take a relaxed pose for the coachman.

Assembly documentation can be found here , including a separate section on the assembly under Windows .

Now step by step.

0. Toolkit for assembly.

0.1 It takes CMake to generate studio sln ( https://cmake.org/download/ )
0.2 Need a git client to download the source.
0.3 I use VisualStudio 2015, how to compile all components with other versions of the studio I did not check.
0.4 Apparently, you need Perl to build OpenBLAS, I put <a href=" https://www.perl.org/get.html "> ActivePerl and did not notice any problems.

1. Install Protobuf .

1.1 Download Protobuf sources for Windows .

1.2 Follow the assembly instructions . In particular, generating a sln file for VS 2015 x64 looks like this:

 cmake -G "Visual Studio 14 2015 Win64" ^ -DCMAKE_INSTALL_PREFIX=../../../install ^ -Dprotobuf_BUILD_SHARED_LIBS=ON ^ ../..

Pay particular attention to the -Dprotobuf_BUILD_SHARED_LIBS = ON option. This is not mentioned in
assembly documentation for SINGA, but somewhere in its source code the use is nailed
dll-dependencies. A similar approach is needed for GLOG and CBLAS.

1.3 Compile the desired configuration - Release or Debug. I did this in VS using the sln file generated by cmake. While compiling, listen to ambient, drink tea, read Habr.

2. Installing OpenBLAS .

2.1 Download the source code of the open-source somewhere:

 git clone https://github.com/xianyi/OpenBLAS.git

2.2 Preparing sln using CMake, I had the following command for VS2015 x64:

 cmake -G "Visual Studio 14 2015 Win64"

2.3 We open the generated OpenBLAS.sln in the studio, we start the assembly of the necessary configurations. Drinking tea. It compiles for a long time, you have to wait.

3. Installing glog .

In the SINGA source code, you can see the build version without using GLOG, with some simplified
version of the logger, but under Windows that option does not meet.

3.1 Download:

 git clone https://github.com/google/glog.git

3.2 Doing

 mkdir build & cd build

Then

 cmake -G "Visual Studio 14 2015 Win64" -DBUILD_SHARED_LIBS=1 ..

About the meaning-DBUILD_SHARED_LIBS = 1 I mentioned earlier in the description of the assembly Protobuf.

We do not pay attention to a bunch of "not found" in the console, and then run the studio:

 glog.sln

We select the necessary configuration (Release or Debug), we compile. Drinking tea.

3. Build SINGA

3.1 We write out the paths to the libraries and leaders from the previous steps, we collect approximately
such portion of commands for the console:

 set CBLAS_H="e:\polygon\SINGA\cblas\OpenBLAS" set CBLAS_LIB="e:\polygon\SINGA\cblas\OpenBLAS\lib\RELEASE\libopenblas.lib" set PROTOBUF_H=e:\polygon\SINGA\protobuf\protobuf-3.3.0\src set PROTOBUF_LIB=e:\polygon\SINGA\protobuf\protobuf-3.3.0\cmake\build\solution\Release\ set PROTOC_EXE=e:\polygon\SINGA\protobuf\protobuf-3.3.0\cmake\build\solution\Release\protoc.exe set GLOG_H=e:\polygon\SINGA\glog\src\windows\ set GLOG_LIB=e:\polygon\SINGA\glog\build\Release\glog.lib ..

This stage is the most difficult, since no one can immediately guess which paths to
header files must be specified. I was able to guess from the 4th or 5th attempt, so it’s not
throw is not halfway, everything will turn out.

 cmake -G "Visual Studio 14 2015 Win64" -DUSE_CUDA=OFF -DUSE_PYTHON=OFF ^ -DCBLAS_INCLUDE_DIR=%CBLAS_H% ^ -DCBLAS_LIBRARIES=%CBLAS_LIB% ^ -DProtobuf_INCLUDE_DIR=%PROTOBUF_H% ^ -DProtobuf_LIBRARIES=%PROTOBUF_LIB% ^ -DProtobuf_PROTOC_EXECUTABLE=%PROTOC_EXE% ^ -DGLOG_INCLUDE_DIR=%GLOG_H% ^ -DGLOG_LIBRARIES=%GLOG_LIB% -DUSE_GLOG

Get studio singa.sln. Run the studio.

It will take another tiny modification of the project. I did not find a way to specify the required define in the cmake startup parameters, so I simply indicated in the studio project in the C ++ / Preprocessor section:

USE_GLOG
PROTOBUF_USE_DLLS

Begin compiling. Tea is no longer fit, just relax.

I was wrong several times with the paths to dependencies, so getting a normal sln for VS came about an hour later, but in general, there is nothing supercomplex here.

As a result, the coveted file singa.lib appears in the folder build \ lib \ Release \

Colleague, if you get to this place, I congratulate you on the successful completion
The first phase of the mission! You most likely have assembled an inoperable singa.lib library, now I will explain how to check it and what to do next.

Check the performance of singa.lib

For a quick check, you need to build such a program:

 #include "singa/utils/channel.h" int main(int argc, char **argv) { singa::InitChannel(nullptr); std::vector<std::string> rl = singa::GetRegisteredLayers(); //   rl return 0; }

If the singa :: GetRegisteredLayers () function returns an empty list, then the problem is present.
and no neural network will continue to work.

Using the debugger you can see the following picture. The library implements several
factory classes that return the string name of the "singacpp_dense" type
object of the corresponding class. To initialize the factory, the authors of the library
wrote a macro

 #define RegisterLayerClass(Name, SubLayer) \ static Registra<Layer, SubLayer> Name##SubLayer(#Name);

Each class for a factory is created through a global static object declaration:

 RegisterLayerClass(singa_dense, Dense); RegisterLayerClass(singacpp_dense, Dense); RegisterLayerClass(singacuda_dense, Dense); RegisterLayerClass(singacl_dense, Dense);

For pitonists, sisharperov and other owners of normal reflection, this may sound crazy, but this is the old C ++.

And here it is already necessary to guard. C ++ generally implements the principle of maximum surprise (hi to Python): a design with indefinite behavior always behaves unexpectedly and shoots in the foot. In particular, experience says that you should avoid using static constructors in C ++ libraries , since they may not work at all (in the static library), or be called not in the order that the developer expects, in general, cause many different outrages. It is better to write a function that explicitly calls the right constructors in the right order and pull it in the client code than to deal with the unported nuances of the compilers for Windows and Linux.

In general, I added a separate function to SINGA:

 namespace singa { void initialize_static_ctors() { RegisterLayerClass(singa_dense, Dense); RegisterLayerClass(singacpp_dense, Dense); //RegisterLayerClass(singacuda_dense, Dense); //RegisterLayerClass(singacl_dense, Dense); RegisterLayerClass(singa_relu, Activation); RegisterLayerClass(singa_sigmoid, Activation); RegisterLayerClass(singa_tanh, Activation); RegisterLayerClass(singacpp_relu, Activation); //RegisterLayerClass(singacuda_relu, Activation); //RegisterLayerClass(singacl_relu, Activation); RegisterLayerClass(singacpp_sigmoid, Activation); //RegisterLayerClass(singacuda_sigmoid, Activation); //RegisterLayerClass(singacl_sigmoid, Activation); RegisterLayerClass(singacpp_tanh, Activation); //RegisterLayerClass(singacuda_tanh, Activation); //RegisterLayerClass(singacl_tanh, Activation); return; } }

With its help, everything worked as it should - the factories were initialized, the grid began to work.
If someone falters and does not want to build SINGA binaries, I put the Win x64 compiled libraries into the repository .

Implementing simple feed forward neural networks

A cursory analysis of the sources shows that Apache.SINGA supports fully connected (dense) layers , convolutional layers (convolution), pooling (pooling), variants of recurrent architectures - RNN, GRU, LSTM and bidirectional LSTM. Activation options are the minimum set of tanh, sigmoid and relu. In general, the entire gentleman's set of deep learning in stock.

Immediately I warn those accustomed to the Piton service: in many cases the wrong
the use of SINGA functionality leads to memory fault.

Wrong choice of data type for tensor y? Get a memory fault.

Set 1 output instead of 2 on the final layer for binary classification? Hold the memory fault.

And so on. By the way, it's good that memfault, because you can get on the wrong result,
if you're lucky with the block size.

In general, many things in SINGA must be felt, since it is not always possible to guess about them in advance. As I said, the tensor for the target variable y for learning and validation must necessarily have the data type kInt. An attempt to create it as well as the tensor X with the type kFloat32 will lead to a segmentation error, because in one section of the code the void * caste is made to an int *, and if there is actually a pointer to float, then we get an error.

As a seed for my code, I took the alexnet.cc file. In my example of creating a grid, you can see that the sigmoid activations are set on top of fully connected layers explicitly as separate layers:

 static FeedForwardNet create_net(size_t input_size) { FeedForwardNet net; Shape s{ input_size }; net.Add(GenHiddenDenseConf("dense1", 96, 1.0, 1.0), &s); net.Add(GenSigmoidConf("dense1_a")); net.Add(GenOutputDenseConf("dense_output", 2, 1.0, 1.0)); net.Add(GenSigmoidConf("dense2_a")); return net; }

If, by virtue of the habit of working with Keras, to forget to indicate activation, expecting to get sigmoid, then there will be a surprise, which you can guess from a strange learning curve: a dense layer will be simply linear.

As can be seen in the above example, the creation of a neural network is quite a familiar way. First a container is created - an object of class FeedForwardNet. In this container layers are added sequentially processing input data. For the first layer, it is necessary to transfer the dimension - an object of the Shape class so that the library correctly configures the connections between the layers.

Training a neural network, as usual, boils down to adjusting the weights and offsets to minimize the loss function. The optimization process is organized by the optimizer (in part, there is also an Updater), which you specify when creating the model. The library has the following options for optimizers (derived classes from Optimizer):

SGD
Nesterov
AdaGrad
Rmsprop

For the basic version of the stochastic gradient descent, you can set your own schedule
on learning speed through callback function: look for reference in my experiment code
SetLearningRateGenerator method. Thus, we further create and configure the desired version of the optimizer:

  SGD sgd; OptimizerConf opt_conf; opt_conf.set_momentum(0.9); sgd.Setup(opt_conf);

Finally, we “compile” the grid, indicating the loss function and the metric. This stage is also familiar to those who use DL libraries with python:

  SoftmaxCrossEntropy loss; Accuracy acc; net.Compile(true, &sgd, &loss, &acc);

Everything is ready - we start training, indicating the tensors, the number of epochs and the size of the batch:

  size_t batch_size = 128; net.Train(batch_size, num_epoch, train_x, train_y, val_x, val_y);

In the source code, you can see some hint at the tools for creating more complex graphs of calculations than the simple sequential addition of layers, for example, concate and merge .

There are also regularizers: dropout and batch normalization .

In principle, all of the above is sufficient for evaluative experiments and acquaintance with this wonderful library. If you look at the contents of the examples folder, then there are quite complex models written on a Python-style wrapper, such as Char-RNN or an image categorizer based on GoogleNet . My experiment with binary classification, by the way, gives an estimate of the accuracy in the region of 0.745. This is somewhat worse than 0.80 for the neural version of Keras with Theano backend, but not so much as to sound the alarm, especially since additional tuning of parameters and architecture can improve the model.

Subjective evaluation of Apache.SINGA

(+) All major grid architectures are presented, including convolutional and recurrent.
(+) GPU support.
(+) Calculations on float.
(+) More or less familiar workflow in the design of the model, understandable terminology.
(-) Difficult installation and compilation.
(-) Not always intuitive requirements for the choice of parameters and options, manifested through memory faults.

Solution of the problem of binary classification in C ++ using OpenNN

This is the last library I tried in this experiment. Description in the repository positions the library as a high-performance implementation of neural networks. I do not dare to say how achieved this goal, but the training of my neural network implementation started on one CPU core (Apache.SINGA, for example, sat on both available cores without additional kicks on my part), and the double accuracy is even more embarrassing.

Nevertheless, I recommend installing and trying this library, since this is an extremely simple process.

Build opennn

The process is described in the assembly documentation , nothing complicated there.

1. Download:

 git clone https://github.com/Artelnics/OpenNN.git

2. Create a sln file for VS 2015:

 mkdir build & cd build cmake -G "Visual Studio 14 2015 Win64" ..

3. Open the created OpenNN.sln file in Studio, compile it. It takes 10-15 minutes. In the subdirectory ... \ build \ opennn \ Debug (or Release) a static library file opennn.lib appears.

At this assembly ends.

Using OpenNN

The main problem with this library is the threshold of entry due to unusual terminology and the general organization of work with the model. Class names are often unintuitive, it is difficult to guess their purpose. For example, what does the LossIndex class do ? This thing obviously has
relation to loss function, but why “Index”?

Or learning options. When you start learning neural networks with default parameters
In the console, you can see the message that quasi-Newtonian optimization is being performed - great, but I would like something like stochastic gradient descend. Switching the optimizer is done like this:

 training_strategy.set_main_type( OpenNN::TrainingStrategy::GRADIENT_DESCENT);

At the same time, the corresponding class is independently engaged in adjusting the learning speed, and so tightly that I have not found a way to somehow influence this process, in particular, how to set the initial learning rate.

Setting the maximum number of iterations and other parameters looks like this:

 TrainingStrategy training_strategy; // ... training_strategy.get_gradient_descent_pointer()->set_maximum_iterations_number(100); training_strategy.get_gradient_descent_pointer()->set_maximum_time(3600); training_strategy.get_gradient_descent_pointer()->set_display_period(1);

As you can see, the names of classes and methods are a bit cumbersome.

Further, the data for training are presented as double. It certainly imposes some
overhead on copying tensors, and raises the question of whether a cast is done to float somewhere
under the hood quietly if using a GPU.

Another disadvantage is that I did not find any analogs of the sklearn method of the predict for the trained model, so that for the test set I could immediately get the vector of predictions, then to calculate my metric. I had to write a loop, feed the grid on one test sample and then analyze a single result, updating the metric.

There are pluses. The library has its own class for dataset operations, with the straightforward name DataSet . He is able to download data from a csv file, and even understands the header with column names. On the other hand, this class is organized in such a way that it loads from the same file both input variables X and target values of y. For me it was a surprise, so I had to add a separate option to save the dataset in the Python script specifically for OpenNN.

In general, writing my own neural net implementation to solve my classification problem
using OpenNN took a little longer than SINGA or tiny-dnn. How transparent it turned out - you can evaluate yourself.

OpenNN subjective assessment

(+) simple installation and compilation, no dependencies
(+) there seems to be support for CUDA
(±) its class for working with datasets DataSet can load from CSV and knows
about the headers, but requires that both X and y be in the same file.
(-) operations with double
(-) extremely unusual terminology, non-intuitive class names, unusual workflow in the description of the model.

Continuing with other libraries

At a minimum, it is worth looking at the maycrosoft CNTK . For this framework, in addition to the standard set of architectural elements of deep learning, the documentation mentions the implementation of reinforcement learning - the technological foundation of the HYIP wave in the last couple of years.

Source: https://habr.com/ru/post/335838/

All Articles