Deep Learning: Comparing Framework for Character Deep Learning

We present you a translation of a series of articles devoted to deep learning. The first part describes the choice of a framework with an open code for symbolic deep learning, between MXNET, TensorFlow, Theano. The author compares in detail the advantages and disadvantages of each of them. In the following sections, you will learn about fine tuning of deep convolutional networks, as well as the combination of a deep convolutional neural network with a recurrent neural network.

The series of articles "Deep Learning"

1. Comparison of frameworks for symbolic deep learning .
2. Transfer learning and fine tuning of deep convolutional neural networks .
3. A combination of a deep convolutional neural network with a recurrent neural network .

Note: further narration will be conducted on behalf of the author.
')

Character frameworks

Symbolic computing frameworks ( MXNET , TensorFlow , Theano ) are characterized by symbolic graphs of vector operations, such as matrix addition / multiplication or convolution. A layer is simply a set of such operations. Due to the division into small composite components (operations), users can create new complex types of layers without using low-level languages (as in Caffe ).

I have experience using different frameworks for symbolic computing. As it turned out, they all have both advantages and disadvantages in the device and the current implementation, but none of them fully meets all the requirements. However, I currently prefer Theano.

Next, we compare the listed frameworks for symbolic computing.

Characteristic	Theano	Tensorflow	MXNET
Software	Theano	Tensorflow	MXNET
Author	Montreal university	Google Brain Team	Distributed (Deep) Machine Learning Community
Software License	BSD license	Apache 2.0	Apache 2.0
Open source	Yes	Yes	Yes
Platform	Cross-platform solution	Linux, Mac OS X, Windows support planned	Ubuntu, OS X, Windows, AWS, Android, iOS, JavaScript
Programming language	Python	C ++, Python	C ++, Python, Julia, Matlab, R, Scala
Interface	Python	C / C ++, Python	C ++, Python, Julia, Matlab, JavaScript, R, Scala
CUDA support	Yes	Yes	Yes
Automatic differentiation	Yes	Yes	Yes
Availability of pre-trained models	Using model zoo in Lasagne	Not	Yes
Recurrent networks	Yes	Yes	Yes
Convolution networks	Yes	Yes	Yes
Limited Boltzmann machines / deep trust networks	Yes	Yes	Yes

Comparison of character and non-character frameworks

Non-character frameworks

Benefits:

Non-character (imperative) neural network frameworks such as torch and caffe , as a rule, have a very similar computational part.
From the point of view of expressiveness, imperative frameworks are built quite well, they can have a graph-based interface (for example, torch / nngraph ).

Disadvantages:

The main disadvantage of imperative frameworks is manual optimization. For example, on-site operations need to be implemented manually.
Most imperative frameworks are symbolic in expressiveness.

Character frameworks

Benefits:

In symbolic frameworks, automatic optimization is possible based on dependency graphs.
In symbolic frameworks, you can get much more reusable memory features. For example, this is perfectly implemented in MXNET.
Symbolic frameworks can automatically calculate the optimal schedule. More details can be found here .

Disadvantages:

Available open source symbolic frameworks are still underdeveloped and inferior to imperative in performance.

Adding new operations

In all of these frameworks, adding operations while maintaining acceptable performance is not easy.

Theano / MXNET	Tensorflow
You can add Python operations with support for embedded C operators.	Forward in C ++, symbolic gradient in Python.

Code reuse

It takes a lot of time to train deep networks. Therefore, Caffe released several pre-trained models (model zoo) that could be used as initial samples when transferring training or fine-tuning deep networks for specific areas of knowledge or custom images.

Theano	Tensorflow	MXNET
Lasagne is a high-level platform based on Theano. Lasagne makes it easy to use pre-trained Caffe models.	No support for pre-trained models.	MXNET provides the caffe_converter tool for converting pre-trained caffe models to MXNET format.

Low Level Tensor Operators

Rather efficient implementation of low-level operators: they can be used as composite components when creating new models without spending effort on writing new operators.

Theano	Tensorflow	MXNET
Many simple operations	Quite good	Very little

Flow control operators

Flow control operators enhance the expressiveness and versatility of the character system.

Theano	Tensorflow	MXNET
Supported	In the format of the experiment	Not supported

High level support

Theano	Tensorflow	MXNET
A “clean” character computing framework. You can create high-level platforms as required. Successful examples include Keras , Lasagne , blocks.	A good device from the point of view of learning neural networks, but at the same time, this framework is not focused exclusively on neural networks, which is very good. You can use graph collections , queues, and image additions as composite components for high-level shells.	In addition to the symbolic part, MXNET also provides all the necessary components for classifying images, from loading data to building models with methods to start learning.

Performance

Single-GPU Performance Measurement

In my tests, the performance of the LeNet model for the MNIST dataset is measured for a single-GPU configuration (NVIDIA Quadro K1200 GPU).

Theano	Tensorflow	MXNET
Fine	Average	Excellent

Memory

The amount of GPU memory is limited, so using for large models can be problematic.

Theano	Tensorflow	MXNET
Fine	Average	Excellent

Single-GPU speed

Theano compiles graphs for a very long time, especially in complex models. TensorFlow is still a little slower.

Theano / MXNET	Tensorflow
Compare to CuDNNv4	Approximately twice as slow

Support parallel and distributed computing

Theano	Tensorflow	MXNET
Experimental Multi-GPU Support	Multi-GPU	Distributed

Conclusion

Theano (with high-level Lasagne and Keras solutions) is an excellent choice for deep learning models. Using Lasagne / Keras is very easy to create new networks and modify existing ones. I prefer Python, so I choose Lasagne / Keras because of the very advanced Python interface. However, these solutions do not support R. The possibilities of transferring training and fine-tuning in Lasagne / Keras show that it is very easy to modify existing networks, as well as to customize for subject-oriented user data.

After comparing the frameworks, we can conclude that the most optimal solution will be MXNET (better performance, efficient memory use). In addition, it has excellent R support. Actually, this is the only platform that supports all functions on R. In MXNET, transfer of training and fine-tuning of networks are possible, but they are quite difficult to perform (compared to Lasagne / Keras). Because of this, it will be difficult not only to modify the existing training networks, but also to configure it for subject-oriented user data.

If you see an inaccuracy of the translation, please let us know in your private messages.

Source: https://habr.com/ru/post/313318/

All Articles

Deep Learning: Comparing Framework for Character Deep Learning

The series of articles "Deep Learning"

Character frameworks

Characteristic

Theano

Tensorflow

MXNET

Comparison of character and non-character frameworks

Non-character frameworks

Character frameworks

Adding new operations

Code reuse

Low Level Tensor Operators

Flow control operators

High level support

Performance

Single-GPU Performance Measurement

Memory

Single-GPU speed

Support parallel and distributed computing

Conclusion

More articles: