“What do we have?” Asked the horned beast, turning.
“Aldan-3,” said the bearded.
“A rich car,” I said. “[1]
Recently, I decided to study deep learning. At work, I was given a new card with CUDA support and the chief expressed his wish that this peak of engineering will allow our laboratory to make a leap forward, or at least, not to lag behind the mass of competitors. I already had some experience with Tensor Flow, but this time I decided to try Torch. Attracted that it is written in the language of Lua and C, is quite lightweight and easily extensible through FFI. And I don't like Python either.
Recently, I came across an article on Habrahabr, in the process of discussing which I remembered that somewhere in the nightstand I saw Raspberry Pi, the B + model and I wanted to see if I could pick up a torch on it and run something simple.
Naturally, the first thing I wanted was to see how alexnet and other well-known networks will train on my desktop with a new GPU card. On github there is a small project in which several popular networks are implemented on Torch . Having played with them, I switched to solving my problems, but I’m not going to talk about them here.
Now go to the raspberry (Raspberry PI model B +).
Copy the torch installer to Malinka :
apt-get install git-core git clone https://github.com/torch/distro.git ~/torch --recursive First of all, I decided that I don’t want to wait until the standard Torch installer compiles OpenBLAS and installs QT with all dependencies, so I decided to do it manually:
 apt-get install -y build-essential gcc g++ curl cmake libreadline-dev libjpeg-dev libpng-dev ncurses-dev imagemagick libzmq3-dev gfortran libopenblas-base libopenblas-dev Start compiling torch:
 cd ~/torch; ./install.sh My compilation takes only about an hour.
“And what is this lamp with you?” - Farfurkis asked suspiciously. [one]
And here we are waiting for the first bummer: the creators of torch: did not expect that it will be compiled on the arm architecture, but without the support of NEON:
 [ 6%] Building C object lib/TH/CMakeFiles/TH.dir/THVector.co In file included from /home/pi/torch/pkg/torch/lib/TH/THVector.c:2:0: /home/pi/torch/pkg/torch/lib/TH/generic/THVectorDispatch.c: In function 'THByteVector_vectorDispatchInit': /home/pi/torch/pkg/torch/lib/TH/generic/simd/simd.h:64:3: error: impossible constraint in 'asm' asm volatile ( "cpuid\n\t" I had to fix this case . And after that, it all worked! If you are too lazy to do it all yourself and want to quickly try an example, I made an archive with a pre-compiled torch for Raspberry PI -B (without the support of NEON): https://github.com/vfonov/deep-pi/releases/download/ v1 / torch_intstall_raspbian_arm6l_20161218.tar.gz , unpacked in / home / pi
To test, I decided to look at the speed of the MNIST handwriting recognition workout, the corresponding example is in the Torch set of demos :
 th train-on-mnist.lua <torch> set nb of threads to 4 <mnist> using model: nn.Sequential { [input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> output] (1): nn.SpatialConvolutionMM(1 -> 32, 5x5) (2): nn.Tanh (3): nn.SpatialMaxPooling(3x3, 3,3, 1,1) (4): nn.SpatialConvolutionMM(32 -> 64, 5x5) (5): nn.Tanh (6): nn.SpatialMaxPooling(2x2, 2,2) (7): nn.Reshape(576) (8): nn.Linear(576 -> 200) (9): nn.Tanh (10): nn.Linear(200 -> 10) } <warning> only using 2000 samples to train quickly (use flag -full to use 60000 samples) <mnist> loading only 2000 examples <mnist> done <mnist> loading only 1000 examples <mnist> done <trainer> on training set: <trainer> online epoch # 1 [batchSize = 10] [===================>.................... 471/2000 ....................................] ETA: 2m20s | Step: 92ms In general, not bad, for comparison - on the desktop with i5-4590 CPU @ 3.30GHz, without using a GPU:
 [=======================>................ 571/2000 ....................................] ETA: 27s613ms | Step: 19ms That is, in this example, the Malinka is about 5 times slower than a modern desktop.
now animated "Aldan" sometimes printed at the exit: "I think. Please do not interfere "[1]
Now it's time to make the raspberry recognize images using a trained googlenet. A second trick was waiting for me here: there are so many parameters in Alexnet that there is not enough memory for Malinka. But here comes squeezenet and Network-in-Network to the rescue, the author of the latter even made a trained model in the format for torch .
First you need to transform the model so that it can be used on the ARM architecture (you should not train on Raspberry PI - the results will be ready in a hundred years).
On the desktop, you need to load the model in the binary format torch, and write in the format 'ascii', then on the malinka - convert back:
Desktop:
 model=torch.load('blah.t7') torch.save('blah_ascii.t7',model,'ascii') Raspberry PI:
 model=torch.load('blah_ascii.t7','ascii') torch.save('blah_arm.t7',model) The version for arm can be downloaded here .
I made a little script for working on a malink:
Full text here .
 ... local m=torch.load(prefix..'nin_bn_final_arm.t7') ... local input=image.load(prefix.."n07579787_ILSVRC2012_val_00049211.JPEG") ... local output=model:forward(cropped) ... And voila, we launch it with an image from the ImageNET test suite:
 >th test_single.lua n07579787_ILSVRC2012_val_00049211.JPEG loading model:0.57sec Running neural net:13.46sec 25.3%: n07579787: plate 13.8%: n07873807: pizza, pizza pie 8.8%: n04263257: soup bowl 8.0%: n07590611: hot pot, hotpot 7.2%: n07831146: carbonara Te for 14 seconds Malinka successfully coped with the pattern recognition procedure!
It is time to make an example more interesting: we attach the interface to the camera from the camera package and the web interface from the display package, and we have an interactive machine that announces to the world in 14 seconds what it sees. You only need to install the package for working with the camera (luarocks install camera) and for visualization via the web interface (luarocks install display).
Full text here .
 … --   local cam = image.Camera {idx=0,width=iW,height=iH} ... local frame = cam:forward() local cropped = image.crop(frame, w1, h1, w1+oW, h1+oH) -- center patch … --    display_sample_in.win=display.image(cropped,display_sample_in) … --     local output=model:forward(cropped) … --   -   display_output.win=display.text(out_text,display_output) Before launching, you must start the daemon from the display package: nohup th -ldisplay.start 8000 0.0.0.0 &
Test setup:

Result:



So we have an inexpensive machine for image recognition, which you can please your friends during the New Year holidays. Naturally, the task of classifying an image can be replaced by something more productive, for example, you can easily make a system for identifying a person by physiognomy, there are several examples here, or you can identify figures of people on your garden monitoring camera.
To optimize performance, you can try using nnpack , or even make an interface to a vector accelerator built into the ramberry processor like this .
Quotes from "Monday begins on Saturday" and "The Tale of the Three" by A. and B. Strugatsky.
Description of the procedure in English and the repository with all the source code is on github .
Source: https://habr.com/ru/post/400141/
All Articles