📜 ⬆️ ⬇️

Using Intel Movidius for Neural Networks

Introduction


We are developing deep neural networks for analyzing photos, videos and texts. Last month we bought a very interesting thing for one of the projects:
Intel Movidius Neural Compute Stick .
Intel MNCS

This is a specialized device for neural network computing. In fact, the external video card, sharpened by neural networks, is very compact and inexpensive (~ $ 83). We want to share the first impressions of working with Movidius. All interested in asking under the cat.

Computing power of the device


In terms of computing, neurons are extremely voracious: they need GPUs for learning, and for use in real-world tasks, they are also GPUs or powerful CPUs. Movidius NCS allows you to use deep neural networks on devices that were not originally designed for it, for example: Raspberry Pi, DJI Phantom 4, DJI Spark. We are talking only about the prediction stage (inference of a pre-trained network): the training of neural networks on Movidius is not yet supported.

The chip's performance is about 100 gigaflops, 10 ^ 9 FLOPS, (this roughly corresponds to the level of top-end supercomputers of the early 90s, now it is in the order of hundreds of petaflops, 10 ^ 15).
')
For reference: FLOPS is the number of computational operations or instructions that are performed on floating-point (FP) operands per second. To go deeper into the topic, I recommend the Intel article .

The piece of iron is based on the Myriad 2 chip. The Myriad 2 configuration includes 12 specialized programmable vector processors. The components of the SoC are connected to a high-speed internal connection that works with minimal delays. Myriad 2 is positioned as a coprocessor in conjunction with an application processor in mobile devices, or as a stand-alone processor in wearable or embedded electronics devices.

Myriad 2
Myriad 2 processor itself

But in the form factor flash drives (Neural Compute Stick) it can be used to embed neural networks in drones, for example, together with the Raspberry Pi.

Let's start the installation and launch of the first program on NCS


What we need



Training


We connect Movidius to the USB 3.0 connector. Next, write to the console:

$ git clone https://github.com/movidius/ncsdk.git $ cd ncsdk $ sudo make install 

These commands will install:


And also add the path to Movidius python-lib in PYTHONPATH.

Run an example


In the same folder, run the command to build the examples:

 $ make examples 

To prepare a standard example — an implementation of inception_v1 trained on ImageNet — we will execute the following commands:

 $ cd examples/tensorflow/inception_v1 $ make all 

The last command uses the grid description and the already trained weights and compiles the binary graph, which we can then run on Myriad 2 VPU.

Now we run the test script run.py. Briefly tell what happens in the script as a whole (some parts of the script are omitted):

 #  NCS, numpy, sys  opencv from mvnc import mvncapi as mvnc import sys import numpy import cv2 #/  / #     graph_filename = 'graph' #      image_filename = path_to_images + 'mouse.jpg' #  NCS-,     , #     devices = mvnc.EnumerateDevices() if len(devices) == 0: print('No devices found') quit() device = mvnc.Device(devices[0]) device.OpenDevice() #   ,   TensorFlow #(  Caffe) with open(path_to_networks + graph_filename, mode='rb') as f: graphfile = f.read() #/  / #      graph = device.AllocateGraph(graphfile) #         img = cv2.imread(image_filename).astype(numpy.float32) dx,dy,dz= img.shape delta=float(abs(dy-dx)) if dx > dy: #crop the x dimension img=img[int(0.5*delta):dx-int(0.5*delta),0:dy] else: img=img[0:dx,int(0.5*delta):dy-int(0.5*delta)] img = cv2.resize(img, (reqsize, reqsize)) img=cv2.cvtColor(img,cv2.COLOR_BGR2RGB) #  for i in range(3): img[:,:,i] = (img[:,:,i] - mean) * std print('Start download to NCS...') graph.LoadTensor(img.astype(numpy.float16), 'user object') #  output, userobj = graph.GetResult() #        NCS- top_inds = output.argsort()[::-1][:5] print(''.join(['*' for i in range(79)])) print('inception-v1 on NCS') print(''.join(['*' for i in range(79)])) for i in range(5): print(top_inds[i], categories[top_inds[i]], output[top_inds[i]]) print(''.join(['*' for i in range(79)])) graph.DeallocateGraph() device.CloseDevice() print('Finished') 

When we collected the example, we entered the command make all into the console, after which useful information was output to the console, for example, you can see how quickly data passes through each layer of the network using the Detailed Per Layer Profile . Useful for debugging and optimizing stuff.

Run the script:

 $ python3 run.py 

The test image is loaded onto NCS, passes through Inception, and the recognition result is displayed in the console (probability distribution over 1000 + 1 categories of ImageNet dataset).

Console output
 Number of categories: 1001 Start download to NCS... ******************************************************************************* inception-v1 on NCS ******************************************************************************* 674 mouse, computer mouse 0.99512 663 modem 0.0037899 614 joystick 0.00031853 528 desktop computer 0.00021553 623 lens cap, lens cover 0.0001626 ******************************************************************************* Finished 


Test picture
image We uploaded this photo to Movidius and drove it through Inception.

It can be seen that the network with ~ 99% confidence believes that the picture shows a computer mouse (thanks to our hint :)), the modem is in second place with close to 0% confidence, and so on. The grid is right, so congratulations on your first neuron, successfully launched on this device!

Conclusion


In the end I would like to list the main advantages and disadvantages of the device.

First bad news:


Good news:


image
So Intel suggests using Movidiuses to speed up computing

Of course, this device has analogues.

One of them - and the most promising so far - is Gyrfalcon Technology Laceli , which has 28 times more performance and 90 times more energy efficiency. The only obstacle to buying is that the device has not yet entered the market.

Another competitor that has long been on the market is NVIDIA Jetson TX2 . Differences:


If interested, we will write in the near future another article about using Jetson TX2 for neural networks. Thank you for your attention and have a nice day)

PS Intel announced the launch of a competition for optimizing neural networks for the Intel Movidius Neural Compute Stick. Registration is until January 26, the end of the competition - March 15.

Source: https://habr.com/ru/post/346958/


All Articles