Who is this article for?
For people who are interested in computer vision and augmented reality in relation to mobile devices, but do not know where to start.
Foreword
So, we are students of the Faculty of Mathematics and Mechanics of St. Petersburg State University who, at their leisure, decided to familiarize themselves with the basic aspects of computer vision. To consolidate the theoretical foundations decided to do something practical. A visit to our extremely interesting lectures prompted the idea of ​​an application that allows you to throw bombs on people in augmented reality.
Android was chosen as a mobile platform, as there was little experience in writing applications for it, and we know Java much better than Objective-C. For image processing, we decided to use the well-known
OpenCV library .
Under the cut the story of the creation of our simple application.
What applications of augmented reality are already on Google Play?
Conventionally, they can be divided into three categories:
- Receive data from the accelerometer, compass and GPS, and, focusing only on them, superimposing something on the image ( example )
- Using any tags that need to be printed or distributed ( example )
- Not receiving and not using any data at all, but simply superimposing a picture ( example )
And this, in general, is all that can be found on Google Play.
')
Where to begin?
The easiest way to begin acquaintance with computer vision, using the library OpenCV. We recommend the book O'Reilly “Learning OpenCV” (available on rutrekera). You will also often need to look at the
developers wiki .
Formulation of the problem
Actually, what did we have to do? First of all, we needed to select objects in the picture that a bomb could explode about. We decided to use the following approach: search for moving objects and check collisions with them. This can be done fairly quickly, which is very important, since the resources of mobile devices are limited.
Implementation
What did we come up with? Take a few key points on one frame and see where they will be on the next. Then consider their shift and distribute them into clusters, which would be objects. The background can be considered the cluster with the largest area (or with the largest number of points, but as a result we have abandoned this approach).
To determine the shift of key points, we decided to use the
Lucas-Kanade method . At the entrance he needs two pictures and an array of points with the first one, and the output is an array with the same points, but already found in the second picture. Just what we need and works fast enough.
In this case, the goodFeaturesToTrack (...) method is well suited for finding key points. As the name suggests, it looks for features (key points that differ from the rest according to a certain criterion), which can be easily traced from frame to frame. It works slower than a simple search for features, but the accuracy of calculating shifts becomes greater.
Problems and Tips
- It is impossible to determine the camera shift with good accuracy.
Initially, we planned the following approach: cut the common parts of the two frames and subtract one from the other. In theory, a nonzero part would show where moving objects are located. But in practice, due to the inaccuracy of the shift and brightness fluctuations, even with a slight movement of the camera, there were almost no zero areas. This fact must be taken into account when designing an application. - Everything needs to be done in the native.
The first version was written in pure Java. But, even though the library itself is native, it worked very slowly due to the many unnecessary jets of large amounts of data and the frequent invocation of the garbage collector. After the removal of all calculations in the native productivity increased several times.
- Debugging
Debug on the device takes quite a lot of time, and the emulator for our purposes is not an option (too slow). Therefore, it is convenient to have a PC version of the C ++ application, from which a specific section can be simply copied into the native code for Android.
- Multithreading
Processing frames takes a lot of time, and the user is unlikely to be pleased to observe delays in changing the image. Therefore, it is necessary to do a completely natural thing: to divide the output of images and their processing into two streams.
- Documentation.
In the version of OpenCV for Android, we were struck by the lack of intelligible documentation. Of course, it is basically the same as the PC version, but some points are still different.
Conclusion
Image processing requires serious computing power. Our application works well even on devices comparable in performance to Acer Liquid (768 MHz). For the majority of devices manufactured now, it is quite realistic to do something more complicated.
In addition, as you can see, there is nothing complicated in computer vision, but at the moment there are almost no apps with augmented reality for Android, so go for it.