
Author: Igor Litvinenko, Senior Mobile Developer.
Everyone probably heard about the VR helmets, creating the effect of presence in the virtual world. However, today I would like to talk not about virtual, but about augmented reality. These concepts are important to distinguish. In virtual reality helmets, the entire image is generated - such a reality is completely artificial. Augmented reality, in contrast to virtual reality, does not imply the creation of a completely artificial reality, but the addition of a video stream of our reality with virtual objects and data. Thus, there is a combination of the virtual and real world.
Core Augmented Reality Technologies
How is augmented reality created? To make the addition of a certain real object, you need to detect this real object in the video stream. This is the most important thing - after the discovery of the object, it is not difficult to finish something and somehow supplement it. There are different ways to detect the necessary objects, mainly for this purpose augmented reality markers are used. The following evolutionary sequence lists the main ways to detect complemented objects:
')
- The simplest marker or image.
- Markerless - markerless augmented reality.
- Simple 3D markers.
- The combination of markers for rendering complex objects (cylinder, cube, parallelepiped).
- Frame Marker.
- Based on location.
- Real augmented reality.
Marker Picture
The simplest marker of augmented reality can be easily recognized by the thick black frame. Such an object is very easy to detect in a video stream:

- Requirements:
- Black frame, the width of which is not less than 10%.
- Only black and white.
- Invariant to rotations - at any time we can specify the exact angle of rotation.
- Always square.
- The algorithms here are the simplest: we can define the borders of the image (edge ​​detection), a square and a white square in the middle, having made the display binarization before this threshold. This way we can get to the content inside, cut it out and work with it as with a picture - it is much easier than scanning the entire frame.
- Corner detection.
- Blob detection.
- Edge detection.
- Thresholding.
- Benefit:
- The easiest detection algorithm is a lot of open and closed libraries that can detect a marker.
- The most stable - the marker is always detected very accurately and there is practically no effect of model shake.
Markerless
Despite the name, with markerless approach, the marker is, in fact, still there. Just here it does not look like a marker, but as a picture.

- Requirements:
- A large number of small parts.
- The more colors, the better.
- Invariant to turns.
- The ideal ratio is 1: 1, i.e., the closer to the square, the better. At a ratio of 1: 2 or more, the marker is recognized very poorly.
- Algorithms. To detect such markers, characteristic points are used: for example, this is the point where the gradient changes. That is, this is the point where a clear border is visible. The algorithm for finding geometric points can also be used: in some places of the picture there are exact markers that form angles — for example, a cross point can be a geometric point.
- Detect interest points.
- Fiducial markers /
- Edge detection - detection of image borders for searching characteristic points.
- Simultaneous localization and mapping (SLAM) - finds the characteristic points and makes maps on them.
- Benefit:
- It is easier to fit into the world: - do not need a thick black frame. The user may not even notice that the picture in front of him is a marker of additional reality. Markerless approach is convenient to use, for example, in the printing industry - it can be a picture in a book that comes to life when the camera is pointed at it.
Marker combination
This technology allows us to take into account the simple form of three-dimensional objects: a cube, a cylinder, etc. Here we can create a configuration that helps to understand what kind of object is in front of us - so, a certain shape object with a certain color combination can serve as a marker (for example, made an application that defines the drug by packaging and label). We also made an application that recognizes brands of wines - the library could find labels from different angles, which does not work in markerless or simplest technology due to nonlinear marker transformation.
Frame marker
Let's say you are holding a conference. You have a logo that you hang on the walls to show people where to go. The logo is one, so all the images are the same; at the same time you need to uniquely identify each picture. How to do it? With frame marker. When using frame marker, the ID of the image is encrypted in the frame:

- Requirements:
- Unified frame.
- Invariant to turns.
- Always square.
- The internal image should be in contrast with the frame.
- Small size (3 - 10 cm).
- Algorithms:
- Benefit:
- The ability to uniquely identify the same marker.
Location Based Augmented Reality
If you walk around the city and get information about the buildings that you see, most likely, the addition of reality occurs through location.

In this case, there is no image recognition task. This technology is based on the use of the GPS receiver, compass and accelerometer present in the mobile device. Thanks to them, we know in which direction we are looking. Thus, to supplement reality, you just need to correctly respond to the readings of the sensors of the mobile device. This task is not so difficult - there are enough libraries that cope well with it.
Real Augmented Reality
There are no markers here. Here we are on the move to determine the 3D forms and characteristics of any objects that fall into the camera lens. We need to know the depth of the object to turn the 2D image into 3D. For this, you can use, for example, the above-mentioned SLAM algorithm, which searches for characteristic points on surrounding objects. So far, on mobile devices it all works very slowly. Now the technology of this augmented reality actively introduces Sony in conjunction with the PlayStation.
Keyshare - keyshare.org
And now I’ll tell you how we in DataArt wrote our augmented reality engine and why we did it.
One Swiss startup decided to offer a new system to increase sales, built on the use of augmented reality technology, and we developed this system for it. Here is how it works.
We have a patented marker of augmented reality in the form of a key image that can be placed, for example, in a magazine next to a description of a product. White dots of different sizes inside this key allow you to uniquely identify the content. There is a server - it accepts the code read from the key and returns to the user a variety of data about the product, its 3D model, etc.

To develop such a key, we tried all the most popular libraries, but could not find a marker that would fit any combination. When we use marker augmented reality, it sits on key points. The marker is black and white, and key points are concentrated in changing places. In the end, we decided to write everything from scratch.
We used the MSER search algorithm, which simply finds an area. After all, we know that for sure there is a black key and for sure there is a white cross inside this key. Therefore, we first find a large black area, and inside this area we find a white area. Then we cut the picture and look at the aspect ratio - it should be 2: 1. Next, analyze the form. Focusing on the cross, we can find the beginning of a key phrase. As for the points, they are always in the same places, so finding them is also not difficult. As a result, we got an algorithm for searching for a marker by form. This, of course, is not a universal solution, but our task is simply excellent.
So, on the iPhone 5S, we got a performance of more than 25 FPS. To achieve this was quite difficult. First, as with any algorithm, we have reduced the picture: the recognition algorithm works much better on a smaller picture with low quality. Then they implemented the prediction algorithm - after they found the picture, we assume that the key cannot fly away from the frame by more than a certain pixel value. Then shorten the picture. After that, we analyze the dynamics: if the user turns the phone to the left, the key will move to the right. This is a probabilistic algorithm. if we don’t find right away what we are looking for, we start processing a larger area. We have an excellent model rendering algorithm, which was written from scratch.
What else do we have? On the key there are three rows, in each of them - 13 points. This means that 469 combinations are possible. Since at a distance of more than a meter the picture is already somewhat blurred, we made a probabilistic algorithm for decoding with error correction. We use it in conjunction with a self-correcting key. So we accurately identify four false signs, which is enough. We also have an optimized detection algorithm, a tracking and prediction algorithm for the next position.
Despite the fact that such a key is somewhat similar to a QR code, there are fundamental differences. You cannot link the augmented reality to the QR code, because its content is constantly changing. In other words, you cannot create it as a marker. You cannot put a 3D model on it and cannot determine the angle of rotation. In addition, this key is very easily recognizable.
Football Clubs Recognizer
We also developed an application that helps users follow their favorite football clubs. It allows you to complement the virtual reality image of the logo of a football club - when you hover the camera on the logo shows the data of the club.