
About augmented reality (Augumented reality, AR) many articles are written. On Habré there is a whole section dedicated to this
area . In short, there are two fundamental approaches for creating augmented reality: using a pre-prepared
marker (
more ) that needs to be printed and
without it . Both approaches, using computer vision algorithms, recognize objects in a frame and complement them.
This article focuses on recognition algorithms for creating additional reality with a pre-prepared marker.
What could be the marker?
In theory, a marker can be any shape (object). But in practice we are limited by the resolution of the webcam (phone), color rendering features, lighting, and the computing power of the equipment (everything happens in real time, and therefore must be done quickly), and therefore a black-and-white marker of simple form is usually chosen. As a rule, it is a rectangle or square with an identifier-inscribed inside.
The article [1] describes the main types of markers that may look like this:
ArToolKit (ATK) marker system.

Institut Graphische Datenverarbeitung (IGD) marker system.

Siemens Corporate Research (SCR) marker system.

Ho ff man marker system (HOM) used by SCR and Framatome ANP.

The same article [1] compares the performance of different implementations of markers and recognition.
')
Generalized Marker Recognition Algorithm
In short, the marker recognition algorithm looks like this (
taken here ):

a) Given in grayscale.
b) Image binarization (threshold).
c) Definition of closed areas.
d) Select the contours.
e) Select the corners of the marker.
f) Transform the coordinates.
Let us consider in more detail what happens at each step with illustrations.
Color conversion to grayscale
We give three algorithms for converting a color image to grayscale.
1. Lightness
GS = (max (R, G, B) + min (R, G, B)) / 2
2. Luminosity (Luminosity)
GS = 0.21 × R + 0.72 × G + 0.07 × B
3. Average
GS = (R + G + B) / 3
Here is an example of what these three ways look like. As a rule, use "luminosity".
Original Image:

Lightness:

Luminosity:

The average:

Image binarization. Threshold
It is transparent that a certain threshold is used to translate an image into a two-color state. The question is how and to what this threshold to apply. The easiest way is to set a threshold. For example, if we have 256 colors, we can set the threshold to 128. If we complicate things a little, we can choose the threshold using the color histogram. In general, all methods of converting an image into an h / b form can be divided into six large groups that are listed and classified in article [2].
1) Methods based on histogram “shape.”
2) Methods based on clustering.
3) Methods based on the study of entropy.
4) Methods based on the search for similarities between gray and b / w image.
5) Methods using correlation dependencies and features of the statistical distribution between pixels in the image areas.
6) Methods based on local adaptation of the threshold for each pixel of the image.
The most commonly used methods are based on local adaptation. There are about forty binarization methods. If we compare them, we get another full-fledged article, so let's give an example of image binarization with a given threshold,
adaptive , and also the
Otsu method
Hard threshold:

Adaptive:

Otsu method:

Definition of closed areas
We need to define closed areas on a white background. As a rule, a combination of algorithms takes place here, but in the general case, as a rule, algorithms are used to fill white areas and select closed areas.
Here
the person solved a similar problem and
more . Variations are possible in this place. First, the contours are selected, and then checked for closure. Perhaps this approach is faster.
Like this:

Or so, at first selected contours, then checked for isolation:

Outline selection
There are several approaches to the selection of contours in the image. I will indicate the main found [3]:
1) Marr-Hildreth Edge Detector
2) Canny Edge Detector
3) Boolean function based Edge Detector
4) Euclidian distance and Vector Angle based Edge Detector
5) Depth Edge Detection using Multi-Flash Imaging
6) Sobel Edge Detector
Again, the material is quite voluminous, the comparison and description of the algorithms can take more than one page. Mainly used algorithm Canny and Sobel.
Inside the white enclosed areas are looking for contours.
As a result of the application, we get this image
Kanni (need to play with the parameters of the algorithm)

Sobel

Marker corner selection
Having selected a contour, it is necessary to compare it with our marker. Many different contours in the image can be highlighted, we need to find something “similar” to a quadrilateral.
To solve this problem, you can apply the
Douglas-Pecker algorithm (also known as the Rahmer-Douglas-Pecker algorithm, the iterative nearest point algorithm, the partitioning and merging algorithm), which allows to reduce the number of curve points approximated by a larger series of points.
For example, in openCV there is a function approxPolyDP, which already does this. If its result is further processed, then it turns out quite suitable result.

So we got the coordinates of the corners of the marker.
Coordinate Transformation
Now we have the coordinates of the corners of the marker, which, in fact, ideally are perpendicular, but in reality are located at a different angle. In addition, both ideally and in reality, the sides of the square are the axes of coordinates. Thus, we can determine the position of the “camera” relative to our object, and the origin of the coordinates.
What is the basis of the method of determining the coordinates can be shown graphically as follows [4]:

Those. The idea is that when the angle from which the camera looks, the projection size changes. For example:

Drawings and basic information
from here . Mathematics is described here [4]:
Knowing the position of the camera and the point of reference, we can draw a projection for our 3D model. In this case, the cube.
As you can see, if we use a square with a clean field as a marker, then it is symmetrical, and we can only recognize the rotation in part. Sometimes this is enough. If not, then we can add an additional marker inside the square, then it will look like this

and get the resulting rotation angle using the
Hough transform
To build augmented reality, there are already libraries that can be used without going through the whole of this path again, for example,
ARToolkit and its associated ones.
This article was conceived as the first part of the training cycle on augmented reality. In the second part, if this arouses interest in Habrayuzerov, I am going to consider the creation of applications with augmented reality for Android. And then we can try to create an augmented reality without a specially prepared marker.
Continued:
How to use the webcam in the Android emulatorReferences
1.Xiang Zhang, Stephan Fronz, Nassir Navab. Visual Marker Detection and Decoding in Systems: A Comparative Study.
2. Mehmet Sezgin. Survey over image thresholding techniques and quantitative performance evaluation.
3.Ehsan Nadernejad. Edge Detection Techniques: Evaluations and Comparisons.
4.Hirokazu Kato, Mark Billinghurst. Marker Tracking and HMD Calibration for a Video-Based Augmented Reality Conferencing System