Restoration of three-dimensional models by the active parallax method

Hello dear readers.

I am a student at MSTU. Bauman. I hasten to share my experience in the field of image processing and the restoration of three-dimensional objects by the active parallax method.

At present, three-dimensional modeling and prototyping of real-world objects is actively used in various fields of activity, such as manufacturing, medicine, computer graphics, robotics and technical vision. In this regard, the development of 3d scanners and cameras, which create a 3d model of the object being registered, is becoming increasingly relevant.

A bit of history

In 1999, 3DVSystems, the world leader in three-dimensional video images, developed the ZCam video camera with a unique technology for measuring distances to objects in real time. This technology allowed to perceive and process the three-dimensional image, being directed at the object from only one side of it. In 2009, Microsoft bought the assets of 3DVSystems and, based on ZCam, a controller was developed for the Xbox game console. In 2010, Microsoft announced its beloved Kinect, a game controller that allows you to control the game with your own body. The Artec-group company produces 3d-scanners to digitize the shape of an object in real time. Such scanners can be used in medicine, the production and tuning of cars and the creation of special effects in film and video games.
')

Fig.1. An example of using algorithms in video games

0. Parallax method of registering 3d objects.

Systems of registration of three-dimensional objects can be built on various principles, one of which is the stereoscopic principle. The stereoscopic system consists of two cameras that record an object from different, but not too different angles. On the resulting images are determined by the corresponding points (stereo). Then, knowing the internal parameters of the stereopair cameras, as well as their mutual arrangement, one can determine the three-dimensional coordinates of the object points using the triangulation method [1]. Despite the successes of recent years, the solution of this problem remains a number of issues related to the fundamental limitations of this method, in particular, with the stereo-identification of points of objects that do not have a pronounced texture or have large non-textured (homogeneous) areas [2].

Fig.2. Stereo pair example

To overcome these disadvantages of the stereoscopic method, one can replace one of the stereopair cameras with a projector and obtain a device for recording three-dimensional objects based on the active parallax principle. The scheme of the system built on this principle is shown in the figure: a certain picture (structured illumination) is projected onto the object, its distortions caused by the shape of the object are recorded by the camera [3, 4].

Fig.3. Schematic diagram of the system, built on the active parallax method

1.Active parallax method for registering three-dimensional objects

Currently, many different variants of pictures have been developed for use in systems of structured illumination, representing both a series of changing pictures (pictures with time multiplexing) and fixed patterns using different color coding options [3, 4]. The time encoding uses a sequence of black and white pictures, as shown in Figure 4, a. The idea of this method is to encode the position of a pixel on the projector's matrix by a set of intensities in a sequence of projected pictures. A set of paintings shown in Fig. 2, a, uses "bit" encoding: a set of two-color (black and white) pictures is a binary code that defines the "number" of a pixel in a row. In addition to the "bit" coding, other methods of binary coding are also used (the shift of the binary pattern, the Gray code, and others [3, 4]). This method is not sensitive to the color of the surface, allows you to encode each pixel on the projector's matrix, but requires static position of the object.
because of the large number of used paintings.

Fig. 4. Paintings used to create a structured backlight: a - with temporal coding, b - with color coding

The color coding, an example of which is shown in Figure 4b, uses only one picture. The position of each pixel is uniquely encoded by the color value of a given pixel and several of its “neighbors”. When creating a color-coded picture, it is usually sought to obtain the minimum size of the neighborhood (the number of “neighbors”) of a pixel required for an unambiguous restoration, and the minimum number of different colors (to increase the reliability of determining each color). Such properties are possessed by M-sequences or de Bruin sequences (de Brujin) [3, 4]. The advantage of this method is the ability to restore the shape of an object in just one picture, and as a result, the ability to register moving objects. The disadvantages include the sensitivity of the color picture decoding method to the structure of the recorded surface and its color.

2. Algorithms for processing registered images

The used picture of the backlight, shown in Figure 5, consists of 128 narrow vertical stripes of six colors (three main - R, G, B, and three additional - C, M, Y), separated by black intervals. The sequence of colors obtained using the generator of M-sequences, the combination of every 3 adjacent bands occurs only once.

Fig. 5. Used color sequence

The image processing algorithm solves two main tasks:
1. Detection of stripes in the image and determining the position of the center line for each strip (selection of stripes);
2. Determination of color for each selected part of the strip (classification by color).

3. The selection of the centers of the bands

Strip selection algorithms for different backlight patterns with color coding can be divided into two types: with selection of edges (edges) and with selection of peaks (peaks). The lighting pattern used contains gaps in black and requires the use of a second type of algorithm. To select the bands in the image, you can use the direct method of searching for local maxima, the method of crossing the zero line of the second derivative (Gaussian Laplassian detector (LoG)) or the Canny method [6]. The registered image contains three color channels (R, G, B), therefore, to apply the above-mentioned methods, either conversion to one channel is required, using it for processing, or combining the results of the allocation of bands along several channels. After the primary separation of the centers of the bands by these methods, subpixel refinement of the coordinates of the maxima is usually performed either by using interpolation with a parabola [7, 8], or by determining the center of gravity using normalized values in the vicinity.

In various papers, the following methods have been proposed for solving this problem: in [9], the Canny method (Canny) was used according to the brightness component (Y) after transformation into the YCbCr color space; [7] used the direct method of searching for local maxima along the scanning lines along three color channels R, G, B with the subsequent merging of the results at the stage of sub-pixel refinement; in [9], the separation of the centers of the bands was carried out using the second derivative of the color value (V) after conversion to the HSV color space.

Figure 6 presents an image recorded in a dark room. In [5], the implementations of the maxima search algorithms and the determination of their colors, which work for images produced without external illumination, were considered. However, when adding an external light source, the algorithms showed unstable operation.

Fig.6. Image registered in a dark room without illumination

To select the optimal color conversion to highlight the maxima, an analysis of the cross sections of images recorded during the experiments was carried out, and the suitability of using different values from the point of view of selecting reliable algorithm threshold values was evaluated. In [7], values of various quantities were shown: R, G, and B color channels, arithmetic average of three data channels, brightness component (Y) after conversion to YCbCr, and color values (V) after conversion to HSV in cross section of an object image, perpendicular to the direction of the projected bars. Also, in [7] it was concluded that when using a linear combination of colors (R + G + B), pure colors are strongly suppressed, and can be skipped. According to the results of the analysis, we can conclude that the most stable detection of the centers of the bands can be obtained using the value of the color V.

Fig. 7. The values of various quantities in the cross section of the image of the object, perpendicular to the direction of the projected bands.

In the course of performing the work in MatLab, the algorithms for separating the centers of the bands by the color value V and rms from the RGB channels were implemented using the direct method of finding local maxima and a method similar to the Canny method. When implementing the direct method of searching for maxima, two threshold values were set: - the minimum absolute value at the expected maximum, - the minimum value of the difference of values in the expected maximum and its “neighbors”. When implementing the Canni method, at the stage of nonmaximal suppression, instead of the values of the modulus of the gradient, in the first case, the V values themselves are used directly, as in [5], and in the second case, the root-mean-square of RGB. Before converting the image to the HSV color space, a smoothing Gaussian filter was applied to the image; for subpixel refinement of the coordinates of the maxima, a parabola interpolation along the image line was used.

To assess the results of the work of the two methods using different color channels, images recorded during the experiments on the bench used in [5] were used. The characteristics of the devices used and the registration conditions during the experiment are described in detail at the end of this article.
A quantitative assessment of the dependence of the results of the detection of the bands on the color channel used (R + G + B, Y or V) was made according to images of the object in the form of a smooth white plane (“Plane”) and a smooth white object (Lenin's plaster bust, “Bust”). The number of points of the centers of the bands detected by the algorithm when working on the image color channel under consideration when the picture was illuminated with color stripes was compared to the number of points of the centers of the bands detected by the same algorithm when working on the Y brightness channel of the image when the picture was highlighted with white stripes. The threshold values for each color channel were chosen as, where the maximum value for a given color channel within the considered fragment of the image is the same fixed value for all channels. The results confirmed the earlier conclusion on the preferred use of the V color value.

To assess the dependence of the results of the detection of the bands on the method used (the direct method of searching for local maxima or the Canny method), the same images were used. In this case, quantitative assessment is possible only for the “Plane” object, since it is possible to determine for it the “true” number of points of the centers of the bands as the product of the number of bands by the number of lines of the image. For both methods, the detection of bands on a flat object along V produced the same maximum result. For the “Bust” object, it is possible to conduct a qualitative assessment, the results of the detection of bands are shown in Fig. 8. It can be seen that the Canny method allows detecting more local maximum points at the same threshold value due to the use of “strong” and “weak” thresholds, but this advantage is insignificant. The main advantage of the Canny method is that the selected points are already connected into fragments of bands, which can be used further at the stages of classification and decoding to increase the overall reliability of the algorithm.

Fig. 8. The results of the work of the allocation of bands: (a) direct method, (b) Canny method

4 Classification of selected bands by color. Clustering

At this stage, the problem of classifying the selected point-centers of bands by color into 7 types is solved: 6 types, corresponding to 6 used colors and “unclassified” points that cannot be reliably correlated with any of the used colors. To solve this problem, you can apply a number of different methods, among which there are two groups: methods with fixed thresholds and adaptive methods.

Consider possible solutions to this problem in different color spaces: YCbCr and HSV, presented in the literature. The classification is performed by the threshold method, the threshold values are selected in advance. The use of the clustering algorithm makes it possible to adapt the algorithm to changes in ambient light and increase the reliability of classification when working with color objects. Consider clustering algorithms in YCbCr color spaces (for difference components Cb and Cr) and HSV (for saturation S and tone H).

The clustering algorithm consists of two repetitive actions:
• Assign each point to a cluster, the distance to the central point of which is the smallest of all;
• Using the current distribution of points across the clusters, determine the average value for each cluster and assign this value to the central point of the cluster.

Fig.9. The histogram obtained from the white plane

Fig.10. The influence of the structure of the object on the quality of the histogram. On the left is the histogram obtained when processing a frame with a white bust. The right histogram obtained by processing the frame with a colored toy

Fig.11. The results of clustering for the white object a, b - clustering in the Cb-Cr color space; c, d - clustering in the HS color space

The belonging of a point to a cluster is shown in color, the location of the points corresponds to their coordinates in the CbCr plane (Fig. 12, a) and in the HS plane (Fig. 12, b). The figures show that the algorithm incorrectly classifies the set of points with a low saturation value S. The reason for this error is the method of calculating the distance from the point to the center of the cluster, which is determined without taking into account the shape of the cluster. For clustering in Cartesian space, the distance from the point to the center of the cluster without taking into account the shape of the clusters.

This problem can be eliminated by clustering on the color tone H and saturation S, and introducing an artificial anisotropy coefficient. In this case, the distance from an arbitrary point to the nearest cluster can be calculated by the formula, where k <1. The introduction of such anisotropy will take into account the cluster “elongation” along S. In the figure in fig. 11, c, d are the results of clustering according to the values of H and S at k = 1/3. A color shift is used to preserve the integrity of the red cluster.

Also shown in Fig. 12, b, d you can see a lot that can not be attributed with confidence to any of the clusters. Entering the threshold value for saturation S or for the distance to the cluster center, one can select these points into the cluster “unclassified”.

Fig.12 The result of the work of the algorithms for the selection of bands and clustering

Having made decoding we will receive a three-dimensional cloud of points:

Fig.13. Refurbished three-dimensional cloud of dots for a white object

Unfortunately, when adding a weak background or texture, non-linear distortions are introduced and the algorithm stops working. About how they fought against the background and made the algorithm adaptive - in the next part.

Bibliography

1. Hartley RI, Zisserman A. Multiple View Geometry. Cambridge, UK: Cambridge University Press, 2000.
2. Scharstein D., Szeliski, RA taxonomy and evaluation of the two stereoscopic correlation algorithms // International Journal of Computer Vision. 2002. Vol. 47 (1-3).
P. 7–42.
3. Salvi J., Pages J., Batlle J. Pattern cognification strategies in structured light systems // Pattern Recognition. 2004. Vol. 37 (4). P. 827–849.
4. Geng J. Structured-light 3d surface imaging: a tutorial // Advances in Optics and Photonics. 2011. Vol. 3. P. 128–160.
5. Safroshkin M.A. Experimental studies of the parallax method of registering 3d objects with color coding. Molodyozhny Scientific Technical Bulletin, 2013. URL: sntbul.bmstu.ru/doc/608517.html .
6. Gonzalez, R., Woods, R., Eddins, S. Digital Image Processing in MatLab. M .: Technosphere, 2006.
7. Fechteler P., Eisen P. Adaptive color classification for structured light systems // Computer Vision and Pattern Recognition Workshops, 2008. CVPRW '08. IEEE Computer Society Conference. P. 1-7.
8. Fechteler P., Eisen P., Rurainsky J. Fast and High Resolution 3D face scanning // Image Processing, 2007. ICIP 2007. IEEE International Conference. P. 81-847. VK De Wansa Wickramarante, VV Ryazanov, AP Vinogradov. Accurate reconstruction of a human face using a structured light // Pattern Recognition and Image Analysis, 2008. Vol. 18, No. 3, p. 442-446.
9. Xing Lu, Jung-Hong Zhou, Dong-Dong Liu, Jue Zhang, Acta-Mech. Sin (2011). Vol. 27 (6). P. 1098-1104.
10. Permuter, H .; Francos, J .; Jermyn, IHA for Patterns and Dissection / Pattern Recognition, 2006. Vol. 39, No. 4, p. 695–706.6. Zhang Z. Flexible camera calibration at a distance from unknown orientations // International Conference on Computer Vision, 1999. P. 666–673.
11. Zhang Z. Flexible camera alignment from unknown orientations // International Conference on Computer Vision, 1999. P. 666–673.
12. Falcao G., Hurtos N., Massich J. Plane-based system for a projector-camera system // VIBOT Master, 2008.
13. VK De Wansa Wickramarante, VV Ryazanov, AP Vinogradov. Accurate reconstruction of a human face using a structured light // Pattern Recognition and Image Analysis, 2008. Vol. 18, No. 3, p. 442-446

Stereoparu took with urixblog.com .

Source: https://habr.com/ru/post/243771/

All Articles