📜 ⬆️ ⬇️

Make3D from one photo, part 2



Continuing the article about the Stanford University project (now Cornell University) " Make3D ", which has set itself the task of restoring a three-dimensional model of a scene from just one photo that has not yet become typical.

The publication consists of: Part 1 , Part 2
It is published to satisfy curiosity, in order to expose the magic to make it clear how it works.
')


Content:


Part One ( here ):Part Two ( current ):


Wish list is ready, we stuff it in the Model MRF


In order to model the dependence of the parameters of the planes α, the features ( features ) of the image itself, and other properties such as compounds, coplanarity, and collinearity, by analogy with the Gibbs distribution function in a multiplicative form, the MRF model is formulated as follows:

Distribution function for MRF:

MRF


In this formula: α_i represent the parameters of the plane for superpixels i . For all S_i points of the i - th superpixel x_(i,s_i) denotes a feature vector for a point s_i Images. X_i={x_(i,s_i)∈R^524:s_i=1,…,S_i} denotes the set of all features ( features ) of the superpixel i . Similarly R_i={R_(i,s_i):s_i=1,…,S_i} denotes the set of all rays to the superpixel i . The parameter υ represents the level of confidence in the distance to the superpixel, i.e. as far as we are confident in the distance that the model suggested based only on local features ( features ) of the image.

The MRF will be “minimizing” a bunch of the above distances.
By analogy with the Gibbs distribution, the MRF will look for the state of the system with the smallest amount of energy (in our case, this will be the “error in determining distances”) - this is the most likely state of the model.




Now about the types of distances more ...



1. Local features: Preliminary determination of distance by local features


In the MRF formula, the first factor f_1(.) is a function that simulates the parameters of planes α_i depending on the features ( features ) of the image x_(i,s_i) . Let the true distance to the point s_i located on the i- th superpixel will be equal to d_(i,s_i) , then the equality is true: R_(i,s_i)^T α_i=1/d_(i,s_i) where R_(i,s_i) represents the beam connecting the camera point to the point s_i Images. Then, if the model has determined the preliminary distance by local features of the image ( features ) as d^_(i,s_i)=x_(i,s_i)^T θ_r , then the relative error is:

Relative error:



Pic.17 Accuracy of determining distances according to local features

Pic.17


Thus, in order to minimize the relative error at all points S_i of the superpixel, in the MRF model, the dependence of the distance to superpixels and local features ( features ) of an image is modeled by the formula:

Minimize local error:



The parameters in this model are θ_r∈R^524 . In " Make3D ", different parameters are used for different image rows r = 1, ..., 11 assuming that the photograph was most likely taken horizontally and therefore has different statistical patterns for different parts of the image: most often there will be sky or roof above Images will most often be lawn or road. Options υ_i={υ_(i,s_i):s_i=1,…,S_i} represent the confidence level of a predetermined distance d^_(i,s_i) to the point s_i .

Variables υ_(i,s_i) i.e. confidence levels for the distance to the point s_i based only on local properties of superpixels, are preliminarily studied by monocular image features ( monocular fetures ), evaluating the results obtained using true distances ( ground-truth ) d_i to superpixels by the formula: |d_i-x_i^T θ_r |/d_i .

The second factor in the MRF formula f_2(.) is a function that simulates the relationship between the parameters of the planes α_i and α_j for two superpixels i and j . In turn, it decomposes into a product of 3 functions. h_(s_i,s_j)(.) establishing interrelations of various types: connection, coplanarity, collinearity for points {s_i,s_j} owned by corresponding superpixels.

Super Pixel Simulation:





2. Connection: Connection


2 pairs are selected point s_i , s_j lying on the border of neighboring superpixels i and j . In order for these superpixels to be connected in the restored three-dimensional model of the scene, if they are connected in reality, then the distance between these points is minimized in the MRF .

Pic.18 Minimizing the distance of the connection

Pic.18


For the distance from the camera center to the point s_i equality is fulfilled R_(i,s_i)^T α_i=1/d_(i,s_i) similarly for point s_j performed R_(j,s_j)^T α_j=1/d_(j,s_j) . Magnitude (R_(i,s_i)^T α_i-R_(j,s_j)^T α_j)d^ gives an estimate of the distance ( fractional distance ) |(d_(i,s_i )-d_(j,s_j ) )/√(d_(i,s_i ) d_(j,s_j ) )| at d^=√(d^_(s_i ) d^_(s_j )) where d^_(s_i) and d^_(s_j) preliminary distances to the corresponding super pixels.

Minimization in MRF is carried out using the formula:


It is worth noting that if superpixels belong to different objects and points s_i , s_j lie on the border of these various objects then y_ij = 0 and "connection" is thus not performed.



3. Co-planar: Coplanarity


3 pairs of points are selected ( s_i , s_j ), 2 pairs of points lying on the border of neighboring superpixels i and j , as before, and the third pair s_i^'',s_j^'' points lying in the center of the corresponding super pixel. In order for these superpixels to lie in the same plane in the restored three-dimensional model of the scene, if this is the case, then the MRF minimizes the distance between the planes on which the points lie s_i^'',s_j^'' along one ray.

Pic.19 Minimizing the distance of coplanar planes

Pic.19


Minimization in MRF is carried out using the formula:


And the product of the corresponding errors is calculated. h_(s_i^'',s_j^'')(.)=h_(s_i^'' )(.)h_(s_j^'' )(.) , it should be noted that if two super pixels are coplanar, then h_(s_i^'',s_j^'') = 1. If these superpixels are coplanar, but are at some distance from each other, i.e. if the first two pairs of points do not lie on the borders, but are spaced apart in space, then this metric will still work.



4. Co-linear: Collinearity


Suppose two superpixels i and j lie along one straight line in a photograph. In three-dimensional space, an infinite number of straight lines can be projected onto this straight line in a two-dimensional one. However, a long straight on a photo with a high probability will also be the length of a straight line and in space. Therefore, in the " Make3D " model, the deviation of a point from superpixel j from a straight line passing through the plane in which superpixel i lies is also minimized.

In more detail, let's say two superpixels i and j lie in planes with corresponding parameters α_i and α_j , but also lie along one straight line in a two-dimensional image. For point s_j belonging to the superpixel j , the MRF minimizes the distance estimate ( fractional distance ) along the beam R_(j,s_j) from point s_j , to a point lying in the plane of superpixel i along the same beam:

Pic.20 Minimizing the distance of collinear planes

Pic.20


Minimizing the distance of the collinear planes:


And the product of the corresponding errors is calculated. h_(s_i,s_j)(.)=h_(s_i )(.) h_(s_j )(.) . More detail for point s_j true equality R_(j,s_j)^T α_j=1/d_(j,s_j) and R_(j,s_j)^T α_i=1/d_(j,s_j)^' . In this way, (R_(j,s_j)^T α_i-R_(j,s_j)^T α_j)d^ gives an estimate of the distance ( fractional ) |(d_(j,s_j)-d_(j,s_j)^')/√(d_(j,s_j )d_(j,s_j)^')| at d^=√(d^_(j,s_j ) d^_(j,s_j)^') . In this case, y_ij characterizes not the "probability" of the boundary between these super pixels, but is set depending on the length of the line along which the super pixels are located and its curvature.



Solution accepted


Briefly about learning MRF


Since it is not possible to determine the exact parameters for the entire solution space of the model, the authors of “Make3D” used MCL ( Multi-Conditional Learning ) to break the learning task into smaller subtasks for each distribution density. MCL allows you to optimize a graphical model, presenting it as the product of several conditional probabilities with boundary conditions, each of which operates with both general parameters from the combined model and its own subset of variables with the specified conditions.

Study of parameters θ_r∈R^524 ground-truth depths d and certain quantities y_ij∈[0,1] and υ_(i,s_i) happens with the maximization of the conditional probability function logP(α|X,υ,y,R;θ_r) as:


In the formula function f_2(.) independent of parameter θ_r Thus, the study of this parameter is reduced to the problem of minimizing L_1 - deviation norms using linear programming ( LP ) methods:
MRF



Briefly about the MRF Solution


When constructing a three-dimensional model of a scene from a loaded photo, the maximum a posteriori probability (MAP) for the parameters of the polygon planes α is also estimated by maximizing the conditional probability function logP(α|X,υ,y,R;θ_r) . The solution is carried out for parameters α as for continuous values. Thus, it is calculated:

MRF

In view of the fact that the normalizing factor Z does not depend on the parameter α

MRF

K is the number of superpixels in the image; N (i) is the set of neighbors i superpixels; B_ij the set of points located on the border between superpixels i and j , through which connectivity is modeled; C_j this is the central point of the super pixel j , through which co-linearity and coplanarity ( co-planarity ) are modeled; d^_(s_i,s_j)=√(d^_(s_i ) d^_(s_j)) . Each element of the sum represents L_1 -norm of a linear function of α , so the solution is reduced to the minimization problem L_1 - deviation rates. Using variable substitution, the task can be rewritten in matrix form:

MRF


In this case, the solution of the problem is found using the methods of linear programming ( LP ). In the “Make3D” project, the modified Newton method is used to effectively find the Hessian taking into account the sparsity of the matrices.

Pic.21 Illustration for finding the minimum of a function

Pic.21




Practical implementation


Source code and model parameters studied


On the official site of the project " Make3D ", the authors laid out in open access the source code of the method for restoring the three-dimensional model of the scene from one photo. The open-access MRF training part of the project is not provided, but you can download the parameters of the already trained Markov network (MRF).
The parameters were obtained by taking photographs using a laser, to measure the true distance (ground-truth). The resolution of the obtained images of distances is 55x305, the resolution of the photographs for which the training was conducted is 2272x1704. The MRF network was trained on 400 copies of pairs of images. The downloadable database of parameters is approximately 150 MB.
The source code is a program written in the MATLAB language, the project also uses third-party designs written in the C / C ++ language, which are compiled into MEX files, to work with them from the MATLAB environment.

Program execution, testing


The project was launched on a laptop with an Intel Core 2 Duo T7500 CPU (2.20 Ghz), with 2 GB RAM, running the Microsoft Windows 7 Home Premium operating system, inside the VMWare Workstation 7.0.0 virtual machine under the Linux Mint 8 Helena operating environment, kernel version 2.6.31-generic, with a dedicated volume of 1.7 GB RAM. The programs were executed in the MATLAB R2009b (7.9.0.529 32-bit) environment.
The processing of one photo on average took 270-300 seconds, and used about 1 GB of RAM in the Linux environment. The resulting VRML file with the extension * .wrl, takes an average of 120-160 KB. Viewing * .wrl files was performed using the player " Cortona3D Viewer ".




Total


The technology is far from flawless, and this is acknowledged by the researchers themselves. Now the project is best at analyzing landscapes and landscapes, but the transformation into a three-dimensional view of close-up objects is still unsatisfactory. The reason for this behavior of the model is very likely due to the relatively small training set of photographs, most of which were landscapes of the Palo Alto area in summer, a small laser resolution for measuring true distances to objects, and many system parameters that are not corrected depending on features of the recognizable scene and parameters of the photographic equipment on which the pictures were taken. However, the authors of the project have already begun to work towards improving the system: adding elements of interactivity and receiving feedback when restoring a three-dimensional model, and exploring the technology of combining monocular signs with restoring a three-dimensional model in several photographs.

To avoid such an article in terms of volume, I will not dwell on the use of such algorithms. As one of the most obvious examples: take common services, such as Google Street View or Bing Maps 3D , in which three-dimensional images are already available, but mostly only central streets, and the rest of the area is usually ignored, not to mention already on the interior of large supermarkets and warehouses. Adhering to the concept of Web 2.0 , in which users add content for the services themselves, Google and Microsoft are already developing the SketchUp and 3DVIA Shape products, respectively, so that users can independently build three-dimensional models of the buildings that surround them. In this case, a project like “Make3D” could harmoniously fit into the services for the initial processing of one or several photos of the object from a regular phone, helping to build a preliminary three-dimensional model, and thereby simplify user input.


Few files


If someone wants to fly himself in the "3D model", and time Tietnot to understand, then I suggest a link to the file with the models from the video. There are 10 models (photos and * .wrl), the entire file weighs ~ 11.1 MB , with you only Cortona3D Viewer

As long as my DropBox (please tell me the normal Fileupload - reload)



PS Picture with humor, about the reasons for the publication of this sheet:

,

Beginning of the article: Part 1

Source: https://habr.com/ru/post/95559/


All Articles