Continuing the article about the Stanford University project (now Cornell University) " Make3D ", which has set itself the task of restoring a three-dimensional model of a scene from just one photo that has not yet become typical.
The publication consists of: Part 1 , Part 2 It is published to satisfy curiosity, in order to expose the magic to make it clear how it works. ')
Ising model or where did this formula in MRF come from
Effective image segmentation on graphs
We want 3D Model and Location of polygons
Watch will have wider ...
Monocular image features
Borders and refractions
MRF Input and Output
What will MRF do? Target: min (distance error)
Part Two ( current ):
Wish list is ready, we stuff it in the Model MRF
1. Local features: Preliminary determination of distance by local features
2. Connection: Connection
3. Co-planar: Coplanarity
4. Co-linear: Collinearity
Briefly about learning MRF
Briefly about the MRF Solution
Source code and model parameters studied
Program execution, testing
Total
Few files
Wish list is ready, we stuff it in the Model MRF
In order to model the dependence of the parameters of the planes α, the features ( features ) of the image itself, and other properties such as compounds, coplanarity, and collinearity, by analogy with the Gibbs distribution function in a multiplicative form, the MRF model is formulated as follows:
Distribution function for MRF:
In this formula: represent the parameters of the plane for superpixels i . For all points of the i - th superpixel denotes a feature vector for a point Images. denotes the set of all features ( features ) of the superpixel i . Similarly denotes the set of all rays to the superpixel i . The parameter υ represents the level of confidence in the distance to the superpixel, i.e. as far as we are confident in the distance that the model suggested based only on local features ( features ) of the image.
The MRF will be “minimizing” a bunch of the above distances. By analogy with the Gibbs distribution, the MRF will look for the state of the system with the smallest amount of energy (in our case, this will be the “error in determining distances”) - this is the most likely state of the model.
Now about the types of distances more ...
1. Local features: Preliminary determination of distance by local features
In the MRF formula, the first factor is a function that simulates the parameters of planes depending on the features ( features ) of the image . Let the true distance to the point located on the i- th superpixel will be equal to , then the equality is true: where represents the beam connecting the camera point to the point Images. Then, if the model has determined the preliminary distance by local features of the image ( features ) as , then the relative error is:
Relative error:
Pic.17 Accuracy of determining distances according to local features
Thus, in order to minimize the relative error at all points of the superpixel, in the MRF model, the dependence of the distance to superpixels and local features ( features ) of an image is modeled by the formula:
Minimize local error:
The parameters in this model are . In " Make3D ", different parameters are used for different image rows r = 1, ..., 11 assuming that the photograph was most likely taken horizontally and therefore has different statistical patterns for different parts of the image: most often there will be sky or roof above Images will most often be lawn or road. Options represent the confidence level of a predetermined distance to the point .
Variables i.e. confidence levels for the distance to the point based only on local properties of superpixels, are preliminarily studied by monocular image features ( monocular fetures ), evaluating the results obtained using true distances ( ground-truth ) to superpixels by the formula: .
The second factor in the MRF formula is a function that simulates the relationship between the parameters of the planes and for two superpixels i and j . In turn, it decomposes into a product of 3 functions. establishing interrelations of various types: connection, coplanarity, collinearity for points owned by corresponding superpixels.
Super Pixel Simulation:
2. Connection: Connection
2 pairs are selected point , lying on the border of neighboring superpixels i and j . In order for these superpixels to be connected in the restored three-dimensional model of the scene, if they are connected in reality, then the distance between these points is minimized in the MRF .
Pic.18 Minimizing the distance of the connection
For the distance from the camera center to the point equality is fulfilled similarly for point performed . Magnitude gives an estimate of the distance ( fractional distance ) at where and preliminary distances to the corresponding super pixels.
Minimization in MRF is carried out using the formula:
It is worth noting that if superpixels belong to different objects and points , lie on the border of these various objects then = 0 and "connection" is thus not performed.
3. Co-planar: Coplanarity
3 pairs of points are selected ( , ), 2 pairs of points lying on the border of neighboring superpixels i and j , as before, and the third pair points lying in the center of the corresponding super pixel. In order for these superpixels to lie in the same plane in the restored three-dimensional model of the scene, if this is the case, then the MRF minimizes the distance between the planes on which the points lie along one ray.
Pic.19 Minimizing the distance of coplanar planes
Minimization in MRF is carried out using the formula:
And the product of the corresponding errors is calculated. , it should be noted that if two super pixels are coplanar, then = 1. If these superpixels are coplanar, but are at some distance from each other, i.e. if the first two pairs of points do not lie on the borders, but are spaced apart in space, then this metric will still work.
4. Co-linear: Collinearity
Suppose two superpixels i and j lie along one straight line in a photograph. In three-dimensional space, an infinite number of straight lines can be projected onto this straight line in a two-dimensional one. However, a long straight on a photo with a high probability will also be the length of a straight line and in space. Therefore, in the " Make3D " model, the deviation of a point from superpixel j from a straight line passing through the plane in which superpixel i lies is also minimized.
In more detail, let's say two superpixels i and j lie in planes with corresponding parameters and , but also lie along one straight line in a two-dimensional image. For point belonging to the superpixel j , the MRF minimizes the distance estimate ( fractional distance ) along the beam from point , to a point lying in the plane of superpixel i along the same beam:
Pic.20 Minimizing the distance of collinear planes
Minimizing the distance of the collinear planes:
And the product of the corresponding errors is calculated. . More detail for point true equality and . In this way, gives an estimate of the distance ( fractional ) at . In this case, characterizes not the "probability" of the boundary between these super pixels, but is set depending on the length of the line along which the super pixels are located and its curvature.
Solution accepted
Briefly about learning MRF
Since it is not possible to determine the exact parameters for the entire solution space of the model, the authors of “Make3D” used MCL ( Multi-Conditional Learning ) to break the learning task into smaller subtasks for each distribution density. MCL allows you to optimize a graphical model, presenting it as the product of several conditional probabilities with boundary conditions, each of which operates with both general parameters from the combined model and its own subset of variables with the specified conditions.
Study of parameters ground-truth depths and certain quantities and happens with the maximization of the conditional probability function as:
In the formula function independent of parameter Thus, the study of this parameter is reduced to the problem of minimizing - deviation norms using linear programming ( LP ) methods:
Briefly about the MRF Solution
When constructing a three-dimensional model of a scene from a loaded photo, the maximum a posteriori probability (MAP) for the parameters of the polygon planes α is also estimated by maximizing the conditional probability function . The solution is carried out for parameters α as for continuous values. Thus, it is calculated:
In view of the fact that the normalizing factor Z does not depend on the parameter α
K is the number of superpixels in the image; N (i) is the set of neighbors i superpixels; the set of points located on the border between superpixels i and j , through which connectivity is modeled; this is the central point of the super pixel j , through which co-linearity and coplanarity ( co-planarity ) are modeled; . Each element of the sum represents -norm of a linear function of α , so the solution is reduced to the minimization problem - deviation rates. Using variable substitution, the task can be rewritten in matrix form:
In this case, the solution of the problem is found using the methods of linear programming ( LP ). In the “Make3D” project, the modified Newton method is used to effectively find theHessian taking into account the sparsity of the matrices.
Pic.21 Illustration for finding the minimum of a function
Practical implementation
Source code and model parameters studied
On the official site of the project " Make3D ", the authors laid out in open access the source code of the method for restoring the three-dimensional model of the scene from one photo. The open-access MRF training part of the project is not provided, but you can download the parameters of the already trained Markov network (MRF). The parameters were obtained by taking photographs using a laser, to measure the true distance (ground-truth). The resolution of the obtained images of distances is 55x305, the resolution of the photographs for which the training was conducted is 2272x1704. The MRF network was trained on 400 copies of pairs of images. The downloadable database of parameters is approximately 150 MB. The source code is a program written in the MATLAB language, the project also uses third-party designs written in the C / C ++ language, which are compiled into MEX files, to work with them from the MATLAB environment.
Program execution, testing
The project was launched on a laptop with an Intel Core 2 Duo T7500 CPU (2.20 Ghz), with 2 GB RAM, running the Microsoft Windows 7 Home Premium operating system, inside the VMWare Workstation 7.0.0 virtual machine under the Linux Mint 8 Helena operating environment, kernel version 2.6.31-generic, with a dedicated volume of 1.7 GB RAM. The programs were executed in the MATLAB R2009b (7.9.0.529 32-bit) environment. The processing of one photo on average took 270-300 seconds, and used about 1 GB of RAM in the Linux environment. The resulting VRML file with the extension * .wrl, takes an average of 120-160 KB. Viewing * .wrl files was performed using the player " Cortona3D Viewer ".
Total
The technology is far from flawless, and this is acknowledged by the researchers themselves. Now the project is best at analyzing landscapes and landscapes, but the transformation into a three-dimensional view of close-up objects is still unsatisfactory. The reason for this behavior of the model is very likely due to the relatively small training set of photographs, most of which were landscapes of the Palo Alto area in summer, a small laser resolution for measuring true distances to objects, and many system parameters that are not corrected depending on features of the recognizable scene and parameters of the photographic equipment on which the pictures were taken. However, the authors of the project have already begun to work towards improving the system: adding elements of interactivity and receiving feedback when restoring a three-dimensional model, and exploring the technology of combining monocular signs with restoring a three-dimensional model in several photographs.
To avoid such an article in terms of volume, I will not dwell on the use of such algorithms. As one of the most obvious examples: take common services, such as Google Street View or Bing Maps 3D , in which three-dimensional images are already available, but mostly only central streets, and the rest of the area is usually ignored, not to mention already on the interior of large supermarkets and warehouses. Adhering to the concept of Web 2.0 , in which users add content for the services themselves, Google and Microsoft are already developing the SketchUp and 3DVIA Shape products, respectively, so that users can independently build three-dimensional models of the buildings that surround them. In this case, a project like “Make3D” could harmoniously fit into the services for the initial processing of one or several photos of the object from a regular phone, helping to build a preliminary three-dimensional model, and thereby simplify user input.
Few files
If someone wants to fly himself in the "3D model", and time Tietnot to understand, then I suggest a link to the file with the models from the video. There are 10 models (photos and * .wrl), the entire file weighs ~ 11.1 MB , with you only Cortona3D Viewer
As long as my DropBox (please tell me the normal Fileupload - reload)
PS Picture with humor, about the reasons for the publication of this sheet: