📜 ⬆️ ⬇️

Fast stitching panorama



Panoramic shooting has long been widespread, it is supported by built-in applications for working with the camera on most smartphones and tablets. Applications stitching panoramas, work like this: they get several images, find matching elements and connect them. Device manufacturers typically use their own methods for stitching, which work very quickly. There are also several open source alternative solutions.

For more information about the implementation of stitching panoramas, as well as about the new approach, when two cameras are used to remove the full circular panorama, see my previous publication . This document provides a brief comparison of two popular libraries, and then follows a detailed description of creating an application that can quickly stitch images into a panorama.

OpenCV * and PanoTools *


I tested the two most popular open source libraries: OpenCV and PanoTools. I started with PanoTools - this is a fairly advanced stapling library available for Windows *, Mac OS * and Linux *. It supports many advanced features and ensures consistent quality. The second library reviewed is OpenCV. OpenCV is a very large project containing several image processing libraries, with an extensive user base. It is available for Windows, Mac OS, Linux, Android * and iOS *. Both libraries include sample stapling applications. The application as part of PanoTools completed the work in 1 minute 44 seconds. An application as part of OpenCV - in 2 minutes 16 seconds. Although PanoTools has a higher speed, we decided to use OpenCV as a starting point, given the significant user base of this library and its availability on mobile platforms.
')

Description of the original application and test script


We will use the OpenCV sample application cpp-example-stitching_detailed as a starting point. The application launches the stapling pipeline, which includes several separate steps. Here is a brief appointment of these steps.
  1. Import images.
  2. Search for items.
  3. Pairwise comparison.
  4. Warp images.
  5. Drafting.
  6. Mixing.

For testing, we used a tablet with a quad "system on a chip" Intel Atom Z3770 with 2 GB of RAM running Windows 8.1. The load consisted in stitching 16 images with a resolution of 1280 x 720.

Multi-threaded item lookup using OpenMP *


Most of the stages of the pipeline consists of repetitive work, which is carried out with images that do not depend on one another. As a result, these steps are great for multi-thread processing. All these stages use a for loop, so you can very easily parallelize these blocks of code using OpenMP.
The first stage we parallelize is the search for items. First, add the OpenMP compiler directive above the for loop:

#pragma omp parallel for for (int i = 0; i < num_images; ++i) 

Now the loop will be executed in several threads. But in the loop we set the values ​​of the variables fullj'mg and img. Because of this, there will be competition between threads, and this will affect the result. The easiest way to solve this problem is to convert variables into vectors. We take the following variable declarations:

 Mat full_imgj img; 

and replace them with:

 vector<Mat> full_img(num_images); vector<Mat> img(num_images); 

Now in the loop, we will change each occurrence of each variable to its new name.
full_img becomes full_img [i]
img becomes img [i]

The content loaded in full_img and img is used later in the application, so we will not free up memory to speed up. Delete these lines:

 full_img.release(); img.releaseQ; 

Then you can remove this line from the compilation stage:

 full_img = imread(img_names[img_idx]); 

On full_img we refer again when scaling in the drafting cycle. Change the variable names again:
full_img becomes full_img [img_idx]
img becomes img [img_idx]

So, the first cycle is parallelized. Now you need to parallelize the warp cycle. First, add a compiler directive to make the loop parallel:

 #pragma omp parallel for for (int i = 0; i < num_images; ++i) 

This is all that is needed to make the cycle parallel, but you can still optimize this section a little more. There is a second for loop immediately after the first. We can transfer work from it to the first cycle in order to reduce the number of threads launched. Move this line to the first for loop:

 images_warped[i].convertTo(images_warped_f[i], CV_32F); 

You also need to move the definition of the images_warped_f variable above the first loop:

 vector<Mat> images_warped_f(num_images); 

Now you can parallelize the compilation cycle. Add a compiler directive before the for loop:

 #pragma omp parallel for for (int img_idx = 0; img_idx < num_images; ++img_idx) 

The third cycle is parallelized. After all these changes, the load is performed in 2 minutes 8 seconds, that is, 8 seconds faster than before.

Optimization of the pairwise comparison algorithm


Pairwise comparison of elements is implemented so that each image is compared with each other image, because of which the amount of work increases according to the formula O (n 2 ). This is not necessary, since we know the order in which the images follow. It is necessary to rewrite the algorithm so that each image is compared only with adjacent in order images.
To do this, change this block here:

 vector<MatchesInfo> pairwise_matches; Best0f2NearestMatcher matcher(try_gpUj match_conf); matcher(featureSj pairwise_matches); matcher. collectGarbageQ; 

on this:

 vector<MatchesInfo> pairwise_matches; Best0f2NearestMatcher matcher(try_gpUj match_conf); Mat matchMask(feat ures.size()j features.size(),CV_8U, Scalar(0)); for (int i = 0; i < num_images -1; ++i) { matchMask.at<char>(i,i+l) =1; } matcher(featureSj pairwise_matcheSjmatchMask); matcher. collectGarbageQ; 

Now the processing time is reduced to 1 minute 54 seconds, that is, we won 14 seconds. Please note that images must be imported in sequential order.

Optimization of parameters


Options are available that allow you to change the resolution at which we compare and mix images. Thanks to an improved comparison algorithm, we have increased error resilience and can reduce the values ​​of some of these parameters in order to significantly reduce the amount of work.
We changed the default settings:

 double work_megapix =0.6; double seam_megapix = 0.1; float conf_thresh = lf; string warp_type = "spherical"; int expos_comp_type = ExposureCompensator::GAIN_BLOCKS; string seam_find_type = "gc_color"; 

on this:

 double work_megapix = 0.08; double seam_megapix = 0.08; float conf_thresh = 0.5f; string warp_type = "cylindrical"; int expos_comp_type = ExposureCompensator::GAIN; string seam_find_type = "dp_colorgrad"; 

After changing these parameters, the load handling took 22 seconds, which is 1 minute 40 seconds faster. This acceleration is mainly due to the change in the values ​​of the work_megapix and seam_megapix parameters. Processing has accelerated significantly since we are now comparing and stitching for very small images. This reduces the number of distinctive elements that can be found and compared, but at the expense of an improved comparison algorithm, we can slightly sacrifice accuracy.
Removing unnecessary work
In the compilation cycle there are two blocks of code that do not need to be repeated, since we initially work with images of the same size. They relate to resizing the mismatched images and initiating blending. These code blocks can be moved immediately before the creation cycle:

 if (!is_compose_scale_set) { … } if (blender.empty()) { … } 

Note that there is one line in the deformation code, where you need to replace full_img [img_idx] with full_img [0]. Due to this change, we can speed up processing by 2 seconds: now it takes 20 seconds.

Moving Search Items


We have made another change, but its implementation depends on how the stapling application works. In our case, the application takes images and then stitches them right after taking all the images. If your application has the same situation, you can move that part of the pipeline, which is responsible for the search for elements, to the stage of shooting images. To do this, you need to run the search algorithm elements immediately after the first image was captured and save the data until the moment they are needed. In our experiments, this led to an acceleration of crosslinking of about 20%, that is, in this case, the crosslinking time is reduced to 16 seconds.

Logging


Initially, logging is not enabled, but note that when it is turned on, performance decreases slightly. We found that if you enable logging, the staple time increases by about 10%. Therefore, in the final version of the application, it is important to disable logging.

Conclusion


Given the popularity of applications for panoramic shooting on mobile platforms, it is important to have an open source solution that can quickly stitch images into a panorama. By speeding up the stitching of images, we made the application more convenient for end users. Thanks to all these changes, we managed to reduce the total stitching time from 2 minutes 18 seconds to 16 seconds, that is, to speed up processing by about 8.5 times. Detailed information is provided in this table.

ChangeTime reduction
Multithreading with OpenMP *0:08
Pairwise comparison algorithm0:14
Optimization of initial parameters1:40
Removing unnecessary work0:02
Moving Search Items0:04

The software and loads used in the performance tests could be optimized to achieve high performance only on Intel microprocessors. Performance tests, such as SYSmark * and MobileMark *, are conducted on specific computer systems, components, programs, operations, and functions. Any changes to any of these elements may result in a change in results. When selecting products to be purchased, refer to other information and performance tests, including performance tests of a particular product in combination with other products.
Configurations: Intel Atom Z3770 quad-core "system-on-chip" with 2 GB of RAM running Windows 8.1. For more information, see the Intel site .

Source: https://habr.com/ru/post/256533/


All Articles