How we ported OpenCV to WindowsRT

OpenCV, along with the expansion of its functionality, continues to migrate to different platforms. And new platforms are other compilers, system API features, testing, and many new adventures. In this article, we would like to share the experience of porting a large and fairly well-established project to a new hardware and software platform. Under Windows RT, we will understand both the new API for developing custom Windows Runtime applications and the Windows RT operating system for ARM-based processors. I must say that the main goal was the second point, it was the customer who needed it.

First approach

The library's CMake-based infrastructure has repeatedly experienced relocation to different platforms, but this time the blitzkrieg failed. As it turned out, cmake (then 2.8.11) does not support the generation of projects for Visual Studio under Windows RT. The workaround was quickly found on the network: instead of generating projects for Visual Studio under Windows, you can use one of the make-file generators. To build, you can then use make, nmake or Ninja, you just need to configure the environment to build with the ARM compiler.

In the struggle to compile existing code, many small nuances have surfaced. In particular, building a typical Win32 application for the ARM architecture is limited at the level of header files and Visual Studio projects. Therefore, to successfully compile the cmake verification files and our tests, we had to make a hack — add the _ARM_WINAPI_PARTITION_DESKTOP_SDK_AVAILABLE macro to the compiler command line. It suppresses platform checking. The same is true for all tests and examples, developing together with the library.
')
To solve all problems with macros and compilation keys, a pseudo- toolchain file was added to the library. “Pseudo” due to the fact that it does not set the path to the compiler, OS, platform, etc., but only adds the necessary flags and macros. In addition, I had to install the Windows Platform SDK . It brings some more static libraries from C and C ++ runtime for ARM.

In an attempt to bring the highgui module build to the end, I had to sacrifice some parts for working with video and windows - the system libraries lack the functions necessary for layout. On Windows RT there is no support for the old API for windows, GDI, Video For Windows and some parts of COM. I also had to give up support for FFmpeg. This is a good and portable library, but additional efforts are needed to integrate it. As a result, we were left without windows and without input-output video. Windows is not a big loss, their support is needed only for prototyping and debugging, rather than creating the final application. A video input-output, on the contrary, is a very necessary thing, and it was worth working on it.

A couple of taut cogs - and the tests successfully gathered and almost successfully ran on the ARM-tablet. But successful compilation and execution of tests of the main library is not even half the battle. In addition to OpenCV itself, I also wanted to transfer a large number of optimizations made in assembly and intrinsics for Android and compiled for GCC, add support for Intel TBB to the new platform, still support I / O video and much more.

Video I / O

Active development for Windows RT began with video I / O. Beginning with Windows XP, a new multimedia API has appeared in Windows operating systems - the Microsoft Media Foundation . As it quickly turned out, this is the only API for video on ARM tablets. The new implementation for cv :: VideoCapture promised great benefits for both desktop and tablet PCs. At this point, OpenCV already had support for some backends for working with video in Windows, including through the outdated Direct Show. The implementation of cv :: VideoCapture based on DirectShow is made using the VideoInput library. It was hoped that the VideoInput project was alive and, perhaps, in some form supported by the Media Foundation. But the autopsy showed that the project was abandoned for quite a long time, and even our copy of the library went very far in comparison with the original. The big breakthrough in the work was the find on CodeProject. Our compatriot from the Far East, Evgeny Peregud, has implemented his VideoInput-like library to integrate the Media Foundation into its project with OpenCV (!). Thank you very much for the work done.

We decided to use the published code as a base for the new implementation of cv :: VIdeoCapture. In the process of adaptation, we had to get rid of the assembler inserts and replace some of the calls that could not be linked to ARM. In addition, slightly modifying the construction of the pipeline, it turned out to add reading and writing video files. On the desktop, the new cv :: VideoCapture worked fine with the camera, but on the tablet we were in for an unpleasant surprise. In the ARM application compiled for the Win32 subsystem, the MediaFoundation pipeline initialization proceeded normally, but the video could not be obtained: the conveyor hung and not a single frame came from the camera. In the application assembled for container operation, another anomaly was observed. The enumeration of cameras with the MFEnumDeviceSources function produced a bunch of garbage and NULL instead of a pointer to the camera interfaces. As it soon turned out, in the first case, the system broker is to blame, who monitors access to the camera and other devices. We have a Win32 application that does not have a manifest, so we cannot allow access to the camera. In the second case, the problem lies in the implementation of the MFEnumDeviceSources function, and more precisely in its absence. It is listed in the list of prohibited functions for container applications and, unlike many others that have been banned, is not implemented at all. More precisely, to avoid linking errors, a stub is called, which returns NULL instead of the interface pointer and garbage in the description. As a result, we managed to support the Media Foundation for desktop systems, but the question remained open to ARM

Optimization

The next step was to build and integrate the TBB library. At that time, the library was already actively used as a tool for parallelizing algorithms, which also wanted to be transferred to Windows RT for ARM. But to make the adaptation quickly and a little blood did not work. In addition to standard system calls, TBB uses its own set of low-level synchronization primitives for each platform. In all previous cases, when we needed to use TBB, the GCC compiler was used, and in the library code, we worked on a branch with GCC-specific intrinsics. But this time needed a new branch. The library development team helped us in porting TBB for Windows RT to ARM. Special thanks to Vladimir Polin ( vpolin ). Stable Windows RT support appeared in TBB 4.1 update 4.

After putting in order the code of the library itself, the hands reached the set of their own developments and optimizations. Most of them were written for Android and the GCC compiler and were never built for Windows.

Let's start with the optimizations made on intrinsics and inline assembler. With intrinsikami business turned out very well. All intrinsics for NEON-instructions, as well as names of data types, coincide with those in GCC and practically do not require code modifications. The only exception is register variables. When working with a studio compiler, you cannot explicitly specify a register name. Binding of variables to specific registers had to be done for old versions of the android NDK, the compiler in which sometimes fooled and used the same registers twice - for the plus code and intrinsics, spoiling the local variables.

But with the assembler, everything turned out to be very bad. The new versions of the compiler from Microsoft do not support inline assembler for x86_64 and ARM, so the idea of a quick way, correcting the difference in syntax, to move the assembler code, had to be abandoned. We rewrote the most significant parts using compiler intrinsics, but even here it was not without surprises. The studio compiler does not have intrinsics for VFPv3 (hardware implementation of floating-point math for ARM) instructions and the only normal way to use them explicitly is to implement the functions completely in assembler and put such functions into a separate asm-file. The approach with separate compilation of the assembler did not suit us at all, because significant modifications were required in the build infrastructure. It is also important that the compiler refuses to embed assembler functions and to carry out any optimizations of their call. The problem was solved only by using the __emit keyword. It allows you to insert in any C or C ++ code block an arbitrary piece of machine code. Knowing the calling conventions and opcodes of instructions, you can compile any missing intrinsic yourself.

Cleaning Win32 API

The final step in bringing OpenCV to readiness was passing a certification test for container applications. This is a prerequisite for all applications in the Marketplace.
First I had to add / appcontainer to the linker flags. Then went the sweep used by the system API. All the ANSI versions of the Win32 calls, as well as some of the functions that have the Ex version, such as InitializeCriticalSection, have come under disgrace.

Special attention had to be paid to the WinT-API's GetTempFileName function that was unremarkable at first glance. In OpenCV, it is used in the tempfile call. GetTempFileName does not have a suitable counterpart that could be used in a container. The only option found is calls from the Windows Runtime library. The code for the function is taken here , and here - how to link with Windows Runtime when building a normal application.

In addition to GetTempFileName, there was no replacement for Thread Local Storage (TLS) interaction functions. But in this case, the alternative was found rather quickly and where it was not expected at all - C ++ 11. The new C ++ standard includes tools for working with TLS data, which made it possible to move from a system-dependent API to language constructs.

Replacing the Win32 API calls for working with TLS with new language constructs was the last step to pass certification testing. After adding his screensaver, icons for all occasions and descriptions to the test application, the test turned green.

Continuous integration

In parallel with porting the code, work was underway to develop the test infrastructure of the project. Unfortunately, there were no acceptable ways to run tests remotely on a tablet with ARM, so all automatic testing was done on the desktop, and the tablet was tested manually.

The first rake appeared on the horizon during the integration of the appcert.exe test. This is a standard test for certification of applications for the Marketplace, made in the form of an executable file and several libraries. The test has a command line interface and runs well from the console manually. But it was enough to add it to the list of steps of Buildbot , as interesting surprises began. First, the test generates additional consoles and writes all the output to them. So, Buildbot doesn't get anything to enter. Secondly, the test hangs in some places, not reaching the end of 20 minutes allotted for a step, despite the fact that in manual mode, the test passes on the strength of 2-3 minutes.

The solution to the paradox was found on the MSDN forum in discussing the integration of similar steps in Jenkins. To successfully pass the certification test, an active user session is required. That Jenkins , that Buildbot run their nodes on test machines in the context of a system service, which has no active session. To overcome the hang, I had to turn off the launch of Buildslave as a service, make automatic login to the test site with Windows 8 and add Buildslave to the standard autoload. In the new mode, the test hangout was gone, but the problem with multiple consoles remained relevant.

Application samples for Marketplace with OpenCV on a tablet with Windows 8

Conclusion

The overall result of the work: OpenCV is ready for use in product applications for both ARM and x86 architectures: the entire algorithmic part works, there is support for paralleling with TBB, reading and writing images in most popular formats. Implemented a couple of examples of integration into the xaml application. There are problems with video input and output, but, as practice has shown, this functionality is of little use in the code of the final application. In any case, you will have to be built into an existing camera pipeline or video decoding. In addition, experience has shown that the platform is not very friendly to automated testing, so do not delay this issue indefinitely. Infrastructure issues can take significantly longer than you expect.

Now, apparently, due to the platform’s not very high popularity, the customer’s interest in the development of support for Windows Runtime has diminished significantly and smoothly passed on to enthusiasts from the community. If you have an interest in the development of OpenCV on Windows RT - join ( OpenCV How to contribute , http://itseez.com/jobs/ )!

Source: https://habr.com/ru/post/210832/

All Articles