OpenCV on STM32F7-Discovery

I am one of the developers of the Embox operating system, and in this article I will discuss how I managed to run OpenCV on the STM32746G board.

If you type something like the "OpenCV on STM32 board" into the search engine, you can find quite a few people who are interested in using this library on STM32 boards or other microcontrollers.
There are several videos that, judging by the title, should demonstrate what is needed, but usually (in all the videos I saw) on the STM32 board, only the image from the camera was received and the result was displayed on the screen, and the image processing itself was done either Ordinary computer, or on boards more powerful (for example, Raspberry Pi).

Why is it difficult?

The popularity of search queries is explained by the fact that OpenCV is the most popular library of computer vision, which means that more developers are familiar with it, and the ability to run ready for desktop code on a microcontroller greatly simplifies the development process. But why there are still no popular ready-made recipes for solving this problem?

The problem of using OpenCV on small scarves is associated with two features:

If you compile the library even with a minimal set of modules, it simply won't fit into the flash memory of the same STM32F7Discovery (even without OS) due to the very large code (several megabytes of instructions)
The library itself is written in C ++, which means
- Need support for plus runtime (exceptions, etc.)
- Few LibC / Posix support, which is usually found in the OS for embedded systems - you need a standard plus library and a standard STL template library (vector, etc.)

Porting to Embox

As usual, before porting any programs to the operating system, it is a good idea to try building it as it was intended by the developers. In our case, there are no problems with this - the source can be found on the githab , the library is compiled under GNU / Linux using the usual cmake.

From the good news - OpenCV out of the box can be collected as a static library, which makes porting easier. We compile a library with a standard config and see how much space they occupy. Each module is collected in a separate library.

> size lib/*so --totals text data bss dec hex filename 1945822 15431 960 1962213 1df0e5 lib/libopencv_calib3d.so 17081885 170312 25640 17277837 107a38d lib/libopencv_core.so 10928229 137640 20192 11086061 a928ed lib/libopencv_dnn.so 842311 25680 1968 869959 d4647 lib/libopencv_features2d.so 423660 8552 184 432396 6990c lib/libopencv_flann.so 8034733 54872 1416 8091021 7b758d lib/libopencv_gapi.so 90741 3452 304 94497 17121 lib/libopencv_highgui.so 6338414 53152 968 6392534 618ad6 lib/libopencv_imgcodecs.so 21323564 155912 652056 22131532 151b34c lib/libopencv_imgproc.so 724323 12176 376 736875 b3e6b lib/libopencv_ml.so 429036 6864 464 436364 6a88c lib/libopencv_objdetect.so 6866973 50176 1064 6918213 699045 lib/libopencv_photo.so 698531 13640 160 712331 ade8b lib/libopencv_stitching.so 466295 6688 168 473151 7383f lib/libopencv_video.so 315858 6972 11576 334406 51a46 lib/libopencv_videoio.so 76510375 721519 717496 77949390 4a569ce (TOTALS)

As can be seen from the last line, .bss and .data do not occupy much space, but the code is more than 70 MiB. It is clear that if it is linked statically with a particular application, the code will be less.

Let's try to throw out as many modules as possible so that a minimal example will be assembled (which, for example, simply displays the OpenCV version), so we look at cmake .. -LA and disable everything that is disabled in the options.

  -DBUILD_opencv_java_bindings_generator=OFF \ -DBUILD_opencv_stitching=OFF \ -DWITH_PROTOBUF=OFF \ -DWITH_PTHREADS_PF=OFF \ -DWITH_QUIRC=OFF \ -DWITH_TIFF=OFF \ -DWITH_V4L=OFF \ -DWITH_VTK=OFF \ -DWITH_WEBP=OFF \ <...>

 > size lib/libopencv_core.a --totals text data bss dec hex filename 3317069 36425 17987 3371481 3371d9 (TOTALS)

On the one hand, this is only one library module, on the other hand, it is without optimization by the compiler for the size of the code ( -Os ). ~ 3 MiB code is still quite a lot, but it gives hope for success.

Run in emulator

On the emulator, debugging is much easier, so first make sure that the library works on qemu. As an emulated platform, I chose Integrator / CP, since Firstly, it is also ARM, and secondly, Embox supports graphics output for this platform.

Embox has a mechanism for building external libraries, using it we add OpenCV as a module (transferring all the same options for the "minimal" assembly as static libraries), then add the simplest application that looks like this:

 version.cpp: #include <stdio.h> #include <opencv2/core/utility.hpp> int main() { printf("OpenCV: %s", cv::getBuildInformation().c_str()); return 0; }

We assemble the system, launch it and get the expected output.

 root@embox:/#opencv_version OpenCV: General configuration for OpenCV 4.0.1 ===================================== Version control: bd6927bdf-dirty Platform: Timestamp: 2019-06-21T10:02:18Z Host: Linux 5.1.7-arch1-1-ARCH x86_64 Target: Generic arm-unknown-none CMake: 3.14.5 CMake generator: Unix Makefiles CMake build tool: /usr/bin/make Configuration: Debug CPU/HW features: Baseline: requested: DETECT disabled: VFPV3 NEON C/C++: Built as dynamic libs?: NO <      --    ,   OpenCV     ..>

The next step is to launch some example, the best of all is a standard one that those developers themselves offer on their site . I chose Kenny's border detector .

The example had to be slightly rewritten in order to display the image with the result directly in the frame buffer. I had to do it because The imshow() function can draw images via QT, GTK and Windows interfaces, which, of course, will not be in the config for STM32. In fact, QT can also be run on STM32F7Discovery, but this will be discussed in a different article already :)

After a brief determination of exactly what format the result of the operation of the boundary detector is stored, we obtain an image.

Original picture

Result

Run on STM32F7Discovery

On 32F746GDISCOVERY there are several hardware memory sections that we can use in one way or another.

320KiB RAM
1MiB flash for image
8MiB SDRAM
16MiB QSPI NAND flash drive
MicroSD card slot

An SD card can be used to store images, but in the context of running a minimal example this is not very useful.
The display has a resolution of 480x272, which means that the memory for framebuffer will be 522,240 bytes with a depth of 32 bits, i.e. this is larger than the size of the RAM, so the framebuffer and the heap (which is also required for OpenCV to store data for images and auxiliary structures) will be placed in SDRAM, everything else (memory for stacks and other system needs) will be sent to RAM .

If you take the minimum config for STM32F7Discovery (throw out the entire network, all the teams, make the stacks as small as possible, etc.) and add OpenCV there with examples, with the required memory will be the following:

  text data bss dec hex filename 2876890 459208 312736 3648834 37ad42 build/base/bin/embox

For those who are not very familiar with which sections they add up to, I’ll explain: in .text and .rodata are instructions and constants (roughly speaking, readonly-data), in .data are data that can be changed, in .bss lies "zeroned" variables that, however, need a place (this section will "go" to RAM).

The good news is that .data / .bss should fit, but the trouble with .text is that there is only 1MiB of memory under the image. You can throw out a picture from the example from .text and read it, for example, from an SD card into memory at startup, but fruits.png weighs approximately 330KiB, so this will not solve the problem: most of .text consists of OpenCV code.

By and large, there is only one thing left - loading part of the code on a QSPI flash drive (it has a special mode of operation for mapping memory onto the system bus, so that the processor can access this data directly). In this case, a problem arises: firstly, the memory of a QSPI flash drive is not available immediately after the device is rebooted (you need to separately initialize the memory-mapped-mode), and secondly, you cannot “flash” this memory with a familiar bootloader.

As a result, it was decided to link all the code in QSPI, and flash it with a self-written bootloader, which will receive the desired TFTP binary.

Result

The idea of porting this library to Embox appeared about a year ago, but time after time it was postponed due to various reasons. One of them is support for libstdc ++ and standart template library. The problem of C ++ support in Embox is beyond the scope of this article, so here I can only say that we managed to achieve this support in the right amount for the work of this library :)

As a result, these problems were overcome (at least sufficiently for the OpenCV example to work), and the example started. It takes 40 seconds for the board to search the borders for a Kenny filter. This, of course, is too long (there are considerations on how to optimize this matter, you can write a separate article about this if successful).

Nevertheless, the intermediate goal was to create a prototype, which will show the fundamental possibility of running OpenCV on STM32, respectively, this goal was achieved, hurray!

tl; dr: step by step instructions

0: Download Embox sources, for example:

  git clone https://github.com/embox/embox && cd ./embox

1: Let's start by building a boot loader that will “flash” a QSPI flash drive.

  make confload-arm/stm32f7cube

Now you need to configure the network, because We will upload the image via TFTP. In order to set the IP addresses of the card and host, you need to change the conf / rootfs / network file.

Configuration example:

 iface eth0 inet static address 192.168.2.2 netmask 255.255.255.0 gateway 192.168.2.1 hwaddress aa:bb:cc:dd:ee:02

gateway is the address of the host from which the image will be downloaded, address is the address of the card.

After that we collect the loader:

  make

2: Normal boot loader (sorry for the pun) on the board - there is nothing specific here, you need to do this just like any other application for STM32F7Discovery. If you do not know how this is done, you can read about it here .
3: Compiling an image with config for OpenCV.

  make confload-platform/opencv/stm32f7discovery make

4: Extract from ELF sections to be written to QSPI in qspi.bin

  arm-none-eabi-objcopy -O binary build/base/bin/embox build/base/bin/qspi.bin \ --only-section=.text --only-section=.rodata \ --only-section='.ARM.ex*' \ --only-section=.data

There is a script in the conf directory that does this, so you can run it.

  ./conf/qspi_objcopy.sh #   -- build/base/bin/qspi.bin

5: Using tftp, download qspi.bin.bin to a QSPI flash drive. On the host, you need to copy qspi.bin to the root folder of the tftp server (usually / srv / tftp / or / var / lib / tftpboot /; packages for the corresponding server are in most popular distributions, usually called tftpd or tftp-hpa, sometimes you need to make systemctl start tftpd.service to start).

  #   tftpd sudo cp build/base/bin/qspi.bin /srv/tftp #   tftp-hpa sudo cp build/base/bin/qspi.bin /var/lib/tftpboot

On Embox (i.e. in the bootloader), you need to execute this command (we assume that the server has the address 192.168.2.1):

  embox> qspi_loader qspi.bin 192.168.2.1

6: With the help of the goto command you need to "jump" into QSPI memory. The specific location will vary depending on how the image slinks, you can see this address with the command mem 0x90000000 (the start address fits into the second 32-bit word of the image); you will also need to set the stack with the -s flag, the stack address is located at 0x90000000, for example:

  embox>mem 0x90000000 0x90000000: 0x20023200 0x9000c27f 0x9000c275 0x9000c275 ↑ ↑        embox>goto -i 0x9000c27f -s 0x20023200 #  -i         <      ,    OpenCV >

7: Launch

  embox> edges 20

and enjoy the 40-second search for boundaries :)

If something goes wrong - write an issue in our repository , or to the embox-devel@googlegroups.com mailing list, or in the comments here.

Source: https://habr.com/ru/post/457724/

All Articles