Intel® RealSense ™. Work with raw data streams

Developers who are interested in the features that are available to implement management without the help of controllers in their applications, it is enough to familiarize themselves with the Intel RealSense SDK, related examples and resources on the Internet. If you “immerse yourself” in this solution, you will find a wide range of functions that allow you to create completely new, great interfaces using new technologies.
In this article we will talk about streams of various raw data, about access to them and how to use them. Through direct access to the raw data, we will not only be able to work with metadata, but also get the fastest way to determine what the user is doing in the real world.

In this article, we used the Bell Cliff 3D camera as an Intel RealSense camera, which displays several data streams, from traditional RGB color images to depth data and images from an infrared camera. Each of these streams behaves in its own way, but we'll talk about this in more detail below. After reading this article, you will learn what streams are available and when to work with them.
To understand the presented materials, it is useful (but not necessary) to know C ++ to get acquainted with code examples and to have a general idea of the Intel RealSense technology (or its earlier version - Intel Perceptual Computing SDK).

Why is it important

If you are only interested in the implementation of a simple gesture or face recognition system, then you will find everything you need in the modules of the Intel RealSense SDK algorithms, and you can not take care of the raw data streams. The problem will arise when you need the functionality that is missing in the modules of the algorithms in the SDK. The application will not work without an alternative solution.
So, the first question: “What does your application need and is it possible to fulfill these requirements using the modules of the Intel RealSense SDK algorithms?”. If you need a pointer on the screen that tracks the movement of your hand, this may be enough for the hand or finger tracking module. To quickly determine if the available functionality matches your needs, you can use the examples in the SDK. If this is not enough, then you can begin planning to use the raw data stream.

For example, 2D gesture detection is currently supported. But what if you need to detect gestures on a set of three-dimensional hands and determine additional information on the movement of the user's hands? What if you need to record a high-speed stream of gestures and save them as a sequence, not as a snapshot? It will be necessary to bypass the finger and hand recognition system, which forms the computational load, and introduce a technique for dynamic telemetry coding in real time. In general, you may encounter insufficient functionality; a more direct solution may be required for a particular software problem.
')
Another example: suppose you create an application that detects and recognizes a sign language and converts it into text for transmission to a newsgroup. The current Intel RealSense SDK functionality supports hand and finger tracking (but only in single gestures) and does not have targeted support for sign language recognition. The only solution in such cases is to develop your own gesture recognition system, which will be able to quickly convert gestures into a sequence of the positions of fingers and hands, and with the help of a template system will recognize characters and restore text. The only currently available way to achieve this result is access to the raw data stream using high-speed recording and converting the value on the fly.

The ability to write code to fill the gap between existing and desired functionality is extremely important, and it is provided in the Intel RealSense SDK.
This technology is still relatively new, and developers are still exploring its capabilities. Access to raw data streams expands possible actions, and from such improvements new solutions are born.

Streams

The best way to learn about data streams is to familiarize yourself with them. To do this, run the Raw Streams example in the bin folder of the installed Intel Realsense SDK instance.

\ Intel \ RSSDK \ bin \ win32 \ raw_streams.exe

The example is provided with full source code and a project that will be very useful to us. If you start the executable file and press the START button when starting the application, you will get an RGB color stream, as shown in Figure. one.

Figure 1. Typical RGB color stream

Waving a pen to yourself, press the STOP button, open the Depth menu and select 640x480x60. Press the START button again.

Figure 2. Filtered depth data stream from an Intel RealSense 3D camera.

As seen in fig. 2, this image is significantly different from the RGB color stream. You see a black and white image representing the distance of each pixel to the camera. Light pixels are closer, and dark pixels are farther; Black is either considered a background or not recognized reliably.
Moving in front of the camera, you will realize that the camera can make decisions about user actions very quickly. For example, it is quite clear how to select hands on the stage due to the thick black contour that separates them from the body and head, which are further away from the camera.

Figure 3. Night Vision. The Intel RealSense 3D Camera delivers a stream of raw video captured in the infrared spectrum

The latter type of stream may not be known by previous developers of the Intel Perceptual Computing SDK, but in Figure 2. 3 shows that in the IR menu you can get an image taken in the infrared range from the camera. This is a stream of raw data, its reading speed far exceeds the update rate of typical monitors.

You can initialize all or any of these threads for simultaneous reading as the application needs; For each stream, you can select the desired resolution and refresh rate. It is important to note that the final frame rate of incoming flows will depend on the available bandwidth. For example, if you try to initialize the RGB stream at 60 frames per second, the depth stream at 120 frames per second and the IR stream at 120 frames per second and transmit all these streams with a single synchronization, only the lowest update rate (60 frames per second) will be available , and only if the system handles such work.

A sample with raw streams is suitable for getting started, but does not allow combining streams, so it should be used only to familiarize yourself with the types, resolutions and refresh rates available for your camera. Remember that the Intel RealSense SDK is designed to work with different types of 3D cameras, so sample resolution may not be available on other cameras. Therefore, you should not rigidly set the resolution in the application code.

Creating streams and accessing data

You can view the full source code of the sample with raw threads by opening the next project in Visual Studio *.

\ Intel \ RSSDK \ sample \ raw_streams \ raw_streams_vs20XX.sln

The example contains a simple user interface and a full set of parameters, so the code is not very easy to read. It makes sense to remove the additional code and leave only the necessary lines of code used to create, process and delete the stream received from the camera. Below is the code that represents a “cleaned” version of the above project, but this code contains all the necessary components, even for the simplest Intel RealSense applications.

The first two important functions are the initialization of the Intel RealSense 3D camera and its release upon completion of the program. This can be seen in the code below, and the details of the called functions will be given below.

int RSInit ( void ) { InitCommonControls(); g_session=PXCSession::CreateInstance(); if (!g_session) return 1; g_bConnected = false; g_RSThread = CreateThread(0,0,ThreadProc,g_pGlob->hWnd,0,0); Sleep(6000); if ( g_bConnected==false ) return 1; else return 0; } void RSClose ( void ) { g_bConnected = false; WaitForSingleObject(g_RSThread,INFINITE); }

Here we have top-level functions for any application intended for raw data: creating an instance of a session and a thread to execute the code that processes the stream, then freeing the stream with the global flag g_bConnected . It is recommended to use CPU streams when working with data streams, as this will allow the main application to work at any desired frame rate, regardless of the frame rate of the camera. In addition, it helps to distribute the load on the CPU among several cores, thereby improving the overall performance of the application.

In the above code, we are only interested in the line with the ThreadProc function, which contains all the code responsible for managing threads. Before proceeding to the details, we note that this source code is not exhaustive, here global declarations and secondary sections have been removed to improve readability. For information on global announcements, see the original source code for the sample raw_streams project.

 static DWORD WINAPI ThreadProc(LPVOID arg) { CRITICAL_SECTION g_display_cs; InitializeCriticalSection(&g depthdataCS); HWND hwndDlg=(HWND)arg; ~ PopulateDevices(hwndDlg); PXCCapture::DeviceInfo dinfo=GetCheckedDevice(hwndDlg); PXCCapture::Device::StreamProfileSet profiles=GetProfileSet(hwndDlg); StreamSamples((HWND)arg, &dinfo, &profiles, false, false, false, g_file ); ReleaseDeviceAndCaptureManager(); g_session->Release(); DeleteCriticalSection(&g_depthdataCS); return 0; }

To work with data flow, it is important to create a “main section” in the code. If you do not do this in a multi-threaded environment, then two threads can theoretically write data to the same global variable at the same time, which is undesirable.

For those who are not familiar with multithreading, this function is called and does not end until the main thread (which created this thread) for the g_bConnected parameter is set to false. The main function call here is StreamSamples , and the rest of the code above and below serves only to enter and exit. The first function of interest to us is PopulateDevices , it is almost identical to the same function in the raw_streams project. It fills the g_devices list with the names of all available devices. If you are using an Intel RealSense 3D camera on an ultrabook, then there is a possibility that you will have two devices (the second is a built-in ultrabook camera). Pay attention to the following lines.

 static const int ID_DEVICEX=21000; static const int NDEVICES_MAX=100; int  = ID_DEVICEX; g session->CreateImpl<PXCCapture>(g_devices[c], &g_ capture); g_device=g_capture->CreateDevice((c-ID_DEVICEX)%NDEVICES_MAX);

The code, constants and global functions are copied from the original code, they can be further reduced. The most important calls here are Createlmpl and CreateDevice . As a result, the Intel RealSense 3D camera pointer is now stored in g_device .
If there is a valid pointer to the device, the rest of the initialization code works without problems. The StreamProfileSet function is a wrapper for this code.

 g_device->QueryDeviceInfo(&dinfo);

The StreamProfileSet function is responsible for collecting all types of streams and permissions that need to be initialized; it can be simple or complex based on needs. For compatibility with cameras, it is strongly recommended to list the available permissions and types in the list instead of hard coding fixed settings.

 PXCCapture::Device::StreamProfileSet GetProfileSet(HWND hwndDlg) { PXCCapture::Device::StreamProfileSet profiles={}; if (!g_device) return profiles; PXCCapture::DeviceInf dinfo; g_device->QueryDeviceInfo(&dinfo); for (int s=0, mi=IDXM_DEVICE+l;s<PXCCapture::STREAM_LIMIT;s++) { PXCCapture::StreamType st=PXCCapture::StreamTypeFromIndex(s); if (!(dinfo.streams&st)) continue; int id=ID_STREAMlX+s*NPROFILES_MAX; int nprofiles=g_device->QueryStreamProfileSetl\lum(st); for (int p=0;p<nprofiles;p++) { if ( st==PXCCapture::StreamType::STREAM_TYPE_COLOR) continue; if ( st==PXCCapture::StreamType::STREAM_TYPE_IR) continue; if ( st==PXCCapture::StreamType::STREAM_TYPE_DEPTH && p==2) { PXCCapture::Device::StreamProfileSet profilesl={}; g_device->QueryStreamProfileSet(stj p, Sprofilesl); profiles[st]=profilesl[st]; } } mi++; } return profiles; }

QueryStreamProfileSet returns a significant amount of code in which we need the available streams for a single depth stream and return a profile. You can write your own conditions to search for the necessary streams, whether with a certain resolution or with a certain frame rate, if there are rollback conditions, so that the application can work with the stream of a suitable format.
The final function and the central code block for accessing stream data is StreamSamples . If you remove the security code and comments, the code will look like this.

 void StreamSamples(HWND hwndDlg, PXCCapture::DeviceInf *dinfo, PXCCapture::Device::StreamProfileSet *profiles, bool synced, bool isRecord, bool isPlayback, pxcCHAR *file) { PXCSenseManager *pp=g_session->CreateSenseManager(); pp->QueryCaptureManager()->FilterByDeviceInfo(dinfo); for (PXCCapture::StreamType st=PXCCapture::STREAM_TYPE_COLOR;st!=PXCCapture::STREAM_TYPE_ANY;st++) { PXCCapture::Device::StreamProfile &profile=(*profiles)[st]; if ([profile.imagelnfo.format) continue; pp->EnableStream(st,profile.imagelnfo.width, profile.imagelnfo.height, profile.frameRate.max); } pp->QueryCaptureManager()->FilterByStreamProfiles(profiles); MyHandler handler(hwndDlg); if (pp->Init(&handler)>=PXC_STATUS_NO_ERROR) { pp->QueryCaptUreManager()->QueryDevice()->SetMirrorMode(PXCCapture: :Device: :MirrorMode: :MIRROR_MODE_DISABLED); g_bConnected = true; for (int nframes=0;g_bConnected==true;nframes++) { pxcStatus sts2=pp->AcquireFrame(synced); if (sts2<PXC_STATUS_N0_ERR0R && sts2!=PXC_STATUS_DEVICE_LOST) break; if (sts>=PXC_STATUS_NO_ERROR) { PXCCapture::Sample *sample = (PXCCapture::Sample*)pp->QuerySample(); short invalids[l]; invalids[0] = pp->QueryCaptureManager()->QueryDevice()->QueryDepthSaturationValue(); invalids [1] = pp->QueryCaptureManager()->QueryDevice()->QueryDepthl_owConfidenceValue(); PXCImage::ImageInfo dinfo=sample->depth->QueryInfo(); PXCImage::ImageData ddata; if (sample->depth->AcquireAccess( PXCImage::ACCESS_READ, PXCImage::PIXEL_FORMAT_DEPTH, &ddata)>=PXC_STATUS_NO_ERROR) { EnterCriticalSection(&g_depthdataCS); memset ( g_depthdata, 0, sizeof(g_depthdata)); short *dpixels=(short*)ddata.planes[0]; int dpitch = ddata.pitches[0]/sizeof(short); for (int  = 0;  < (int)dinf.height; y++) { for (int x = 0; x < (int)dinfo.width; x++) { short d = dpixels[y*dpitch+x]; if (d == invalids[0] || d == invalids[l]) continue; g_depthdata[x][y] = d; } } LeaveCriticalSection(&g_depthdataCS); g_bDepthdatafilled = true; } sample->depth->ReleaseAccess(&ddata); } pp->ReleaseFrame(); } } pp->Close(); pp->Release(); }

At first glance, there is a bit of code here, but if you look at it, you will see that these are just a few configuration calls, a conditional loop and a final purge before returning to the ThreadProc function that caused this code. The main variable used is called pp, which is the pointer to the Intel RealSense SDK Manager for our main actions. Note. As mentioned above, in order to improve the readability of the code, error tracking has been removed from it, but in practice you should not create code that assumes that all calls to the Intel RealSense SDK are successful.
The first code snippet, which will include the threads of interest to us, looks like this.

 pp->EnableStream(st,profile.imagelnfo.width, profile.imagelnfo.height, profile.frameRate.max);

This simple request includes a stream type with a specific resolution and frame rate and instructs the camera to get ready to send us this raw data. The next important line activates the dispatcher so that it can begin the process of getting the data.

 MyHandler handler(hwndDlg); if (pp->Init(&handler)>=PXC_STATUS_NO_ERROR)

The MyHandler class is defined in the original raw_streams project and comes from the PXCSenseManager: Handler class. If successful, you will find out that the camera is on and transmits the data stream.
Now we need to start a conditional cycle, which will work until any external influence changes the condition of the cycle. In this loop, we will receive the stream data one frame at a time. To do this, use the AcquireFrame command.

 for (int nframes=0;g_bConnected==true;nframes++) { pxcStatus sts2=pp->AcquireFrame(synced);

As long as g_bConnected is true, we will do it as quickly as possible in a separate thread created for this purpose. To get the actual data you need a few more lines of code.

 PXCCapture::Sample *sample = (PXCCapture::Sample*)pp->QuerySample(); short invalids[l]; invalids[0] = pp->QueryCaptureManager()->QueryDevice()->QueryDepthSaturationValue(); invalids [1] = pp->QueryCaptureManager()->QueryDevice()->QueryDepthl_owConfidenceValue(); PXCImage::ImageInfo dinfo=sample->depth->QueryInfo(); PXCImage::ImageData ddata; if (sample->depth->AcquireAccess( PXCImage::ACCESS_READ, PXCImage::PIXEL_FORMAT_DEPTH, &ddata)>=PXC_STATUS_NO_ERROR)

The first command receives a sample pointer from the dispatcher and uses it to obtain a pointer to the actual data in memory using the last AcquireAccess command . The code performs two queries to ask the dispatcher what values correspond to a “saturated” pixel and an “unreliable” pixel. Both of these conditions may occur when receiving depth data from the camera. They should be ignored when interpreting the returned data. The main result of this code: the ddata data structure is now filled with information that will allow us to directly access the depth data (in this example). By changing the corresponding parameters, you can access the data of the COLOR and IR streams, if they are enabled.
This completes the code snippet related to the Intel RealSense SDK (from the first initialization call to getting the pointer to the stream data). The rest of the code will be somewhat more familiar for developers with experience in creating image processing programs.

 EnterCriticalSection(&g_depthdataCS); memset ( g_depthdata, 0, sizeof(g_depthdata) ); short *dpixels=(short*)ddata.planes[0]; int dpitch = ddata.pitches[0]/sizeof(short); for (int  = 0;  < (int)dinf.height; y++) { for (int x = 0; x < (int)dinfo.width; x++) { short d = dpixels[y*dpitch+x]; if (d == invalids[0] || d == invalids[l]) continue; g_depthdata[x][y] = d; } } LeaveCriticalSection(&g_depthdataCS);

Notice that the important session object created earlier is used to block our thread so that no other thread can access our global variables. This is done so that you can write a global array and be sure that code from another part of the application will not affect the work. If you follow the nested loops, you will see that after blocking the stream, we clear the global array g_depthdata and fill it with values from the above-mentioned ddata structure, which contains a pointer to the depth data. In nested loops, we also compare the value of the depth pixels with two invalid values that we set earlier using the QueryDepthSaturationValue and QueryDepthLowConf idenceValue calls .

After transferring data to a global array, the CPU thread can get the next frame from the stream data, and the main CPU thread can begin to analyze this data and make decisions. You can even create a new workflow to perform this analysis, which will allow the application to work in three threads and more efficiently use the resources of the multi-core architecture.

What to do with stream data

So, now you know how to get data stream from an Intel RealSense 3D camera, and you are probably wondering what to do with this data. Of course, you can simply display this data on the screen and admire the image, but soon you will need to convert this data into useful information and process it in your application.

As there are no two identical snowflakes, all implementations of raw data streams will differ, but there are still several general approaches that help to organize data analysis. To reduce the amount of new code, we will use the above code as a template for the examples offered below.

Find the nearest point

It is recommended to find the closest point of the object in front of the camera. In doing so, you just transferred the depth data from the data stream to the global array in the CPU stream. You can create a nested loop to check each value in the array.

 short bestvalue = 0; int bestx = 0; int besty = 0; for ( int  = 0;  < (int)dinfo.height; y++) { for ( int x = 0; x < (int)dinfo.width; x++) { short thisvalue = g_depthdata[x][y]; if ( thisvalue > bestvalue ) { bestvalue = thisvalue; bestx = x; besty = y; } } }

Each time a closer value is found, it replaces the current best value, and the coordinates along the X and Y axes are recorded. By the time the cycle bypasses each pixel in the depth data, the final BESTX and BESTY variables will store the coordinates of the depth data for the nearest point camera.

Ignore objects in the background.

It may be necessary to identify the shapes of the foreground objects, but the application should not confuse them with objects in the background, for example, with a seated user or people passing by.

 short newshape[dinfo.height][dinfo.width]; memcpy(newshape,0,sizeof(newshape)); for ( int  = 0;  < (int)dinfo.height; y++) { for ( int x = 0; x < (int)dinfo.width; x++) { short thisvalue = g_depthdata[x][y]; if ( thisvalue>32000 && thisvalue<48000 ) { newshape[x][y] = thisvalue; } } }

If you add a condition when reading each pixel and transfer only those pixels that are within a certain range, then you can extract objects from the depth data and transfer them to the second array for further processing.

Hints and Tips

What to do

If you are working with examples for the first time and using an ultrabook with a built-in camera, then the application can use this built-in camera instead of the Intel RealSense camera. Ensure that the Intel RealSense camera is properly connected and that your application is using an Intel RealSense 3D camera device. For more information on how to find a list of devices, see the links to g_devices in this article.
Always try to use multi-thread computing in an Intel RealSense application: in this case, the application will not be tied to the frame rate of the stream of Intel RealSense 3D cameras, and on multi-core systems, higher performance will be achieved.

What not to do

Do not hard-code the device or profile parameters when initializing streams, as future Intel RealSense 3D cameras may not support the parameters you specified. You should always list the available devices and profiles and use the search terms to find the right one.
Avoid unnecessary data transfer to secondary arrays, since each such cycle consumes a lot of CPU and memory resources. Keep your data analysis as close as possible to the original data read operation.

Conclusion

Knowing how to get the raw data stream from an Intel RealSense 3D camera will help expand the capabilities of this technology and create modern solutions. We have already seen great hands-free applications created by the first developers in this area, but this is only a small part of all the possibilities of new technologies.
Many users still treat computers as devices that should be actively influenced so that they work, but now computers have got a “vision” and can watch all of our movements. Do not peep, please note, but just watch, so that at the right moment to come to the rescue. According to the saying in the land of the blind, the one-eyed will become king. Is it wrong that we live in a world inhabited by "blind" computers? Imagine what kind of revolution will occur if one of them “begins to see light” in the not too distant future? As developers, we are the architects of this revolution, together we can create a completely new paradigm in which computers see their operators and try to help them.

Learn more about RealSense on the Intel website.
All about RealSense for Developers
Download RealSense SDK
RealSense Developer Forum

Source: https://habr.com/ru/post/253361/

All Articles