📜 ⬆️ ⬇️

Depth cameras are a quiet revolution (when robots see) Part 2



In the first part of this text, we looked at depth cameras based on structural light and measuring round-trip light delays, which mainly use infrared lights. They work well in premises at distances from 10 centimeters to 10 meters, and most importantly - they are very cheap. Hence the massive wave of their current use in smartphones. But ... As soon as we go outside, the sun even through the clouds illuminates the infrared illumination and their work deteriorates dramatically.

As Steve Blank says ( on another occasion , however): “If you want success, leave the building.” Below we will discuss the depth cameras that work outdoors. Today, this topic is strongly moved by autonomous cars, but, as we will see, not only.
')

Source: Innoviz Envisions Mass Produced Self-Driving Cars With Solid State LiDAR

So, the depth cameras, i.e. devices shooting video, each pixel of which is the distance to the object of the scene, working in sunlight!

Who cares - welcome under the cat!

Let's start with the eternal classics ...

Method 3: Depth from Stereo Camera +


The construction of the depth map from stereo is well known and has been used for more than 40 years . Below is an example of a $ 450 compact camera that can be used to control gestures, along with professional photography or with VR helmets:


A source

The main advantage of such cameras is that sunlight not only does not interfere with them, but vice versa, makes their results better, and as a result, the active use of such cameras for all kinds of street cases, for example, here’s a great example of how to take a three-minute model of a three-year old fort:


ZED outdoor camera example

Considerable money in the translation of the construction of depth from stereo to a new level was infused, of course, by the topic of autonomous cars. Of the 5 considered methods of building video depth, only two - this and the next (stereo and plenoptic) do not interfere with the sun and do not interfere with neighboring cars. At the same time, plenopticism is several times more expensive and less accurate at large distances. It can take shape anyway, it is difficult to build forecasts, but in this case it is worth agreeing with Ilon Mask - the stereo of all 5 ways has the best prospects. And the current results are quite encouraging:


Source: Large-Scale Direct SLAM with Stereo Cameras

But it is interesting that, it seems, not unmanned vehicles (of which only a little is being produced so far) will have an even stronger influence on the development of the depth-building theme from stereo, where already the stereo depth map is being built, namely ... That's right! Smartphones!

About three years ago, the “two-eyed” smartphones boomed, in which literally all brands were noted, for the quality of photos taken with one camera and the quality of photos taken with two differed dramatically, but in terms of increasing the price of a smartphone, it was not so significant:



Moreover, last year the process actively went even further: “Do you have 2 cameras in your smartphone? Sucks! I have three four !!! ":


Source: Samsung Galaxy A8 & A9

Sony's six-eyed future were mentioned in the first part. In general, multi-eyed smartphones are gaining popularity among manufacturers.

The fundamental reasons for this phenomenon are simple:


In general, there are so many advantages that when people become aware of them, they start to wonder why at least 2 cameras have not been installed for a long time.

And here it turns out that not everything is so simple. In order to meaningfully use additional cameras to improve the image (and not just switch to a camera with a different lens), we must build a so-called disparity map, which is directly converted into a depth map. And this is a very nontrivial task, the solution to which the power of smartphones came just. And even now, depth maps often have a rather dubious quality. That is, you still need to live up to an exact comparison of the “pixel in the right image to the pixel on the left”. For example, here are real examples of depth maps for the iPhone:


Source: iPhone XS & XR depth maps comparison

Even on the eye, massive problems are clearly visible on the background, on the borders, I am silent about translucent hair. From here - numerous problems that arise both when transferring color from a black and white camera to a color one, and during further processing.

For example, here’s a pretty good example from the Apple’s “Creating Photo and Video Effects Using Depth” talk:


Source: Creating Photo and Video Effects • Using Depth, Apple, WWDC18

You can clearly see how the depth is buggy under the hair, and indeed on any more or less uniform background, but more importantly, the collar went to the background on the right matting card (that is, it will be blurred). Another problem is the real resolution of the depth and the matting map is significantly lower than the image resolution, which also affects the quality during processing:



However, all this - the problem of growth. If 4 years ago there was no talk of any serious effects on the video, the phone simply didn’t “pull” them, then today the processing of video with depth is shown on top serial phones (it was in the same Apple presentation):


Source: Creating Photo and Video Effects • Using Depth, Apple, WWDC18

In general, the topic of multi-cell phones, and, as a result, the theme of receiving depth from stereo on cell phones - conquers the masses without a fight:


Source: "found in these your internet"

Key findings:


The next way is less trivial and well-known, but very cool. Meet

Method 4: Light Field Camera Depth


The topic of plenopticians (from the Latin plenus is full and optikos is visual) or light fields are still relatively poorly known to the broad masses, although the professionals began to study it very closely. At many top conferences, separate sections have been allocated to articles on Light Field (the author was once amazed at the number of Asian researchers at the IEEE International Conference on Multimedia and Expo who are closely involved in this topic).

Google Trends say that the US, Australia are leading in terms of interest in Light Field, followed by Singapore and Korea. Great Britain. Russia is in 32nd place ... We will correct the gap with India and South Africa:


Source: Google Trends

Your humble servant some time ago did a detailed article on Habré with a detailed description of how it works and what it gives, so let's go over briefly.

The main idea is to try to fix at each point not just light, but a two-dimensional array of light rays, making each frame four-dimensional. In practice, this is done using a microlens array:


Source: plenoptic.info (it is recommended to click and see in full resolution)

As a result, we have a lot of new features , but the resolution is seriously falling. Having solved a lot of complex technical problems, they drastically raised the resolution at Lytro (bought by Google), where in Lytro Cinema the resolution of the camera's sensor was brought to 755 megapixels of RAW data, and which looked cumbersome, like the first television cameras:


Source: NAB: New Chandelier Light

Interestingly, even professionals regularly misinterpret the drop in resolution of plenoptic cameras, because they underestimate how well Super Resolution algorithms can be worked out for them, really superbly restoring many of the details on the micro-shifts of the image in the light field (pay attention to the blurry needles and moving swings in the background) :


Source: Naive, Smart and Super Resolution plenoptic frame recovery from the Adobe Technical Report "Superresolution with Plenoptic Camera 2.0"

All of this would have been relatively theoretical interest, if Google in Pixel 2 had not implemented plenoptic, having covered 2 pixels with a lens :


Source: AI Google Blog

As a result, a microstereopara formed, which made it possible to MEASURE the depth to which Google, faithful to new traditions, added neural networks, and it turned out generally wonderful:




More depth examples in full resolution in a special gallery .

Interestingly, the depth is stored by Google (like Huawei, and others) in the image itself, so you can extract it from there and see:


Source: Google's New Secret Camera App.

And then you can turn the photo into three-dimensional:


Source: Google's New Secret Camera App.

You can experiment with this on the website http://depthy.me where you can upload your photo. Interestingly, the site is available in source code , i.e. You can improve the processing of depth, since there are many possibilities for this, now the simplest processing algorithms are used there.

Key points:


And finally, we come to the last in this review method of measuring depth.

Method 5: Cameras on lidar technology


Generally, laser rangefinders are firmly established in our lives, are inexpensive and give high accuracy. The first lidars (from LIDaR - Light Identification Detection and Ranging ), built as bundles of similar devices rotating around a horizontal axis, were first used by the military, then tested in the autopilot of the machines. They performed quite well there, which caused a strong surge in investment in the region. Initially, the lidars rotated, giving a similar pattern several times a second:


Source: LIDAR: The Key Self-Driving Car Sensor

It was inconvenient, unreliable due to moving parts and rather expensive. Tony Seb in his lectures presents interesting data on the rate of decline in the cost of lidars. If for the first stand-alone Google machines, the lidars cost 70 thousand dollars (for example, the specialized HDL-64E used on the first machines cost 75 thousand):

Source: This and the Next From Parking to Parks –Bellevue & the Disruption of Transportation

When mass production of new models of the next generation, they are threatening to reduce the price significantly less than $ 1000:


One can argue about Tony’s example (the promise of a startup is not the final cost), but what is booming in this area, the rapid increase in production runs, the emergence of completely new products and the general decline in prices is indisputable. A little later, in 2017, the forecast for the fall in prices was as follows (and the moment of truth will come when they will start to be put into cars in large quantities):


Source: LiDAR Completes Sensing Triumvirate

In particular, relatively recently, several manufacturers have acquired the so-called Solid State Lidar , which in principle have no moving parts, which show a radically higher reliability, especially when shaken, a lower cost , etc. I recommend to watch this video, where their device is explained in 84 seconds very clearly:


Source: Solid State Lidar Sensor

What is important for us is that Solid State Lidar gives a rectangular picture, i.e. in essence, it starts working like a “regular” depth camera:


Source: Innoviz Envisions Mass Produced Self-Driving Cars With Solid State LiDAR

The example above gives a video of approximately 1024x256, 25 FPS, 12 bits per component. Such lidars will be mounted under the hood grill (since the device heats well):


Source: Solid-State LiDAR Magna Electronics

As usual, the Chinese, who are today in the first place in the world for the production of electric cars and who obviously aim at the first place in the world in autonomous cars, are lighting up:


Source: Alibaba, RoboSense launch unmanned vehicle using solid-state LIDAR

In particular, their experiments with non-square “pixel” depths are interesting, if you do joint processing with an RGB camera, it is possible to raise the resolution and this is a rather interesting compromise (“squareness” of pixels is important, in fact, only for humans):


Source: MEMS Lidar for Driverless Vehicles Takes Another Big Step

Lidars are mounted in different schemes, depending on the cost of the kit and the power of the on-board system, which will need to process all this data. The overall characteristics of the autopilot change accordingly. As a result, more expensive cars will better tolerate dusty roads and it is easier to “dodge” cars entering at intersections at the side, while cheap ones will only help reduce the number of (numerous) stupid accidents in traffic jams:


Source: Description RoboSense RS-LiDAR-M1

Note that, apart from Solid-State, low prices promise a couple more directions in which lidars are developing. Something to predict here is a thankless task, since too much will not depend on the potential engineering characteristics of the technology, but, for example, on patents. Just Solid-State is already engaged in several companies, so the topic looks the most promising. But not to say about the rest would be unfair:


Source: This 2017 LiDAR Bottleneck Is Causing a Modern-Day Gold Rush

If we talk about lidars as cameras, it’s worth mentioning another important feature that is essential when using solid-state lidars. They by their nature work like cameras with an already forgotten running shutter , which makes noticeable distortions when shooting moving objects:


Source: What is the difference between the global (global) shutter and the running (rolling)

Considering that on the road, especially on the autobahn at a speed of 150 km / h, everything changes pretty quickly, this feature of the lidars will distort objects, including fast flying towards us ... Including in depth ... Note that the two previous methods There is no such problem.


Source: good animation on wikipedia, which shows the distortion of the car

This feature, coupled with low FPS, requires the adaptation of processing algorithms, but in any case, due to high accuracy, including at large distances, lidars have no special competitors.

Interestingly, by their nature, lidars work very well on even areas and worse on borders, and stereo sensors do poorly on even areas and relatively well on borders. In addition, the lidars give a relatively small FPS, and the cameras are much larger. As a result, they essentially complement each other, which was used, among other things, in the Lytro Cinema cameras (in the photo above the camera there is an illuminated lens of plenoptics, giving up to 300 FPS, and below is Malevich’s black square lidar):


Source: Lytro poised to forever change filmmaking: debuts Cinema prototype and short film at NAB

If we already combined two depth sensors in a movie camera, then in other devices (from smartphones to cars) we can expect the mass distribution of hybrid sensors that provide maximum quality.

In a sense, lidars are still about investments that in the previous three years amounted to about 1.5 billion dollars (since this market is valued at 10 billion in 6 years and the graph below clearly shows how the average transaction size grew):


Source: An interesting March review of trends in the lidar market from Techcrunch

The development cycle of an innovative product even in highly competitive markets is 1.5-2 years, so soon we will see extremely interesting products. In the form of prototypes they already have.

Key points:


Processing video with depth


In a nutshell, consider why depth is not so easy to handle.

Below is an example of raw depth data from a very nice line of Intel RealSense cameras . Fingers on this video can be relatively easy to see the work of the camera and processing algorithms:


Source: hereinafter - the author's materials

The typical problems of depth cameras are clearly visible:


But that is not all. If we take a closer look at the data, it can be seen that in some places deeper data “see through” through closer ones:



This is because for convenience of processing the image is reduced to an image from an RGB camera, and since the borders are not precisely determined, it becomes necessary to specially process such “overlaps”, otherwise problems arise, such as such “holes” in depth on the arm:



When processing, there are a lot of problems, for example:




Nevertheless, it is clear that the picture has improved greatly. Note that slower algorithms can still significantly improve the quality.

In general, video depth processing ( depth video processing ) is a separate huge topic that not many people are engaged in so far, but which is rapidly gaining popularity. The previous market, which she completely changed, is the conversion of films from 2D to 3D.The result of manually building stereo on 2D video improved so quickly, showed so much more predictable results, caused so much less headache and less cost that in the 3D movie, after almost 100 years of shooting (!), Pretty quickly almost completely ceased to shoot and only started to convert . There was even a separate specialty - depth artist , and the old-school stereographers were shocked (“ka-ak ta-ak?”). And the key role in this revolution was played by the rapid improvement of video processing algorithms. Perhaps, somehow I will take the time to write about this separate series of articles, because my material is in large quantities.

Approximately the same can be expected in the near future for cameras depth robots, smartphones and autonomous cars.

Wake up! Revolution is coming!

Instead of conclusion


A couple of years ago, your humble servant brought a comparison of different approaches to obtaining video with depth into a single tablet. In general, during this time the situation has not changed. The division here is rather arbitrary, since the manufacturers are well aware of the disadvantages of the technologies and try to compensate for them (sometimes very successfully). Nevertheless, in general, it can be compared to different approaches:


Source: materials of the author

Let 's run through the table:


It is important to understand the situation in specific markets. For example, the plenoptic approach looks clearly lagging behind. However, if it turns out to be the best for AR (and this will become clear in the next 3-4 years), then it will take its rightful place in every phone and tablet. It is also clear that solid-state lidars look best of all (the “greenest” ones), but this is also the youngest technology, and these are their own risks, first of all patent (in the area of ​​a huge number of fresh patents that will not expire soon):

Source : LiDAR: An Overview of the Patent Landscape

Nevertheless, we don’t forget - the market of depth sensors for smartphones is planned to be worth $ 6 billion next yearand less than 4 it can not be, because last year amounted to more than 3 billion. This is a huge amount of money that will not only go to return on investment (the most successful investors), but also to develop new generations of sensors. There is no similar growth of lidars yet, but by all indications it will go literally in the next 3 years. And then it is the usual avalanche-like exponential process, which have already happened more than once.

The near future of depth cameras looks extremely interesting!

Carthage must be destroyed ... All video until the end of the century will be three-dimensional!

Stay tuned!

Part 1

Acknowledgments
:
  • . .. ,
  • Google, Apple Intel , ,
  • , , ,
  • , , , , !

Source: https://habr.com/ru/post/458458/


All Articles