📜 ⬆️ ⬇️

Legs, wings ... the main thing is the tail! The human body in terms of Intel RealSense


The work of the programmer is interesting for its diversity. Depending on the problem to be solved, you delve into modeling climate processes, then into the biology of cell division, then into stellar physics ... But it also happens differently: at first glance, the most common problem opens up an abyss of nuances for you. Developers who have come across Intel RealSense technology for the first time are probably surprised at how complex the processes of recognizing and tracking the position of hands or face are, because our brain does it almost without our participation. What features of our anatomy should be considered when designing natural interfaces, and how much success did the creators of RealSense make on this path?
At the end of the post - an invitation to Intel RealSense Meet Up in Nizhny Novgorod on April 24. Nizhny Novgorod, do not miss!

Look at your hands and try to bend different fingers. Note: when flexing, they depend on one another. Therefore, it is enough to track only two joints in order to achieve realistic flexion of four fingers (all but a large one). Only the index finger can be bent without bending the rest of the fingers, so the index finger requires its own joint tracking algorithm. For other fingers, everything is simpler: if you bend the middle finger, then the ring finger and little finger will also bend; if you bend the ring finger, while bending the middle finger and little finger; if you bend the little finger, the middle and ring fingers will bend.

We continue the study of our hands. The angle at which a certain phalanx of a finger can bend depends on the length of this phalanx (and not on the joint). The upper phalanx of the middle finger can bend at a smaller angle than the middle phalanx of the same finger, and the bend angle of the middle phalanx is less than the angle at which the lower phalanx can bend. We note here that tracking the hands of a child is much more difficult than an adult, since it is more difficult to obtain data for small hands and to accurately interpret them.

Currently, RealSense technology allows you to simultaneously track 22 joints in each hand and two hands (by the way, they do not have to be right and left, but may belong to different people). In any case, the computer knows which hand is in front of it. An important step forward was the elimination of the calibration stage, although in some difficult cases (again, let's say, if a child is in front of the camera), the system asks for an initial calibration. But then a person’s hand is not only endorsed at key points, but can also be completed independently if its part is out of the camera’s field of view or is not sufficiently illuminated. In the same case, if conditions allow, the hand will be separated from the background behind, even if it periodically changes.
')
The accuracy of determining the position of some parts of the hand relative to the others makes it possible to implement very interesting variants of information transfer. Say, you can use the relative values ​​of the disclosure of the palm - from a fully open to a fully clenched fist (from 0 to 100). Agree this is somewhat similar to sign language. By the way, the implementation of the classic sign language will open for RealSense another important and necessary area of ​​application - the rehabilitation of people with disabilities. Hardly any computer technology could have a more humane use ...

We now turn to the recognition of gestures. Currently, Intel RealSense supports 10 ready gestures - you can see them in the figure. Recognition can be static (motionless postures) or active (posture in motion). Naturally, nothing prevents you from switching from one mode to another, for example, the static open palm becomes active when waving. Gesture recognition is an order of magnitude more difficult than simply tracking movements, since it is necessary here not only to calculate the position of points, but also to compare movements with certain samples. Therefore, learning cannot be done without learning, and both sides must learn: the computer needs to learn how to detect your movements, and you need to move correctly.

The clearer your gesture, the clearer it will be for the car. Perhaps at first you will experience a certain psychological discomfort: in real “human” life, we almost never fix gestures, but give them out one by one continuously. For RealSense, the initial and final phases and a long action between them are necessary (the duration of the gesture, by the way, can also be used as a parameter). Dynamic motion is determined depending on the situation of movement or time.

As you can see, there is plenty of room for misunderstanding in natural interfaces. Say, those gestures that we call "similar" computer, most likely, interprets as the same. Such situations should be avoided by designers. Further, the application must constantly monitor that the person in the frame does not get out of it and, if necessary, issue warnings. A lot of nuances are added by the RealSense camera, which has its own characteristics ... well, the simple task is not interesting to solve, right? Here we solve the difficult.

Next time, if an opportunity presents itself, we will talk about face recognition. In the meantime, taking this opportunity, I want to invite all programmers from Nizhny Novgorod who are interested in Intel RealSense technology to an informal meeting with specialists of the company , which will be held on April 24 , on Friday, at ul. Magistratskaya, house 3 . In the program: reports on the topic, answers to questions, equipment demonstrations and, of course, interested discussions - how can it be without them? Come, it will be interesting.

IDZ articles were used to write the post:

Source: https://habr.com/ru/post/256167/


All Articles