User feedback based on Intel RealSense technology

Software helps to overcome human limitations. Programs make it possible for people with visual impairments to read, programs have helped a person to land on the moon and come back, programs make it possible to share information on a global scale with incredible ease. A few decades ago, all these possibilities seemed fantastic. Nevertheless, despite the power of the programs in our life, the ways of our interaction with the programs are far from perfect.
With the advent of natural user interfaces (NUI), such as Intel® RealSense ™ technology , we can interact with programs in a new, more natural way. NUI interfaces enable us to work more conveniently, more simply and more efficiently. But for these new ways of interacting with programs, you need to create a new language.
This article describes the experience of Chronosapien Interactive in the development of natural game interfaces with a special focus on user feedback in this environment.

User expectations

Software in its current state is inflexible and, if you like, ruthless. It does not accept anything without explicit user actions and expects to receive full commands to perform the necessary actions. We are well trained and tailored to program requirements. But when using natural interfaces, the picture changes. All that we have learned about computers and how they know the world around us, disappears when we say “Hello!” To the computer. When we are told to wave our hand in front of the screen, but the computer does not respond to it immediately, we are embarrassed because, from our point of view, we did exactly what we were asked to do. Some of this misunderstanding stems from a lack of knowledge of technology, but it is mainly due to the fact that users are asked to communicate with the computer in a natural way, and this leads to the humanization of the computer. Users behave as if they are communicating with a person, but at the same time they will not receive the same clues as in natural communication: facial expressions, eye contact, gestures, etc. It is necessary to compensate for the absence of such feedback signals by creating obvious answers on user actions, you need to give the user answers like "We received your message", "I did not understand this because ..." and "Accepted, I am working on getting a response." In addition, for the formation of the desired expectations of users require some training. Treat this as an acquaintance with a new person from another country.

Did you have to communicate with someone who made a long pause in the middle of the phrase in order to better formulate his idea? Or imagine that you waved to a person, and he raised his hand awkwardly in response. Or you were in a room where it was very noisy, and heard only scraps when a friend shouted to you: “It's time to leave!”. In such situations, relying on contextual cues and past experience, you could correctly interpret the intentions of people, having only partial information. But such moments form serious difficulties in natural interfaces.

In the above examples of interaction, some of the information was missing, but in most cases we could recover it based on other related information. When someone stops in the middle of a phrase to collect his thoughts, you do not forget what was said earlier and do not respond to the spoken half of the phrase, letting the other person to finish it. The reason is that you know, based on indirect information, such as intonation of speech, facial expression and eye contact, that the other person is going to say something else. If someone awkwardly gave you up, you are not confused because this gesture does not fully comply with the generally accepted standard. Instead, you interpret this gesture on the basis of the interlocutor’s most likely behavior in this context, and you may also make any assumptions about the interlocutor’s identity in order to better adapt to the information coming from him in the future. If you hear only a part of a phrase in a noisy, crowded room, you do not need to hear a complete sentence to guess that it’s time to leave. These examples highlight two important things: context and related information. In my examples of user feedback in natural interfaces, you will constantly encounter the following premise: it is better to give too much information than not enough.
')
The analogy with the attempt to talk with the interlocutor in a noisy crowded room is very well suited to work with natural interfaces. The situation is aggravated by the fact that in this case your interlocutor (computer) has a short-term memory of the newborn baby, and in terms of its ability to perceive the context, it is at the level of the fruit fly-drosophila. The following are the main problems in creating user feedback using data in Intel RealSense applications.

Often, you do not know when the user began to interact with the application.
You cannot distinguish between situations when the user interacts with the application and when the user does something completely foreign.
Without significant efforts, you cannot teach a program to distinguish a user who interacts with an application from another person who simply came to the attention of the camera.
There will be a lot of interference in the data for interaction, sometimes such data will be false.
The data are not limited to the real world.
It takes time to process the data, causing awkward pauses between receiving a command and responding to it.

These hands-related issues are discussed in the sections below, based on various implementations of Intel RealSense technology. There are a number of general principles to be remembered when designing both feedback and the interaction itself. In the course of my work, I managed to find a solution for some problems, but they are still a serious obstacle to the natural use of computers. When developing using natural interfaces or for them, be prepared for a huge amount of testing and many successive steps. Some of the problems you encounter will be related to the hardware, others to the SDK, and still others with natural interfaces.

Hand tracking in the Intel RealSense SDK

The ability of programs to interpret hand movements opens up new possibilities for program makers. In addition to creating an intuitive platform on which to build interaction between a person and a computer, the use of hands provides a new level of “immersion” in an application that is otherwise unattainable. Using the Intel RealSense SDK, developers can work with a multitude of monitored hand nodes, with its current state of “openness,” with various postures, movements, and gestures. These features, of course, are associated with certain limitations, as in other modes of Intel RealSense applications, and these limitations will have to be bypassed in some way. Below, I talk about these limitations, as well as describe the different ways to control with the hands that we tried to use.

Hand Interaction Restrictions

Tracking scope

The scope of tracking Intel® RealSense ™ in hand tracking mode is finite and may limit the application's capabilities.

One of the problems of interaction with the help of hands in the SDK is the limited range of equipment available for the equipment. Since the range of movement of the hands of a person is large enough, often the hands go beyond this range. Going beyond the tracked volume is the most common problem faced by new users trying to interact with Intel RealSense applications with their hands.

Overlay

Unsupervised Context Learning Mutual Robust Arm and Hand Tracking

The second most common limitation of SDK and other image-based tracking systems is overlaying. Simply put, overlaying is when one object is blocking another. This problem is most important when tracking hands, because in many natural postures and gestures, hands are one in front of the other from the point of view of the camera. On the contrary, if the screen is used as a viewer, hands often block the screen from the user.

Hand size relative to screen size

When interacting interactively with an application using hands, it’s natural to create an interface as if the user touched the viewer, that is (in the most frequent case) to the screen. However, if using this method of interaction to use hands, then there is no longer any space left on the screen for almost anything. This causes problems for both the graphical user interface and the application itself.

Hand fatigue

Managing the digital world with your hands is a new degree of freedom, but it's easy to overdo it. One of the most important problems noted both in our and in other applications: when using hands to work with applications, users begin to feel tired after 60–90 seconds. The situation is somewhat simplified if there is a table on which the user can put elbows, but this does not completely solve the problem.

Lack of tactile feedback

Of all that we lose when we abandon traditional computer interfaces, tactile feedback is the most important. If you make gestures with your hands in the air, the simplest feedback is lost - the mechanical sensation of pressing a button. This means that the application, since tactile feedback is impossible, should provide visual and audible feedback.

Hands as a pointer

Our implementation of hands as a pointer in the game Space Between . Pointer - a glowing ball next to sharks

In our game Space Between, we came to the conclusion that you can conveniently control the application using your hands as a pointer. This provides an intuitive connection between controlling the application in the traditional way (with the mouse) and with the new way (with the hands). Below I describe some of the problems that we encountered in this campaign, our implementation, as well as our successes and failures in terms of usability.

Our tasks

Here are the problems that we found when trying to use hands as a pointer.

Users did not understand exactly what they were driving.
In Space Between, users directly control the glowing ball that follows their hands on the screen in real time. In our games, a player-controlled character follows the pointer. The result was a somewhat mediated control. Many times, when users first tried to play our game, they needed a lot of time to realize that they were controlling the pointer, and not the character itself.

Users did not understand what the pointer controls
Since in our game, pointer control is used in different contexts and in different ways, sometimes users could not understand what exactly the pointer should control.

User hands often went beyond the tracked volume.
As mentioned earlier, this is the most common problem when using hands to interact with Intel RealSense applications. Even when the visible pointer was on the edge of the screen, users did not associate it with the fact that their hands reached the limits of the tracked volume.

Pointer to Space Between

In Space Between, we used a two-dimensional pointer in three ways.

gust of wind

Pointer in the form of a gust of wind in the game Space Between

What worked
Of all the three options, the gust of wind was the most abstract. Lucky was that its amorphous outlines allowed to mask most of the location data interference that inevitably occurs in Intel RealSense applications. In addition, the voice was used, the volume of which varied depending on the speed of movement of the pointer. This was convenient because users knew whether their hand movements were being tracked or not (this could also be determined by the movement of clouds on the screen).

What did not work
The amorphous shapes were convenient for disguising interference, but they did not make it possible to accurately determine the location on the screen. Because of this, there were difficulties when trying, for example, to select a particular game by hovering over objects on the screen.

Glowing ball

Another pointer in the game Space Between

What worked
The pointer emitted light to the environment, but at the same time it was drawn on top of it. Thanks to this, users knew exactly where their character would move in the environment, and there were no problems like “the pointer was lost among the walls”. Due to the relatively small size, we also could see the accuracy of the hand-tracking module in the SDK. Initially, we used the ball itself as a pointer. But at the same time there was a problem: it was easy to lose sight of it, if you make quick movements with your hand. To cope with this, we created a trail of particles, which remained behind the pointer for about a second. This solution had a nice side effect: it was interesting to just move the pointer in space to draw shapes to it. Finally, in order to connect the pointer with the player’s character, we created a trace connecting them. This was especially useful when the player’s character was locked up in an environment and could not move anywhere.

What did not work
The main problem with the luminous ball in our games: users sometimes did not understand that they were controlling the pointer, and not the character itself. Another problem with the glowing ball. In addition to controlling the position of the character, we also tried to use it for another function - displaying an open or closed palm. To do this, we increased the intensity of light and made it brighter. In the future, we will refine the pointer to visually show its change when opening the palm. In addition, we can briefly show the image of the hand next to it, so that users understand exactly what they control.

Hand pointer

The hand pointer in Space Between is used to interact with the menu.

What worked
The hand pointer was the easiest and most intuitive to use of all three options. Since the pointer was shaped like a hand (and the right size), users immediately understood what and how they controlled. We advanced further and created animated images of transitions between different hand positions, corresponding to the current state. It was very convenient, because the user immediately saw that the system recognized the change and responds to it. Even if an action was not used in the current context, players could easily find out exactly what the application interprets and how well.

What did not work
From the point of view of convenience, the hand pointer was great, but it didn’t fit the style of the game at all. Either we violated the player’s immersion into the atmosphere of the game world, or could use this pointer only in the context of controlling the application, for example, in the game pause menu and parameter settings.

findings

Our experience shows that the question of how to implement a pointer when using hands does not have a definite answer: everything depends on the context and the application. Nevertheless, there are a number of universal rules regarding the provision of feedback to users.

In all possible cases, provide the user with visual and audio feedback when changing the state of the hands. This helps players understand what is being monitored by the system, and also allows them to naturally perceive gaming capabilities.
If the user is outside the tracked volume, immediately and clearly report this. We currently do not have this opportunity in Space Between, but it solves many problems associated with the convenience of users, for example, when users do not understand why the game no longer tracks their gestures or works with delays when hands return to the field of view of the camera.

Hands and gestures

The first stage of the gesture of raising in the game The Risen

Gestures are a powerful tool to express your thoughts and perform actions. The constancy of gestures makes it possible to arrange very precise control and create sensations unique to the environment in which gestures are used. Using gestures helped us create our games based on Intel RealSense technology, namely Space Between and The Risen, and connect players with the actions they perform. As mentioned earlier, first I will talk about the problems we encountered when using gestures, how we implemented them, and what we think worked and what did not.

Our tasks

Gestures are harder than just position tracking. Here are some of the problems that we discovered while working on gestures.

There is no way to determine the beginning of a gesture.
This, of course, to some extent depends on the particular gesture used, but in general, ready-made gestures supported by the corresponding Intel RealSense mode do not contain any instructions as to how they started before the gesture is actually made. It seems to be nothing serious, but with complex gestures you have to wait for their completion and only then find out that the gesture did not work and repeat it again.

Many users perform gestures correctly, but not accurately enough for the application to recognize.
As I said above, gesture recognition software works very meticulously. Scroll gestures must travel a certain distance, hands must move in a certain way, at a certain distance from the camera, etc. All this sometimes makes the use of gestures very inconvenient.

Some hand angles are not optimized for tracking using Intel RealSense technology.
One of the most serious shortcomings of hand tracking algorithms is the inability to track certain angles. Currently, the system perfectly detects the hands if the palms are turned toward the camera, but the detection is much worse if the palms are perpendicular. This affects many different gestures, but especially gestures with complex movements. For example, in the game The Risen, we created a gesture to raise the skeletons: first, the user shows the camera palm, then lowers his hands, turning his palms up, and then raises them. In that part of this gesture, when the palms are flat, the application often stops tracking them, which is interrupted by a gesture in the middle of it.

The gesture of raising in the game The Risen

The second stage of the gesture of raising in the game The Risen

The game The Risen uses a nonstandard gesture of elevation, important for the player to enjoy the gaming atmosphere and feel part of the game world. And that's what we learned during the work.

What worked
We managed to achieve a complete understanding by the players of the necessary movements, since the gesture is repeatedly used in the game. In addition, we wanted to avoid complex texts, which describe in minute detail the change in the position of the hands with time. The decision was this: we showed animated hands on the stage in the training section so that you can see exactly how you should make the desired gesture. The hands in the animation were the same size as the user's hands on the stage, so the users immediately understood what was required of them.

When creating the gesture, we knew that the hands of the users would most likely not be properly positioned. We also took into account the hand-tracking restrictions in the SDK. To solve this problem, we chose the initial gesture posture so that the tracking module recognizes it well. In addition, we warn the user: "Aha, now there will be a gesture of raising." The user receives a visual and audible notification that the system is aware of the beginning of the gesture. This avoids unnecessary repetitions and tells the player what exactly the system needs.

Following the principle of dividing gestures into parts for greater convenience, we also launch visual and sound effects when reaching the second part of the gesture. Since this gesture is quite complex (and non-standard), it signaled to the players that they are doing everything correctly.

We have divided the gesture into parts for technical reasons and for reasons of convenience, but it can be done together, in one movement. Parts are used only for displaying hints on correct execution and indicating errors, if any.

What did not work
Our main problem with using gestures was related to tracking restrictions. When the palm of your hand becomes perpendicular to the camera during the gesture, tracking often stops and the gesture is canceled halfway through. This we can not control, but here we are able to help inform users about this feature.

findings

Here are some things to keep in mind when creating feedback for input using gestures.

Proper preparation and explanation is essential for users to understand how to perform gestures. We used an animated image of three-dimensional hands, and this method seems to be optimal, as users understand what needs to be done.
Providing feedback at different stages of complex gestures helps to avoid irritation. When users get used to the technology, informing them that the system is working (or not working) helps to avoid the forced repetition of gestures time after time.

Virtual hands

Using virtual hands to interact with the environment in the game The Risen

The opportunity to reach out to the virtual world and interact with it, as with our own world, is an unforgettable experience. The level of immersion achieved at the same time, it is impossible to compare with anything. In The Risen, we give users the opportunity to stretch their hands into the game world to open doors or turn on traps. Below I list some of the problems associated with the interaction with the help of hands, describe our implementation of virtual hands in the game The Risen and tell how successful it has been.

Problems found

Managing virtual hands is very cool, but the implementation of such control with the help of ready-made features of the SDK is associated with certain difficulties. Here are some problems that you have to somehow solve.

Lots of data interference
When you display a hand and manage data from the SDK, a lot of noise is generated. The SDK has smoothing algorithms, but they are far from completely eliminating unnecessary noise.

Data is not limited to the real world.
In addition to interference, nodes (corresponding to the joints of the hand) sometimes receive a location that is physically impossible in the real world. In addition, sometimes they jump across the screen at the speed of light for several frames, this happens when part of the arm is not visible.

Small interactions are very difficult to perform and detect.
We wanted to provide players with the ability to interact with objects of relatively small size compared with the size of the hand. But due to the significant interference in the data, due to the fuzzy feeling of depth and the absence of tactile feedback, this turned out to be almost impossible.

Virtual hands in the game The Risen

Players can interact with the world with the hands of a ghost-skeleton in the game The Risen. Using the hands of the player helps the skeletons in various ways, for example, opens the door or includes traps for opponents. The implementation of virtual hands has helped us learn a lot of new things.

What worked

The interface of the game The Risen: display of the detected face and right hand

The first thing to note is the graphical user interface that we created for The Risen. The skull in the upper left corner represents the player and the controls currently being tracked. When the system detects hands, they are displayed on the screen in animated form, showing the player that the system recognizes them. It would seem that it is very simple, but it is actually useful if the player can determine what works and what does not. For example, if the system detects a player’s head, but does not detect his hands, this means that they are outside the tracked volume.

To indicate which objects in the world can be used by hands, we show an icon that, when objects first appear on the screen, hangs over them and shows how you can interact with them. We wanted players to know how to use different things, and also so that they could discover the interactive possibilities of the environment. Displaying the icon in the early stages of the game turned out to be a well-balanced solution.

I describe our initial approach to interacting with environment objects in the “What didn’t work” section below, but in the end we achieved what we wanted: a simple gesture of seizure, in which the whole hand was used, worked quite acceptable. This allowed to a certain extent to solve the two problems mentioned above (fuzzy feeling of depth and the absence of tactile feedback) without significant damage to the game. At the same time, however, it was necessary to more strictly approach the selection of objects with which you can interact in this way, since if there are two or more objects in the hand, they will be affected all at once when interacting.

To indicate to users that their hands are in a state of interaction (a squeezed palm), we changed the color of the hands. At the same time, the use of hands became similar with the use of buttons: there was an inactive state, and there was an active state in which it was quite obvious what the application expects. Based on this, users had to guess where to interact.

What did not work
When we first conceived of using hands to interact with the environment, we imagined such movements as “pull the target”, “move the book”, as if these objects were right in front of the player. The problem was that it was very difficult to carry out these movements exactly. To grab the chain with your fingers, when you cannot perceive the depth correctly and do not get tactile feedback, it turned out to be a very difficult task with a huge number of unsuccessful attempts. This problem can be somewhat smoothed due to more accurate tracking, but in fact it can be solved by using a stereoscopic screen and haptic feedback for the hands.

findings

A brief summary of the main findings obtained when trying to use virtual hands.

Interaction with simple gestures works best. Perhaps, when the technology will be improved (or using other viewing tools), small gestures can be used, but for now you should stop at the simplest.
Provide visual and audible feedback when your hands are in a “interacting” state. Due to this, the user learns that the system finds objects within reach, and this simplifies the interaction.

Head Tracking in the Intel® RealSense ™ SDK

Head tracking in the Intel RealSense SDK shows user head orientation

Suppose the application "knows" where the user's head is. So what? The benefits of this are not always obvious. Unlike hands, perhaps, one should not develop sophisticated control mechanisms based on head tracking, otherwise users cannot do without vertigo. Nevertheless, the limited use of head position tracking will help to give a unique color to the application and more deeply immerse users in the virtual world. Intel RealSense SDK . , .

, , Intel RealSense SDK , , , . , Intel RealSense, Intel RealSense, . , - . , , .

, . , . , ( ). , , , , , . , ( ).

Twilight Zone Space Between

Space Between , . , . « ».

, , , .

, .
, Intel RealSense, -, , . , - Intel RealSense . , , .

, .
, . , , . ( , ), , .

Space Between

Space Between , , Twilight Zone. , . , — . , , .

, . , , . . , ( ), : , .

Space Between, , Intel RealSense SDK

, , , , . , , , . , , . , , .

findings

, .
, - , .
, , .

Intel RealSense SDK

: , . , Intel RealSense SDK, : . , Intel RealSense SDK. . , ( ) , . , .

.
, Apple Siri*, Google Now* *. , . , Intel RealSense SDK. , , .

.
, , , Intel RealSense SDK .

, .
, , .

.
Intel RealSense SDK : . , , , , .

.
, Intel RealSense (F200), , -, .

.
, . . , . , , , - .

The Risen

— . , . , . The Risen — . , .

.
, . , Intel RealSense SDK, .

, .
, , , , , ? , , .

, , .
, , .

The Risen

The Risen : , , , . , . .

, , . LABEL_SPEECH_BEGIN LABEL_SPEECH_END . . , , , .

The Risen

, , , . , , , , , ( ). .

, , . , , . : , , , , .

, , . , , , , , . , , , .

, , , . ( ) , , ( ), . , , . , , .

findings

, , , .
, , , .
, , .
, .
, .

. . , . .
The first step on this path is Intel RealSense technology and other natural user interfaces. These technologies enable us to truly interact in a new way with the world. However, everything is just beginning here, and we, being developers and creators, are responsible for ensuring that technologies evolve in the right direction. One day, computers will become like our best friends: they can foresee our intentions even before we begin to express them. Now computers are closer to the level of pets: to tell us that they need to walk, they need our help.

Source: https://habr.com/ru/post/275173/

All Articles

User feedback based on Intel RealSense technology

User expectations

Hand tracking in the Intel RealSense SDK

Hand Interaction Restrictions

Tracking scope

Overlay

Hand size relative to screen size

Hand fatigue

Lack of tactile feedback

Hands as a pointer

Our tasks

Pointer to Space Between

findings

Hands and gestures

Our tasks

The gesture of raising in the game The Risen

findings

Virtual hands

Problems found

Virtual hands in the game The Risen

findings

Head Tracking in the Intel® RealSense ™ SDK

Space Between

findings

Intel RealSense SDK

The Risen

findings

More articles: