
The mouse has been widely used to control personal computers for over 30 years. It would seem difficult to imagine a world with computers without mice and keyboards, but methods of interaction with computer systems are constantly evolving. Consumers need the freedom to control systems and applications using a more intuitive way of expression. Fortunately, this concept is now easy to implement: it is enough to have a consumer-level personal computer. Gesture control is developing quite intensively in the field of games, and Intel RealSense technology is one of the most advanced developments in this direction. The integration of gesture control into desktop PCs was a matter of time.
This example describes the solution of the American company Ideum - the GestureWorks Fusion program - and the use of multi-mode input to create a powerful and intuitive system capable of interpreting gestures and voice commands. It shows how Ideum developers used the
Intel RealSense SDK and the new
Cursor mode , which allows them to quickly and conveniently interact with traditional applications designed for keyboard and mouse. In addition, the article describes the problems faced by designers and developers, and describes approaches to solving these problems using a combination of technologies from Intel and Ideum.
Introducing GestureWorks Fusion
GestureWorks Fusion is an application that uses the
Intel RealSense SR300 camera for multi-mode input, such as gestures and voice commands. In the original version of this software product, users can intuitively control the operation of websites to play streaming videos, such as YouTube *. Using traditional graphical user interface controls, users can play, pause, rewind video without touching the mouse, keyboard, and screen. Thanks to direct feedback, the system is very easy to use and learn.
')
GestureWorks Fusion makes it easy and fun to use streaming websites, such as YouTube, using intuitive voice commands and gestures to control if the system is equipped with an Intel RealSense SR300 camera.The Intel RealSense SR300 Camera is an enhancement to the Intel RealSense F200 Camera, which was one of the first and most compact cameras with integrated 2D and depth imaging modules. As in the model F200, the Intel RealSense SR300 camera has the ability to capture color images of high definition with a resolution of 1080p and advanced three-dimensional shooting, and the allowable range has been increased. Together with the microphone, this camera is the ideal solution for tracking the head and hands, as well as for face recognition. “In the Intel RealSense SR300 camera, we are attracted to the fact that this camera can do all this at the same time, very quickly and exceptionally reliably,” explains Paul Lacey, technical director of Ideum and head of the GestureWorks development team.
GestureWorks Fusion is based on the capabilities and developments of two existing Ideum products: GestureWorks Core and GestureWorks Gameplay 3. GestureWorks Gameplay 3 is an application for Microsoft Windows * that provides touch control in popular PC games. Users can create their own touch controls, share them with other users, or download community controls.
GestureWorks Core is a multi-mode interaction system that performs a full three-dimensional analysis of head and hand gestures and supports interaction using multi-touch input and voice. GestureWorks Core SDK features over 300 ready-to-use gestures and supports the most common programming languages, including C ++, C #, Java *, and Python *.
Initially, GestureWorks Fusion was designed to work with Google Chrome * and Microsoft Internet Explorer * running Microsoft Windows 10. It is assumed that GestureWorks Fusion will work with any systems equipped with an Intel RealSense camera. The company also plans to develop its system so that it can work with a wide range of applications, including games, office applications and presentation programs.
Problems and Solutions
Ideum experts faced several problems, trying to make the GestureWorks solution intuitive and easy to use, especially for new users. The developers already had the experience of creating multi-touch tables and wall panels for public institutions and knew that users are annoyed if the technique does not work as expected. Based on this experience, the designers decided to make gestures as simple as possible and focus on the most familiar behavior.
GestureWorks * Fusion uses a simple set of gestures directly related to the user interface of the application; Access to popular existing applications is realized without using a traditional or touch interface.The following set of difficulties arose due to restrictions in the operating system and in the browser. So, modern web browsers are not optimized for multi-mode input. Because of this, it is difficult to determine, for example, the user's input focus, that is, the space on the screen with which the user intends to interact. It also disrupts the smoothness of movement between different segments of the interface and even switching from one website to another. At the same time, it became obvious that it is impossible to simply give up scrolling and clicks, these operations are fundamental for the desktop, they are used in almost all modern applications.
Moreover, for interfaces of this type it is important to be able to intuitively enable and disable control using gestures. The person intuitively understands which gestures are meaningful and in what circumstances. An application, unlike a person, requires a context for analyzing gestures. In GestureWorks Fusion, it is enough to raise your hand in the camera’s field of view to enable the control interface with gestures. If the hand goes out of sight of the camera, the gestures interface disappears; This approach is similar to the display of additional information when you hover the mouse.
Multi-mode input itself is associated with certain programming issues that affect the architecture and implementation of Ideum programs. For example, in the Ideum application, a voice command is provided for each gesture, which can cause conflicts. “Multi-mode input needs careful consideration to succeed,” explains Lacy.
No less important factor was the response time, it must meet the already existing standards set for mice and keyboards (otherwise, the complexity of all operations for the user, which has to constantly adjust the input, increases dramatically). This means that the response time should be no more than 30 ms, and ideally 6 ms. This figure Lacy calls the "Holy Grail of human-computer interaction."
Finally, Ideum developers faced the problem of customization. In the GestureWorks Fusion application, the setting is mostly implicit, behind the scenes. “The system automatically adapts and changes, gradually increasing usability as the product is used,” explains Lacy.
Using the Intel RealSense SDK
Developers get access to the capabilities of the Intel RealSense SR300 camera through the Intel RealSense SDK, which is a standard interface for an extensive library of pattern detection and recognition algorithms. These algorithms include a number of useful functions, such as face recognition, gesture and speech recognition, and text-to-speech processing.
The system is divided into a set of modules with which developers can focus on various aspects of interaction. Some components, such as the SenseManager interface, provide coordination for common functions, including face and hand tracking, and control the multimode control pipeline, including I / O control and processing. Other elements, such as the Capture and Image interfaces, allow developers to track camera work and work with captured images. The HandModule, FaceModule, and AudioSource interfaces provide access to face and hand tracking, and audio input.
The Intel RealSense SDK simplifies integration with support for many styles and techniques for writing code. Wrappers are provided for several common programming languages, platforms and game engines, C ++, C #, Unity *, Processing, and Java. The Intel RealSense SDK also provides limited support for browser-based applications with JavaScript *. The Intel RealSense SDK simplifies the implementation of complex human-computer interaction algorithms; thanks to this package, developers can focus on improving user convenience, rather than writing code for gesture and speech recognition algorithms.
"Thanks to Intel solutions, development costs are significantly reduced," said Lacy. "Intel technologies take on an important part of the work, they guarantee the input and recognition of gestures, which greatly simplifies the tasks of developers, gives them the opportunity to confidently engage in new projects on the interaction between humans and computers."
Work on the solution
When creating GestureWorks Fusion, Ideum developers applied a number of new techniques. Consider, for example, the problem of determining the focus of a user. To resolve this problem, it was decided to use the new Cursor mode, which first appeared in Intel RealSense SDK 2016 R1 for Windows. In Cursor mode, there is a fast and accurate way to track a single point that corresponds to the overall position of the arm. Due to this, the system is able to maintain a small set of gestures, such as clicks, opening and closing of the palm, rotation in any direction. In Cursor mode, the user's focus problem is resolved: the system interprets the input with gestures in the same way as mouse input.
Using the built-in Cursor mode in the Intel RealSense SDK, developers can easily mimic common desktop management activities, such as mouse clicks.Using these gestures, users can navigate with high precision and confidence in the application and control its work "on weight" without touching the keyboard, mouse and screen. Cursor mode helps in other areas. “We found, among other things, that not everyone is gesticulating in the same way,” Lacey said. The Cursor mode helps to match similar gestures with the same context, which contributes to the overall reliability of the work.
The developers also emphasized the ease of introducing the Cursor mode into existing prototypes, which made it possible to release new versions of GestureWorks Fusion in just a few hours: it took only a few lines of code to be added. For example, in GestureWorks, the Cursor mode is used to get the coordinates of the pointer image and to synthesize mouse events, as shown in the following code snippet.
After that, you can quickly determine which window the focus is in using the standard Windows API.
In Cursor mode, tracking works twice as fast as tracking your entire hand, with half the power consumption. “Ease of use is shaping the expected results in the most predictable way possible,” explains Lacy. - When a very high level of confidence in gestures is achieved, it allows you to focus on refining other areas of user interaction; it also contributes to lower development costs and allows for greater results with less resources. ”
To support multi-mode input, GestureWorks uses the Microsoft Speech Recognition API (Microsoft SAPI), which contains components not found in the Intel RealSense SDK, such as partial hypotheses. This allows you to accompany each gesture with a corresponding voice command, as shown in the following code snippet.
IspRecognizer* recognizer; ISpRecoContext* context;
Parallelization is used to recognize users' intentions, which allows them to interact and provide feedback almost simultaneously at a speed of 60 frames per second. “Efficient use of multi-threaded processing has enabled us to reduce response times,” says Lacy. “Multithreading has expanded our capabilities, we were able to achieve results in the feasibility of which we were not even sure, while maintaining low levels of delays.”
Ideum developers also tried to more fully describe and formalize gestural-based interaction by developing an advanced XML configuration script, which was called Gesture Markup Language (GML). Using GML, it was possible to create a complete library of gestures that can be used to solve problems of interaction between a person and a computer. Due to this, developers managed to avoid the excessive complexity of gesture recognition algorithms, since the input range for tracking movements and multi-touch control can cover thousands of varieties.
“The impact of multi-mode interaction with the Intel RealSense camera can be described in one word: context,” Lacy said. “We are able to recognize a new level of context, opening up fundamentally new possibilities for human-computer interaction.”
Next steps
Ideum developers plan to develop GestureWorks Fusion, add support for additional applications, including office suites, graphic applications and computer-aided design systems, in which three-dimensional gestures will be used to manage virtual objects. GestureWorks can also work on tablets supporting Intel RealSense, in home entertainment systems and even in cars, as well as with other technologies in solutions that are very different from traditional desktops and laptops.
In the future, other systems, including solutions with virtual, augmented and mixed reality. This also applies to the Internet of Things technology, where new interaction models will allow users to create their own unique space.
“In the course of working on GestureWorks Fusion, we were able to discover new ways to interact in a modern environment,” explains Lacy. “However, regardless of the environment, it should be possible to simply control the device with gestures and speech and select the desired sequence of actions without encountering the need to control the device in the traditional way, like a computer.”
Resources
Visit the Intel Developer Zone to get started with Intel RealSense technology.
Learn more about Ideum, which developed GestureWorks.
Download the Intel RealSense SDK.