Developing games for children using Intel Perceptual Computing. Clifford Adventures Example

We offer you a shortened translation of an article highlighting the development of a series of interactive educational games for young children “The Adventures of Clifford” from the company Scholastic Interactive. The diverse use of gestures and voices in this game was made possible thanks to the Intel Perceptual Computing SDK 2013 technology in conjunction with the Creative Senz3D * camera. It also discusses new methods for recognizing gestures and voices using perceptual computations, methods of solving problems with the SDK.

Educational game concept

In a series of four interactive episodes about Clifford, players view the plot and interact with it. The game involves children in action, offering various ways to “help” Clifford with certain gestures and statements. Thanks to Scholastic's interactive technology, Clifford responds to the voice and movements of children. During the plot of the game, they watch animated excerpts of each adventure and actively contribute to the heroes by touching the screen or giving answers to questions. The plot develops as the child interacts with the game. Each game is designed to develop basic literacy skills and can be repeated as many times as desired.
The Intel Perceptual Computing SDK 2013 includes APIs, code samples, and tutorial on how to interpret gestures and speech. Developers can easily combine the capabilities of the SDK in speech recognition, hand and finger gestures, facial expressions, augmented reality technology and background subtraction, creating software for various devices. The use of a microphone, camera, touch screen, positioning functions and geolocation, widely used on tablets, laptops, transformers and all-in-one computers, enhances the multidimensionality of the perception of new applications.

Intel Perceptual Computing Platform Development

Adaptation of perceptual computation to the analysis of movements and voices of children carries with it a number of difficulties. Scholastic comprehensively tested each prototype in order to evaluate the design of the game and the reality of passing its levels. This helped to identify potential problems that could be faced by the target audience, and to find solutions for them.
Some aspects of this work may be of particular interest in terms of perceptual calculations. They are listed below.

Voice recognition calibration

To ensure acceptable voice recognition quality, it was necessary to conduct a series of checks. The child's voice changes as he grows older, especially at the age for which the Clifford series is intended. Therefore, it was necessary to achieve a level of calibration so that the children's voice and speech constructs were recognized correctly.
')

Episode of the game, requiring speech participation of the player

Recognition and localization of gestures

In one of the Clifford Adventure games, the child is required to help the dog catch toys falling from the tree. To do this, you need to "grab" the basket on the screen with your hand and move it in different directions.
Special algorithms were developed that recognize gestures and correlate them with touch coordinates, so that the basket on the screen moves in the direction of the child’s hand. Small players took part in testing with pleasure. Previously, the developers mistakenly believed that the child's gestures to hold the object on the screen would not be much different from the gestures of an adult. But working with children made it necessary to revise the design of the game so that it perceived their fuzzy movements. It was not easy to teach the sensors to understand the child's sweeping, often erroneous and chaotic gestures of a child consisting of many touches. It took a lot of work to define the prototypes of gestures and to select their most common configurations. The touch registration area has been expanded so that even an inaccurate gesture is recognized and causes the desired application response.
For example, in another mini-game, children help Clifford to remove weeds from the garden. Instead of forcing players to take a weed and move a hand upwards, pulling it out, the developers chose to grab and open the movement of the palm, denoting pulling and throwing out.

Below is a fragment of the game code that calibrates the player’s gestures in a training exercise where you need to rotate the ball with your hands. In the episode shown in the figure below, exponential smoothing was used for more precise control of the object and ease of movement. It isolates or at least approximately calculates the random movements of the player that the program should ignore.

Ball rotation

void TutorialActivity::MoveHandHandler(Hand^hand) { D2D_POINT_2F normalizedTouchPos = {hand->x*GetWidth(), hand->y*GetHeight()}; //calc distance //exponential smoothing float new x = m_gestureBallSpin->GetPosition().x*0.9f + normalizedTouchPos.x*0.1f ; float new y = m_gestureBallSpin->GetPosition().y*0.9f + normalizedTouchPos.y*0.1f ; m gestureBallSpin->SetPosition(new x, new y); float x = m_gestureBallSpin->GetPosition().x - m_EEhand->GetPosition().x; float y = m_gestureBallSpin->GetPosition().y - m_EEhand->GetPosition().y; if(sqrt(x*x + y*y) < 400) { SetTutorialState(TUTORIAL_MOVEYOURHAND_DONE); //there it is m_EEhand->FadeTo(0,0.5f); if( !m_tutorialIsStopping ) { m_moveTutorial[5]->Play([this](SoundInstance^, bool reachedEnd) { m gestureBallSpin->MoveTo(GetWidth()*0.5f, -GetHeight(), 0.5f); GoToSprinkleHandState(); }); } } }

Troubleshooting Intel Perceptual Computing SDK

Intel's SDK gives a real effect of immersion in the game, players get an immediate program response to their actions. This creates a sense of physical participation in what is happening. However, the developers are faced with some limitations in the ability to recognize complex movements and voice reactions of children.

Gestures

The camera that perceives gestures is focused at a distance of about 60–90 cm. Therefore, small movements are recorded better than sweeping or complex movements that go beyond this range. The optimal set of gestures was determined by trial and error. Specialists had to think about different environmental conditions, lighting and distance to the camera.
From the point of view of the SDK, API and other technologies used, it is easy to develop initial gestures, because there are training exercises, code samples and structures used in the SDK. After setting up the development environment, you can perform a training exercise, such as finger tracking, to study the interaction of sensors and code in the SDK.

 #include "gesture render.h" #include "pxcgesture.h" class GesturePipeline: public UtilPipeline { public: GesturePipeline (void):UtilPipeline(),m_render(L"Gesture Viewer") { EnableGestureO ; } virtual void PXCAPI OnGesture(PXCGesture::Gesture *data) { if (data->active) m_gdata = (*data); } virtual void PXCAPI OnAlert(PXCGesture::Alert *data) { switch (data->label) { case PXCGesture::Alert::LABEL_FOV_TOP: wprintf_s(L"******** Alert:   .\n"); break; case PXCGesture::Alert::LABEL_FOV_BOTTOM: wprintf_s(L"******** Alert:   .\n"); break; case PXCGesture::Alert::LABEL_FOV_LEFT: wprintf_s(L"******** Alert:   .\n"); break; case PXCGesture::Alert::LABEL_FOV_RIGHT: wprintf_s(L"******** Alert:   .\n"); break; } } virtual bool OnNewFrame(void) { return m_render.RenderFrame(Querylmage(PXCImage::IMAGE TYPE DEPTH),QueryGesture() ,&m gdata); } protected: GestureRender m render; PXCGesture::Gesture m gdata; };

Programmers found that the SDK lacks different coordinate systems for gestures. It had to fill their own development.

Visual gesture coordinate chart

Initially, the development team used the node [8] .positionImage.x / y approach, ignoring the depth data, since they were not required to interpret gestures. But later a more optimal approach was found. A “deep image” was used and the nearest pixel was searched for, on the basis of which the gesture was effectively determined. Then an exponential smoothing was added.

Voice recognition

Voice recognition in the game was highly dependent on the devices and the plot. On some devices and in some situations it worked well, in other conditions it did not work at all.
The game should prompt the children about the command to be pronounced so that it will be recorded using a microphone. The function should work even against the background of extraneous sounds and musical accompaniment of the game. Voice recognition can work in speech detection mode when the program tries to determine what you have said, or in dictionary mode, when what is said is compared with your dictionary, which is determined in the case of a given game by the user.
At first, the experts tried the first mode and set it up to record any sounds, based on the fact that the speech of young children is not always clearly articulated. But the results were unsatisfactory. Then it was decided to switch to the dictionary mode. It works well if words are pronounced distinctly. The developers tried to add word variations to the dictionary in order to increase the likelihood of their recognition (for example, tractor - tlaktol - teaktol ). However, the dictionary mode did not give the expected results, because the more in the dictionary of units, the higher the probability of error. I had to look for a compromise between the size of the list of words and the potential share of errors. In the final version, the list of permissible words was minimized to allow the child to simply interact with the game.

Conclusion

The testing phase was fun. Developers have gained valuable experience working with children, end users of the application. And it was even more pleasant to see the finished game in use. One of our senior specialists showed her to her three-year-old daughter, and we all were very happy to hear that the girl was playing “The Adventures of Clifford” with great interest and excitement.
Now Scholastic can't wait to apply its technology in new projects. Together with Symbio, we are working on a new game based on the Intel RealSense 3D SDK, which is planned to be released this fall. Intel RealSense technology, announced at CES 2014, is a new image of Intel Perceptual Computing, an SDK with an intuitive user interface and speech recognition, gestures, hand movements and facial expressions in combination with improved 3D cameras. Intel RealSense provides developers with additional features such as use in scanning, editing, 3D printing, and augmented reality technology. Thanks to them, users can manipulate scanned 3D objects using the latest touch control technology.

Source: https://habr.com/ru/post/237227/

All Articles