Manage Drones with Intel RealSense SDK Speech Recognition Applications

The news is about drones - unmanned aerial vehicles - literally every day. They have a wide range of applications: reconnaissance and combat operations, photo and video filming, and just entertainment. Drones technology is quite new and deserves interest.

Developers can create drone control applications. The drone is ultimately a conventional programmable device, so you can connect to it and give commands to perform the necessary actions using conventional applications for PC and smartphones. For this article, I chose one of the drones with the most powerful programming features - Parrot's AR.Drone 2.0 .
')
We learn how to interact with and manage such a drone using a library written in C #. Based on this basis, we will add voice commands to control the drone using the Intel RealSense SDK .

Parrot's AR.Drone 2.0 model is one of the most interesting drones on the market for enthusiasts. This drone has many functions and includes an integrated help system with stabilization and calibration interfaces. The drone is equipped with a protective frame made of durable polystyrene, protecting the propeller blades and moving parts in case of a fall or collision with fixed obstacles.

AR.Drone * 2.0 by Parrot

The drone equipment provides its connection via its own Wi-Fi * network to external devices (smartphones, tablets, PCs). The communication protocol is based on AT-like messages (similar commands were used several years ago to program modems for communication via the telephone network).

Using this simple protocol, you can send the drone all the commands necessary for take-off, ascent or descent, and flight in different directions. You can also read the stream of images taken by cameras (in high definition) mounted on the drone (one camera is pointing forward, the other is down) to save the photos taken in flight or to record video.
The manufacturing company provides several applications for piloting the drone manually, but it is much more interesting to learn how to achieve autonomous flight control. For this, I decided (with the assistance of my colleague Marco Minerva) to create an interface that would allow controlling the drone from different devices.

Software control drone

The drone has its own Wi-Fi network, so we’ll connect to it to send control commands. We found all the necessary information in the AR.Drone 2.0 Developer Guide . For example, the manual says that you need to send commands via UDP protocol to the IP address 192.168.1.1, port 5556. These are simple strings in the AT format:

AT * REF - take-off and landing control;
AT * PCMD - drone movement (direction, speed, height).

After connecting to the drone, we will create a kind of "game" in which we will send commands to the drone based on the application's input data. Let's try to create a class library.

First you need to connect to the device.

public static async Task ConnectAsync(string hostName = HOST_NAME, string port = REMOTE_PORT) { // Set up the UDP connection. var droneIP = new HostName(hostName); udpSocket = new DatagramSocket(); await udpSocket.BindServiceNameAsync(port); await udpSocket.ConnectAsync(droneIP, port); udpWriter = new DataWriter(udpSocket.OutputStream); udpWriter.WriteByte(1); await udpWriter.StoreAsync(); var loop = Task.Run(() => DroneLoop()); }

As mentioned earlier, you need to use the UDP protocol, therefore, you need a DatagramSocket object. After connecting using the ConnectAsync method, we create a DataWriter in the output stream to send commands. Finally, we send the first byte over Wi-Fi. It serves only to initialize the system and will be dropped by the drone.

Check the command sent to the drone.

  private static async Task DroneLoop() { while (true) { var commandToSend = DroneState.GetNextCommand(sequenceNumber); await SendCommandAsync(commandToSend); sequenceNumber++; await Task.Delay(30); } }

The DroneState.GetNextCommand tag formats a string AT command that needs to be sent to the device. To do this, you need a sequence number: the drone expects that each team is accompanied by a sequence number, and ignores all commands whose numbers are less than or equal to the numbers of commands already received.

After that, we use WriteString to send commands to the stream via StreamSocket , while StoreAsync writes the commands to the buffer and sends them. Finally, we increase the sequence number and use the Task Delay parameter to introduce a delay of 30 milliseconds before the next iteration.
The DroneState class defines which command to send.

  public static class DroneState { public static double StrafeX { get; set; } public static double StrafeY { get; set; } public static double AscendY { get; set; } public static double RollX { get; set; } public static bool Flying { get; set; } public static bool isFlying { get; set; } internal static string GetNextCommand(uint sequenceNumber) { // Determine if the drone needs to take off or land if (Flying && !isFlying) { isFlying = true; return DroneMovement.GetDroneTakeoff(sequenceNumber); } else if (!Flying && isFlying) { isFlying = false; return DroneMovement.GetDroneLand(sequenceNumber); } // If the drone is flying, sends movement commands to it. if (isFlying && (StrafeX != 0 || StrafeY != 0 || AscendY != 0 || RollX != 0)) return DroneMovement.GetDroneMove(sequenceNumber, StrafeX, StrafeY, AscendY, RollX); return DroneMovement.GetHoveringCommand(sequenceNumber); } }

The properties of StrafeX , StrafeY , AscendY and RollX determine, respectively, the speed of movement to the left and to the right, forward and backward, the height and angle of rotation of the drone. These properties are of the Double data type, valid values are from 1 to -1. For example, if you set the StrafeX property to -0.5, the drone will move to the left with half the maximum speed; If set to 1, the drone will fly to the right with the maximum speed.

The variable Flying determines the takeoff and landing. In the GetNextCommand method, we check the values of these fields to determine which command to send to the drone. These commands, in turn, are under the control of the DroneMovement class.
Note that if the commands are not specified, the last instruction creates the so-called Hovering command. This is an empty command that supports an open communication channel between the drone and the device. The drone must constantly receive messages from the application managing it, even if no action is needed and nothing has changed.

The most interesting method of the DroneMovement class is the GetDroneMove method, which is actually engaged in composing and sending commands to the drone. For other motion-related methods, see this example.

 public static string GetDroneMove(uint sequenceNumber, double velocityX, double velocityY, double velocityAscend, double velocityRoll) { var valueX = FloatConversion(velocityX); var valueY = FloatConversion(velocityY); var valueAscend = FloatConversion(velocityAscend); var valueRoll = FloatConversion(velocityRoll); var command = string.Format("{0},{1},{2},{3}", valueX, valueY, valueAscend, valueRoll); return CreateATPCMDCommand(sequenceNumber, command); } private static string CreateATPCMDCommand(uint sequenceNumber, string command, int mode = 1) { return string.Format("AT*PCMD={0},{1},{2}{3}", sequenceNumber, mode, command, Environment.NewLine); }

The FloatConversion method is not specified here, but it converts a Double value of the range from -1 to 1 into a signed integer value that can be used by AT commands, such as the PCMD string, for motion control.

The code shown here is available as a free library on the NuGet website (AR.Drone 2.0 Interaction Library). This library provides everything you need to control - from takeoff to landing.

AR.Drone UI user interface on NuGet website

Thanks to this sample application, you can forget about the intricacies of the implementation and focus on creating applications that give us the opportunity to fly drone using different methods of interaction.

Intel RealSense SDK

Now let's look at one of the most interesting and easy-to-use (for me) features of Intel RealSense SDK - speech recognition .

The SDK supports two approaches to speech recognition.

Recognition of commands (for a given dictionary).
Free text recognition (dictation).

The first approach is a kind of list of commands specified by the application in the specified language, which is processed by the "recognizer". All words that are not in the list are ignored.

The second approach is something like a dictaphone, “understanding” any text in free form. This approach is ideal for transcribing, automatic subtitling, etc.

In this project we use the first option, since it is required to support a finite number of commands sent to the drone.
First you need to define some variables.

  private PXCMSession Session; private PXCMSpeechRecognition SpeechRecognition; private PXCMAudioSource AudioSource; private PXCMSpeechRecognition.Handler RecognitionHandler;

Session is the tag required for accessing I / O and the SDK algorithms, since all subsequent actions are inherited from this instance.
SpeechRecognition is an instance of the recognition module created by the CreateImpl function in the Session environment.
AudioSource is a device interface that allows you to install and select an audio input device (in our code example, for simplicity, we select the first available audio device).
RecognitionHandler is the actual handler that assigns an event handler for the OnRecognition event.

Now we initialize the session, the AudioSource and the SpeechRecognition instance.

  Session = PXCMSession.CreateInstance(); if (Session != null) { // session is a PXCMSession instance. AudioSource = Session.CreateAudioSource(); // Scan and Enumerate audio devices AudioSource.ScanDevices(); PXCMAudioSource.DeviceInfo dinfo = null; for (int d = AudioSource.QueryDeviceNum() - 1; d >= 0; d--) { AudioSource.QueryDeviceInfo(d, out dinfo); } AudioSource.SetDevice(dinfo); Session.CreateImpl<PXCMSpeechRecognition>(out SpeechRecognition);

As noted earlier, for simplicity of code, we select the first available audio device.

 PXCMSpeechRecognition.ProfileInfo pinfo; SpeechRecognition.QueryProfile(0, out pinfo); SpeechRecognition.SetProfile(pinfo);

Then you need to poll the system, find out the actual configuration parameter and assign it to a variable ( pinfo ).

You also need to configure a number of parameters in the profile to change the recognition language. Set the recognition confidence level (with a higher value, more confident recognition is required), recognition end interval, etc.

In our case, the default parameter is set as in profile 0 (obtained from Queryprofile ).

  String[] cmds = new String[] { "Takeoff", "Land", "Rotate Left", "Rotate Right", "Advance", "Back", "Up", "Down", "Left", "Right", "Stop" , "Dance"}; int[] labels = new int[] { 1, 2, 4, 5, 8, 16, 32, 64, 128, 256, 512, 1024 }; // Build the grammar. SpeechRecognition.BuildGrammarFromStringList(1, cmds, labels); // Set the active grammar. SpeechRecognition.SetGrammar(1);

Then we set the grammatical dictionary for learning the recognition system. Using BuildGrammarFromStringList, we create a simple list of verbs and corresponding return values, defining grammar number 1.

You can specify several grammars for use in the application and include one of them if necessary, so you can create different command dictionaries for all supported languages and allow the user to switch between the languages recognized in the SDK. In this case, you need to install the appropriate language support DLL files, because when installing the SDK, only English (US) language support is installed by default. In this example, we only use the default grammar along with the English (United States) language.

Then choose which grammar should be designated as active in the SpeechRecognition instance.

  RecognitionHandler = new PXCMSpeechRecognition.Handler(); RecognitionHandler.onRecognition = OnRecognition;

These instructions define a new event handler for the OnRecognition event and assign it to the method described below.

  public void OnRecognition(PXCMSpeechRecognition.RecognitionData data) { var RecognizedValue = data.scores[0].label; double movement = 0.3; TimeSpan duration = new TimeSpan(0, 0, 0, 500); switch (RecognizedValue) { case 1: DroneState.TakeOff(); WriteInList("Takeoff"); break; case 2: DroneState.Land(); WriteInList("Land"); break; case 4: DroneState.RotateLeftForAsync(movement, duration); WriteInList("Rotate Left"); break; case 5: DroneState.RotateRightForAsync(movement, duration); WriteInList("Rotate Right"); break; case 8: DroneState.GoForward(movement); Thread.Sleep(500); DroneState.Stop(); WriteInList("Advance"); break; case 16: DroneState.GoBackward(movement); Thread.Sleep(500); DroneState.Stop(); WriteInList("Back"); break; case 32: DroneState.GoUp(movement); Thread.Sleep(500); DroneState.Stop(); WriteInList("Up"); break; case 64: DroneState.GoDown(movement); Thread.Sleep(500); DroneState.Stop(); WriteInList("Down"); break; case 128: DroneState.StrafeX = .5; Thread.Sleep(500); DroneState.StrafeX = 0; WriteInList("Left"); break; case 256: DroneState.StrafeX = -.5; Thread.Sleep(500); DroneState.StrafeX = 0; WriteInList("Right"); break; case 512: DroneState.Stop(); WriteInList("Stop"); break; case 1024: WriteInList("Dance"); DroneState.RotateLeft(movement); Thread.Sleep(500); DroneState.RotateRight(movement); Thread.Sleep(500); DroneState.RotateRight(movement); Thread.Sleep(500); DroneState.RotateLeft(movement); Thread.Sleep(500); DroneState.GoForward(movement); Thread.Sleep(500); DroneState.GoBackward(movement); Thread.Sleep(500); DroneState.Stop(); break; default: break; } Debug.WriteLine(data.grammar.ToString()); Debug.WriteLine(data.scores[0].label.ToString()); Debug.WriteLine(data.scores[0].sentence); // Process Recognition Data }

This is a method of getting the value returned from the recognition data and executing the corresponding command (in our case, the corresponding drone's flight control command).

Each drone command refers to calling a DroneState with a specific method ( TakeOff, GoUp, DoDown , etc.) and with a specific motion or duration parameter, which in each case relates to a specific amount or duration of the motion.

Some teams need an explicit call to the Stop method to stop the current action, otherwise the drone will continue to move according to the received command (see the commands in the previous code snippet).

In some cases, you need to insert Thread.Sleep between two different commands to wait for the previous action to complete before sending a new command.

To check the recognition, even if there is no drone available, I inserted a variable (it is controlled by a checkbox in the main window) that turns on the Drone Stub function mode (in this mode, commands are created, but not sent).

To close an application, call the OnClosing method to close and destroy all instances and handlers and for general system cleaning.

The code contains some debug commands that display useful information in the Visual Studio * debug windows when testing the system.

Conclusion

In this article, we saw how to interact with a device (as complex as a drone) using a natural language interaction interface. We saw how you can create a simple command dictionary, teach the system to understand it and appropriately manage a complex device - the drone in flight. Shown in this article - only a small fraction of the available options for managing drone. The possibilities are truly endless.

Original article

Source: https://habr.com/ru/post/273083/

All Articles

Manage Drones with Intel RealSense SDK Speech Recognition Applications

Software control drone

Intel RealSense SDK

Conclusion

More articles: