📜 ⬆️ ⬇️

We control the standard Sailfish OS player using voice commands

Many people know and use such features of the Android operating system as Google Now and Google Assistant, which allow not only to receive useful information in time and to search for something on the Internet, but also to control the device using voice commands. Unfortunately, Sailfish OS (the operating system developed by the Finnish company Jolla and the Russian company Open Mobile Platform) does not provide such an opportunity out of the box. As a result, it was decided to fill the lack of these amenities on their own. One of the functions of the developed solution is the ability to control the music player using voice commands, the technical side of which will be discussed in this article.

To implement recognition and execution of voice commands, you will need to go through four simple steps:

  1. develop a system of commands
  2. realize speech recognition
  3. realize the identification and execution of commands
  4. add voice feedback.

It is assumed that, for a better understanding of the material, the reader already has a basic knowledge of C ++, JavaScript, Qt, QML and Linux and has familiarized himself with the example of their interaction within the framework of Sailfish OS . It may also be useful to get acquainted with the lecture on related topics held in the framework of the Sailfish OS Summer School in Innopolis in the summer of 2016, and other articles on the development of this platform that have already been published on Habré.

Development of a command system


Let us analyze a simple example, limited to five functions:
')

To start a new playback, you need to check the presence of an open copy of the player (if necessary, create) and start playing music in a random order. To activate, we will use the "Turn on the music" command.

In the case of resuming and stopping playback, you need to check the status of the player and, if possible, start playback or pause it. To resume playback, we will use the “Play” command; to pause - the commands "Pause" and "Stop".

In the case of navigation through the compositions, the above principle of checking the status of the audio player applies. To activate the forward navigation, use the commands “Forward”, “Next” and “Next”; to activate the navigation back - the command "Back" and "Previous".

Speech recognition


The process of recognition of voice commands is divided into three stages:

  1. recording a voice command to a file
  2. command recognition on the server,
  3. identification commands on the device.

Record voice command to file

At the beginning, you need to create a user interface to capture the voice command. In order to simplify the example, we will begin and end the recording by pressing the button, since the implementation of the process of detecting the beginning and end of a voice command deserves a separate material.

IconButton { property bool isRecording: false width: Theme.iconSizeLarge height: Theme.iconSizeLarge icon.source: isRecording ? "image://theme/icon-m-search" : "image://theme/icon-m-mic" onClicked: { if (isRecording) { isRecording = false recorder.stopRecord() yandexSpeechKitHelper.recognizeQuery(recorder.getActualLocation()) } else { isRecording = true recorder.startRecord() } } } 

From the code presented above, it is clear that the button uses standard size values ​​and standard icons (an interesting feature of Sailfish OS for unification of application interfaces) and has two states. In the first state, when the recording is not made, after pressing the button, the voice command is recorded. In the second state, when the command recording is active, after pressing the button, the recording stops and voice recognition begins.

To record speech, we will use the QAudioRecorder class, which provides a high-level interface for controlling the input audio stream, as well as QAudioEncoderSettings for setting the recording process.

 class Recorder : public QObject { Q_OBJECT public: explicit Recorder(QObject *parent = 0); Q_INVOKABLE void startRecord(); Q_INVOKABLE void stopRecord(); Q_INVOKABLE QUrl getActualLocation(); Q_INVOKABLE bool isRecording(); private: QAudioRecorder _audioRecorder; QAudioEncoderSettings _settings; bool _recording = false; }; Recorder::Recorder(QObject *parent) : QObject(parent) { _settings.setCodec("audio/PCM"); _settings.setQuality(QMultimedia::NormalQuality); _audioRecorder.setEncodingSettings(_settings); _audioRecorder.setContainerFormat("wav"); } void Recorder::startRecord() { _recording = true; _audioRecorder.record(); } void Recorder::stopRecord() { _recording = false; _audioRecorder.stop(); } QUrl Recorder::getActualLocation() { return _audioRecorder.actualLocation(); } bool Recorder::isRecording() { return _recording; } 

It specifies that the command will be recorded in wav format in normal quality, as well as methods for starting and ending recording are defined, to get the storage location of the audio file and the state of the recording process.

Team Recognition on the server

For broadcasting an audio file into text, the Yandex SpeechKit Cloud service will be used. All that is required to start working with it is to get a token in the developer’s office . Service documentation is quite detailed, so we will dwell only on private moments.

The first step is to transfer the recorded command to the server.

 void YandexSpeechKitHelper::recognizeQuery(QString path_to_file) { QFile *file = new QFile(path_to_file); if (file->open(QIODevice::ReadOnly)) { QUrlQuery query; query.addQueryItem("key", "API_KEY"); query.addQueryItem("uuid", _buildUniqID()); query.addQueryItem("topic", "queries"); QUrl url("https://asr.yandex.net/asr_xml"); url.setQuery(query); QNetworkRequest request(url); request.setHeader(QNetworkRequest::ContentTypeHeader, "audio/x-wav"); request.setHeader(QNetworkRequest::ContentLengthHeader, file->size()); _manager->post(request, file->readAll()); file->close(); } file->remove(); } 

Here a POST request to the Yandex server is formed, in which the received token is transmitted, the unique device ID (in this case, the MAC address of the WiFi module is used) and the type of request (“queries” are used here, since voice interaction with the device is most often used) short and accurate commands). In the request headers indicate the format of the audio file and its size, in the body - the content itself. After sending the request to the server, the file is deleted as unnecessary.

As a response, the SpeechKit Cloud server returns XML with recognition options and a degree of confidence in them. We use the standard Qt tools to highlight the required information.

 void YandexSpeechKitHelper::_parseResponce(QXmlStreamReader *element) { double idealConfidence = 0; QString idealQuery; while (!element->atEnd()) { element->readNext(); if (element->tokenType() != QXmlStreamReader::StartElement) continue; if (element->name() != "variant") continue; QXmlStreamAttribute attr = element->attributes().at(0); if (attr.value().toDouble() > idealConfidence) { idealConfidence = attr.value().toDouble(); element->readNext(); idealQuery = element->text().toString(); } } if (element->hasError()) qDebug() << element->errorString(); emit gotResponce(idealQuery); } 

Here the received answer is sequentially viewed and, for variant tags, the recognition accuracy indicators are checked. If the new version is more correct, then it is saved, and the scan continues. At the end of viewing the response, a signal is sent with the selected command text.

Identification commands on the device

Finally, it remains to identify the command. At the end of the YandexSpeechKitHelper :: _ parseResponce method , as mentioned above, the gotResponce signal is sent , containing the command text. Next, you need to process it in the QML-code of the program.

 Connections { target: yandexSpeechKitHelper onGotResponce: { switch (query.toLowerCase()) { case " ": dbusHelper.startMediaplayerIfNeed() mediaPlayer.shuffleAndPlay() break; case "": mediaPlayerControl.play() break; case "": case "": mediaPlayerControl.pause() break; case "": case "": case "": mediaPlayerControl.next() break; case "": case "": mediaPlayerControl.previous() break; default: generateErrorMessage(query) break; } } } 

It uses the Connections element to process the incoming signal and compare the recognized command with the voice command patterns defined earlier.

Controlling a working player


If the audio player is open, then it is possible to interact with it through a standard DBus interface , inherited from a big linux brother. With it, you can navigate through the playlist, start or pause playback. This is done using the QML element DBusInterface .

 DBusInterface { id: mediaPlayerControl service: "org.mpris.MediaPlayer2.jolla-mediaplayer" iface: "org.mpris.MediaPlayer2.Player" path: "/org/mpris/MediaPlayer2" function play() { call("Play", undefined) } function pause() { call("Pause", undefined) } function next() { call("Next", undefined) } function previous() { call("Previous", undefined) call("Previous", undefined) } } 

Using this element, the standard audio player's DBus interface is used by defining four basic functions. The undefined parameter of the call function is passed if the DBus method takes no arguments.

It should be noted that for the transition to the previous song, the Previous method is called twice, since its single call leads to the playback of the current song from the beginning.

Start playback from scratch


There is nothing complicated in managing a player already operating. However, if there is a desire to start playing music when it is closed, there is a problem, since, by default, the functionality of launching a standard player with simultaneous playback of the entire collection is not provided.

But do not forget that Sailfish OS is an open source operating system available for free modification. As a result, this problem can be solved in two stages:


Expansion of the functions of the standard audio player

The standard audio player, in addition to the org.mpris.MediaPlayer2.Player interface, provides the com.jolla.mediaplayer.ui interface, defined in the /usr/share/jolla-mediaplayer/mediaplayer.qml file. From this it follows that it is possible to modify this file by adding the necessary function.

 DBusAdaptor { service: "com.jolla.mediaplayer" path: "/com/jolla/mediaplayer/ui" iface: "com.jolla.mediaplayer.ui" function openUrl(arg) { if (arg[0] == undefined) { return false } AudioPlayer.playUrl(Qt.resolvedUrl(arg[0])) if (!pageStack.currentPage || pageStack.currentPage.objectName !== "PlayQueuePage") { root.pageStack.push(playQueuePage, {}, PageStackAction.Immediate) } activate() return true } function shuffleAndPlay() { AudioPlayer.shuffleAndPlay(allSongModel, allSongModel.count) if (!pageStack.currentPage || pageStack.currentPage.objectName !== "PlayQueuePage") { root.pageStack.push(playQueuePage, {}, PageStackAction.Immediate) } activate() return true } } 

Here the DBusAdaptor element used to provide the DBus interface was modified by adding the shuffleAndPlay method. It uses the standard player functionality to start playback of all songs in a random order, provided by the c om.jolla.mediaplayer module, and the current playback queue is brought to the forefront.

As part of the example, for simplicity, a simple modification of the system file was performed. However, when distributing the program, such changes should be made in the form of patches using the appropriate instructions .

Now from the developed program it is necessary to refer to the new method. This is done with the help of the already familiar DBusInterface element, in which you connect to the service defined above and call the function added to the player.

 DBusInterface { id: mediaPlayer service: "com.jolla.mediaplayer" iface: "com.jolla.mediaplayer.ui" path: "/com/jolla/mediaplayer/ui" function shuffleAndPlay() { call("shuffleAndPlay", undefined) } } 

Player launch if closed

Finally, the last thing left is to launch the audio player if it is closed. Conventionally, the task can be divided into two stages:


 void DBusHelper::startMediaplayerIfNeed() { QDBusReply<bool> reply = QDBusConnection::sessionBus().interface()->isServiceRegistered("com.jolla.mediaplayer"); if (!reply.value()) { QProcess process; process.start("/bin/bash -c \"jolla-mediaplayer &\""); process.waitForFinished(); QDBusInterface interface("com.jolla.mediaplayer", "/com/jolla/mediaplayer/ui", "com.jolla.mediaplayer.ui"); while (true) { QDBusReply<bool> reply = interface.call("isSongsModelFinished"); if (reply.isValid() && reply.value()) break; QThread::sleep(1); } } } 

From the code of the presented function, it can be seen that at the first stage, the availability of the necessary DBus service is checked. If it is registered in the system, the function terminates and the transition to start playback is performed. If the service is not found, then a new copy of the audio player is created using QProcess , with the expectation of its full launch. In the second part of the function, using QDBusInterface , the flag of the end of scanning the music collection on the device is checked.

It should be noted that two additional changes were made to the /usr/share/jolla-mediaplayer/mediaplayer.qml file to check the collection scan flag.

First, the GriloTrackerModel element provided by the com.jolla.mediaplayer module was modified by adding a scan end flag.

 GriloTrackerModel { id: allSongModel property bool isFinished: false query: { //: placeholder string for albums without a known name //% "Unknown album" var unknownAlbum = qsTrId("mediaplayer-la-unknown-album") //: placeholder string to be shown for media without a known artist //% "Unknown artist" var unknownArtist = qsTrId("mediaplayer-la-unknown-artist") return AudioTrackerHelpers.getSongsQuery("", {"unknownArtist": unknownArtist, "unknownAlbum": unknownAlbum}) } onFinished: { isFinished = true var artList = fetchAlbumArts(3) if (artList[0]) { if (!artList[0].url || artList[0].url == "") { mediaPlayerCover.idleArtist = artList[0].author ? artList[0].author : "" mediaPlayerCover.idleSong = artList[0].title ? artList[0].title : "" } else { mediaPlayerCover.idle.largeAlbumArt = artList[0].url mediaPlayerCover.idle.leftSmallAlbumArt = artList[1] && artList[1].url ? artList[1].url : "" mediaPlayerCover.idle.rightSmallAlbumArt = artList[2] && artList[2].url ? artList[2].url : "" mediaPlayerCover.idle.sourcesReady = true } } } } 

Secondly, one more function was added that is available through the DBus interface com.jolla.mediaplayer.ui , which returns the value of the scan state flag of the collection of audio files.

 function isSongsModelFinished() { return allSongModel.isFinished } 

Error Team Report


The final element of the example is a voice message about the wrong command. To do this, we use the Yandex SpeechKit Cloud speech synthesis service.

 Audio { id: audio } function generateErrorMessage(query) { var message = ".  " + query + "  ." audio.source = "https://tts.voicetech.yandex.net/generate?" + "text=\"" + message + "\"&" + "format=mp3&" + "lang=ru-RU&" + "speaker=jane&" + "emotion=good&" + "key=API_KEY" audio.play() } 

Here, an Audio object was created to play the generated speech and the generateErrorMessage function was declared to form a request to the Yandex server and start playback. The request passes the following parameters:


Conclusion


This article describes a simple example of controlling music playback in a standard Sailfish OS audio player using voice commands; Basic knowledge of speech recognition and speech synthesis using Yandex SpeechKit Cloud using Qt tools, as well as the principles of interaction of programs with each other in Sailfish OS, are obtained and repeated. This material can serve as a starting point for deeper research and experimentation in this operating system.

An example of the above code can be viewed on the video:


Posted by: Peter Vytovtov

Source: https://habr.com/ru/post/313680/


All Articles