We are implementing an analogue of Apple iCloud Voicemail using free grammars from Yandex

Introduction:

Some time ago, a well-respected apple company announced the testing of the new Apple iCloud Voicemail service.

Well, what's the big deal, you ask? Let's see.

Apple has long been famous for returning interest in long-forgotten technologies. Not forgetting to make good money. This is how the iPad, iWatch, etc. were born. The technology of voice mail long ago no one uses. Although almost every one of us has a default voicemail service on a mobile phone.
')
The reason is very simple: no one wants to listen to these messages and spend time on it. And no one remembers how to get there. It was the inconvenience of using voice mail that killed this technology. And of course mobile communication: it's easier to call on the mobile later than to leave messages.

In our country, no one was so accustomed to using voice mail, but in the United States it was the de facto standard. About 10 years ago, all Western companies installing a telephone exchange in Russia required to install voice mail.

So what did the Cupertines come up with now?

Service Description

Actually the idea is extremely simple. Using the technology of free speech recognition (this is now called speech recognition by free grammars), recognize a voice message with free text and send it as an SMS.

What does it change? Yes all! You are calling to an office or mobile phone - the subscriber is busy, does not answer or is unavailable, you are asked to leave a voice message. You pronounce arbitrary text, the system recognizes this arbitrary text and sends it to the mailbox owner via SMS. The mailbox owner receives an SMS, sees the caller's number, the recognized text and responds to it. The voice mail host no longer needs to remember the voice mail service number, login password, menu structure, etc. He no longer needs to spend time listening to the message itself. Very often there are only beeps or nothing speaking phrases. The owner of such a service is enough to run through the eyes on the text and decide on the reaction to leaving a message. Those. everything becomes much more convenient to use than before.

Leaving the same message is also convenient. He does not need after an unsuccessful call to climb in the SMS and type the text. He said everything already, and it is much faster than writing SMS.

Those. we can see that there is a very big future behind the technology in terms of saving time for both the caller and the owner of the voice mail service. At the same time, behind the facade of the technology there are more advantages for IT. This system almost does not need to be administered, as it used to be with voice mail systems. Total on the face of the background of the new revolution.

What is the heart of this technology? This is of course recognition by free grammar. Apple uses Siri, which is based on Nuance's ASR (Automatic Speech Recognition) technology. This is such an ancient vendor of ASR / TTS technology.

Google first made its recognition for Android phones. Cloud recognition and with API for developers. Designed primarily for mobile application developers. Well, with the monetization scheme of course. And Yandex, of course, did not stand aside: Yandex called its product Cloud SpeechKit. He is able to recognize free speech and translate it into text, and he also has an API.

As for the Russian language, in fact, Google recognizes the Russian language, and there are a bunch of other small vendors on the Internet who also recognize the Russian language. In fact, it is not very difficult to do. Nuance started doing this many years ago. Actually Yandex is not unique to this. Yandex has several very important options. Firstly, this is Yandex SpeeckKit, which recognizes addresses very well, almost 100%. Obviously, behind this lies the accumulated material of the Yandex.Maps project. Secondly, Yandex has SpeechBox - this is something that is not and probably will never be in Google and other cloud services. That is, what is important for this solution, to minimize recognition delay and compliance with security requirements - you can install a local server and organize recognition by free grammars without going online.

Service development

general description

Now about the development of an analogue of such a revolutionary idea from Apple. Avaya's corporate IVR was available under the Avaya Aura Experience Portal 7.0.1. A feature of this system is the ability to connect to email servers and sms aggregators. Well, the presence of an easily used internal database.

The task of sending the text via SMS was not, because it costs money and, moreover, it requires to maintain this SMS connection. With email, in fact, everything works the same as with SMS. As soon as the message is delivered to the postal address, immediately on the mobile phone, push notification says so. Everything else in development is similar. You can also use email2sms. Well, this is for those who just want to get the text by SMS. By the way email2sms at the same time deal with the encoding.

It is not necessary to use Avaya IVR, it was just at hand. You can write the same on Genesys GVP, Cisco IVR, Asterisk, Freeswitch, or even 3CX. However, for both Genesys and Cisco, making a voice application that will then send the result by email will require installation of several servers. In the case of Avaya, it’s enough to put a virtual machine on 2 cores and 4 GB RAM, and it can do everything: take a voice call, hold a recognition in the cloud, and send an email or SMS.

In the case of Asterisk, you can implement the service through the protocol MRCP v2. Email can also be sent.

Application schema

The description is given for Avaya IVR, since the prototype was written for him. For the prototype of the application was used cloud HTTP connector. The reason is as follows: Avaya's cloud-based MRCP for Avaya does not work, since Avaya IVR is designed for the corporate environment and it does not have the ability to correctly substitute an IP address for RTP when passing NAT for MRCP. In theory, you can select a separate interface for the MRCP and connect a public IP to it, but there was no such possibility. Local MRCP was also not possible. ASR on free grammars requires a large amount of RAM. No such resources were available.

For Asterisk easier to use cloud MRCP. It can manually substitute the public IP when passing NAT for SDP in the MRCP client.

The application was developed by Avaya Aura Orchestration Designer 7.0. This is the Avaya IVR application designer. This product is based on Eclipse and is its plugin.

Here is a general view of the application:

When using the HTTP connector, in contrast to the MRCP, you have to use the Record function instead of Prompt & Collect, which are common for content collection mechanisms in IVR. Of course there is some difference. When using MRCP, recognition starts directly from the beginning of speech. In the case of the HTTP connector and Record, you have to first collect the entire message recording, then transfer it via the Internet to the cloud, and then recognize it. Actually, this introduces a delay of a couple of seconds. But in our case it does not matter. Because for the one who leaves the message, this delay is not important. He does not have to wait. After leaving the message, he hangs up, and the IVR application continues its work, completing the recognition and sending the result by email. And in this particular application, it does not matter to us how long the recognition will take. Nobody will notice this and it will not hurt anyone.

At the beginning of the recording, the phrase is pronounced: “Voice mail welcomes you. This subscriber cannot answer your call. Please leave a message and its text will be delivered to the subscriber by email. ”

The Record block records the voice and puts it in a temporary file, whose name is put in the variable Record.value. Next, the Yandex HTTP connector is launched, which sends this temporary file to the cloud, according to the API, and receives the recognized text. This connector is not standard for IVR and is written entirely in Java. Next, the result is sent to the Data block, where the message duration is calculated in seconds and the database is requested for matching the number to which the original call was sent, as well as an email where the email with the result is to be sent.

Further, in the standard Avaya IVR Email connector, the letter body is formed to send the recognition result to the voice mailbox owner.

Actually everything. It is even surprising how such a revolutionary solution can be so easily implemented.

Source: https://habr.com/ru/post/276187/

All Articles