📜 ⬆️ ⬇️

Asterisk speech recognition through Google + smart IVR



Good day, dear habr-users.
In one project, it was necessary to make a smart IVR based on IP PBX Asterisk. What is meant by the word "smart": when calling a certain number, the station asks to call the name of the subscriber, the person at the other end of the wire calls the name and the station connects it with the desired subscriber.

In my case, I used the ready-made AsteriskNow assembly with FreePBX already pre-installed, although in this case it does not play a special role, since Differences will be only in editing the dial-plan.
')

Step 1:


Working with google

The first thing that is needed is to somehow recognize the caller's speech. On Habré there were enough ( one , two ) articles how to do it using Google Translate. I decided to take ready-made scripts found on the github open spaces: googletts.agi - in order to teach Asterisk to speak and speech-recog.agi - in order for Asterisk to be able to recognize speech.

The files googletts.agi and speech-recog.agi are thrown into the / var / lib / asterisk / agi-bin folder.
For the successful operation of scripts, you must have the following packages: Perl, perl-libwww, IO-Socket-SSL, flac, sox, mpg123. All the packages I successfully downloaded and installed from the repositories (via yum install), except for mpg123, had to be downloaded separately.

In googletts.agi, we change the value of the $ lang variable from en to ru, because we want Asterisk to speak Russian.
In speech-recog.agi, we change the value of the $ language variable from en-US to ru-RU so that Google returns the result in Russian.
Everything, more in the scripts, I did not touch anything.

Step 2:


Writing a dial plan

As I said above, I have FreePBX installed, so I’ll make all the changes to the extensions_custom.conf file.
For a start, it's nice to greet the caller and give him a comment on what to do next.

exten => 100,1,Answer()
exten => 100,n,agi(googletts.agi,”! .”,ru)


Next, with the help of speech-recog.agi, we listen to what the user says, write, convert, send to Google and get the results from it.

exten => 100,n(record),agi(speech-recog.agi,ru-RU)

Next, using the GotoIf function , we check how the script worked.
The script returns the following values:

status: returns the status of the execution. 0 means success
utterance: string returned by google
confidence: a value from 0 to 1 indicating the probability of correct recognition

exten => 100,n,GotoIf($[$["${status}" = "0"] & $["${confidence}" > "0.8"]]?if1:retry)

If the test succeeds, go to the if1 event; if it fails, go to the retry event, in which we ask the user to repeat.

exten => 100,n(retry),agi(googletts.agi,”, ”,ru)

Then go to work directly with the line itself, which we received from Google. It is necessary to compare the resulting $ {utterance} string with some pattern and decide what to do next. Let's use the GotoIf function

exten => 100,n(if1),GotoIf($[“${utterance}” = “”]?vasya:retry)

If the string received from Google, coincides with "Vasya", go to the vasya event, if it does not match, ask the user to repeat.

And it remains only to call Vasya

exten => 100,n(vasya),Dial(SIP/101)

Dial plan completely:
exten => 100,1, Answer ()
exten => 100, n, agi (googletts.agi, ”Hello! After the beep, say the name of the subscriber.”, ru)
exten => 100, n (record), agi (speech-recog.agi, ru-RU)
exten => 100, n, GotoIf ($ [$ ["$ {status}" = "0"] & $ ["$ {confidence}"> "0.8"]]? if1: retry)
exten => 100, n (if1), GotoIf ($ [“$ {utterance}” = “you”]? vasya: retry)
exten => 100, n (retry), agi (googletts.agi, ”Please repeat”, ru)
exten => 100, n (vasya), Dial (SIP / 101)


Variations on the topic



Subtleties

With this kind of working with Google Translate, it is worth considering that it works well, but not perfectly and this should be taken into account when creating templates with which we will compare the result obtained from Google.

Here is an example of a rake that I stepped on:
My name is Cyril (two "L" at the end). Google, for whatever reasons only known to him, once again returned either Cyril or Cyril.

Afterword

There is a suspicion that the comparison could be implemented in some more technological way, I will be happy to hear your opinion and suggestions in the comments.
And there is still an open question on the scale: what will happen if there are many subscribers, how long will it take to complete all the comparisons, if, of course, they are implemented using the method proposed by me. But for a small PBX for about 20 subscribers, this method is acceptable.

Thanks for attention.

UPD

Examples

As an example, I used a slightly different dial-plan, but the essence does not change.

Dial-up of the example:
exten => 8251,1, Answer ()
exten => 8251, n, MixMonitor (/ var / spool / asterisk / monitor / 8251 / $ {CDR (start)} - ​​$ {DST-NUM} - $ {ID_CALL} -full.wav)
exten => 8251, n, agi (googletts.agi, "Please say the name of the subscriber with whom you connect.", ru)
exten => 8251, n (record), agi (speech-recog.agi, ru-RU)
exten => 8251, n, GotoIf ($ [$ ["$ {status}" = "0"] & $ ["$ {confidence}"> "0.5"]]? if1: retry)
exten => 8251, n (if1), GotoIf ($ ["$ {utterance}" = "alexander"]? al: retry)
exten => 8251, n (al), Dial (SIP / 8201)
exten => 8251, n (retry), agi (googletts.agi, “Please repeat?”, ru)
exten => 8251, n, goto (record)

Record link with successful recognition.
Astesterisk's output:
- Executing [8251 @ from-internal: 1] Answer ("SIP / 8211-00000000", "") in new stack
- Executing [8251 @ from-internal: 2] MixMonitor ("SIP / 8211-00000000", "/ var / spool / asterisk / monitor / 8251 / 2013-04-24 10:28:03 --- full.wav" ) in new stack
- Executing [8251 @ from-internal: 3] AGI ("SIP / 8211-00000000", "googletts.agi," Please say the name of the subscriber with whom you connect. ", Ru") in new stack
- Launched AGI Script /var/lib/asterisk/agi-bin/googletts.agi
== Begin MixMonitor Recording SIP / 8211-00000000
- Playing '/ tmp / 16ae8d012843179807cfdabd9a34608f' (escape_digits =) (sample_offset 0)
- Playing '/ tmp / ef3ccb070117857a8045932052f3fd7b' (escape_digits =) (sample_offset 0)
- <SIP / 8211-00000000> AGI Script googletts.agi completed, returning 0
- Executing [8251 @ from-internal: 4] AGI ("SIP / 8211-00000000", "speech-recog.agi, en-RU") in new stack
- Launched AGI Script /var/lib/asterisk/agi-bin/speech-recog.agi
- <SIP / 8211-00000000> Playing 'beep.ulaw' (language 'en')
- <SIP / 8211-00000000> AGI Script speech-recog.agi completed, returning 0
- Executing [8251 @ from-internal: 5] GotoIf ("SIP / 8211-00000000", "1? If1: retry") in new stack
- Goto (from-internal, 8251,6)
- Executing [8251 @ from-internal: 6] GotoIf ("SIP / 8211-00000000", "1? Al: retry") in new stack
- Goto (from-internal, 8251,7)
- Executing [8251 @ from-internal: 7] Dial ("SIP / 8211-00000000", "SIP / 8201") in new stack
== Using SIP RTP TOS bits 184
== Using SIP RTP CoS mark 5
- Called SIP / 8201
- SIP / 8201-00000001 is ringing
- SIP / 8201-00000001 answered SIP / 8211-00000000
- Executing [h @ from-internal: 1] Hangup ("SIP / 8211-00000000", "") in new stack
== Spawn extension (from-internal, h, 1) exited non-zero on 'SIP / 8211-00000000'
== Spawn extension (from-internal, 8251, 7) exited non-zero on 'SIP / 8211-00000000'
== MixMonitor close filestream
== End MixMonitor Recording SIP / 8211-00000000

Link to NOT successful recognition.
Astesterisk's output:
- Executing [8251 @ from-internal: 1] Answer ("SIP / 8211-00000002", "") in new stack
- Executing [8251 @ from-internal: 2] MixMonitor ("SIP / 8211-00000002", "/ var / spool / asterisk / monitor / 8251 / 2013-04-24 10:36:29 --- full.wav" ) in new stack
- Executing [8251 @ from-internal: 3] AGI ("SIP / 8211-00000002", "googletts.agi," Please tell the name of the subscriber with whom you connect. ", En") in new stack
- Launched AGI Script /var/lib/asterisk/agi-bin/googletts.agi
== Begin MixMonitor Recording SIP / 8211-00000002
- Playing '/ tmp / 16ae8d012843179807cfdabd9a34608f' (escape_digits =) (sample_offset 0)
- Playing '/ tmp / ef3ccb070117857a8045932052f3fd7b' (escape_digits =) (sample_offset 0)
- <SIP / 8211-00000002> AGI Script googletts.agi completed, returning 0
- Executing [8251 @ from-internal: 4] AGI ("SIP / 8211-00000002", "speech-recog.agi, en-RU") in new stack
- Launched AGI Script /var/lib/asterisk/agi-bin/speech-recog.agi
- <SIP / 8211-00000002> Playing 'beep.ulaw' (language 'en')
- <SIP / 8211-00000002> AGI Script speech-recog.agi completed, returning 0
- Executing [8251 @ from-internal: 5] GotoIf ("SIP / 8211-00000002", "1? If1: retry") in new stack
- Goto (from-internal, 8251,6)
- Executing [8251 @ from-internal: 6] GotoIf ("SIP / 8211-00000002", "0? Al: retry") in new stack
- Goto (from-internal, 8251.8)
- Executing [8251 @ from-internal: 8] AGI ("SIP / 8211-00000002", "googletts.agi," please repeat? ", Ru") in new stack
- Launched AGI Script /var/lib/asterisk/agi-bin/googletts.agi
- Playing '/ tmp / 0c5de11c17dda57dabeaebe335110036' (escape_digits =) (sample_offset 0)
- <SIP / 8211-00000002> AGI Script googletts.agi completed, returning 0

Source: https://habr.com/ru/post/177623/


All Articles