📜 ⬆️ ⬇️

Asterisk + UniMRCP + VoiceNavigator. Synthesis and speech recognition in Asterisk. Part 3

Part 1
Part 2
Part 4

In the previous article, we talked about synthesis tags and the construction of recognition grammars.
In this part I would like to show the construction of a specific voice application in Asterisk. In order not to invent a voice menu for the Horns and Hoofs store, I decided to enter more easily and find a previously implemented example on Habré, where you can visually show the advantages of using synthesis and recognition.

On Habré was found this post , which was once quite actively discussed. The author proposes to listen to the weather forecast on the phone, using many pre-recorded files and xml-informers from the site Gismeteo. I would like to improve this application and show how synthesis and recognition make life easier when building IVR and obtaining dynamic information.
')
The application will request the city, the weather in which I would like to know, then ask the time (this afternoon, tomorrow evening, etc.) and provide the necessary information.

Weather XML file


Files with Gismeteo are as follows:
 <?xml version="1.0" encoding="utf-8"?> <MMWEATHER> <REPORT type="frc3"> <TOWN index="10381" sname="%C1%E5%F0%EB%E8%ED" latitude="52" longitude="13"> <FORECAST day="02" month="08" year="2011" hour="20" tod="3" predict="0" weekday="3"> <PHENOMENA cloudiness="0" precipitation="10" rpower="0" spower="0"/> <PRESSURE max="760" min="758"/> <TEMPERATURE max="21" min="19"/> <WIND min="2" max="4" direction="1"/> <RELWET max="74" min="72"/> <HEAT min="19" max="21"/> </FORECAST> <FORECAST day="03" month="08" year="2011" hour="02" tod="0" predict="0" weekday="4"> <PHENOMENA cloudiness="0" precipitation="10" rpower="0" spower="0"/> <PRESSURE max="761" min="759"/> <TEMPERATURE max="15" min="13"/> <WIND min="1" max="3" direction="1"/> <RELWET max="83" min="81"/> <HEAT min="13" max="15"/> </FORECAST> <FORECAST day="03" month="08" year="2011" hour="08" tod="1" predict="0" weekday="4"> <PHENOMENA cloudiness="0" precipitation="10" rpower="0" spower="0"/> <PRESSURE max="761" min="759"/> <TEMPERATURE max="18" min="16"/> <WIND min="2" max="4" direction="2"/> <RELWET max="80" min="78"/> <HEAT min="16" max="18"/> </FORECAST> <FORECAST day="03" month="08" year="2011" hour="14" tod="2" predict="0" weekday="4"> <PHENOMENA cloudiness="1" precipitation="10" rpower="0" spower="0"/> <PRESSURE max="760" min="758"/> <TEMPERATURE max="26" min="24"/> <WIND min="2" max="4" direction="2"/> <RELWET max="56" min="54"/> <HEAT min="22" max="24"/> </FORECAST> </TOWN> </REPORT> </MMWEATHER> 

Every day is divided into 4 times of day: night, morning, afternoon, evening.
The file always shows the weather for 4 periods, starting from the moment when it was updated. Updating of files proceeds 4 times a day: 2.30, 8.30, 14.30, 20.30 MSK. In the file shown above, the weather is on the evening of August 2, the night, morning and day of the 3rd of August. We will take this logic into account when processing the file and in the operation of the application.
We will use the following parameters:
weekday - the day of the week (1 - Sunday, 2 - Monday, etc.)
tod - time of day for which the forecast was made (0 - night 1 - morning, 2 - day, 3 - evening)
cloudiness - cloudiness according to gradations (0 - clear, 1 - overcast, 2 - cloudy, 3 - overcast)
precipitation - type of precipitation (4 - rain, 5 - rain, 6.7 - snow, 8 - thunderstorm, 9 - no data, 10 - no precipitation)
TEMPERATURE - air temperature, in degrees Celsius

Automatic receipt of xml-files


In order to always have the actual xml-files in place, create a script and add it to cron. The script will take the files for the cities we need and put them in the specified folder. From it we will take the names of cities.
 #!/bin/bash DIR=/var/www/html/gismeteo/xml /usr/bin/wget 'http://informer.gismeteo.ru/xml/27612_1.xml' -O $DIR/27612_1.xml # /usr/bin/wget 'http://informer.gismeteo.ru/xml/26063_1.xml' -O $DIR/26063_1.xml #- /usr/bin/wget 'http://informer.gismeteo.ru/xml/22892_1.xml' -O $DIR/22892_1.xml # /usr/bin/wget 'http://informer.gismeteo.ru/xml/29634_1.xml' -O $DIR/29634_1.xml # /usr/bin/wget 'http://informer.gismeteo.ru/xml/31960_1.xml' -O $DIR/31960_1.xml # /usr/bin/wget 'http://informer.gismeteo.ru/xml/26850_1.xml' -O $DIR/26850_1.xml # /usr/bin/wget 'http://informer.gismeteo.ru/xml/33345_1.xml' -O $DIR/33345_1.xml # /usr/bin/wget 'http://informer.gismeteo.ru/xml/36870_1.xml' -O $DIR/36870_1.xml # /usr/bin/wget 'http://informer.gismeteo.ru/xml/76680_1.xml' -O $DIR/76680_1.xml # /usr/bin/wget 'http://informer.gismeteo.ru/xml/2974_1.xml' -O $DIR/2974_1.xml # /usr/bin/wget 'http://informer.gismeteo.ru/xml/10381_1.xml' -O $DIR/10381_1.xml # /usr/bin/wget 'http://informer.gismeteo.ru/xml/48454_1.xml' -O $DIR/48454_1.xml # 

Grammar


Let's start with building grammars. We will need 3 files.
towns.xml - file with the cities, the weather in which we want to know. The name of the xml file from the Gismeteo server is used as a semantic tag. You can add all the cities, the weather in which you may be potentially interesting))
 <?xml version="1.0" encoding="utf-8"?> <grammar xml:lang="ru-RU" root="speak" mode="voice" version="1.0" xmlns="http://www.w3.org/2001/06/grammar" tag-format="semantics/1.0-literals"> <rule id="speak" scope="public"> <one-of> <item>!SYLLABLES</item> <item>!SYLLABLES <ruleref uri="#town"/> !SYLLABLES</item> </one-of> </rule> <rule id="town"> <one-of> <item><tag>27612_1.xml</tag></item> <item><tag>27612_1.xml</tag></item> <item>-<tag>26063_1.xml</tag></item> <item><tag>26063_1.xml</tag></item> <item><tag>26063_1.xml</tag></item> <item><tag>22892_1.xml</tag></item> <item><tag>29634_1.xml</tag></item> <item><tag>31960_1.xml</tag></item> <item><tag>26850_1.xml</tag></item> <item><tag>33345_1.xml</tag></item> <item><tag>36870_1.xml</tag></item> <item>-<tag>36870_1.xml</tag></item> <item><tag>36870_1.xml</tag></item> <item><tag>76680_1.xml</tag></item> <item><tag>2974_1.xml</tag></item> <item><tag>10381_1.xml</tag></item> <item><tag>48454_1.xml</tag></item> </one-of> </rule> </grammar> 

The main obvious advantage of recognition over DTMF appears in the case of multiple choice, when a subscriber needs to choose not one item from 3 (sales department, technical support or accounting), but, say, one out of three hundred (list of cities or streets). In this example, for simplification, I made several cities, but nothing prevents you from making a choice out of a hundred.

time.xml - file with options for choosing the time of day. In the semantic tag, the first digit is 0 today, 1 tomorrow; The second digit - the time of day is similar to the tod parameter in the xml weather file.

 <?xml version="1.0" encoding="utf-8"?> <grammar xml:lang="ru-RU" root="speak" mode="voice" version="1.0" xmlns="http://www.w3.org/2001/06/grammar" tag-format="semantics/1.0-literals"> <rule id="speak" scope="public"> <one-of> <item>!SYLLABLES</item> <item>!SYLLABLES <ruleref uri="#time"/> !SYLLABLES</item> </one-of> </rule> <rule id="time"> <one-of> <item> <tag>00</tag></item> <item> <tag>01</tag></item> <item> <tag>02</tag></item> <item> <tag>03</tag></item> <item> <tag>10</tag></item> <item> <tag>11</tag></item> <item> <tag>12</tag></item> <item> <tag>13</tag></item> </one-of> </rule> </grammar> 

end_next.xml is a simple grammar consisting of only three items that will be used at the end to continue working with the application or to complete it.
 <?xml version="1.0" encoding="utf-8"?> <grammar xml:lang="ru-RU" root="speak" mode="voice" version="1.0" xmlns="http://www.w3.org/2001/06/grammar" tag-format="semantics/1.0-literals"> <rule id="speak" scope="public"> <one-of> <item>!SYLLABLES</item> <item>!SYLLABLES <ruleref uri="#check"/> !SYLLABLES</item> </one-of> </rule> <rule id="check"> <one-of> <item> <tag>city_choice</tag></item> <item> <tag>time_choice</tag></item> <item><tag>bye</tag></item> </one-of> </rule> </grammar> 

Prerecorded sound files


It is beneficial to use synthesis for dynamically generated phrases and responses to user choices. In the case of static phrases, it is more correct to write them in advance. This approach has two main advantages:
- decreases the load on the synthesis resources
- prerecorded files can be transferred by the parameter of the function MRCPRecog and use the barge-in mode (the ability to interrupt the file and start recognition at the beginning of speech).

Files can be written using the functions MRCPSynth and Monitor . In our application we will use the following files:
city_choice.wavSay the name of the city, the weather in which you are interested.
no_input.wavSpeak, please, louder.
error_city.wavSorry, incomprehensible. Repeat the name of the city.
error.wavI did not understand you. Please repeat.
time_choice.wavSay the weather for how long you are interested. For example, tomorrow afternoon.
end_next.wavTo find out the weather in another city, say "choose a city." To find out the weather at another time of day, say “another time”. To end the call, say “finish”.
bye.wavThank you for your call. Goodbye!
not_found_time.wavThere is no weather data for the selected time. Choose a different time.

Script for working with weather xml file


Create an AGI script gismeteo.agi, which will receive from the dialplan the name of the xml file with the weather and the recognized time of day, then search for the weather information in this file.
Asterisk :: AGI is used for interaction with Asterisk, xml - XML ​​:: Simple is used for parsing.
 #!/usr/bin/perl use XML::Simple; use Asterisk::AGI; use Time::localtime; use strict; $|=1; my $AGI = new Asterisk::AGI; my %var = $AGI->ReadParse(); #   my $xml_file=$AGI->get_variable("xml_file"); my $xml_source="/var/www/html/gismeteo/xml/$xml_file"; if ($ARGV[0] eq "city"){ #   = city,        #     xml- open (LIST, "/var/www/html/gismeteo/agi-bin/get_xml.sh") || die "   get_xml.sh"; my $city=""; while (<LIST>) { if(m/$xml_file/) { ($city)=/#(.*)/; last; } } close(LIST); $AGI->set_variable('city' => $city); exit; } elsif ($ARGV[0] eq "time") { #          #     my @cl_time=$AGI->get_variable("RECOG_INT0")=~/(.)/g; #    my $present_time=localtime(time()); my $present_weekday=$present_time->wday; #       my @day=('',''); my @tod=('','','',''); my @cloudiness=('','','',''); my %precipitation=('4'=>'', '5'=>'', '6'=>'', '7'=>'', '8'=>'', '9'=>'', '10'=>' '); # XML- my $xmlWeather = new XML::Simple(keeproot => 1,searchpath => ".", forcearray => 1, suppressempty => ''); my $xmlTown = $xmlWeather->XMLin($xml_source); my $xmlData = $xmlTown->{MMWEATHER}[ 0]->{REPORT}[ 0]->{TOWN}[ 0]->{FORECAST}; my $i=0; #      ,    for ($i=0; $i<4; $i++) { print "$xmlData->[$i]->{weekday}, $present_weekday%7+1+$cl_time[0], $xmlData->[$i]->{tod}\n"; if ($xmlData->[$i]->{weekday}==($present_weekday%7+1+$cl_time[0]) && $xmlData->[$i]->{tod}==$cl_time[1]) { $AGI->set_variable('speech_text' => "$day[$cl_time[0]] $tod[$xmlData->[$i]->{tod}]    $xmlData->[$i]->{TEMPERATURE}[ 0]->{min}  $xmlData->[$i]->{TEMPERATURE}[ 0]->{max} . $cloudiness[$xmlData->[$i]->{PHENOMENA}[ 0]->{cloudiness}]. $precipitation{$xmlData->[$i]->{PHENOMENA}[ 0]->{precipitation}}."); $AGI->set_priority('found'); exit; } } #    ,     $AGI->set_priority('not_found'); } 

Application in extensions.conf


I prefer to write the application in a separate file and include it in /etc/asterisk/extensions.conf with include.
Create a gismeteo.conf file.

Recognition macro

To begin with, we will write a macro that will directly deal with recognition:
 [macro-recog-gismeteo] ;ARG1 -  , ARG2 -  , ARG3 -     , ARG4 -   exten => s,1,MRCPRecog(${GRAMMARS_PATH}/${ARG1},ct=0.20&b=1&f=${SND_PATH}/${ARG2}) exten => s,n(recog),SET(RECOG_HYP_NUM=0) exten => s,n,SET(RECOG_UTR0=) ;   NLSML- exten => s,n,AGI(NLSML.agi,${QUOTE(${RECOG_RESULT})}) ;  no-input exten => s,n,GotoIf(${REGEX("Completion-Cause: 002" ${RECOG_RESULT})}?$[${PRIORITY}+1]:check_error) exten => s,n,MRCPRecog(${GRAMMARS_PATH}/${ARG1},ct=0.20&b=1&f=${SND_PATH}/no_input) exten => s,n,Goto(recog) ;    exten => s,n(check_error),GotoIf($["${RECOG_UTR0}" = ""]?$[${PRIORITY}+1]:ok) exten => s,n,MRCPRecog(${GRAMMARS_PATH}/${ARG1},ct=0.20&b=1&f=${SND_PATH}/${ARG3}) exten => s,n,Goto(recog) ;    exten => s,n(ok),Goto(${MACRO_CONTEXT},${MACRO_EXTEN},${ARG4}) 

The macro receives as parameters the grammar file, the sound message file, the recognition error message file and the priority that must be passed if the recognition is successful.
Able to handle recognition errors (reports an error and asks to repeat) and no-input (asks to speak louder if no speech was detected in the channel).
ConfidenceTreshhold = 20, which should be enough to weed out the options with low recognition accuracy.
The NLSML.agi parser takes the $ {RECOG_RESULT} variable as input and, as a result, returns the variables to the dialplan:
$ {RECOG_UTR0} - recognized phrase from grammar,
$ {RECOG_INT0} - semantic tag,
$ {RECOG_CNF0} - confidence level,
$ {RECOG_SNR0} - the level of the signal-to-noise ratio.

Gismeteo application

 [gismeteo] exten => 6853,1,Goto(gismeteo,1) ; ,      exten => gismeteo,1,Answer() ;  exten => gismeteo,n,Set(SND_PATH=/var/www/html/gismeteo/sounds) exten => gismeteo,n,Set(GRAMMARS_PATH=http://192.168.2.103/gismeteo/grammars) exten => gismeteo,n,Set(AGI_PATH=/var/www/html/gismeteo/agi-bin) ;  exten => gismeteo,n(city_choice),Macro(recog-gismeteo,towns.xml,city_choice,error_city,$[${PRIORITY}+1]) exten => gismeteo,n,SET(xml_file=${RECOG_INT0}) ;   AGI-     exten => gismeteo,n,AGI(${AGI_PATH}/gismeteo.agi,city) exten => gismeteo,n,MRCPSynth(<?xml version=\"1.0\"?><speak version=\"1.0\" xml:lang=\"ru-ru\" xmlns=\"http://www.w3.org/2001/10/synthesis\"><voice name=\"8000\">   ${city}.</voice></speak>) ;  exten => gismeteo,n(time_choice),Macro(recog-gismeteo,time.xml,time_choice,error,agi_check) ;   AGI-    exten => gismeteo,n(agi_check),AGI(${AGI_PATH}/gismeteo.agi,time) ;         exten => gismeteo,n(not_found),Macro(recog-gismeteo,time.xml,not_found_time,error,agi_check) ;     exten => gismeteo,n(found),MRCPSynth(<?xml version=\"1.0\"?><speak version=\"1.0\" xml:lang=\"ru-ru\" xmlns=\"http://www.w3.org/2001/10/synthesis\"><voice name=\"8000\">${speech_text}</voice></speak>) ;   exten => gismeteo,n,Macro(recog-gismeteo,end_next.xml,end_next,error,$[${PRIORITY}+1]) exten => gismeteo,n,Goto(${RECOG_INT0}) exten => gismeteo,n(bye),Playback(${SND_PATH}/bye) exten => gismeteo,n,Hangup() 

Application logic

System : Say the name of the city, the weather in which you are interested.
Subscriber : Moscow
If the city is not recognized or is too quiet, the system will issue a corresponding voice message. If the recognition is successful, the system reports the result.
System : You have chosen the city of Moscow.
System : Say the weather for how long you are interested. For example, tomorrow afternoon.
Caller : Tomorrow morning.
If the time of day is not recognized or is too quiet, the system issues a corresponding voice message. If the recognition is successful, then gismeteo.agi is launched, which searches for the necessary information in the xml weather file. If there is no information on this time of day in the file, for example, the subscriber says this evening “this morning”, he will receive the message “There is no weather data for the selected time. Select another time. ”If information is found, the system reports the result.
System : Tomorrow morning the air temperature is from 16 to 18 degrees. Clear. No rain.
System : To find out the weather in another city, say “choose a city”. To find out the weather at another time of day, say “another time”. To end the call, say “finish”.
The phrase "select city" returns to the beginning of the application. The phrase "another time" allows you to find out the weather in the selected city at a different time of day.
Caller : Complete.
System : Thank you for your call. Goodbye!

I think the solution was pretty simple and elegant. You can compare it with the solution described in the original article;)
In addition, I tried to show many features and "chips" of work in Asterisk, such as using the NLSML parser, simplifying the dialplan by defining macro recognition, using pre-recorded phrases for barge-in, etc. At the same time he touched upon the creation of AGI scripts for working with external data. Similarly, with xml processing, you can access and retrieve data from a database or any other source.

In the next article I would like to touch upon the sore subject and describe some of the problems and limitations of Asterisk as a platform, in the context of using synthesis and recognition in it.

Waiting for your questions, comments, suggestions.

PS: You can test the application here (812) 3258848, ext. 6853
PPS: Friends, I hope there will not be too many people willing to call and you will not put corporate telephony))

Source: https://habr.com/ru/post/125512/


All Articles