Customer Self Service with google ASR

It all started with a request from a client who wanted to keep up with the times and partially replace live tellers with a soulless robot - to take meter readings through the telephone. With the introduction via DTMF everything seems to be clear and there is no zest in it. I wanted to input voice.

Under the cut, a small manual about screwing a google ASR to one of the boxed versions of asterisk.

After some monitoring of the Internet, several ready-made solutions were found on MDG technologies. Maybe it's good, but the price tag started from 150 kilo rubles, so we go by (hello toad).

I didn’t take a naked asterisk, because I want some kind of gui, FreePBX also went to the forest because it turns into one big custom-context, they took XVB box - VirtualPBX because: cdr, TTS, gsm modems, autoinformer (our second stage) out of the box , ASR and database through http API. We take a free one (from paid calls only by a limit of 10 simultaneous calls), a version for our 2 PSTN lines and 2 3G whistles behind our eyes. In principle, besides the switchvox, the fastening is about the same.
')

Go:

We download from a site or a mirror (stable but with a slightly old asterisk here ) a large tar.bz2 archive with a ready-to-use image for VMWare Player. Or if there is an old version, we put updates .

We unpack and run in VMWare player either convert to ESXi or dump it to real hardware. Nothing complicated. The first launch is a bit long because sound files are unpacked and updates are checked.

Lyrical digression:
The system was conceived to create a hosting of virtual PBXs, but since we use it alone then both the admins and the users are ourselves. In general, this is not a small `gun` to fight with` sparrows`. But there are additional buns, such as the ability to create multiple QA from the box.

We go to the administrator interface, make the user, add the DID, change his language to Russian-Woman (this is google TTS) - this virtual PBX will be our self-service system. In more detail about the setting to write probably not worth it, there have already been written .

Then we stamp into the console and write a cgi script that will go to Google. Take an example and rule a little so that we understand the CGI parameters, we get something like:

#!/usr/bin/perl ######################################################################## use strict; use CGI qw(:standard); use LWP::UserAgent; use JSON::XS; my $lang = param('lang'); my $file = param('file'); my $var_name = param('var') || 'ASR_RESULT'; print "Content-type: text/plain; charset=utf-8\n\n"; my $url; if ( lc($lang) eq 'ru' ) { $url = "http://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=ru-RU"; } else { $url = "http://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US"; } my $new_file = "/tmp/google-asr-$$.wav"; open(TMP,">$new_file"); my $buffer; while ( !eof($file) ) { read( $file, $buffer, 16384 ); print TMP $buffer; } system "ffmpeg -y -i $new_file $new_file.flac 2>/dev/null"; if ( $? ) { print "$var_name=\n"; die "Can't convert file $new_file to $new_file.flac: $?\n"; } else { $new_file .= '.flac'; } my $file_info = `file $new_file`; if ( $file_info =~ /FLAC audio.*\s([\d.]+)\s*kHz/ ) { $file_info = $1 * 1000; } else { unlink $new_file; print "$var_name=\n"; die "Incorrect FLAC file: $file_info\n"; } unless ( open( FILE, "<$new_file" ) ) { print "$var_name=\n"; die "Can't open input file[$file]: $!\n"; } undef $/; my $audio = <FILE>; $/ = "\n"; close(FILE); unlink $new_file; my $ua = LWP::UserAgent->new( debug => 1 ); my $response = $ua->post($url, Content_Type => "audio/x-flac; rate=$file_info", Content => $audio); if ( $response->is_success ) { my $h_ref; eval { my $json = JSON::XS->new(); $h_ref = $json->decode($response->content()); }; if ( ref $h_ref eq 'HASH' and exists $h_ref->{'hypotheses'} ) { my $data = $h_ref->{'hypotheses'}->[0]->{'utterance'}; my $map = { 'o' => 0, 'o' => 1, 'a' => 2, '' => 3, '' => 4, '' => 5, '' => 6, '' => 7, 'o' => 8, '' => 9, '' => 10, '' => '', '' => '', '' => '', '' => '', '' => '', '' => '', '' => '', '' => '', '' => '', '' => '', '' => '', '' => '', }; my $result = ''; foreach my $ch ( split(/\s+/,$data) ) { $result .= length($map->{$ch}) ? $map->{$ch} : $ch; } print "$var_name=$result\n"; exit; } } print "$var_name=\n";

In addition to reading CGI parameters, they also added `conversion to numbers` from the strings of the form 'one', 'two', etc. and several 'aliases' on the names of the counter types and confirmation (yes == yes || further || true), because Some lines 'farther away' are recognized better than just 'yes'.

At the input, the script receives the file in wav format and the name of the variable where to write the value (this is transmitted to XVB and is used in the dialplan), we make another 2 scripts:

The second (account_check.cgi) checks the correctness of the input of the personal account,
The third (commit.cgi) writes data to the database.

I will not bring them here because there everything will depend on your database. The third script takes the account number / type of evidence / and readings and writes them to the database and returns the difference. We pronounce it to the user.
We save these scripts on the same machine (or another) in / var / www / cgi-bin /.

Next comes the boring adjustment of the dial-plan of almost identical pieces:

data request and transfer to our cgi-script that returns a text string from Web (waraviables) from wav
verification of user input (GotoIF)
repeat entered data (RoboText)
data acknowledgment by voice (WebVar variables)

That is, for each indicator we do:

request data (record sound file).
file transfer to the web server which decodes the entered data into the text and returns the answer in a variable.
We read the user what he entered and ask to confirm the input.
write the result.

As you can see, the actions are very tedious, so we did not do it all by hand and generated the XML config by the script
(XVB supports loading / unloading configuration from an XML file). A small piece of what happened:

 <opt> <IVR name="0" EXT_NUMBER="0" NAME="." GREET_REPEAT_CNT="1" GREETING="     ." GREET_REPEAT_DELAY="0.00" NEXTEXTENSION="V5_2*0" TYPE="1" WAITEXTENSION="0"> </IVR> <IVR name="error" EXT_NUMBER="error" NAME="WEB " GREETING=" .   ." GREET_REPEAT_CNT="1" GREET_REPEAT_DELAY="0.00" NEXTEXTENSION="hangup" TYPE="1" WAITEXTENSION="0"> </IVR> <IVR name="V5_2*0" EXT_NUMBER="V5_2*0" NAME="  -  --------------" GREET_REPEAT_CNT="0" GREET_REPEAT_DELAY="0.00" NEXTEXTENSION="V5_2*1" TYPE="1" WAITEXTENSION="0"> </IVR> <IVR name="V5_2*1" EXT_NUMBER="V5_2*1" NAME="  -  /  " GOTO_IF_FAIL="error" GREETING="    " GREET_REPEAT_DELAY="0.00" GREET_REPEAT_CNT="1" MAX_MSG_DURATION="10" MAX_SILENCE="2" NEED_VOICE="1" NEXTEXTENSION="V5_2*2" TYPE="20" WAITEXTENSION="0" WEBVAR_URL="http://localhost/cgi-bin/gv.pl?lang=ru&file=% VAR:FILE_DATA %&var=ASR_RESULTV5_2"> </IVR> <IVR name="V5_2*2" EXT_NUMBER="V5_2*2" NAME="  -  /  " GREET_REPEAT_CNT="0" GREET_REPEAT_DELAY="0.00" NEXTEXTENSION="V5_2*3" TYPE="21" WAITEXTENSION="0"> <_VB_DATA COND="==" FUNC="strlen" PRIORITY="5" REDIRECT_TO="V5_2*2*1" VAR_NAME="ASR_RESULTV5_2" VAR_VALUE="0"/> </IVR> <IVR name="V5_2*2*1" EXT_NUMBER="V5_2*2*1" NAME="  -  /  " GREETING="    ,    ." GREET_REPEAT_CNT="1" GREET_REPEAT_DELAY="0.00" NEXTEXTENSION="V5_2*1" TYPE="1" WAITEXTENSION="0"> </IVR> <IVR name="V5_2*3" EXT_NUMBER="V5_2*3" NAME="  -  /  " GREETING=" " GREET_REPEAT_CNT="1" GREET_REPEAT_DELAY="0.00" NEXTEXTENSION="V5_2*4" SAY_PATTERN="char" SAY_PATTERN_ID="0" TEXT_STR="% VAR:ASR_RESULTV5_2 %" TYPE="25" WAITEXTENSION="0"> </IVR> <IVR name="V5_2*4" EXT_NUMBER="V5_2*4" NAME="  -  / " GOTO_IF_FAIL="error" GREETING=",    ." GREET_REPEAT_DELAY="0.00" GREET_REPEAT_CNT="1" MAX_MSG_DURATION="3" MAX_SILENCE="2" NEED_PARAMS="0" NEED_VOICE="1" NEXTEXTENSION="V5_2*4*1" TYPE="20" WAITEXTENSION="0" WEBVAR_URL="http://localhost/cgi-bin/gv.pl?lang=ru&file=% VAR:FILE_DATA %"> </IVR> <IVR name="V5_2*4*1" EXT_NUMBER="V5_2*4*1" NAME="  -  /  " GREET_REPEAT_CNT="0" GREET_REPEAT_DELAY="0.00" NEXTEXTENSION="V5_2*3" TYPE="21" WAITEXTENSION="0"> <_VB_DATA COND="==" FUNC="value" PRIORITY="5" REDIRECT_TO="V5_2*0" VAR_NAME="ASR_RESULT" VAR_VALUE=""/> <_VB_DATA COND="==" FUNC="value" PRIORITY="5" REDIRECT_TO="V5_2*1" VAR_NAME="ASR_RESULT" VAR_VALUE=""/> </IVR> <IVR name="V5_3*9" EXT_NUMBER="V5_3*9" NAME="  -  / " GREET_REPEAT_CNT="0" GREET_REPEAT_DELAY="0.00" NEXTEXTENSION="V5_3*9*1" TYPE="6" WAITEXTENSION="0" QUIET_MODE="1" GOTO_IF_FAIL="error" TTS_URL="http://127.0.0.1/cgi-bin/commit.pl?level=V5_3&account=% VAR:ASR_RESULTV1 %&value1=% VAR:ASR_RESULTV5_1 %&value2=% VAR:ASR_RESULTV5_2 %&value3=% VAR:ASR_RESULTV5_3 %"> </IVR> <IVR name="V5_3*9*1" EXT_NUMBER="V5_3*9*1" NAME="-----------------------" GREET_REPEAT_CNT="1" GREETING="  . .       ." GREET_REPEAT_DELAY="0.00" NEXTEXTENSION="0" TYPE="1" WAITEXTENSION="0"> </IVR> </opt>

You can save the above configuration in xxx.xml and upload it in the user profile via the restore configuration item.

In the web, we ended up with something like:

Actually the simplest working prototype for collecting testimony by voice and through DTMF (the ability to organize self-service by entering DTMF into VirtualPBX is out of the box, so I did not describe it in detail here) is ready. Now we polish our scripts and write normal sound greetings.

PS Let's see how many people prefer voice to DTMF, for me DTMF seems to be simpler (more familiar) and faster if you can enter via DTMF.

Source: https://habr.com/ru/post/188382/

All Articles

Customer Self Service with google ASR

Go:

More articles: