📜 ⬆️ ⬇️

Synthesis and Speech Recognition from Google for Asterisk

Good morning!

Last night I looked at Habr, I saw the Google translate + Asterisk IVR article and my hair began to move in my armpits.

Speech synthesis, how easy it is!
No need to collect the Festival and look for samples for it. Everything is ready, simply and from Google.
')

Immediately rewrote the proposed version of his favorite php and designed in the form of AGI to call from Asterisk. I wanted the synthesis to be used as a single line in the dialplanet, as the standard SayDigits () command:

An example of using extensions.ael:
s => { Answer(); Wait(1); AGI(say.php,""); AGI(say.php,"  "); AGI(say.php,"Habrahabr!",en); AGI(say.php,"    !"); AGI(say.php,"!"); AGI(say.php,"  "); AGI(say.php,"  !"); }; 


And the php code itself (must be /var/lib/asterisk/agi-bin/say.php):
 #!/usr/bin/php -q <?php $agivars = array(); while (!feof(STDIN)) { $agivar = trim(fgets(STDIN)); if ($agivar === '') break; $agivar = explode(':', $agivar); $agivars[$agivar[0]] = trim($agivar[1]); } extract($agivars); $text = $_SERVER["argv"][1]; if (isset($_SERVER["argv"][2])) $lang = $_SERVER["argv"][2]; else $lang = 'ru'; $md5 = md5($text); $prefix = '/var/lib/asterisk/festivalcache/'; $filename = $prefix.$md5; if (!file_exists($filename.'.alaw')) { $wget = 'wget -U "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5" '; $wget.= '"http://translate.google.com/translate_tts?q='.$text.'&tl='.$lang.'" -O '.$filename.'.mp3'; $ffmpeg = 'ffmpeg -i '.$filename.'.mp3 -ar 8000 -ac 1 -ab 64 '.$filename.'.wav -ar 8000 -ac 1 -ab 64 -f alaw '.$filename.'.alaw -map 0:0 -map 0:0'; $exec = $wget.' && '.$ffmpeg.' && rm '.$filename.'.mp3 '.$filename.'.wav'; exec($exec); } echo 'STREAM FILE "'.$filename.'" ""'."\n"; fgets(STDIN); exit(0); ?> 

In my Asterisk, the main codec is alaw, for this mp3 I convert directly into alaw.

After 10 minutes of excitement, I remembered that Google has the ability to recognize speech (as in a search from a mobile phone). Climbed on the Internet and found the article Voice Control. Recognition of Russian speech , where php is an example for speech recognition by Google.

I rewrote the code in AGI form and got (/var/lib/asterisk/agi-bin/voice.php):
 #!/usr/bin/php -q <? $agivars = array(); while (!feof(STDIN)) { $agivar = trim(fgets(STDIN)); if ($agivar === '') break; $agivar = explode(':', $agivar); $agivars[$agivar[0]] = trim($agivar[1]); } extract($agivars); $filename = $_SERVER["argv"][1]; exec('flac -f -s '.$filename.'.wav -o '.$filename.'.flac'); $file_to_upload = array('myfile'=>'@'.$filename.'.flac'); $ch = curl_init(); curl_setopt($ch, CURLOPT_URL,"https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=ru-RU"); curl_setopt($ch, CURLOPT_POST,1); curl_setopt($ch, CURLOPT_HTTPHEADER, array("Content-Type: audio/x-flac; rate=8000")); curl_setopt($ch, CURLOPT_POSTFIELDS, $file_to_upload); curl_setopt($ch,CURLOPT_RETURNTRANSFER,1); $result=curl_exec ($ch); curl_close ($ch); $json_array = json_decode($result, true); $voice_cmd = $json_array["hypotheses"][0]["utterance"]; unlink($filename.'.flac'); unlink($filename.'.wav'); echo 'SET VARIABLE VOICE "'.$voice_cmd.'"'."\n"; fgets(STDIN); echo 'VERBOSE ("'.$voice_cmd.'")'."\n"; fgets(STDIN); exit(0); ?> 

The Google Speech API accepts flac and speex audio files, leaving flac from the example.
The recognized text will be set to the variable $ {VOICE}.

A general example of using extensions.ael:
 s => { Answer(); Wait(1); AGI(say.php,""); AGI(say.php,""); AGI(say.php,"  "); Record(/tmp/${UNIQUEID}.wav,3,20); AGI(say.php," "); Playback(/tmp/${UNIQUEID}); AGI(voice.php,/tmp/${UNIQUEID}); AGI(say.php," "); AGI(say.php,"${VOICE}"); Hangup(); }; 

Record records a wav file with a maximum length of 20 seconds and ends the recording after 3 seconds of silence.
Since this is a test example, we listen to what has been said and then synthesize the recognized text.

What can I say, Google is great!
Now it is clear how pure Asterisk to teach speech synthesis and recognition, without using Festival and Sphinx.

And if the authorities ask you to quickly make a voice IVR menu, we will be able to surprise!

Added by

I read the comment of the user int80h , read about the migration from the Google Translate API to the Bing Translate API and thought that an alternative is needed in everything.

Version 2.0
say.php with speech synthesis through Microsoft Translator:
 #!/usr/bin/php -q <?php $agivars = array(); while (!feof(STDIN)) { $agivar = trim(fgets(STDIN)); if ($agivar === '') break; $agivar = explode(':', $agivar); $agivars[$agivar[0]] = trim($agivar[1]); } extract($agivars); $text = $_SERVER["argv"][1]; if (isset($_SERVER["argv"][2]) && in_array($_SERVER["argv"][2], array('g','m'))) $voice = $_SERVER["argv"][2]; else $voice = 'g'; if (isset($_SERVER["argv"][3])) $lang = $_SERVER["argv"][3]; else $lang = 'ru'; $md5 = md5($text.$voice.$lang); $prefix = '/var/lib/asterisk/festivalcache/'; $appid = 'T0CQJrrwQ1NcJFlJshEfWTzaI18B4TzVvBKx9CDoLvf8*'; $filename = $prefix.$md5; if (!file_exists($filename.'.alaw')) { if ($voice == 'm') { $ext = '.wav'; exec('wget "http://api.microsofttranslator.com/V2/Http.svc/Speak?language='.$lang.'&format=audio/wav&options=MaxQuality&appid='.$appid.'&text='.$text.'" -O '.$filename.$ext); } else { $ext = '.mp3'; exec('wget -U "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5" "http://translate.google.com/translate_tts?q='.$text.'&tl='.$lang.'" -O '.$filename.$ext); } if (@filesize($filename.$ext) > 0) { exec('ffmpeg -i '.$filename.$ext.' -ar 8000 -ac 1 -ab 64 -f alaw '.$filename.'.alaw -map 0:0'); } unlink($filename.$ext); } if (file_exists($filename.'.alaw')) { echo 'STREAM FILE "'.$filename.'" ""'."\n"; fgets(STDIN); } else { echo 'VERBOSE ("Speech Error!")'."\n"; fgets(STDIN); } exit(0); ?> 

Microsoft gives the sound in wav format (for mp3, the quality is zero) and asks for some Bing AppId (picked up from microsofttranslator.com, let's see how long it will live).
The quality of the synthesis seemed to me worse than that of Google, but the emphasis in the names puts more correctly.

 AGI(say.php,"",m); AGI(say.php,"",${},${}); ${} -   m  g (  = g) ${} - ru, en   (  = ru) 

Russian text will work only with ru, English will always be, but with ru it will be “broken”.
In the text, stress ('before the vowel) and punctuation marks (for example!) Work and change intonation.

PS: Replaced that speech recognition can produce an empty text, but when you re-send the same file, everything runs smoothly, it is strange :-)

Source: https://habr.com/ru/post/133869/


All Articles