📜 ⬆️ ⬇️

The implementation of the voice directory based on YandexSpeechKit

On the Internet, various implementations are presented, but, in my opinion, they are all quite simple. I want to present my version of the voice directory under the asterisk.


Note: I am not a professional programmer, and perhaps some solutions may seem wild to you. Some tricks may be outdated. I am ready to accept criticism and correct the system for the better.


Brief description of features:


The user enters the IVR, pronounces his request and, in most cases, gets where he wants. The system is also bolted statistics with a record in the mysql table.
Briefly about the company and the network in which the system is deployed:
~ 1000 phones, about 50 departments


Software products used by the system:



Description of dialplan in asterisk.


[officevoicerec] exten => s,1,Answer() same => n,Macro(hangercheck,${CALLERID(num)}) same => n,Set(ITERATIONS=1) same => n,Set(HANGFLAG=TRUE) same => n,Background(/var/lib/asterisk/sounds/ru/speechrec/zdravstvuite) 

In this snippet, a macro is run to check whether the caller just hung up after hearing a greeting message. Next comes the setting of variable values:
ITERATIONS - necessary to repeat the recognition process a specified number of times. HANGFLAG - this variable is used by the hangercheck macro.


 same => n(rec),Set(RECFILE=/tmp/${UNIQUEID}.wav) same => n,Playback(/var/lib/asterisk/sounds/en/beep) same => n,Record(${RECFILE},3,8) same => n,AGI(pyreq8.py,${RECFILE}) same => n,GotoIf($["${NUMTOCALL}" = "repeat"]?repeat) same => n,Set(HANGFLAG=FALSE) 

Set the file variable of the record, write the file. We run an agi-script that is responsible for sending the file for recognition and searching for the number (the script will be described later), checking the variable NUMTOCALL (the value is set by the script), set the HANGFLAG flag, which means that the person did not hang up early.


 same => n,Macro(VRstat,${CALLERID(num)},${NUMTOCALL},${RSTATUS},${CHANNEL},${RECREZ}) same => n,GotoIf($[[${EXISTS(${FNAME})}]]?foundName:havenodescr) same => n(foundName),Set(FILE_FNAME=${STRREPLACE(FNAME, ,)}) same => n,GotoIf($["${STAT(f,/var/lib/asterisk/sounds/ru/cache/${FILE_FNAME}.mp3)}"="1"]?havecache:nocache) 

In this fragment, the Macro is launched to check the fact that the caller just hung up after hearing the greeting message, then the variables were set, ITERATIONS - needed to repeat the process of recognizing a specified number of times. HANGFLAG - this variable is used by the hangercheck macro.


 same => n(rec),Set(RECFILE=/tmp/${UNIQUEID}.wav) same => n,Playback(/var/lib/asterisk/sounds/en/beep) same => n,Record(${RECFILE},3,8) same => n,AGI(pyreq8.py,${RECFILE}) same => n,GotoIf($["${NUMTOCALL}" = "repeat"]?repeat) same => n,Set(HANGFLAG=FALSE) 

Set the file variable of the record, write the file. Run the script responsible for sending the file to recognize and search for the number (the script will be described below), check the variable NUMTOCALL (the value is set by the script), put the HANGFLAG sign that the person did not hang up before the time.


 same => n,Macro(VRstat,${CALLERID(num)},${NUMTOCALL},${RSTATUS},${CHANNEL},${RECREZ}) same => n,GotoIf($[[${EXISTS(${FNAME})}]]?foundName:havenodescr) same => n(foundName),Set(FILE_FNAME=${STRREPLACE(FNAME, ,)}) same => n,GotoIf($["${STAT(f,/var/lib/asterisk/sounds/ru/cache/${FILE_FNAME}.mp3)}"="1"]?havecache:nocache) same => n(nocache),System(curl "https://tts.voicetech.yandex.net/generate?format=mp3&lang=ru-RU&speaker=zahar&emotion=neutral&speed=0.8&key=" -G --data-urlencode "text= ${FNAME}." > /tmp/speech-${UNIQUEID}.mp3) same => n,System(/usr/local/bin/lame -S --scale 30 /tmp/speech-${UNIQUEID}.mp3 /var/lib/asterisk/sounds/ru/cache/${FILE_FNAME}.mp3) same => n(havecache),Playback(/var/lib/asterisk/sounds/ru/cache/${FILE_FNAME}) same => n,Dial(Local/${NUMTOCALL}@common-context) 

Running the macro vrstat (responsible for the statistics, will not be described due to its triviality). Check if there is a description of FNAME (the variable is set by pyreq8.py) for the query. If there is a description, set the cache file name to the variable and check its presence. If the file does not exist, we will synthesize it, convert it to mp3, increase the volume and, further (or if the cache exists), play it and call the subscriber.


 same => n(repeat),GotoIf($["${ITERATIONS}"="1"]?secretary) same => n,Background(/var/lib/asterisk/sounds/ru/speechrec/1-wav) same => n,Set(ITERATIONS=$[${ITERATIONS}+1]) same => n,Goto(rec) 

Repeat recognition. If the number of iterations exceeds the specified, then translate to secretaries.


  same => n(secretary),Macro(VRstat,${CALLERID(num)},${NUMTOCALL},${RSTATUS},${CHANNEL},${RECREZ}) same => n,Set(HANGFLAG=FALSE) same => n,Dial(Local/1000@common-context) 

Transfer to secretaries. We put in the statistics, set the flag that did not throw the phone. Call the secretary.


 same => n(havenodescr),Playback(/var/lib/asterisk/sounds/ru/speechrec/wait2);thanks-wait) same => n,Noop('no description') same => n,Dial(Local/${NUMTOCALL}@common-context) same => n,Hangup() 

Call a subscriber who does not have a description in the directory. We lose the message, we enter in statistics, we cause.


 exten => h,1,Gotoif($["${HANGFLAG}"="TRUE"]?exec:noop) same => n(exec),Macro(VRstat,${CALLERID(num)},x,HANGER,${CHANNEL}) same => n(noop),Noop('exiting') 

Call termination processing for the hangercheck macro.


Description of the dialplan. Macros.


 [macro-hangercheck] ;${ARG1} -clid exten => s,1,GotoIf($["${ARG1}"="anonymous"]?end) exten => s,n,MYSQL(Connect connid SRV user password db utf8) exten => s,n,MYSQL(SET NAMES utf8) exten => s,n,MYSQL(Query resultid ${connid} SELECT IFNULL((SELECT clid from ivr_stat where rstatus="HANGER" and calldate >=ADDDATE(NOW(),INTERVAL -48 HOUR) and clid="${ARG1}" order by calldate desc limit 1),"NF")) exten => s,n,MYSQL(Fetch fetchid ${resultid} VAR) exten => s,n,MYSQL(Clear ${resultid}) exten => s,n,MYSQL(Disconnect ${connid}) exten => s,n,GotoIf($["${VAR}"="NF"]?end) exten => s,n,Macro(VRstat,${ARG1},x,H_RECALL,${CHANNEL}) exten => s,n,Dial(Local/1000@common-context) exten => s,n,Hangup() exten => s,n(end),Noop(Hanger check failed) 

We check in the statistics database whether this number has called the last 48 hours, and if he hung up without waiting for the recognition to be completed, we enter them in the statistics and connect with the secretary.


A brief description of the used sql and mysql tables.


Sql table dbo.phrases.


id(PK, int), phrase (text), number(varchar(50))


For this table, full-text search is raised, used to switch by keyword when getting a long phrase. For example, I received the phrase: “Please connect me with a representative of the advertising department”, if there is a record in the database, where the phrase = “advertising department”, then the caller will connect to the corresponding number (number).


Mysql table recogStats.


id(PK, int), date (datetime), ctime(float), rtime(float),stime(float), phrase(varchar(60))


This table is used for storing recognition results and collecting statistics on recognition time. ctime is the time taken to convert an audio file, rtime is the time taken to load and recognize, stime is the time taken to search


Mysql table ivr_stat.


id(PK, int), calldate (datetime),clid(varchar(15)),duration(int(20)),callednum(varchar(10)),rstatus(varchar(20)),channame(varchar30)),RECREZ(varchar(200))


This table is used to keep records of recognition results (RECREZ, rstatus), to whom the caller went on the search results of the recognized phrase (callednum), to keep statistics on how much time the person spent in the recognition menu (duration) and for debugging (channame)


Mysql table CustomRequests.


id(PK, int), nomer(int(10)), request(varchar(100))
Auxiliary directory of additional groups and departments.


Mysql table numdescriptions.


Num(text), name(text)


Reference c description of the entered numbers for voice.


Mysql tables spravochnik_rus and spravochnik_rus_name_num_no_tops


nomer(varchar(11)), fio(varchar(100))


Reference persons with numbers. The first is complete, for internal use. At the second, for external use, the numbers of directors and managers are wrapped in the secretariat.


AGI script that converts, sends and searches the recognized phrase.


Used libraries:


 import difflib from sys import exit import uuid import time import os import subprocess import xml.etree.ElementTree import MySQLdb import pymssql import string import re from itertools import permutations from os import remove import timeit 

As well as the asterisk.agi library, which we import depending on debugging (debugging is done on windows machines).


Description of variables.


 WINDEBUG = False 

Debug flag


 _digits = re.compile('\d') 

Variable compiled regular expression on numbers.


 uniqid = str(uuid.uuid1()).replace('-', '') 

Unique identifier variable


 dkey = '12345678-9101-1121-3141-51617181920' 

API-key Yandex speech kit.


 lang = 'ru-RU' 

Language option for Yandex speechkit.


 topic = 'queries' 

Theme recognition option for Yandex speechkit.


 callnumber = '222' 

The number to call by default.


 setVar('NUMTOCALL', callnumber) 

Set the number to call by default, in case something goes wrong.


 persondic = dict() 

Dictionary name - number.


 persondicFI=dict() 

FI Dictionary - number.


 persondicF=dict() 

Dictionary Last Name - Number.


 otherdic = dict() 

Dictionary with other records of the form Record - number.


 descriptions = dict() 

The dictionary with descriptions for generating voice acting, Record view - number.


 duplicates = list() 

Service list for filtering duplicates.


 nums = list() 

List with all internal numbers.


 outfile = '/tmp/' + uniqid + '-pcm.wav' 

Variable temporary output audio file.


 mysqlhost='myhost' mysqlpass='mypass' mysqluser='myuser' mysqldb='mydb' mssqlhost='mshost' mssqlpass='mspass' mssqluser='msuser' mssqldb='msdb' 

Details to connect to mysql and mssql.


 if not WINDEBUG: from asterisk.agi import * agi = AGI() infile = agi.env['agi_arg_1'] caller = agi.get_variable('CALLERID(num)') else: caller = '1064' infile = '' 

Import the library, set the variable name of the input audio file and the number of the caller, depending on the debugging.


Feature Description


 def verb(s): if not WINDEBUG: agi.verbose(s) else: print s 

Depending on the value of the debug variable, we display messages in the asterisk console or in stdout.


 def setVar(varname, varval): if not WINDEBUG: agi.set_variable(varname, varval) else: print "setting var " + varname + " with value " + varval 

Depending on the value of the debug variable, we assign the value of the dialplan variable to the asterisk or output to stdout.


 def contains_digits(s): verb('enter contains_digits') return bool(_digits.search(s)) 

Check if s contains digits.


 def return_digits(s): verb('return digits') pstr = s.encode("utf-8") all = string.maketrans('', '') nodigs = all.translate(all, string.digits) return unicode(pstr.translate(all, nodigs), "utf-8") 

We return only numbers from s.


 def check_dob(num): if int(num) in nums: verb('checkdob success') return True else: verb('checkdob fail' + num) return False 

Check for the existence of an extension number in the nums list.


 def set_dob(strnum): buf = return_digits(strnum) if contains_digits(strnum): if check_dob(buf): verb('setting var ' + buf) setVar('RSTATUS', 'SAYDIAL') return buf else: return "repeat" else: return "repeat" 

The function of setting the extension number.


 def checkSize(infile): if int(os.stat(infile).st_size) <= 26364: setVar('NUMTOCALL', 'repeat') setVar('RSTATUS', 'SILENCE') verb('empty file received') remove(infile) exit(9) 

Checking the received audio recording file for recognition, if the size is too small, we assume that the phone is silent.


 def addSessionStat(ctime, rtime, stime, phrase): cdate = time.strftime("%Y-%m-%d %H:%M:%S") db = MySQLdb.connect(host=mysqlhost, user=mysqluser, passwd=mysqluser, db=mysqldb, charset='utf8') cur = db.cursor() cur.execute( "INSERT INTO recogStats(date,ctime,rtime,stime,phrase) VALUES ('" + cdate + "','" + str(ctime) + "','" + str( rtime) + "','" + str(stime) + "','" + phrase + "')") db.commit() db.close() 

Record recognition statistics function.


 def fillDics(): db = MySQLdb.connect(host=mysqlhost, user=mysqluser, passwd=mysqlpass, db="central_cdr", charset='utf8') cur = db.cursor() if len(caller) != 4: tbname = 'spravochnik_rus_name_num_no_tops' else: tbname = 'svravochnik_rus_persons' cur.execute("""SELECT nomer,fio from """ + tbname) for row in cur.fetchall(): fullfio=row[1].lower() f=" ".join(fullfio.split(' ')[0:1]) if not f in persondicF.keys():# and f not in duplicates : persondicF[f] = int(row[0]) elif fullfio in persondic.keys(): pass else: duplicates.append(f) if not fullfio in persondic.keys(): persondic[fullfio] = int(row[0]) fi=" ".join(fullfio.split(' ')[0:2]) if not fi in persondicFI.keys(): persondicFI[fi] = int(row[0]) elif fullfio in persondic.keys(): pass else: persondicFI.pop(fi) uniquedups=[x for x in list(set(duplicates)) if x != ''] for item in uniquedups: for key in persondicF.keys(): if item in key: persondicF.pop(key) cur.execute("""SELECT nomer,request from CustomRequests """) for row in cur.fetchall(): otherdic[row[1].lower()] = row[0] cur.execute("""SELECT num,name from numdescriptions """) for row in cur.fetchall(): descriptions[str(row[0])] = row[1] cur.execute("""SELECT nomer from spravochnik_rus """) for row in cur.fetchall(): nums.append(int(row[0])) db.close() 

This function fills dictionaries of Surnames (persondicF), Surnames of Names (persondicFI), full name (persondic), department names (otherdic), descriptions for speech synthesis (descriptions) and all company numbers (nums).
Depending on the length of the caller's number, a table is taken which contains the directors' numbers (internal caller) or does not contain (external caller).
The directory of surnames is being tested for uniqueness in order to exclude the presence in it of two Ivanovs with different numbers.


 def convert(infile, outfile): #convert file verb("Converting WAV " + infile) soxconvert = subprocess.Popen(['sox', infile, '-r', '16000', '-b', '16', '-c', '1', outfile], stdout=subprocess.PIPE) (out, err) = soxconvert.communicate() # remove(infile) if soxconvert.returncode != 0: setVar('NUMTOCALL', callnumber) # return "" exit(9) 

The conversion function of the file recorded in the asterisk, in case sox gives an error, set the default number and exit the script.


 def sendRecog(file): verb("Sending file to yandex: " + outfile) proc = subprocess.Popen(['curl', '--max-time', '5', '--silent', 'asr.yandex.net/asr_xml?key=' + dkey + '&uuid=' + uniqid + '&topic=' + topic + '&lang=ru-RU', '-F', 'Content-Type=audio/x-pcm;bit=16;rate=16000', '-F', 'audio=@' + outfile], stdout=subprocess.PIPE) (out, err) = proc.communicate() verb("return code is: " + str(proc.returncode)) if proc.returncode != 0: return "" remove(file) e = xml.etree.ElementTree.fromstring(out) if e.attrib['success'] == '1': verb(e._children[0].text) return e._children[0].text else: return "" 

The function of sending a file to recognize and receive a response from Yandex. In case recognition succeeded, take the first answer. With curl library, I could not do this, for this reason I use this option.


 def searchfiobyf(f): if len(f.split(" "))==2: limit=2 elif len(f.split(" "))>=3: return f else: limit=1 for fio,num in persondic.iteritems(): cutfio=" ".join(fio.split(' ')[0:limit]) if f == cutfio: return fio 

The full name search function, depending on whether the last name or surname and first name came to the input.


 def combinationSearcher(phrase,pdic,quality): verb(u'searching in persons') result = difflib.get_close_matches(phrase, pdic.keys(), 1, quality) #verb(" ".join(result)) if len(result) > 0: fullfio=searchfiobyf(result[0]) verb(u'found ' + str(pdic[result[0]])) setVar('RSTATUS', 'NAMESUCCESS') setVar('FNAME', fullfio) return pdic[result[0]] else: return "" 

The search function combinations, example: phrase = "Alexey Peter", in the pdic contains the entry "Alexeyev Peter", the function get_close_matches returns "Alexeyev Peter". The function may produce incorrect results, but, as practice has shown, there are much more correct responses. The quality parameter allows you to set the accuracy of searching for similar phrases.


 def getNumByName(recognizedString): if u'' in recognizedString: verb('enter dobavochn') return set_dob(recognizedString) elif len(recognizedString.replace(" ", "")) == 4: verb('enter num say') return set_dob(recognizedString) if len(recognizedString) <= 5 and recognizedString.lower() not in [u"",u"",u""]: setVar('RSTATUS', 'SHORT') return "repeat" verb(u'start searching') #  split = list(set(recognizedString.split(" "))) parts = len(split) fixedstring=" ".join(split) if parts >= 5: verb('Phrase is long. Using FTDB') buf = mssqlwrapper(fixedstring) if buf != '': buf = mssqlwrapper(fixedstring)[0] if str(buf) in descriptions.keys(): setVar('FNAME', descriptions[str(buf)]) setVar('RSTATUS', 'DEPTSUCCESS') return buf else: setVar('RSTATUS', 'REQUESTNOTFOUND') return "repeat" result="" if parts == 1: mssqlcheck=mssqlwrapper(fixedstring) if mssqlcheck!='': if mssqlcheck[2]>=80: buf=str(mssqlcheck[0]) if buf in descriptions.keys(): setVar('FNAME', descriptions[str(buf)]) setVar('RSTATUS', 'DEPTSUCCESS') result=buf else: result=combinationSearcher(fixedstring,persondicF,0.7) elif parts ==2: combs = list(permutations(split, parts)) # (  ,   ,   .....) for item in combs: element = " ".join(item) result=combinationSearcher(element,persondicFI,0.7) if result!="": break elif parts ==3: combs = list(permutations(split, parts)) for item in combs: element = " ".join(item) result=combinationSearcher(element,persondic,0.8) if result!="": break if result!="": return result verb('Low ftdbsearch') buf = mssqlwrapper(recognizedString) if buf != '': buf = mssqlwrapper(recognizedString)[0] if str(buf) in descriptions.keys(): verb('it is') setVar('FNAME', descriptions[str(buf)]) setVar('RSTATUS', 'DEPTSUCCESS') return buf else: verb(u'item not found ' + recognizedString) #    setVar('RSTATUS', 'REQUESTNOTFOUND') return callnumber 

The main function of finding a number by phrase.


Let's sort this function in parts.


 if u'' in recognizedString: verb('enter dobavochn') return set_dob(recognizedString) 

Example: the phrase “Please extension 1234” will return the extension number (if any).


 elif len(recognizedString.replace(" ", "")) == 4: verb('enter num say') return set_dob(recognizedString) 

In case the user has said a four-digit number, return the corresponding additional number (if any).


 if len(recognizedString) <= 5 and recognizedString.lower() not in [u"",u"",u""]: setVar('RSTATUS', 'SHORT') return "repeat" 

In order not to make extra runs of the program on short phrases (this can happen when Yandex heard and recognized a part of the conversation when the caller did not listen to the recognition message).


 if parts >= 5: verb('Phrase is long. Using FTDB') buf = mssqlwrapper(fixedstring) if buf != '': buf = mssqlwrapper(fixedstring)[0] if str(buf) in descriptions.keys(): setVar('FNAME', descriptions[str(buf)]) setVar('RSTATUS', 'DEPTSUCCESS') return buf else: setVar('RSTATUS', 'REQUESTNOTFOUND') return "repeat" result="" 

For long phrases (more than five words), we immediately turn to FT base mssql.


 if parts == 1: mssqlcheck=mssqlwrapper(fixedstring) if mssqlcheck!='': if mssqlcheck[2]>=80: buf=str(mssqlcheck[0]) if buf in descriptions.keys(): setVar('FNAME', descriptions[str(buf)]) setVar('RSTATUS', 'DEPTSUCCESS') result=buf else: result=combinationSearcher(fixedstring,persondicF,0.7) 

If the recognized phrase consists of one word, we first look at the FT database, with 80% match accuracy, look for the description of the recognized number, set the result variable and the number plan status variable, otherwise we search for similar words in the surnames dictionary.


 elif parts ==2: combs = list(permutations(split, parts)) for item in combs: element = " ".join(item) result=combinationSearcher(element,persondicFI,0.7) if result!="": break 

If the phrase consists of two words, we assume that it is a surname and first name.
We make a list of combinations (Ivanov Ivan, Ivan Ivanov) and look for similar matches.


 elif parts ==3: combs = list(permutations(split, parts)) for item in combs: element = " ".join(item) result=combinationSearcher(element,persondic,0.8) if result!="": break if result!="": return result 

In case of receiving a phrase from three words, we assume that it is a full name, we make a list of combinations and look for it in a dictionary with a full name. Return result if the variable is not empty.


 buf = mssqlwrapper(recognizedString) if buf != '': buf = mssqlwrapper(recognizedString)[0] if str(buf) in descriptions.keys(): verb('it is') setVar('FNAME', descriptions[str(buf)]) setVar('RSTATUS', 'DEPTSUCCESS') return buf else: verb(u'item not found ' + recognizedString) #    setVar('RSTATUS', 'REQUESTNOTFOUND') return callnumber 

If we have not found a match, we do a search in the FT database, if we find a match, we return the result and description, otherwise, we return the default number and set the status that the phrase was not found.


The main body of the program.


 fillDics() if not WINDEBUG: checkSize(infile) start_time = timeit.default_timer() convert(infile, outfile) convert_elapsed = timeit.default_timer() - start_time start_time = timeit.default_timer() checkstring = sendRecog(outfile).lower() recog_elapsed = timeit.default_timer() - start_time verb('convert_elapsed = ' + str(recog_elapsed)) if checkstring == "": verb('not recognized. using default.') setVar('NUMTOCALL', 'repeat') setVar('RSTATUS', 'SILENCE') exit(9) else: setVar('RECREZ', checkstring) else: checkstring = u"" start_time = timeit.default_timer() callnumber = getNumByName(checkstring) search_elapsed = timeit.default_timer() - start_time if not WINDEBUG: setVar('NUMTOCALL', str(callnumber)) addSessionStat(convert_elapsed, recog_elapsed, search_elapsed, checkstring.lower()) 

The function of filling in dictionaries starts, if we are not debugging the program - we fill in the timer variables, check the file size, convert, send for recognition, otherwise, we fill in the checkstring variable with a check phrase.
Next, we fill in the variables of the timers for recognition, conduct a search for phrases in our structure. And, in the case of combat operation, we set a variable for the numbering plan and enter statistics.


Some statistics.


June 2018:
successful recognition - 1010
silent on the phone - 78
recognition failed (no match found) - 79
short request - 4
recognized department - 7
average recognition time (average response time of Yandex) - 2.6 seconds


June 2017:
successful recognition - 1271
recognized department - 18
recognition failed (no match found) - 127
short request - 9
silent on the phone - 71
average recognition time (average response time of Yandex) - 1.5 seconds


This service is successfully used in the company where I work. I hope my achievements will help other people to implement their ideas or similar functionality. Ready to answer all questions.


')

Source: https://habr.com/ru/post/417273/


All Articles