Using Google Cloud Speech API v2 in Asterisk to recognize Russian speech

Good evening, colleagues. Recently there was a need to add a system of voice requests to our ticket system. But it is not always convenient to listen to the voice file every time, so an idea arose to add to this the automatic voice recognition system, moreover, in the future it would be useful in other projects. In the course of this work, two APIs of the most popular speech recognition systems from google and yandex were tried. In the end, the choice fell on the first option. Unfortunately, I did not find detailed information about this on the Internet, so I decided to share my experience. If it is interesting what came out of this welcome under cat.

Choice of speech recognition API

I considered only the api option, the boxed solutions were not needed because they required resources, the data for recognition was not critical for the business, and using them was much more complicated and required more man-hours.

The first was Yandex SpeechKit Cloud. I immediately liked it with ease of use:

curl -X POST -H "Content-Type: audio/x-wav" --data-binary "@speech.wav" "https://asr.yandex.net/asr_xml?uuid=< >&key=<API->&topic=queries"

Pricing policy 400 rubles per 1000 requests. The first month is free. But after that only disappointments went:
')
- On the transfer of a large sentence, came the answer of 2-3 words
- Recognized these words in a strange sequence.
- Attempts to change the topic did not bring positive results

Perhaps this was due to the average recording quality, we all tested through voice gateways and ancient panasonic phones. While I plan to use it in the future to build an IVR.

The next was a service from Google. The Internet is replete with articles suggesting the use of the Chromium API for developers. Now the keys for this API is not so easy to get. Therefore, we will use a commercial platform.

Pricing policy - 0-60 minutes per month for free. Further, $ 0.006 for 15 seconds of speech. Each request is rounded to a digit multiple of 15. The first two months are free, you need a credit card to create a project. The uses of the API in the basic documentation are varied. We will use the Python script:

Script from the documentation

 """Google Cloud Speech API sample application using the REST API for batch processing.""" import argparse import base64 import json from googleapiclient import discovery import httplib2 from oauth2client.client import GoogleCredentials DISCOVERY_URL = ('https://{api}.googleapis.com/$discovery/rest?' 'version={apiVersion}') def get_speech_service(): credentials = GoogleCredentials.get_application_default().create_scoped( ['https://www.googleapis.com/auth/cloud-platform']) http = httplib2.Http() credentials.authorize(http) return discovery.build( 'speech', 'v1beta1', http=http, discoveryServiceUrl=DISCOVERY_URL) def main(speech_file): """Transcribe the given audio file. Args: speech_file: the name of the audio file. """ with open(speech_file, 'rb') as speech: speech_content = base64.b64encode(speech.read()) service = get_speech_service() service_request = service.speech().syncrecognize( body={ 'config': { 'encoding': 'LINEAR16', # raw 16-bit signed LE samples 'sampleRate': 16000, # 16 khz 'languageCode': 'en-US', # a BCP-47 language tag }, 'audio': { 'content': speech_content.decode('UTF-8') } }) response = service_request.execute() print(json.dumps(response)) if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument( 'speech_file', help='Full path of audio file to be recognized') args = parser.parse_args() main(args.speech_file)

Preparing to use Google Cloud Speech API

We will need to register a project and create a service account key for authorization. Here is a link to get a trial, you need a google account. After registration, you must activate the API and create a key for authorization. After you need to copy the key to the server.

Let us proceed to setting up the server itself, we will need:

- python
- python-pip
- python google api client

 sudo apt-get install -y python python-pip pip install --upgrade google-api-python-client

Now we need to export two environment variables for successful work with api. The first is the path to the service key, the second is the name of your project.

 export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service_account_file.json export GCLOUD_PROJECT=your-project-id

Download the test audio file and try to run the script:

 wget https://cloud.google.com/speech/docs/samples/audio.raw python voice.py audio.raw {"results": [{"alternatives": [{"confidence": 0.98267895, "transcript": "how old is the Brooklyn Bridge"}]}]}

Fine! The first test is successful. Now let's change the text recognition language in the script and try to recognize it:

 nano voice.py service_request = service.speech().syncrecognize( body={ 'config': { 'encoding': 'LINEAR16', # raw 16-bit signed LE samples 'sampleRate': 16000, # 16 khz 'languageCode': 'ru-RU', # a BCP-47 language tag

We need a .raw audio file. We use for this sox

 apt-get install -y sox sox test.wav -r 16000 -b 16 -c 1 test.raw python voice.py test.raw {"results": [{"alternatives": [{"confidence": 0.96161985, "transcript": "\u0417\u0434\u0440\u0430\u0432\u0441\u0442\u0432\u0443\u0439\u0442\u0435 \u0412\u0430\u0441 \u043f\u0440\u0438\u0432\u0435\u0442\u0441\u0442\u0432\u0443\u0435\u0442 \u043a\u043e\u043c\u043f\u0430\u043d\u0438\u044f"}]}]}

Google returns us the answer in unicode. But we want to see normal letters. Let's change our voice.py a bit:

Instead

 print(json.dumps(response))

We will use

 s = simplejson.dumps({'var': response}, ensure_ascii=False) print s

Add import simplejson . The final script under the cut:

Voice.py

 """Google Cloud Speech API sample application using the REST API for batch processing.""" import argparse import base64 import json import simplejson from googleapiclient import discovery import httplib2 from oauth2client.client import GoogleCredentials DISCOVERY_URL = ('https://{api}.googleapis.com/$discovery/rest?' 'version={apiVersion}') def get_speech_service(): credentials = GoogleCredentials.get_application_default().create_scoped( ['https://www.googleapis.com/auth/cloud-platform']) http = httplib2.Http() credentials.authorize(http) return discovery.build( 'speech', 'v1beta1', http=http, discoveryServiceUrl=DISCOVERY_URL) def main(speech_file): """Transcribe the given audio file. Args: speech_file: the name of the audio file. """ with open(speech_file, 'rb') as speech: speech_content = base64.b64encode(speech.read()) service = get_speech_service() service_request = service.speech().syncrecognize( body={ 'config': { 'encoding': 'LINEAR16', # raw 16-bit signed LE samples 'sampleRate': 16000, # 16 khz 'languageCode': 'en-US', # a BCP-47 language tag }, 'audio': { 'content': speech_content.decode('UTF-8') } }) response = service_request.execute() s = simplejson.dumps({'var': response}, ensure_ascii=False) print s if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument( 'speech_file', help='Full path of audio file to be recognized') args = parser.parse_args() main(args.speech_file)

But before launching it, you will need to export one more environment variable export PYTHONIOENCODING = UTF-8 . Without it, I had problems with stdout when called in scripts.

 export PYTHONIOENCODING=UTF-8 python voice.py test.raw {"var": {"results": [{"alternatives": [{"confidence": 0.96161985, "transcript": "   "}]}]}}

Fine. Now we can call this script in the dialplan.

Asterisk dialplan example

To call the script, I will use a simple dialplan:

 exten => 1234,1,Answer exten => 1234,n,wait(1) exten => 1234,n,Playback(howtomaketicket) exten => 1234,n,Playback(beep) exten => 1234,n,Set(FILE=${CALLERID(num)}--${EXTEN}--${STRFTIME(${EPOCH},,%d-%m-%Y--%H-%M-%S)}.wav) exten => 1234,n,MixMonitor(${FILE},,/opt/test/send.sh support@test.net "${CDR(src)}" "${CALLERID(name)}" "${FILE}") exten => 1234,n,wait(28) exten => 1234,n,Playback(beep) exten => 1234,n,Playback(Thankyou!) exten => 1234,n,Hangup()

I use to write mixmonitor and after the end I run the script. You can use the record and it will probably be better. An example of send.sh to send is that it assumes that you have already configured mutt:

 #!/bin/bash #    #     #    export GOOGLE_APPLICATION_CREDENTIALS=/opt/test/project.json #   export GCLOUD_PROJECT=project-id #    export PYTHONIOENCODING=UTF-8 #    EMAIL=$1 CALLERIDNUM=$2 CALLERIDNAME=$3 FILE=$4 #     raw  ,      sox /var/spool/asterisk/monitor/$FILE -r 16000 -b 16 -c 1 /var/spool/asterisk/monitor/$FILE.raw #               TEXT=`python /opt/test/voice.py /var/spool/asterisk/monitor/$FILE.raw | sed -e 's/.*transcript"://' -e 's/}]}]}}//'` #  ,      echo "   : $CALLERIDNUM $CALLERIDNAME $TEXT " | mutt -s "  " -e 'set from=test@test.net realname="  "' -a "/var/spool/asterisk/monitor/$FILE" -- $EMAIL

Conclusion

Thus we solved the task. I hope someone will benefit from my experience. I would be happy to comment (perhaps just for the sake of this, and it is worth reading Habr!). In the future I plan to implement on the basis of this IVR with elements of voice control.

Source: https://habr.com/ru/post/310622/

All Articles

Using Google Cloud Speech API v2 in Asterisk to recognize Russian speech

Choice of speech recognition API

Preparing to use Google Cloud Speech API

Asterisk dialplan example

Conclusion

More articles: