I have already talked about using pocketsphinx for speech recognition in ROS. In this article I want to talk about using gspeech for speech recognition. gspeech is a ROS package that uses the Google Speech API:
wiki.ros.org/gspeech .
Getting the Google API Key
So, let's begin. First you need a Google API key. To receive it, you first need to have a Google account. Secondly, you need to subscribe to chromium-dev@chromium.org (you need to subscribe
here ).
Now you can get your Google API key. To do this, go to Google developer console:
cloud.google.com/console . Here you need to create a project. After creating the project, you need to activate the Speech API in the APIs section under the APIs & auth item in the left menu. Be careful: this item may not be in the list, as it happened to me. If you do not see the Speech API, then check that you subscribed to chromium-dev and that you are now logged in with the Google account whose email address you used when subscribing to chromium-dev.
You can get the Google API key in the Credentials section under the same APIs & auth. Here you need to create a key by clicking on the Create new Key button in the Public API access section.
Gspeech installation
Now it remains to be easy - to install the gspeech package. To do this, clone gspeech from the Github page:
github.com/kusha/gspeech . For gspeech to work, it requires a sox installation:
')
sudo apt-get install sox
You also need to insert your Google API key into the gspeech.py ​​script in the line:
api_key = ""
Running gspeech
Everything is ready and you can start the ROS gspeech node:
rosrun gspeech gspeech.py
Gspeech recognition
In the process of recognition, gspeech publishes the recognized phrases into the theme / speech in String format and the degree of “confidence” of recognition into the theme / confidence in the Int8 format.
The phrase recognition process may take some time, as gspeech makes requests to Google servers. Nevertheless, gspeech has a fairly high recognition accuracy, gspeech recognizes phrases much better than the pocketsphinx package. When testing, gspeech recognized phrases with “confidence” 70-80. In some cases, recognizes with "confidence" to 94.
I wish you good luck in speech recognition with the Google Speech API.