In the recent past, only in science fiction films one could see the “miracle of the box” in which one speaks, and as a result one hears the translation of what was said in another language. But progress is underway ...
I have been waiting for a long time for Google to open the API of its speech recognition service (the company uses it in its products). A few months ago I turned over “this is your Internet,” but to no avail. And the other day I see a topic on Habré. Use Google Voice Search in your .NET application ! I was incredibly happy. The topic refers to the original Accessing Google speech API / Chrome 11 article. All carefully studied and "scratched" the source code of Chrome.
Google Speech Recognition API is still unofficial and has become available to the public thanks to the Chrome browser. ')
The options for its use are truly endless. And if you combine this with the morphological modules, then you can freak out just the freaky affairs in the field of voice control.
To demonstrate and create a “frame engine” (for my further needs), I made a “Voice Translator” for Mac OS X in a couple of days. This is a symbiosis of technologies Google, Microsoft (pronunciation) and the open project ffmpeg (conversion to flac). This is a voice translator - no typing. Just say the phrase and listen to its translation. Yes, the quality of recognition may not be perfect, but on short, clear phrases it is quite acceptable.
Here is the video of the program:
As usual, I will divide this topic into two parts. One for ordinary users who want to "play around" with this program. Another for developers (I will provide the source code of the base project).
FOR USERS
You can download the program here (Mac OS X 10.6+).
The program interface is very simple. Choose the right language direction (in this demonstration project I made only two directions, but the services support a much larger number of languages). Click the "Record" button and say the phrase. Recording will automatically stop after 5 seconds or you can stop it yourself. Everything - listen to the translation :).
The project uses the already assembled ffmpeg binary converter to convert the recorded sound into flac. If you want to transfer the project to iOS, you can use the static library from the libFlac project.
For the HUD interface, the project uses the already assembled BGHUDAppKit framework .
After conversion, we start the recognition process. It is implemented in a separate GoogleASR class. An object of this class sends a request (asynchronously) to https://www.google.com/speech-api/v1/recognize , processes the result and informs the delegate of the recognition content or informs about the error. The processing of the response from the server is very correct - it is completely (algorithm) copied from the Chrome browser. There is only one main method in the class:
- ( void ) speechRecognition : ( NSString * ) flacPath language : ( NSString * ) language
Next, we pass the recognized text to the GoogleTranslate class object. It translates the text and tells the delegate the result of the translation or informs about the error. The main method:
- ( void ) translate : ( NSString * ) text from : ( NSString * ) inLanguage to : ( NSString * ) outLanguage
Next comes the object of the MicrosoftTTS class. It receives audio data and sends it to the delegate or informs about the error. The main method:
- ( void ) textToSpeech : ( NSString * ) text language : ( NSString * ) language
Do not forget to get your Bing AppID from Microsoft (this is done for free) and insert it into the MicrosoftTTS class, into SpeechURL.