Voice translator for Mac OS X

In the recent past, only in science fiction films one could see the “miracle of the box” in which one speaks, and as a result one hears the translation of what was said in another language. But progress is underway ...

I have been waiting for a long time for Google to open the API of its speech recognition service (the company uses it in its products). A few months ago I turned over “this is your Internet,” but to no avail. And the other day I see a topic on Habré. Use Google Voice Search in your .NET application ! I was incredibly happy. The topic refers to the original Accessing Google speech API / Chrome 11 article. All carefully studied and "scratched" the source code of Chrome.

Google Speech Recognition API is still unofficial and has become available to the public thanks to the Chrome browser.
')
The options for its use are truly endless. And if you combine this with the morphological modules, then you can freak out just the freaky affairs in the field of voice control.

To demonstrate and create a “frame engine” (for my further needs), I made a “Voice Translator” for Mac OS X in a couple of days. This is a symbiosis of technologies Google, Microsoft (pronunciation) and the open project ffmpeg (conversion to flac). This is a voice translator - no typing. Just say the phrase and listen to its translation. Yes, the quality of recognition may not be perfect, but on short, clear phrases it is quite acceptable.

Here is the video of the program:

As usual, I will divide this topic into two parts. One for ordinary users who want to "play around" with this program. Another for developers (I will provide the source code of the base project).

FOR USERS

You can download the program here (Mac OS X 10.6+).

The program interface is very simple. Choose the right language direction (in this demonstration project I made only two directions, but the services support a much larger number of languages). Click the "Record" button and say the phrase. Recording will automatically stop after 5 seconds or you can stop it yourself. Everything - listen to the translation :).

FOR DEVELOPERS

Source code on github .

The project uses the already assembled ffmpeg binary converter to convert the recorded sound into flac. If you want to transfer the project to iOS, you can use the static library from the libFlac project.

For the HUD interface, the project uses the already assembled BGHUDAppKit framework .

JSON framework is used for processing JSON.

Additionally (for simplicity) some classes from the Google Data API are used .

SOUND RECORDING

Sound is recorded using the standard QTKit library (QuickTime Kit).

Here is the initialization code for the audio data capture session:

BOOL success = NO ; mCaptureSession = [ [ QTCaptureSession alloc ] init ] ; QTCaptureDevice * audioDevice = [ QTCaptureDevice defaultInputDeviceWithMediaType : QTMediaTypeSound ] ; if ( ! audioDevice ) { [ mCaptureSession release ] , mCaptureSession = nil ; [ textLabel setStringValue : NSLocalizedString ( @ "AudioError" , @ "" ) ] ; [ button setHidden : YES ] ; [ popUp setHidden : YES ] ; [ textLabel setHidden : NO ] ; } success = [ audioDevice open : NULL ] ; if ( ! success ) { [ mCaptureSession release ] , mCaptureSession = nil ; [ textLabel setStringValue : NSLocalizedString ( @ "AudioError" , @ "" ) ] ; [ button setHidden : YES ] ; [ popUp setHidden : YES ] ; [ textLabel setHidden : NO ] ; } mCaptureAudioDeviceInput = [ [ QTCaptureDeviceInput alloc ] initWithDevice : audioDevice ] ; success = [ mCaptureSession addInput : mCaptureAudioDeviceInput error : NULL ] ; if ( ! success ) { [ mCaptureSession release ] , mCaptureSession = nil ; [ mCaptureAudioDeviceInput release ] , mCaptureAudioDeviceInput = nil ; [ textLabel setStringValue : NSLocalizedString ( @ "AudioError" , @ "" ) ] ; [ button setHidden : YES ] ; [ popUp setHidden : YES ] ; [ textLabel setHidden : NO ] ; } mCaptureMovieFileOutput = [ [ QTCaptureMovieFileOutput alloc ] init ] ; success = [ mCaptureSession addOutput : mCaptureMovieFileOutput error : NULL ] ; if ( ! success ) { [ mCaptureSession release ] , mCaptureSession = nil ; [ mCaptureAudioDeviceInput release ] , mCaptureAudioDeviceInput = nil ; [ mCaptureMovieFileOutput release ] , mCaptureMovieFileOutput = nil ; //error handler } [ mCaptureMovieFileOutput setDelegate : self ] ; [ mCaptureMovieFileOutput setCompressionOptions : [ QTCompressionOptions compressionOptionsWithIdentifier : @ "QTCompressionOptionsHighQualityAACAudio" ] forConnection : [ [ mCaptureMovieFileOutput connections ] objectAtIndex : 0 ] ] ; [ mCaptureSession startRunning ] ;

Now, to start writing to the file we are performing:

[ mCaptureMovieFileOutput recordToOutputFileURL : path ] ;

To finish recording:

[ mCaptureMovieFileOutput recordToOutputFileURL : nil ] ;

CONVERSION

After we receive the sound file, we convert it to flac format using ffmpeg:

NSTask * aTask = [ [ NSTask alloc ] init ] ; NSMutableArray * args = [ NSMutableArray array ] ; [ args addObject : @ "-i" ] ; [ args addObject : @ "record.m4a" ] ; [ args addObject : @ "-acodec" ] ; [ args addObject : @ "flac" ] ; [ args addObject : @ "-ac" ] ; [ args addObject : @ "1" ] ; [ args addObject : @ "-ar" ] ; [ args addObject : @ "16000" ] ; [ args addObject : @ "record.flac" ] ; [ aTask setCurrentDirectoryPath : recordPath ] ; [ aTask setLaunchPath : [ [ [ NSBundle mainBundle ] resourcePath ] stringByAppendingPathComponent : @ "ffmpeg" ] ] ; [ aTask setArguments : args ] ; [ aTask launch ] ; [ aTask waitUntilExit ] ; [ aTask release ] ;

COMMUNICATION WITH NETWORK SERVICE

After conversion, we start the recognition process. It is implemented in a separate GoogleASR class. An object of this class sends a request (asynchronously) to https://www.google.com/speech-api/v1/recognize , processes the result and informs the delegate of the recognition content or informs about the error. The processing of the response from the server is very correct - it is completely (algorithm) copied from the Chrome browser. There is only one main method in the class:

- ( void ) speechRecognition : ( NSString * ) flacPath language : ( NSString * ) language

Next, we pass the recognized text to the GoogleTranslate class object. It translates the text and tells the delegate the result of the translation or informs about the error. The main method:

- ( void ) translate : ( NSString * ) text from : ( NSString * ) inLanguage to : ( NSString * ) outLanguage

Next comes the object of the MicrosoftTTS class. It receives audio data and sends it to the delegate or informs about the error. The main method:

- ( void ) textToSpeech : ( NSString * ) text language : ( NSString * ) language

Do not forget to get your Bing AppID from Microsoft (this is done for free) and insert it into the MicrosoftTTS class, into SpeechURL.

Experiment!

Source: https://habr.com/ru/post/117570/

All Articles