📜 ⬆️ ⬇️

Yandex hears you, dude

image
Suddenly, an order came - to write an application for iOS, using Yandex Speechkit to recognize Russian speech. More precisely, to recognize short phrases on an arbitrary topic. The purpose of the task is to compare the success of the Yandex engine with ours, the Sarov engine.

Ordered - made the following steps.

  1. I went to yandex.ru in the speech recognition section
  2. Registered and received a key, it is API_KEY
  3. I sent a letter to yandex with a request to activate the key

')
To the question of how the key will be used, I replied that I was releasing the Diablo 3-13 card game, controlled by voice.

Two days later, the key was activated. At first I impatiently beat my hoof, then I realized that thoughtful, synchronous employees work at yandex .
In my application in the future, I also refused asynchronous requests to yandex.api.

Having received the magic key, API_KEY , I downloaded the archive at the specified link.
YandexSpeechKit-2.1-ios.zip


The archive contains two projects demonstrating the work of the library.
Having collected both examples, I replaced SAMPLE_API_KEY in my text with my own and launched the applications.
Both of them do not work under Xcode 5.1.1, they crash due to some internal error hidden deep in the bowels of the library.

I had to download the current SDK with github .

Downloaded the above archive link
yandex-speechkit-ios-master.zip

collected examples, but the error did not disappear.

I immediately sent a diagnostic letter to the support service and, pending an answer, wrote another toy.
A couple of days passed, there was no response from the service.

Having laid out the toy in the market , I decided to write my own iOS-application using url requests to the Yandex speech recognition service.

After all, the magic key can be used the same.

Step One - Command Line Verification


In the command line, wav files with pre-recorded phrases must be submitted.

Request looks just like bamboo
curl -v -4 "asr.yandex.net/asr_xml?key=e547b4f5----97130fdbcd74&uuid=01ae13cb744628b58fb536d496daa177&topic=notes&lang=ru-RU" -H "Content-Type: audio/x-wav" --data-binary "@recordedFile.wav" 


The request does not need a comment, everything is done exactly with the excellent documentation on the yandex website.
From the first time the request did not pass, because instead of a 32-digit uuid, I slipped the udid of my iPhone. And it is not only HEX.

Type phrases
thirty eight parrots
let's go have a smoke
Vladimir Sysoev
what are you looking at
who doesn't plow that net
Anton Subbotin
Habrahabr - full fly


Recognized perfectly, in the performance of various speakers.

image

To my satisfaction, shameful words Yandex mercilessly cuts out.

Step two - we collect an iOS application where speech is recorded


There is a standard project on the apple website, which demonstrates sound recording / playback.
Download the project SpeakHere , run - everything is in order. I respect these guys from Cupertino, let the Indians. The code is, of course, hmm, but it works.

Modify the SpeakHereController.mm file

Go to the function - (void) stopRecord and append one line

  - (void)stopRecord { //     ... // btn_play.enabled = YES; //    [self yandexTool]; } 

It is clear that we have added a call to the function that processes the audio file generated during the recording.
Initially, the sound is recorded in the project in the fileFile.caff file.

  recordFilePath = (CFStringRef)[NSTemporaryDirectory() stringByAppendingPathComponent: @"recordedFile.caff"]; 

Yandex does not know how to work with files of this type, so the name extender should be replaced with the full name SpeakHereController.mm.

  recordFilePath = (CFStringRef)[NSTemporaryDirectory() stringByAppendingPathComponent: @"recordedFile.wav"]; 

In addition, in the project file AQRecorder.mm in the body of the function void AQRecorder :: StartRecord (CFStringRef inRecordFile), you need to change the parameter in the line

  OSStatus status = AudioFileCreateWithURL(url, kAudioFileCAFType, &mRecordFormat, kAudioFileFlags_EraseFile, &mRecordFile); 

on

  OSStatus status = AudioFileCreateWithURL(url, kAudioFileWAVEType, &mRecordFormat, kAudioFileFlags_EraseFile, &mRecordFile); 

And the last thing - Yandex understands sound files recorded at a frequency of 16000. Apple's default frequency is 44100. It must be changed.

In the project file AQRecorder.mm in the body of the function void AQRecorder :: SetupAudioFormat (UInt32 inFormatID) add the line

  Float64 newRate = 16000; XThrowIfError(AudioSessionSetProperty( kAudioSessionProperty_PreferredHardwareSampleRate, sizeof(newRate), &newRate), "couldn't set hardware sample rate"); 

All that is left is to insert the request function to the Yandex server. In the request, we will, in the same way as a command request, submit the file recordedFile.wav
I quote below the text of the function yandexTool , simple as a track from a Belarus tractor.

 -(void) yandexTool { NSString *urltext_temp = [NSString stringWithFormat:@"https://asr.yandex.net/asr_xml?key=%@&uuid=%@&topic=queries&lang=ru-RU", API_KEY, API_UUID]; NSString* urltext = [urltext_temp stringByAddingPercentEscapesUsingEncoding: NSUTF8StringEncoding]; NSLog(@"url=%@", urltext); NSURL *url = [NSURL URLWithString: urltext]; NSMutableURLRequest *request = [NSMutableURLRequest requestWithURL:url]; [request setURL:url]; [request setHTTPMethod:@"PUT"]; [request setValue:@"audio/x-wav" forHTTPHeaderField:@"Content-Type"]; NSString *filePath=[NSTemporaryDirectory() stringByAppendingPathComponent: @"recordedFile.wav"]; NSData *myData = [NSData dataWithContentsOfFile:filePath]; request.HTTPBody = myData; NSError *error; NSURLResponse *response; NSData *data2 = [NSURLConnection sendSynchronousRequest:request returningResponse:&response error:&error]; NSString *responseString = [[NSString alloc] initWithData:data2 encoding:NSUTF8StringEncoding]; NSLog(@"responseString=%@",responseString); //    XML -    ,    // NSXMLParser *parser = [[NSXMLParser alloc] initWithData:data2]; // [parser setDelegate:self]; // [parser parse]; } 

I modified my application a little more - I added my own face and the output of the recognized text.

Recognizes, I must say awesome good.

image

I, as promised to Yandex , inserted a sign in the card game King of Hearts , but a delay of 1-2 seconds when controlling my voice begins to annoy 5 minutes after the start of the game.

However, there was not a single recognition of the name of the playing card during the game.
Bravo, Yandex !

While preparing the publication, the answer came from the tech support team yandex , they are asked to send the full logs of non-working applications.

We must probably answer them.

Source: https://habr.com/ru/post/236079/


All Articles