📜 ⬆️ ⬇️

Use Google Voice Search in your .NET application.



The speech recognition feature for some time is available in the Google Chrome browser. See how it looks like, for example, here .

Since the original Chromium is open, a natural desire arises to spy on whether it is possible to use technology for the selfish purposes of achieving peace on earth.
')
As often happens, everything has been done for us in this article . Everything turns out to be very simple, you need to make a POST request to the address www.google.com/speech-api/v1/recognize with audio data in FLAC or Speex format. Implement a demonstration of recognizing WAVE files with C #.


Like the author of the original topic, we will not contact Speex. To convert sound from Wave format to FLAC, I used the Cuetools library. In its code, for some reason, an exception was thrown when trying to save FLAC with any number of channels except two, however, by simple commenting out of this check, mono files, perfectly understood by Google, are safely saved.

/// <summary>  wav-  flac </summary> /// <returns> </returns> public static int Wav2Flac(String wavName, string flacName) { int sampleRate = 0; IAudioSource audioSource = new WAVReader(wavName, null); AudioBuffer buff = new AudioBuffer(audioSource, 0x10000); FlakeWriter flakewriter = new FlakeWriter(flacName, audioSource.PCM); sampleRate = audioSource.PCM.SampleRate; FlakeWriter audioDest = flakewriter; while (audioSource.Read(buff, -1) != 0) { audioDest.Write(buff); } audioDest.Close(); audioDest.Close(); return sampleRate; } 


If someone has a desire, I think there is no problem to implement without saving to a temporary FLAC file, we will not complicate the example. I can only note that Google responded with an error of 400 to files with a high sampling rate (44100). I did not determine the maximum possible frequency, 8 and 16 kHz work without problems.

The main request method for Google Voice:
  public static String GoogleSpeechRequest(String flacName, int sampleRate) { WebRequest request = WebRequest.Create("https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=ru-RU"); request.Method = "POST"; byte[] byteArray = File.ReadAllBytes(flacName); // Set the ContentType property of the WebRequest. request.ContentType = "audio/x-flac; rate=" + sampleRate; //"16000"; request.ContentLength = byteArray.Length; // Get the request stream. Stream dataStream = request.GetRequestStream(); // Write the data to the request stream. dataStream.Write(byteArray, 0, byteArray.Length); dataStream.Close(); // Get the response. WebResponse response = request.GetResponse(); dataStream = response.GetResponseStream(); // Open the stream using a StreamReader for easy access. StreamReader reader = new StreamReader(dataStream); // Read the content. string responseFromServer = reader.ReadToEnd(); // Clean up the streams. reader.Close(); dataStream.Close(); response.Close(); return responseFromServer; } 


We will do the deserialization of the JSON response via the DataContractJsonSerializer, here, I confess honestly, I am not strong, besides the results from Google always came in the form:

{"status":0,"id":"4531050901df65542082eacfebf3bb1b-1","hypotheses":[{"utterance":" ","confidence":0.89697623}]}

Therefore, the following simple deserialization was enough, I will be glad to hear the comments.

 [DataContract] public class RecognizedItem { [DataMember] public string utterance; [DataMember] public float confidence; } [DataContract] public class RecognitionResult { [DataMember] public string status; [DataMember] public string id; [DataMember] public RecognizedItem[] hypotheses; } public static RecognitionResult Parse(String toParse) { DataContractJsonSerializer ser = new DataContractJsonSerializer(typeof(RecognitionResult)); MemoryStream stream1 = new MemoryStream(ASCIIEncoding.UTF8.GetBytes(toParse)); RecognitionResult result= (RecognitionResult)ser.ReadObject(stream1); return result; } 




“Buy a bike” in the screenshot was recognized absolutely true, “one two times” was recognized as “1 2 rus”. Download the source code archive from here .

Enjoying technology until it is covered it is available without restrictions!

Source: https://habr.com/ru/post/117234/


All Articles