Microsoft Speech and Streaming Audio

It became interesting how well Microsoft Speech can recognize speech. As a source for recognition, I decided to take an audio stream of police talks from youarelistening.to.

There are two namespaces System.Speech and Microsoft.Speech. As I understand it, in order to use Microsoft.Speech, you must install the Microsoft Speech Platform Runtime and the Microsoft Speech Platform SDK. And System.Speech is already in the latest versions of the .NET Framework.

We will use System.Speech, tk. in this case dictation is supported, but in the case of Microsoft.Speech it is not.
')
We also need a library for working with NAudio sound. There is an example of Mp3StreamingDemo that can work with streaming audio. We need him. Create your project. From MP3StreamingPanel we drag the StreamMp3 method to ourselves and all it needs. Add a link to NAudio.

In our class, we create the StartStreaming method, which will launch StreamMp3 in a separate stream:

public void StartStreaming() { playbackState = StreamingPlaybackState.Buffering; bufferedWaveProvider = null; ThreadPool.QueueUserWorkItem(StreamMp3, "http://relay.broadcastify.com:80/949398448"); }

The constructor of our class will create and configure SpeechRecognitionEngine. We will use the dictation as grammar:

  private bool completed = true; readonly SpeechRecognitionEngine sre = new SpeechRecognitionEngine(); public Recognition() { var grammarBuilder = new GrammarBuilder(); grammarBuilder.Culture = new CultureInfo("en-Gb"); grammarBuilder.AppendDictation(); var grammar = new Grammar(grammarBuilder); grammar.Enabled = true; sre.LoadGrammar(grammar); sre.BabbleTimeout = TimeSpan.FromHours(1); sre.EndSilenceTimeout = TimeSpan.FromSeconds(10); sre.InitialSilenceTimeout = TimeSpan.FromSeconds(10); sre.SpeechRecognized += sre_SpeechRecognized; sre.RecognizeCompleted += sre_RecognizeCompleted; }

The data from the buffer is copied to the MemoryStream, which is passed to SetInputToAudioStream. Here it is necessary to set the audio format parameters correctly. The SetInputToWaveStream method did not work for me.

  public void Recognize() { var size = bufferedWaveProvider.BufferLength; byte[] bytes = new byte[size]; bufferedWaveProvider.Read(bytes, 0, size); using (var ms = new MemoryStream(bytes)) { sre.SetInputToAudioStream(ms, new SpeechAudioFormatInfo( bufferedWaveProvider.WaveFormat.SampleRate, AudioBitsPerSample.Sixteen, AudioChannel.Mono)); sre.RecognizeAsync(RecognizeMode.Multiple); while (!completed) { Thread.Sleep(333); } } } void sre_RecognizeCompleted(object sender, RecognizeCompletedEventArgs e) { Debug.WriteLine("Finished"); completed = true; } private static void sre_SpeechRecognized(object sender, SpeechRecognizedEventArgs e) { Console.WriteLine(e.Result.Text); }

Manipulations with the completed flag and the loop with Thread.Sleep I took from the documentation for the Speech API. For some reason, without this cycle, recognition does not occur.

Now it remains to modify the borrowed method StreamMp3. As soon as the buffer is almost full, we read data from it:

  if (IsBufferNearlyFull) { Debug.WriteLine("Buffer getting full, taking a break"); if (completed) { completed = false; Recognize(); } Thread.Sleep(200); }

And you can run:

  private static void Main(string[] args) { var recognition = new Recognition(); recognition.StartStreaming(); Console.ReadKey(); }

Of course, the output is a complete frantic:

Recognition Results

Michio politically inclined to
it and
regarded
it
her have her had
had
her her
her
in any category goalkeeper were adequate he will
ye cant
have been to ensure tha
may take it nine
but lineup plenty alignment
there get them into productive
all the legal
definitely been likely
a moment
building unlucky in allied
january initiative commissioner
clive be clinging
the Italian Italian open
the relational for transplant
partner new-line
there that they are likely to be alive
Eddie then entitling and it didn't go
thirteen and children ultima
te
augusting inundated with an
it entailed million any luckily a
English allowed her
lineker nine treated for nine
there are
point overhauled understanding complain a
bout it because frankly and that
it is essential either a touchline play
he
scooped old lucky enough to get losing
htly down to that internal it changed
near the of light relief Latino fondly

Findings:

Police wave recognizes very bad
It works quite quickly, the buffer does not have time to fill
System.Speech does not have Russian language support. It is in Microsoft.Speech, but there is no dictation support

Source: https://habr.com/ru/post/267933/

All Articles

Microsoft Speech and Streaming Audio

More articles: