Cognitive services provide access to various cloud services that allow you to work with visual, voice and textual information. In addition, various Bing search functions are available.
In order to try cognitive services in action, it is not even necessary to have a Microsoft account. You can also get a trial key using your GitHub or LinkedIn account. The trial subscription is not limited in time, but limited in the number of resources used for the period. Read the online demo at:
Speaker Recognition APIThe following is a description of how to try out user authentication with a voice in action. Although the service is also able to preview, but despite this, it is already quite interesting.
The service can be used from various platforms, but I will consider creating a C # / XAML UWP application.
You can enter and get a trial key at the link:
Microsoft Cognitive Services - Get started for free')

Click + and select
Speaker Recognition - Preview 10,000 transactions per month, 20 per minute .
Alternatively, you can get the key from your Azure account (but what about without it). Find Cognitive Services APIs and create an account with the type Speaker Recognition API.

This method is suitable for those who do not intend to dwell only on trial functions.
Find the key here:

Before proceeding, let's define the terminology within the framework of the task:
Verification - confirmation that the speech was uttered by a certain person. The identity of the speaker.
Identification - determining which of the many users we know has uttered a phrase.
Enrollment is a process during which a service trains to recognize a user's voice. After the service has received a certain number of example phrases, the user profile becomes registered and can be used for recognition.
Description of the configuration and process of voice recognition
Creates a user profile
Enrollment produced. Repeats of the same phrase are sent to the service several times.
Currently, only the following languages ​​are supported: en-US (English US) and zh-CN (Chinese Mandarin).
The phrase for the English language can be selected from the following list:
"I can't refuse him anymore."
"Houston we had a problem"
"My voice is my passport verify me"
"Apple juice tastes funny after toothpaste"
"You can get in without your password"
"You can activate security system now"
"My voice is stronger than passwords"
"My password is not your business"
"My name is unknown to you"
"Be yourself everyone else is already taken"
The first spoken phrase is attached to the profile. You can change the phrase only by resetting the Enrollment using ResetEnrollmentsAsync.
Let's imagine how a set of phrases could look like in Russian. I'll start, and you offer options in the comments:
“Is this the apartment of Anton Semenovich Shpak?”
"Easy, Masha, I am Dubrovsky!"
“I am a smart, handsome, moderately plump man, well, in his prime,”
Creating a UWP Application
Create a UWP application and add the following NuGet
Microsoft.ProjectOxford.SpeakerRecognition package
Add a microphone to the Capabilities section of the manifest. Internet (client) should be added by default. Configuration is complete and you can go to the code. The list of required namespaces for working with the service:
using Microsoft.ProjectOxford.SpeakerRecognition; using Microsoft.ProjectOxford.SpeakerRecognition.Contract; using Microsoft.ProjectOxford.SpeakerRecognition.Contract.Verification;
Required namespaces for working with audio:
using Windows.Media.Capture; using Windows.Media.MediaProperties; using Windows.Storage.Streams;
To work with the service you need to create some objects. Subscription key row and customer verification. The client will interact with the service.
private SpeakerVerificationServiceClient _serviceClient; private string _subscriptionKey;
After initializing the page, you need to initialize these variables:
_subscriptionKey = "ec186af1f65d428137f9568ec8d896b5"; _serviceClient = new SpeakerVerificationServiceClient(_subscriptionKey);
With the value _subscriptionKey specify your subscription key. Now, logically, you need to create a user profile:
CreateProfileResponse response = await _serviceClient.CreateProfileAsync("en-us");
From the response of the service we can get the profile ID:
String _profileId=response.ProfileId;
The next step in order should be the “training” of voice recognition. Let us explain how to create a stream of audio. The easiest way is to read the file from the disk:
Windows.Storage.Pickers.FileOpenPicker picker = new Windows.Storage.Pickers.FileOpenPicker(); picker.FileTypeFilter.Add(".wav"); Windows.Storage.StorageFile fl = await picker.PickSingleFileAsync(); string _selectedFile = fl.Name; AudioStream = await fl.OpenAsync(Windows.Storage.FileAccessMode.Read);
The phrase should be written in mono with a frequency of 16 kHz.
The second option is to record the voice from the microphone. Starting voice recording:
MediaCapture CaptureMedia = new MediaCapture(); var captureInitSettings = new MediaCaptureInitializationSettings(); captureInitSettings.StreamingCaptureMode = StreamingCaptureMode.Audio; await CaptureMedia.InitializeAsync(captureInitSettings); MediaEncodingProfile encodingProfile = MediaEncodingProfile.CreateWav(AudioEncodingQuality.High); encodingProfile.Audio.ChannelCount = 1; encodingProfile.Audio.SampleRate = 16000; IRandomAccessStream AudioStream = new InMemoryRandomAccessStream(); CaptureMedia.RecordLimitationExceeded += MediaCaptureOnRecordLimitationExceeded; CaptureMedia.Failed += MediaCaptureOnFailed; await CaptureMedia.StartRecordToStreamAsync(encodingProfile, AudioStream);
The following parameters are required:
encodingProfile.Audio.ChannelCount = 1; encodingProfile.Audio.SampleRate = 16000;
Stop recording after some period of time:
await CaptureMedia.StopRecordAsync(); Stream str = AudioStream.AsStream(); str.Seek(0, SeekOrigin.Begin);
and sending the stream to the enrollment service:
Guid _speakerId = Guid.Parse(_profileId); Enrollment response = await _serviceClient.EnrollAsync(str, _speakerId);
From the
response, we can get the following data:
response.Phrase - spoken phrase
response.RemainingEnrollments - the number of remaining repetitions of the phrase
Recognition differs from enrollment only in using the VerifyAsync method:
Guid _speakerId = Guid.Parse(_profileId); Verification response = await _serviceClient.VerifyAsync(str, _speakerId);
The source code of the resulting application is available on
GitHub.A screenshot of what happened below:

As the only method of protection, voice authentication is probably not the most reliable option. But as one of the elements of multifactor authentication it can be used.
Again, the possibility of identification is interesting, the code of which is similar, but with some changes.
Since the time interval during which the recording is performed can be long, in order to give the service time to complete the operation, the check is started in a loop. Enrollment operation example:
_speakerId = Guid.Parse((lbProfiles.SelectedItem as ListBoxItem).Content.ToString()); OperationLocation processPollingLocation; processPollingLocation = await _serviceClient.EnrollAsync(str, _speakerId); EnrollmentOperation enrollmentResult = null; int numOfRetries = 10; TimeSpan timeBetweenRetries = TimeSpan.FromSeconds(5.0); while (numOfRetries > 0) { await Task.Delay(timeBetweenRetries); enrollmentResult = await _serviceClient.CheckEnrollmentStatusAsync(processPollingLocation); if (enrollmentResult.Status == Status.Succeeded) { break; } else if (enrollmentResult.Status == Status.Failed) { txtInfo.Text = enrollmentResult.Message; return; } numOfRetries--; }
For identification,
the same NuGet package is used .
The source code of the identification application is also posted on
GitHub .
The official example of a WPF project on GitHub, which can also be useful:
Microsoft Speaker Recognition API: Windows Client Library & SampleExample in Python:
Microsoft Speaker Recognition API: Python Sample
You can also find the
Android SDK for Microsoft Speaker Recognition API on GitHub