
Hello!
In the
last post, we announced the “Native Speech 2014” developer contest, whose participants will have to create a workable algorithm for converting a recognized sequence of phonemes into text that meets the norms of the Russian language.
Registration has already begun, and to help doubters decide whether to participate, I will try to explain what needs to be done within the framework of the competition.
First, let's do an experiment. Try to read and understand the text of the following paragraph. Note that the sign of an apostrophe after a consonant, for example,
l ' , denotes its softening.
')
'' ' ''' ' '''' '' ' ' ' '' ' ''
Happened? And now let's see how this text should look like in reality:
, , -
In the example above, we tried to simulate the recognition system at the phonetic level. The paragraph with apostrophes is a raw text recognition result. Approximately in this form, participants in the competition will receive data files for developing their system and conducting experiments. The essence of the task is as follows. Having at its disposal a dictionary in which each word is associated with the corresponding
transcription , as well as an
entanglement matrix , it is necessary to restore the original message. In our case, match the phonetic notation of the word to its original spelling.
It seems to be simple, is not it? However, let us consider what difficulties may arise in the implementation of the algorithm. The main problem is that due to the errors that occur during recognition, the resulting sequence of recognized phonemes will not always correspond to the transcriptions of the spoken words. Errors can be of three types: replacing one sound with another (pasch
s ,
t 's'eni), skipping sound (
in n'imateln, giving
a l), inserting an extra sound in a word (pasch s) or erroneous recognition breathing artifacts and extraneous noises as phonemes (jc). You can get information about the probability of confusing one sound with another, as well as the probability of skipping and insertion from the entanglement matrix.

It should also be
borne in mind that, depending on the pace of the speaker at the output of the recognition system, such a long line
could turn out : Therefore, we face the problem of segmentation, i.e., dividing the input sequence into separate words, since the Russian language is rich in such features as: it has stung - for the cause, and wildly for me - come to me, have cured - while being treated, we are married - we are you, you are a foal - you are a child, etc. You can solve this problem by using the
language model
So, to implement the competitive task, it is necessary to solve the following tasks: the problem of inconsistency of transcriptions from the dictionary and the recognized sequence of phonemes due to skip errors, insertion and replacement of sounds, as well as the problem of segmentation of the input sequence of phonemes into separate words.
The simplest solution that immediately comes to mind is a modification of the Levenshtein metric or the Viterbi algorithm. Additional information can be found in the list of references given at the end of the
“Competition Challenge” section.
However, we would not want to give the Participants any obvious “recipes”, since the goal of our competition is to search for specialists who can find non-standard solutions to complex and interesting problems. We hope that young, talented developers, whom we will find with the help of Native Speech, will join the MDG team and will help to make our products even better. And so that all the finalists could come to the last stage of the competition in St. Petersburg, transportation costs and accommodation - at the expense of the organizer.
I would also like to note that the prototypes of the systems developed by the participants of the competition are their intellectual property, which the MDGs do not claim. The company has its own solution to this problem, which is used in our products.
Follow the news of the contest in the social. networks:
VK ,
FB ,
LinkedIn and
online .
Main materials on the competition - on the
forum .