📜 ⬆️ ⬇️

The invisible death of speech recognition

It was supposed that when a computer learns to understand human speech, we can quickly create artificial intelligence. But the accuracy of speech recognition systems reached its peak in 1999 and has since frozen in place. The academic tests of 2006 state a fact: the general profile systems have not surpassed the level of 80%, whereas in humans this figure is 96-98%.

Professor Robert Fortner of the Media Research Institute believes that the creators of speech recognition systems are completely deadlocked. Programmers did everything they could, and they did not succeed. A few decades later, they realized that human speech is not just a collection of sounds. Acoustic signal does not carry enough information for text recognition.



The complexity of the task you can imagine. By some estimates , the number of possible sentences in a human language is 10,570 . In documented sources, only a small part of them is recorded, so the system cannot be taught, even if you feed all the texts created by people.
')
Many words in a language have hundreds or thousands of meanings. The choice of a particular value depends on the context, that is, on the surrounding words. In oral speech, it also depends on the facial expression or intonation.

Our brain is able to generate text completely arbitrarily, using the intuitive rules of functional grammar and the semantic paradigm of each word learned with age. These rules describe which words can be combined with each other and how (through which functional elements). The meaning of each word depends on the meaning of the previous word, and in difficult cases our brain recognizes speech only from fragments of phrases, knowing the context.

The basic rules of functional grammar are understandable to everyone, but they can not be formalized to make the computer understandable. And without this in any way. When a computer attempts to recognize sentences that it has not previously encountered, it will inevitably make mistakes in recognition if it does not have a grammatical parser and a dictionary with semantic paradigms embedded in the human brain.

For example, Russian linguists once tried to make up the semantic paradigm of one simple preposition of the Russian language (I think, PRI). They reached several hundred values, each of which allows its own set of subsequent elements. And it was clearly not a complete list.

According to the grammar of prepositions, entire scientific conferences are held (some scientists have been studying the pretext of software all their lives and cannot fully uncover its secrets). But a similar description is required for each morpheme of the human language, including prefixes and suffixes. Only after that it will be possible to start programming computerized speech recognition systems. Is humanity capable of this task? After all, we must also take into account that the paradigm of each element of human speech is constantly changing, because the language lives its own life and all the time evolves. How can a computer system learn?

The most superficial analysis of published texts on the Internet by Google has revealed a trillion objects . This is only a tiny part of the morphemes that make up our speech. Google has posted a 24-gigabyte archive with texts in universal access and stopped further publications on this topic.

Microsoft started the MindNet project to create a “universal parser” in 1991. They tried to build a universal map of all possible interconnections between words. They spent a lot of manpower and financial resources on the project, but were forced to practically stop research in 2005.

You can put a full stop and start all over again, just in a different way (much more difficult). A language must be formalized within a single functional grammar, universal for all languages, and linguists cannot do without serious help if the task is at all solvable.

Source: https://habr.com/ru/post/92771/


All Articles