Thinking out loud: text recognition.

1. The first point about handwriting recognition using neural networks.

“In the summer of 1987, I got an experience that cooled my already low enthusiasm about neural networks even more. I came to the conference on neural networks, where I saw a presentation hosted by a company called Nestor. Nestor was trying to sell an application on a neural network for recognizing handwritten characters on a substrate. She offered a license for a program of one million dollars. This caught my attention. Although Nestor improved its neural network algorithm and advertised it as another big breakthrough, I felt that the problem of handwriting recognition could be solved in a simpler, more traditional way. I came home that night thinking about the problem, and in two days I developed a handwritten character recognizer that was fast, small, and flexible. My solution did not use the neural network and it worked completely differently from the brain. Although this conference sparked my interest in developing computers with a stylus (which ultimately led to the PalmPilot project ten years later), it also convinced me that neural networks were not such a big improvement over traditional methods. The handwriting recognizer that I created was ultimately useful for a text input system, called Graffiti, used in the first series of Palm products. I think Nestor went out of business. ” Jeff Hawkins, "On the Intelligence"

In his book, Jeff proposes a theory of ~~artificial~~ intelligence, suggesting it in the form of a neural network, repeating the structure of the neocortex, the cortex. In his theory, he explains the intellect by the memory-prediction model and the invariant data representation:

')
2. The second point, text recognition is, first of all, an intellectual task, even if you do not set a task for the computer to understand the text, and such that it simply translates the handwritten text into a digital format suitable for further processing (ASCII) - it’s still efficiency recognition using "simple" neural networks will be small. Recall at least the handwriting of doctors ...

Also, when a letter is completely incomprehensible, however, a person is able to understand a word or text entirely from the context.

That is, in principle, it is realistic to improve existing recognition algorithms, but absolutely any handwriting (and the slightest deviation from the pattern in traditional algorithms leads to an error) cannot be recognized, and the computer will read the text as a preschooler, spelling it, separating the handwritten text with spaces.

While an adult reads the whole word:

“According to rzelulattas are ilseadonal odongo unilyoskogo univertiset, they do not have a date, in which cocoa cake there are salvaged bkuva in solv. Gavvone, chotbay preavaya and plopendyaya bkvuy blyi on mset. Osatlyne bokva mgout seldovti in ploonm bsepordyak, all-torn tkest chtaitsey without browning. Pichrion Egoto is the fact that we do not chiatu kduuzhyu bkuvu otdlennotsi, but everything solvo is a click. ”

3. Another point about the work of the brain:

“In this case, an unexpected discovery came from the basic anatomy of the cortex itself, but it took an unusually quick-witted mind to recognize it. It was Vernon Montkastl, a neurophysiologist at Johns Hopkins University in Baltimore. In 1978, he published an article entitled Organizational Principles of Cerebral Functions. In this document, Montkastl indicated that the neocortex is surprisingly uniform in appearance and structure. Areas of the neocortex that operate with auditory information are similar to the areas that operate with touch, muscle control, Broca's language area, almost like any area of the neocortex. Montkastl suggested that since these areas look the same, they do perform the same basic operation! He suggested that the cortex uses the same computing tool for everything it does. ” Jeff Hawkins.

However, the question remains, how are waves, light, sound stored in the neocortex in the form of patterns? ..

“Roughly speaking, Fourier developed a mathematical method for translating a pattern of any complexity into simple wave language. He also showed how these waveforms can be transformed into the original pattern. In other words, just as a television camera translates a visual image into electromagnetic frequencies [8], and the TV restores the original image from them, the mathematical apparatus developed by Fourier transforms the patterns. The equations used to convert patterns to waveforms and vice versa are known as Fourier transforms. It was they who allowed Gabor to translate an image of an object into an interference “spot” on a holographic film, and also to invent a way of reversely transforming interference patterns into the original image. ” Michael Talbot, The Holographic Universe.

In general, the brain is similar in properties to a hologram, for example, it holds a huge amount of information in a relatively small volume. As a hologram film, illuminated by a laser at different angles, gives a lot of different, previously recorded information, so does a person’s memory when the consciousness changes, naturally (“mood”, “hormones” - including endorphin, etc.) or with the help of “mediators” (alcohol, tobacco, other drugs), gives out various information, including various assessments of the same facts.

"Pribram-Boma theory
If we combine the theories of Bohm and Pribram, we get a radically new view of the world: our brain constructs mathematically objective reality by processing frequencies that come from another dimension - a deeper order of existence beyond space and time. The brain is a hologram rolled into a holographic universe. ” Michael Talbot, The Holographic Universe.

4. To recognize a handwritten text, just text or some other information, such as visual images and sound using neural networks, the computer needs an impressive amount of memory. The neural network repeating the structure of the neocortex has a serious potential in this sense.

Conclusion:

To build a handwriting recognition system, you can use a neural network with a six-layer structure that repeats the basic principles of the structure of the neocortex.

The basic principle of operation is the use of the memory-prediction model. That is, the system will not have to calculate the answer, the correspondence between the handwritten text and the ASCII code, but “get it out of memory”. In this connection, the system should be trained for quite a long time (memorization).

The initial training should take place “in manual mode”, with constant control of the result, then you can switch to automatic continuous training. For this purpose, there may be a special auxiliary training program that will provide the system with visual images and corresponding ASCII codes.

With the successful development of this complex, it will be able to recognize, after appropriate training, not only handwritten text, but also other information, visual and sound, i.e. Any information that can initially be presented in the form of waves.

06.2007

PS *. "Zen". :)

Visual information comes from the eyes through the thalamus of the brain - “the eye on top”, from where it rises, “expanding”, along the cerebral cortex to the base of an imaginary pyramid. Only as the “pyramid” expands, the information is specified, and at the apex of one “quantum” of information there are “many ways” for further progress. That is, the pyramid is not so much a data representation structure, but a path of a unit of information in the neocortex.

On the other hand, if you do not turn the pyramid, you get the following: information enters the cerebral cortex, enters the “base of the pyramid”, from which, following a certain “algorithm”, is specified to the top of the pyramid and, getting “in the eye on the top of the pyramid”, represents that we actually "think what we see." In accordance with this point of view, the principle of intelligent selection of visible information is correct, that is, visible intelligence depends on intelligence and those “algorithms” that it follows, which confirms the hypothesis “on the principle of information relativity” .

The synthesis of these two points of view can give a general idea of the recognition by the intellect of visual information.

The visual information entering the “intellectual system” is subjected to simultaneous processing by two (or more) opposite processes. The first process provides many ways, possible interpretations of the information. The second process, following a certain rule, the algorithm, specifies the incoming information. Then, what we see is the result of the interaction of two opposite processes.

Pps. So who is the Master, what makes the grass green? :)

Source: https://habr.com/ru/post/46960/

All Articles

Thinking out loud: text recognition.

More articles: