Inspired by the publication "Dirty programming with a pure soul" (
http://habrahabr.ru/company/abbyy/blog/144859/ )
A good metaphor with many meanings was brought by Dmitry from the ABBYY company in his post. The author, not offended by the lack of talent and literary gift, involves a very thin border area of ​​intersubstantial-mental transition from “computer hardware” along with the accompanying software infrastructure to the information domain, where an invisible “quantum transformation” of physical laws into mathematical stochastics occurs.
Indeed, to clean something dirty, it is always necessary to dirty something clean. And this principle from the Murphy collection is absolutely valid for the substantive world, but it turns out that it does not apply to the mental sphere at all.
')
Speaking of "dirty programming", as about "dirty technologies" for cleaning and processing "dirty matter", you need to know and remember that the technology itself is neither "clean" nor "dirty", such as Chemistry can not be "socialist" or "capitalist." But all technologies are characterized as “exact” or “coarse”. In this case, accuracy or rudeness is influenced by one of the two components, of which any technology is composed, namely, its tool. It is clear that if an ordinary shovel acts as such, then this garden tool cannot be used as a screwdriver for repairing the mechanism of a wristwatch, although the second component of the technology, the methodology, remains the same in both cases, that is, you just need to rotate the tool. Even simple electrical engineering screwdrivers will be too rough for precision screws. Therefore, the first principle of technotronics is that technological means must be comparable to the scale of the “dirt”, otherwise it will not be removed.
So, in order to successfully process “dirty” informational raw materials into a “pure” informational product, strict fulfillment of the above condition is necessary - the technological processing tool must be thinner, smaller, more precisely ... that is, more prescriptive than the “elements of dirt” which are littered, soiled, filthy informational raw materials to be cleaned. Only in this case it is possible to separate “cutlets from flies,” that is, to separate the most important from the secondary, useful from unnecessary, and so on.
And vice versa, if something is not cleaned in any way and, well, by no means can not be processed to a given level of purity, then this means exactly one thing, the tool clearly does not satisfy the condition of licensing, that is, it is a blunt and coarse means.
Today, one super ambitious task for programmers is at stake, various variations of which for more than half a century cannot be solved either by cybernetics with linguists or by brain scientists with philologists, namely, by recognizing with the help of computers any information products of human consciousness, embodied in the form linguistic epistolary or verbal representations (text or speech), where it is required to understand the semantic and / or semantic loading of any audio-visual or just graphic images, created availed by people.
One of these variations is the machine understanding of natural language texts, both at the level of distinguishing the semantics of individual linguistic graphemes and symbols, and at the level of extracting the meaning of the context, which, in my opinion, glorious workers in the person of Dmitry in such an authoritative and a highly respected company like ABBYY.
And, it would seem, there is already somewhere, and here everything necessary is available in the necessary and sufficient quantities - there is both the intellectual potential of the staff (unusually high), and adequate funding (decent), and the required methodology (in the widest range), and the necessary tools (from Compreno to neuro-semantic networks with all sorts of bells and whistles - everything is there). And the desired result is not!? Why? Yes, because, just, the tools are not very PRECESSIONAL! And in this area are simply unprofitable. I will not compare them in sharpness with Siberian boots, but the means used, which can be good for solving “substantive” tasks, do not at all correspond to the role that they are assigned in the information sphere.
Speaking specifically, linguistic means can NOT be operated in the field of procedural thinking, that is, it is impossible to wield “coarse lingo-scalpel” in the hyperfine sense zone of brain neurons. The tool for this must be appropriate, fairly thin, accurate, sensitive. As, for example, in electrical engineering, where there is a principle according to which the adjustment range of a variometer (potentiometer) should not be larger than the size of the error, that is, “noise” or “dirt” in the case under consideration.
Linguistics, as is well known, can establish, and even then not absolutely, any connections and relations between words in sentences (speech or text). Just as a building science discipline can establish connections and relationships between building elements (words) in various building structures (contexts). Specialists from both regions can issue these connections and relations (conrelays) in the form of some formalized ideas, for example, “three tons of bricks of 6 Mohs hardness in the amount of 1000 pieces are connected by 100 kilograms of cement-Portland 500 and 200 kilograms of quartz sand”.
It is unlikely that you will guess what construction such a record is. Similarly, the computer has no idea what the record is: [the subject “cartridge” is associated with the predicate “entered”, forming a predicate that relates to the preposition “in” in the prepositional group “to the store”]. But, if the builders told us the phrase “brick wall”, we would immediately understand what we are talking about, and would not break our heads, linking tons with kilograms, and sand with cement and bricks.
At the same time, it is far from a fact that if linguists say the phrase: “The patron entered the store”, then we will understand exactly what [patron] we are talking about (boss, electrical product, part of a lathe or a shooting object). However, under the [store] you can understand different images, even the horn from the machine, even a trading company.
These examples should give us a clear idea that no computer can understand the semantic load of the context of speech by means of computer linguistics, and this requires a completely different tool. Which one And exactly the one that we use with you in everyday life.
Let's see how thinking and communication occur between people. First, some thought forms that he imagines as Models of Behavior of Images (MPO) are swarming in the mind of an individual. Then he builds one or another semantic structure from these MPOs and decides to bring it to us using natural language capabilities, since, unfortunately, or fortunately, we do not have telepathy. In his message, the author of the thought forms presents the MPO in a coded form using language elements (words, lexemes, etc.). The recipient, perceiving messages, already includes his associative memory, from which he extracts his own IGOs ​​stored there for the time being, with which he tries to understand what the author wanted to say.
This is how our consciousness works. How does the computer work? What technologies does he use and are they similar to humans? No, they are not like ours. But, since a computer is a counting machine, then, here, the developers are trying stubbornly to COMPUTE the final result, instead of trying to teach the computer to understand the meaning by simply memorizing these MPOs and operating them further.
It turns out that “computational” technologies are, in fact, the role of such “dirty Cinderella”, when we all need precisely the most precise means of understanding and recognizing patterns and meanings.