Roll call in the army
- Ivanov!
- I!
- Petrov!
- I!
- thirty-thirty!
- ???
- thirty-thirty, is there such a ???
- Comrade Lieutenant! My last name is Zozo.
Joke.We all know that document recognition programs sometimes make mistakes. Indeed, if they were not mistaken, they would not need a spreading user interface with a text editor. Our FineReader, alas, is still not an exception, but that's not the point. Recognition programs have existed for quite some time, as many of them exist as their mistakes ... Which of us did not come across a phrase like “I’m going but the road” in a digitized book? Let's look today for these cute pranks of recognizers, let's see how they ultimately influenced our language as an observable object - let's try to notice statistically perceptible anomalies. Yes, of course, with a serious language study, the linguist will surely be able to separate the wheat from the chaff, but you see, the tares are also very interesting.
')
Immediately make a reservation. Not all the anomalies found are derived from FineReader, although he had a hand in many. So, let's begin.
Let's start with a cute character, dog Fafic. Heard, perhaps, about the "Thoughts of the great, medium and dog Fafica"? It turns out that it can be built. The query “building fafikov” (without quotes) costs more than 3.5 thousand Google. Special “computer programs” are used for this (more than 3 thousand Google without quotes). However, the "profammy" without clarification gaining as much as 11 thousand Google. The wonderful word “ofanichenie” is lagging behind - just over a thousand.
How not to remember the favorite players in the "Erudite" word "na"? Especially well it is now used. For example, “pa background” (in quotes) collects 5.5 thousand googles, “pa table” (also in quotes) - about 3 thousand, and “pa hands” (in quotes) exceeds 13 thousand.
Electricity is a dangerous thing. Apparently, 88 thousand googles for the request “commemorative voltage is current” (without quotes) evidence of this. If you look for a separate “memorial voltage”, “memorial power” or “memorial current” (all without quotes), then the results go off the scale for as much as a hundred thousand. Fear!
We have a good drink - juice. But this word is also used with adjectives in the middle gender. Several hundreds of google have “black juice”, “white juice”, “red juice” and “Baltic juice”.
In Armenia (in the part that now belongs to Turkey) there is the ancient city of Kars. Recognitions are trying to appear more and the city Kare. The query "Armenian city caret" without quotes costs 12 thousand Google.
But what are we all about Russian, as if there are no others? Here, the neighbors, in the Ukrainian language, have the preposition "so." Not very common - this is an analogue of the Russian "co" (see "to go to school with interchangeable shoes"). It is often found in the expressions “zoo” (more than twenty thousand googles), “thousands” (2.5 thousand googles), “days” (almost eight thousand - here all the results are specified for search in quotes), etc. It can be seen that the recognizers on (in?) Ukraine are no less popular than in Russia. And they make the same beautiful mistake.
Do you think the English language recognizers do not affect? No, of course, you could not suspect this. Here, for example, is the wonderful English word puc. As a rule, it is written in italics, with a capital letter and under the picture, ends with a full stop, followed by a number. The scale of the phenomenon is more difficult to assess here, you can have fun, google puc.1, puc.2, etc. in runet.
Pure English things are also reflected in search engines. The expression “in die room” costs more than three hundred thousand Google, although here, I think, there was a lot of excess and the German article also interfered in the process. A remarkable “I turn die lights off in die room” was found among those found. By the way, the "die lights" themselves cost 35 thousand Google.
The correct-looking definite article, distorted by a recognizer, became an unpleasant verb, and it is indecent to recall what the usual click appeared to be. It is not easy to estimate the scale of the phenomenon again, but it is possible to fix that on the first pages of the “point-and-dick” query that a wrongly recognized click appeared. Yes, gentlemen, commentators! I warn you that all vulgar jokes on this subject I have already expressed, forcing our entire PR department, three cab drivers, two horses and one and a half Moscow taxis in full to blush - so your unfunny jokes will be nothing more than my repetition - do you need it?
There is the English word comer (literally, “the one who makes the come”, that is, the visitor). The phrase “comer kick” picks up more than one and a half thousand Google (in quotes) and is found mainly in football texts.
Quite a few people in the world do not like the United States of America. Their dissatisfaction is sometimes supported by recognition programs - the query “United Stales” (in quotes) collects more than two hundred thirty thousand Google. It is even strange that it did not become an English-language Internet meme.
With English, our research is more difficult. I never managed to accurately and massively bring to the clear water those look that in reality unrecognized took, those cat that once were eat, and even a couple take - lake. Try it, maybe you can.
With phenomena similar to the Ukrainian "zo", it turned out to be simpler here. Io is the satellite of Jupiter, met in combinations “Io miles” (more than 14 thousand googles), “Io pounds” (another couple thousand), “Io states” (4 thousand).
Perhaps, on this astronomy, we will end our linguistic identification digression. Have a nice Friday and thank you for your attention!
Dmitry Deryagin ( 57DeD )
Technology Development Department