📜 ⬆️ ⬇️

Runic processing

Good afternoon, dear readers.

Probably, you want to hear from us the bravura success-story of the introduction of our cloud technologies. Disappoint - today we are talking about the affairs of more than earthly, but not becoming from this less interesting. I will try to tell you about the ambitious project of processing runic documents received from various sources. For example, these are:

image
')
In this project, we faced unusual tasks not only for recognition systems, text synthesis and DA (document analysis - this is how we call the part of FineReader responsible for selecting text areas), but also for image processing and export.

Our company acted in this project in a role that was not quite familiar to us. Usually, our technologies are used for streaming document input, we are thoroughly proficient in this, and therefore are always ready for such tasks. This time, the customers chose us to solve research problems, where the painstaking restoration of each document with maximum accuracy is required.


As our readers probably know, the runes are intended for inscription not so much on paper-like sources as bark, parchment or papyrus, as for carving them on stones and planks. This is indicated even by the form of most runic symbols:

image

Since the runes were usually deposited with a cutting tool, the use of a standard attribute signer trained to work with printed texts was unreasonable - especially since we have such a powerful tool as structural recognition, where each character is described in the language of segments, arcs and their relative position (see SDK documentation ).

The process turned out to be familiar, albeit rather time-consuming - it was necessary to manually describe the structure of each rune, and then semi-automatically select the relationships between them. Note that in the runic letter the element “arc” was not used (not everyone can gouge the arc or circle on the stone).

As we already mentioned, most of the runes are depicted on paperless media. Therefore, for the initial digitization to use the scanner is very difficult. Had to do photo equipment. As a result, we often did not know the size of the image during processing. At first, this issue was resolved quite simply - when photographing, we put it next to the runestone of the developer, whose height was known. The photos looked something like this:

image

But later, when more sources became available to us, we had to stop relying on the image resolution in our algorithms.

Of particular interest among our customers, and among the developers, were the so-called moon runes. As you know, these runes can be read only by the light of the moon, and some can be considered only if the moon is in the same phase in which it was at the time of their application. For some, the lunar month is also important. As it turned out, no magic is used at the same time, just the spectrum of the lunar glow varies somewhat from phase to phase and from month to month during the year, and the necessary components may not always appear, but only on a certain day of the lunar year. This is easily seen by examining such runes exactly six months after the moment when they can be read, but being approximately at the same latitude of the southern hemisphere - the runes will be seen just as clearly.

To speed up the process (so as not to wait six months for each recording), we slightly modified the standard camera. We had to expand the range of the received spectral regions by adding the most characteristic lunar dimension octarin - so now our images are saved in the 32-bit RGBO color space. For convenience, we began to write the blue and octarin components into the higher bits, so our color space would be correct to call BORG (or GROB for systems with Big Endian).

Not without incidents. As you probably know, quite a lot of various spells are written with the help of the runes. And some of them are protected from rewriting: after all, copyright is not an invention of recent years. But we inadvertently unloaded a couple of such protected spells in html - and as a result, the browser with which it was viewed turned out to be damned, and when we tried to open, say, Habr began to issue approximately the following messages:

image

I had to save the texts with simple compression and unpack them in parts, studying them for non-copying spells.

Fortunately for us, on the test base there were not enough powerful fire spells that our customers often deal with. For them, this is a long-resolved problem: all their computers are equipped with fire extinguishers with an autonomous control system under BeOS. The thing is that the useful function is_computer_on_fire is implemented in BeOS, which also quite accurately measures the temperature of the burning motherboard (for more details, click here ). By the way, the system administrators of the customer treated us to a very good barbecue - if the temperature of the motherboard is maintained around 230-240 degrees Celsius, then in just forty minutes the meat is tender and very juicy.

All recognized runes were neatly painted on scrolls and transferred to the customer. Of course, we will not deny ourselves the pleasure to place here a group photo with customers:

image

Dmitry Deryagin
technology development department

Source: https://habr.com/ru/post/174789/


All Articles