📜 ⬆️ ⬇️

We open the history of the Bolshoi Theater. Part one

image

Have you ever collected theater programs? If yes, then, probably, there are dozens of them in your collection, or maybe a hundred will be typed. Now imagine that at your disposal 120 thousand program, 48 thousand posters and 100 thousand historical photographs. Since the middle of the XIX century, the Bolshoi Theater has kept so many paper documents. The oldest and most valuable of them have already turned yellow and become shabby, and the search for information in the theater archives took hours. To save these treasures, the staff of the theater museum began to manually transfer documents into electronic form, but it turned out that this could take years.

Therefore, in September 2016, together with the Bolshoi Theater and with the active support of Thekla Tolstoy, great-great-granddaughter of Leo Tolstoy, we launched a crowdsourcing project to digitize the history of the main theater of the country. In this post we will tell you about the details of the first stage of the project and its technical details: how we digitized unique documents using ABBYY FineReader and how volunteers helped to verify the recognition results.

A bit of history


The Bolshoi Theater was founded by Empress Catherine II on March 28, 1776. In the buildings where the theater was located, fires occurred more than once, the largest in 1853. The fire blazed for three days, and a large part of the Bolshoi’s historical heritage burned in it. The oldest theater document that has survived since then is the poster of 1830. All other posters and programs have been preserved only since 1858.
')
Poster of the Bolshoi Theater, 1830. Click on the picture to view more details.



The Bolshoi Museum wanted not only to keep the most valuable archive, digitizing it, but also to make information about the performances, actors, directors, choreographers and many others accessible to everyone. If the staff of the Bolshoi Theater manually reprinted data from the programs and posters, it would take several decades. Then the theater decided to call on the assistance of intellectual technology and volunteers. The initiator of the volunteer project “ Discover the history of the Big ” was Thekla Tolstaya. We have already collaborated with her in the project “All Tolstoy in One Click”. Then, in 2014, with the help of ABBYY Recognition Server and ABBYY FineReader and with the participation of 3 thousand volunteers, we digitized 46 thousand pages of 90-volume works of Leo Tolstoy. Now all books in electronic form are available on the official portal tolstoy.ru . Read more about the project itself here .

In the project “Discover the history of the Bolshoi,” we are faced with the task not only of digitizing the collection of documents, but also of extracting valuable information from them to create an electronic archive.

Therefore, the project is divided into three stages:

image


The project was officially launched in October 2016, when volunteers began checking digitized texts of programs and posters. But we started preparing for the start a little earlier.

Hot August 2016


In August 2016 , the scanning team arrived at the Bolshoi Theater. For 7 months they scanned tens of thousands of program notes and photos from the funds of the Bolshoi Theater Museum. Posters did not have to scan, as the museum has already done it himself.

Friendly team scanners. From left to right: Nikolay Altunin, Irina Andryukhina, Dmitry Nesterov.

image

Our partner, Fujitsu , provided for the project two flatbed scanners Fujitsu fi-6770 and Fujitsu fi-6750S and two non-contact scanners Fujitsu ScanSnap SV600 .

The tablets helped us digitize the programs that were collected in binder and tightly stitched. We also scanned photos from both sides. On the back of the images contains valuable information: the name of the productions, the names of artists and photographers.

image

Contactless ScanSnap SV600 helped us digitize large-format and dilapidated programs. They had to be treated very carefully.

image

You can see in more detail how the digitization stage went, you can in the gallery .

As a result of the scan, we received files with photos in TIFF format with a resolution of 600 dpi, as well as programs in JPEG format with a resolution of 300 dpi.

Recognize and remove


We divided all the scanned documents into small parts - “packages” so that the work was not difficult for the participants. One package is one program or poster. In the program there are one sheet and 30, on average, four sheets each. We divided the packages by years and numbered them.

Then you had to recognize the scanned documents and create PDF files with a text layer. Why do you need a text layer? So that museum staff could not only view digitized posters and programs, but also search and copy information. FineReader automatically recognized scans and marked areas on them: the text was highlighted in green, the images in red, and the tables in purple.

image

In a tabular format in the posters and programs are lists of actors:



Each participant in the first phase of the project was registered on the site openbolshoi.ru . Then I went to my personal account, read the detailed instructions, installed the free version of FineReader, downloaded the package (FineReader document, archived in zip format) and proceeded to check. Volunteers looked at the correctness of the marking of areas, read the text and corrected the inaccuracies of recognition, which are possible during digitization.

We called the participants of this stage verifiers. They checked the programs and posters from October 2016 to June 2017, starting with the present time and gradually moving towards the XIX century.

In short, how to make the project site, you can read under the spoiler.

Crowdsourcing platform
Crowdsourcing platform

Openbolshoi.ru is a volunteer collaboration platform. It was created under the control of CMS “1C-Bitrix” in conjunction with a DBMS - MySQL. Programming language - PHP. Amazon S3 was used to create program and poster storage, and GIT was used for version control. After preparation of the project, the project was technically implemented in just one month.

Components of the platform:

1. Public part (available to all users, contains information about the project).
2. Personal account of the participant (available to registered users and is designed to check packages and personal information). In the personal account, volunteers saw the number of packages that they had taken, a place in the rating and accrued points.

image

3. Personal Administrator Account (available only to platform administrators, designed to test the work of volunteers).
4. Administrative part of the platform (available only to CMS administrators and needed for global platform management).
5. Amazon file storage (designed to store packages).

Catch in 48 hours


Each volunteer was given 48 hours to check one package. If during this time the person did not have time to check the document, then the file again fell into the general issue. And another volunteer could take it to the test. If the participant checked the package carefully and on time, then the package was accepted and charged to the volunteer 5 points. If the participant checked the document in bad faith, then such a package was not accepted, and the volunteer lost 10 points.

Translation difficulties


Checking the posters was more difficult to read the program. In the old posters and programs because of the shallow and blurred text, complex layout and print quality characters were not always correctly recognized.

For example, a volunteer spent the whole day checking out this large, complex billboard of 1936 with small print. Every third last name had to be searched on the Internet:


And on this poster, the signature below is poorly visible:


Often, volunteers came across old, torn posters, some of the information from which had to be entered manually. On this billboard of 1883, the volunteer recognized only the heading and the first two columns, because part of the document was not preserved:



Although FineReader knows the old Russian language, it was unusual for participants to test pre-revolutionary posters and programs with their atypical style of presentation and the long-forgotten letters “i”, “ѣ”, “ѳ”, etc. Nevertheless, the volunteers successfully coped with this task and with they wrote with humor in the comments: “After checking the posters of 18 ** years, the hands stretch instead of“ actions ”to write“ actyah ”...”.

In the photo - the program in 1910:

image

Technical support in touch


The project organizing committee round-the-clock answered volunteers' questions via e-mail, in social networks and by phone. In the group “VKontakte” volunteers asked many questions and actively helped each other. It looked like this:

image

Participants also shared interesting details and unusual facts found in unique documents.


Under the spoiler, we collected other volunteer finds.

The smallest billboard, forgotten things in the theater and 40 ladies in costumes debarders





Winning ticket


As you remember, for each checked package the Organizing Committee awarded points. So the rating of volunteers was formed. The five most active participants received prizes - tickets to the Bolshoi Theater. First place was taken by Igor Alimov from Belgorod, he checked 4 349 packages. The top five winners also included Galina Zarina from Moscow, Alexander Aksenov from St. Petersburg, Natalya Klementieva from Moscow and Larisa Ogorodnikova from Ekaterinburg. They chose interesting performances and visited the performances of Don Quixote, The Nutcracker, The Snow Maiden, Iolanthe and the premiere of the ballet Romeo and Juliet.



Reviews of other winners of the first stage of the project can be read here and here .

In addition, the top ten active volunteers received ABBYY FineReader as a gift. And participants who checked at least one package received special diplomas:

image

Some statistics


In the first phase of the project, 4,000 volunteers from 60 countries participated: USA, Australia, Brazil, India, China, Kazakhstan, Mongolia, many European countries and, of course, Russia.

image

Top 10 cities in which most of the volunteers live:



The project involved programmers, IT specialists, teachers, musicians, photographers, journalists, corporate executives, historians, retirees, students, artists, housewives, artists and people of many other professions.



Thanks to the volunteers, it was possible to digitize and check all the programs and posters in just 9 months. Programs and posters in JPEG and PDF formats with a text layer, as well as photos in TIFF format, have already been transferred to the Bolshoi Theater Museum.



Now the second phase of the project is ongoing, in which 6,450 volunteers are already participating. They help extract and organize data from digitized documents. At this stage, the whole complex of ABBYY technologies is involved - from ABBYY Compreno to ABBYY FlexiCapture , and volunteers help to test the work of artificial intelligence. In more detail how it works, we will tell in the following article. In the meantime, you can also become a member of the volunteer project “Discover the history of the Big”. Join now!

Elizaveta Titarenko, editor of the corporate blog ABBYY,
Marina Antropova, Lead Special Projects Manager, ABBYY

Source: https://habr.com/ru/post/352620/


All Articles