
In 1896, in the
geniza of the Cairo synagogue, the remains of manuscripts were found: 320 thousand shredded pieces of paper and parchment. Over the past 117 years, we managed to manually connect 4 thousand of the fragments found. Based on the information received, thousands of scientific papers have been written, but now applied programmers have begun to work.
May 16, 2013 started the project of computer processing of fragments. Experts from the University of Tel Aviv use pattern recognition technologies that have been known since the time
of the DARPA document recovery after the shredder . All fragments are scanned, cleared from the background and aligned, then the borders are highlighted, the program looks for a match in the shape of the pieces, lines on the paper, ink ink contact points, etc. Operators check the correctness of the connection of fragments, and the final assembly is carried out in a graphical editor.
In the case of the Cairo manuscript, the situation is complicated by the fact that over the past century fragments have spread out in 67 libraries and private collections scattered around the world: from
Cambridge to St. Petersburg.
')
In addition, initially it was not known, fragments of how many different documents are among this heap of residues. To solve such problems, it is important to determine the language in which the text of each document is written, the exact set of symbols used, the distance between the lines — in order to assign each fragment to a particular document. Among the hundreds of thousands of fragments found are thousands of different documents written in Hebrew, Aramaic, and Judeo-Arabic.
Already
managed to find out that the documents are dated 9-19 centuries. Among the documents were found letters of
Moses of Egypt from the 12th century, parts of the Torah and prayer books, leaves with poems, personal letters, contracts, alchemical manuals, judicial extracts, even recipes and other documents on parchment and paper telling about the life of the Jewish community in Egypt. Historians have learned that the Jews, it turns out, participated in the import of flax, cloth and sheep cheese from Sicily. In addition, found a rather nasty recipe for honey wine,
writes the NY Times.
Among other things, a marriage contract was found in which the bride named Faiza bat Solomon sets the condition for her fiancé Tobias to “abandon nonsense and idiocy” and “not to associate with bad people”, otherwise he will pay a fine of 10 gold dinars. Among the court documents is a legal dispute between citizen Sitt I-Nasab and her husband Solomon, where the spouse demands to prohibit her mother-in-law and her daughters from entering her chambers and generally address her with any complaints.
In the entire heap of fragments about 15 thousand belong to household, everyday, non-religious records of 950-1250. For example, among the “shuttles” on the trade routes of that time, the most profitable commodity was not gold and spices, but fabrics and soap.
Restored fragment of the letter of Moses of EgyptThe number of fragments here was much more than at the aforementioned DARPA competition. For assembling puzzles you need to make 12.4 billion comparisons of fragments, at the moment about 3.3 billion comparisons have been made. In one hour, a cluster of 100 computers at Tel Aviv University can compare only 10 million pairs, so the process will continue for a few more weeks (expected completion time: June 26, 9:46 am). The progress of the work can be monitored on the
official website .
Computer comparison of fragments is the last stage of a large program for collecting and digitizing fragments of manuscripts. The project began in 1997, and anyone who wanted could help in finding matches, by registering on the
Genizah Project website.
Money for the program in the amount of $ 20 million allocated Canadian financier Albert Friedberg.