
In this issue:
- IT-integrator, retraining from the HR-agency
- People in long clothes at the Abi office *
- 6 year old dump truck driver
Scandals, intrigues, investigations
But seriously, we will tell in general about the census in a country with a particular oriental flavor.
As regular readers of our blog know, we do not only Lingvo and FineReader, but also programs
that extract data from forms . Some time ago, they were actively used to process population censuses: we helped to count the census sheets in Greece, Lithuania, Saudi Arabia, Tajikistan and Kuwait. In Kuwait, the project turned out to be perhaps the most interesting - we will tell about it under the cut.
* Real photos are not preserved, and laid out reproduced from memoryThe census, which will be discussed, was held in Kuwait in 2011. People, buildings, and small businesses were rewritten in Kuwait before, but all the data — it’s scary to even imagine — was manually entered into the database. It was clear to everyone that it was long and expensive, so on the eve of the new project, officials from the Ministry of Statistics of Kuwait traveled to neighboring countries and tried to learn from their successful experience. A successful experience was found in Saudi Arabia, where it was at this time that our ABBYY FlexiCapture 9.0 was finishing processing bulletins.
The East is a delicate matter, so our potential Kuwaiti partners could not simply buy a program from us. They came to our Moscow office to make sure that ABBYY really exists and that we are rather large. Of course, we have no dress code and we saw everything, but people in long white clothes looked very unusual even for us in the office. In general, our eastern guests received answers to all their questions, and work began.
')
In the census projects, we are doing quite a large part of the work - the processing of bulletins (about it below), but in Kuwait there were many interesting points besides. Typically, government agencies that conduct censuses either do everything on their own or attract contractors. The Kuwaiti Ministry of Statistics commissioned the Gulf Business Services & Recruitment Group census manager, the main activity of this company was ... recruitment. Agree unusually. Why them? At first there were several contractors, but one of the most important tasks was to assemble a squad of scribes who would do a good job and would make few mistakes. Therefore, it turned out that the contract to search for people is the largest. Then the officials rightly reasoned that it was better to manage all the processes from a single center, and they transferred all the work to them.
The census took place in several stages. At first, advertisements for the forthcoming census were placed all over the country, and people were offered to “correspond” in electronic form. To do this, it was necessary to leave a request on the site, and on the appointed day, go back and fill in the questionnaire. It was possible not to fill out the questionnaire with hands, but to order a “call back” and dictate all the data by phone. So information was collected about 320 thousand inhabitants - about 11% of all ballots.
The next step was a “paper” census. Since we had to process the census forms, it is logical that we also developed the forms. Double-sided blanks, A3 format. The task was complicated by the fact that the names of the fields needed to be done not only in English, but also in Arabic that was unfamiliar to us, alignment was also atypical - from right to left. They looked like this:

In addition to the name and surname, the person was asked: age, education, marital status, place and work experience, whether they can use the Internet and a computer, who he is the owner of the apartment in which he lives, and the identification number (from the passport). Passport, by the way, looks like this:

There were no unique numbers on the census forms, the main identifier of the sheet was the address of the house (apartment) whose residents were being copied. All residents of the house - on one sheet, the other house - another sheet.
The address was set to a 13-digit number. Each district, quarter, street, household had its own unique code, which the copyist entered in the bulletin.


It was organized 113 centers, in which the census takers submitted the completed forms. Every evening, paper forms were sent to the central information processing center, where they were scanned and recognized. 14-17 thousand forms flowed there every day. Two scanning stations were organized, which worked almost around the clock (
Fujitsu fi-6800 scanners were used, throughput - 20–60 thousand pages per day). When scanning, the package of documents received a unique number consisting of the date, the number of the box in which it came, and the number of the center from which it was sent.
The scanned documents were recognized and verified using ABBYY FlexiCapture 9.0 (30 verification stations were organized). Here it must be said that the product has been substantially refined for this project. The client had a need for this or that function already in the process of work, so our employee spent about 1.5 months in Kuwait
, languishing from the heat and women in veils .
So, at some point it took to make a package of documents can be collected by scripts. What is a “package” in this case? A package is a list of multiple questionnaires belonging to the same address. On one sheet fit 8 people, but in Kuwait live in a heap, and in many homes there were more than 8 people - then the scribe took a new sheet. So, imagine a situation where an employee of the scanning center carries a stack of documents to the scanner and suddenly drops it. Back from the floor, documents are not collected in the order that was at the beginning and, of course, they are scanned at random. The census customer believed that, in such a case, the verifier would not be so competent as to correctly manually assemble all sheets belonging to the same house in the system. Therefore, we “taught” the system to collect a package automatically using the coded address (which, as we recall, was the main identifier of the census form). In the current,
tenth version of FlexiCapture, this function is implemented by default, but at the time of the census in Kuwait, the “dozen” had not yet come out, so I had to finish writing in the process.
Recognized ballots in two stages. Why two? The Ministry of Statistics had to quickly collect information on the number of houses (apartments) and the number of residents, without waiting for all other data to be checked. Therefore, at the first stage, information on the date of completion, address and number of people living at this address was extracted from the census form. All this was verified (verification is when the operator compares the scanned image with the recognized data and corrects errors, if they are there) and was sent to the database - thus it was possible to monitor how fast the census was going and how much was left. Then the document packages were exported to special folders, and an xml description was generated, which set the correct sequence for loading sheets. At the second stage, using this description, the documents were recognized in the required order, information was extracted from all the fields that were not touched during the first pass. Then the documents were verified, the data was loaded into the same database. SQL Server 2008 R2 was used to store information, and SharePoint 2010 was used to publish the results.

Another feature that we have done specifically before this project is automatic verification of compliance with the rules. Rules were needed to avoid obvious logical errors. The rules were of this kind: for example, children should not have children, children up to a certain age and people of retirement age should not have the “Work” field filled, the owner of the house should be 18 years or more older than his sons. So, if in the questionnaire we suddenly came across a 6-year-old boy who works as a truck driver, the system determined that the rule was violated and gave an error. Next, the operator had to decide whether to correct this error by comparing the recognized data with the scanned image of the sheet. If, for example, a pen-writer did not write a pen well and the system did not recognize the two before the six of an age, the error was corrected. If it was not possible to fix it, the error was assigned a critical status and the system automatically transmitted data about the household telephone number, as well as about the nature of the error, to the call center, where the operators contacted the owners of the house and clarified the information.
The preliminary results of the population census were submitted to the Ministry two weeks after the end of the collection of forms. A total of 750,000 bilateral census sheets were processed, which made it possible to reduce the processing time of census data many times (as compared with manual entry). But the numbers are numbers, and for us this project was also important because as a result we managed to grow an educated partner who now had a company that was not related to IT, which now helps us promote our products in Kuwait.
Svetlana Luzgina,
supported by ABBYY 3A (3A = Asia, Africa, Latin America)