Series of publications about collecting signatures
1. Introduction, the site "Bulk 20! 8", preparation for the collection
2. Iron and networks, video surveillance
3. Reaper 2018: a system for collecting signatures
4. Project Management
This is the third part of the material about the IT infrastructure of Navalny headquarters. In previous chapters, we talked about the
development of the site âBulk 20! 8â , the
organization of a network at headquarters and the production of document scanners.
This chapter describes the creation of a system for collecting signatures for the nomination of Navalny as a presidential candidate. The stages of work and the resulting solutions are described. The system of physical storage of sheets with signatures is described.

')
Sheets, QR codes and ways to work with them
The subscription list is the main document in our system. The first thing I want to do to work with a large collection of objects is to assign them a unique identifier in order to associate each object with an entry in the database. But the form of the subscription list is very strictly spelled out in the law, any violation of it is a reason to reject in general all the signatures of the candidate. On the sheet that is submitted to the electoral commission, no unnecessary marks and symbols are allowed.
When collecting signatures in Novosibirsk, we placed each sheet in a multifore (transparent âfileâ), on which the sheet ID and all official notes were written on it with a marker. It came up to four thousand sheets, but would not work for hundreds of thousands. This time we considered the use of multiformers an unreliable and inconvenient solution.

Lawyers invented a method that allowed us to identify each sheet and not to disrupt the form of the subscription list. The law does not say anything about the physical size of the subscription list. This allowed us to design the sheet so that the identification codes were put on its upper part, and before being submitted to the electoral commission, they were simply cut off.
Sheet code consists of 6 characters. You can use Latin numbers and letters that have graphic analogs in Cyrillic (you can write in any layout in forms). For convenience, added separators: 91 â X7 â BA.
The same identifier is printed as a QR code for automatic recognition at various stages of work. QR codes won all other types of bar codes in terms of reliability and speed of recognition.

The life of the staffs is full of difficulties, therefore QR codes were thoroughly tested in various stress situations for the sheets ...

... and decided that the three codes will be enough to process any living sheet.

Lawyers and designers have seriously worked to make the layout consistent with both the law and common sense. Separately tested the number of signatures on the sheet. Few signatures - too many sheets, a lot of unnecessary scribbling (data collector and trustee), more errors in the testimony. Many signatures - inconvenient to make voter data, more errors in the signature lines. After experiments with prototypes, we stopped at five signatures.
Each sheet (more precisely, sheet identifier) ââis created in the database, after which it can be printed on A4 paper. But you can not just take and print a sheet on the nearest printer. By law, the production of subscription lists must be paid from the candidateâs electoral account. Usually they are made by an external contractor. Therefore, we have made the technical side as friendly and flexible as possible. Sheets are either printed directly from the browser, or they are previously saved in a multi-page PDF file that can be sent to the contractor in any convenient way.
Owl: preparation for collecting signatures
The collection of physical signatures in the subscription lists can be started only after the nomination of a candidate and the opening of a special electoral account. The law gives very little time for this. It was important for us to do as many operations as possible in advance in order to debug all the processes and to speed up the work as soon as the elections were officially announced. To preliminarily verify the data of our supporters, to train the staffs and test the collection mechanics, we launched the verification procedure.
Verification is a beta version of collecting signatures: in real headquarters, with the same equipment, with the same strict verification of documents, but without putting a signature on the paper sheet. To work with data of verified people, an application Sych was developed.
Composition Owl
RESTful API backend: Python 3.6, aiohttp, aiohttp_admin, SQLAlchemy.
Databases: PostgreSQL, Redis.
Daemon notifications.
Passport number recognition daemon.
Daemon to build analytics.
Service passport check on his number.
The boxed version
Kladr-API for work with addresses (PHP 5.6 + MongoDB).
We decided to make for the Owl a separate backend with the RESTful API, because it was planned to integrate it with several services, including the Navalny 20! 8 website. A separate PostgreSQL and Redis database was used as storage for caching. To manage users came the aiohttp_admin library, which we modified to fit our needs.
The internal operator interface is a step-by-step form for scanning a passport and filling in personal data. Due to the large number of possible states, this form was written on React.
Interaction with the site âBulk 20! 8â was conducted through an API that is protected by a token and is available only over the local network between virtual machines.
Record for verification
In order to evenly distribute the load on the staffs over time, they invented a verification record. After registering on the site, the person got access to the recording interface, where he chose a convenient headquarters and time.

For load control, records management and schedule, we developed a separate interface available to the regional manager and headquarters coordinator:

If a headquarters has an emergency, the coordinator may massively cancel future verification entries. However, he cannot do this on his own - you must request a cancellation confirmation code from the regional manager. We had to repeatedly use this option.
Notifications
Sychy was a branching notification system. The signer should have received notifications by mail when he signed up for verification, missed the recording, a week after the cancellation of the recording, after successful verification, after the cancellation of the recording by headquarters and in several other cases.

SMS notifications were sent to remind you of the recording in three hours and to inform you that the headquarters canceled the recording. The queue of notifications was made on the same principle as on the site âBulk 20! 8â: tables in the database with messages that were sent in groups via email and SMS gateways.
Passport Recognition
In order to evaluate the work of operators and determine the percentage of errors during data entry, I wanted to have an additional scan recognition. Reliable automatic recognition was impossible to do because of the variability of passports, so two options were considered: send scans to Yandex.Tolok for users to recognize, or take a group of volunteers who would do this in the office. But the issue of personal data security prevented both options, and we left automatic recognition only for the passport number.
Analyst Sycha
During verification, we not only refined and verified our base of supporters, but also tested the work of headquarters, infrastructure, equipment and the mechanics of collecting signatures. To observe the process and adjust it, we made a simple analytics.
Since the headquarters has three levels of process management - the coordinators of the headquarters (responsible for the work of one headquarters), regional managers (monitor the group of headquarters in several regions) and the management of the federal headquarters (monitors everything and everyone), the system grouped the data in different ways. for each category of users.
Most of the details we showed to the headquarters coordinator. He saw the statistics of all operators and the dynamics of key indicators and could make management decisions based on them: put up more or less operators, strengthen the alert, change the work schedule on weekends, fire or re-educate employees who often make mistakes, etc.

We saved the regional manager from unnecessary details, and on the first screen he saw only the most important things in his group of headquarters: key indicators, ratings and problem headquarters (marked with alarming red). We referred to âproblemâ headquarters with N% below the average, chronically underloaded (they needed additional notification) and overloaded by the number of records (this meant that not all people could enroll and need to increase the number of operators).

In order to better deal with the detected problem, the regional manager could easily see the detailed statistics for each headquarters and see all the data that are available to the coordinator.
It was important for the federal headquarters to immediately see the full picture, so we collected key campaign metrics on one screen and made a pivot table for all the cities where the verification is taking place. In the table, you can select the headquarters of interest to view the full set of data on it.

In total, more than 50 indicators were displayed in the analytics. The flexibility of SQLAlchemy was enough to never switch to pure SQL and to keep the code readable. For the most time-consuming indicators, we first made caching in Redis, but it turned out to be easier to periodically calculate them in the background and, when prompted, take them from a file.
Reaper 2018: a system for collecting signatures
In parallel with the verification process, a system for collecting signatures was developed. The architecture of the system used in Novosibirsk and able to work with physical objects - sheets and signatures was taken as a basis.
From the backend side, the Reaper-2018 is the heir to the old Reaper, but he received the operator interface from the verification system. Some screens were refined after analyzing reviews of the owl. In addition, interfaces have been added for several levels of data verification and for controlling sheet movement.
Operator Interface
In the process of obtaining a signature, the operator must scan the voter's passport, fill out a questionnaire (given that the address indicated in the registration stamp can be written in a completely different format) and enter the data into the subscription list, following the instructions of the system. But first we must check whether the voter meets the three key conditions:
1. At the time of the election, he must be over 18 years old.
2. If the voter is 20 or 45 years old, he must have a new passport.
3. Passport should not be listed as invalid.
Checking the base of invalid passports is a simple operation, but it also has its subtleties. The base is distributed by the Ministry of Internal Affairs on its website. Previously, before the elections, they for some reason turned off the possibility of unloading this base, so we started daily downloading the current version of the database in advance (do not forget to turn it off).
Now in the database of more than 110 million entries (series and passport numbers). For a quick search with a small amount of database and indexes, the following scheme was invented: PostgreSQL creates a table with a million entries, the primary key of which is the passport number (from 0 to 999999), and the second field contains all the series of invalid passports for this number. To reduce the volume of the series, they were converted to binary format (two bytes each) and compressed using zlib (I just wanted to). Initially, the base is about 1 GB without indexes. After processing, it turns out 260 MB with an index. One entry is checked for an average of 15 ms.
0.6% of passports of people who passed the verification were found in the database of invalid passports. This means that without such verification we would spend 12% of the limit on invalid signatures only for this type of error.
0.88% of the passports did not suit us, since the citizen was 20 or 45 years old, but he had not yet replaced the passport. And this is another 18% of the limit of invalid signatures.
The subscription list contains 4 columns filled out by the operator: name, year of birth, passport number and address of permanent registration. All these data passed through the Reaper to check and correct possible errors. For example, in the fields for name and patronymic, the search for typos works:

For prompts for names in the API, there is a method that compares the value with a large list and returns three response options:
- everything is OK, there is such a name;
- there is a similar name (such and such);
- Unknown name (rare name or serious spelling mistakes).
A separate story - the letter "e". There are passports in which it is used, but in most cases it is replaced with âeâ, so we display a warning if there is âeâ in some field of passport data.
The system does not fix anything itself, only informs. The operator and inspectors should pay attention to such cases and make the right decision.
Document scanning
We use our own production scanners to obtain document images, and the Raspberry Pi as an operator station. This is described in detail in the
second chapter .
This image is not a scan of a passport, but is collected in a graphical editor from random data.The image is obtained on the client side of the HTML 5 Canvas API and is sent to the server as a base64 string, in which lies the JPEG. From the frontend point of view, scanners can operate in two modes: USB webcam and streaming video from a computer on a local subnet. Owl works only with USB-cameras, and the Reaper-2018 allows you to switch between modes. The operator chooses which scanner to use.
There was a small problem with the choice of the video stream of neighboring computers: tables and scanners can be moved, and operators can change. We do not know which scanner will be next to the operator next time. I had to sort through the headquarters subnet and give the operator the opportunity to choose any of the live scanners. But it turned out that the scanner's video broadcast server, although they put the correct CORS headers (Access-Control-Allow-Origin: *), do not respond to OPTIONS requests. The browser denied ajax requests to neighboring hosts, which made it impossible to use regular jQuery.ajax () for searching. JSONP requests didnât help either, because they couldnât be canceled programmatically, and several dozen pending requests completely blocked the page. Pictures helped to solve the problem. We added tags to the DOM and assigned them a src video stream. If the picture was resized in accordance with the size of the stream, then the stream was considered alive and shown to the operator.
The display of the video stream in the browser noticeably loads modest Raspberry Pi processors, so we had to do a âscreensaverâ: after 5 minutes of inactivity, the browser pauses the broadcast.
It is important for us to select the current information about the place of registration. At the turn of the passport can be 6 stamps, but only one is needed. The interface prompts you to select it using the arrows on the keyboard or by clicking on the desired stamp on the preview.

More registration may not be. Such voters are recorded in a separate subscription list with an empty region and address, and the registration scan is skipped.
Address Processing
The most difficult part in filling out the subscription list is the address of the voter. More than half the errors due to which the signature is considered invalid are associated with the address.
To the address of registration there is a large list of legal requirements. For example:
- it must be an address according to FIAS (federal information address system);
- for the renamed streets it is necessary to indicate new names, even if the passport was old;
- The law establishes a certain format of the hierarchy of address objects that need to be recorded (for example, you cannot specify an urban area).
These are only basic points, but there are a lot of trifles, the list of which was replenished with each interaction with the election commission. Failure to comply with even minor requirements - the reason for the electoral commission not to accept the signature.
At the collection of signatures in Novosibirsk, about 3.5% of signatures were invalidated due to claims to the âaddressâ field. And this is 70% of the limit that is set for signatures for the nomination of a presidential candidate.
In order to fulfill all the requirements, we are forced to drive each address through a computer in order to form the correct format and indicate to the collector up to the symbol what he should write on the subscription list.
We try, when it is possible, not to use API of third-party services, in order not to give data about our users and not to be in a situation when the API is suddenly turned off at the most crucial moment. Work with addresses is a critical function for collecting signatures, so we had to make our API to the FIAS database.
In the FIAS database, there is not yet enough high-quality and complete information about houses and apartments, so we stopped at street level. In this form, the base with all additional constructions weighs about 2 GB and lives quite comfortably in the form of PostgreSQL. Modified scripts from the
fias2pgsql repository were used for import.
For the universal all-Russian form of entering addresses, you cannot simply make the fields âcityâ, âstreetâ, âhouseâ, since there are many different address formats and types of address objects. A well-known example of an unusual format is Zelenograd, in which there are houses without a street name. But believe me, on a nationwide scale, this is a rather trivial case.
After a series of experiments, we settled on a form of three fields:
- the subject of the Russian Federation - it is always there, this is the most understandable field;
- FIAS address - field with autocompletion to addresses of the given region within FIAS;
- house / building / apartment - a line where the data is copied exactly in accordance with the permanent registration stamp.

The lawyers have compiled a table of address transformations, with the help of which we brought the FIAS addresses into a format consistent with the election legislation. Most often it was necessary to exclude one of the address elements. Some addresses were excluded entirely (garage cooperatives, yard areas and other similar objects). The IT department received a table with rules, and the legal department received in response 10 examples for each of the 44 address types.

After several such iterations, the base was ready for operation.
The technical part of the task was to organize a convenient and fast search with auto-completion, which will withstand the load of 1 million queries per day. Sphinx was used as a search engine. The request is cleared of unnecessary characters and transferred to Sphinx, and it returns the full addresses of the objects, ranking them according to the specified rules.
Sphinx indexes an address field written in XML format. This storage format turned out to be convenient in that all the metadata can be hidden in XML attributes that Sphinx does not use for searching, but keeps in memory and returns in the results without additional reference to the database. Somewhere on the frontend, these attributes are used to form a beautiful address bar.

The solution was convenient and fast. One request to the suggest API is performed in 15â20 ms, the backend calmly processes 300 simultaneous connections on a not very powerful virtual machine.
Filling out the subscription list
Signatures must be made on the lists of the subject of the Russian Federation to which the address of the citizenâs permanent registration belongs (or on special sheets without a region if there is no registration). The reaper tells the operator which sheet of the region to take, and does not give a signature to the sheet of another region.
Imagine that you want to solve such a problem without a computer, collecting signatures at the station, where there will be many people from different regions and there will be no card store with empty sheets, sorted by region. About a third of passports, the registration stamp does not contain the name of the region, and passersby do not know the rules of the game and can easily confuse something. It looks like a source of a large number of errors, which is unacceptable at the legal limit of 5%.
Filling out the subscription list is a complicated and responsible procedure. On the sheet there are lines of signatures, the tester of the collector and the signature of the trustee. All these blocks must be completed in accordance with strict formal requirements. At each stage of filling errors are possible, which can make the whole sheet or part of the signatures invalid.
We have developed such scenarios of the operator's work that reduce the likelihood of typical errors. The letters of witness of the sheets of the âhomeâ region (about 80% of the signatures will be from the region in which the headquarters is located) are filled in by the collector in advance in a calm atmosphere. For all blocks of the sheet, the Reaper shows exactly how they should be filled.

The fill interface simulates the real subscription list, which is currently on the table in front of the operator. Showing occupied lines, columns for filling, sheet number, large - data for entering.
For a filled line, the operator must indicate its status (it is not always possible to fill in the line successfully the first time). Each correction and deletion must correspond to a note from the collector on the sheet and the corresponding status in the database.
After filling out the entire sheet on it is stamped with the date and signature of the collector. The sheet is submitted for review.
Verification of signatures, work with sheets in the headquarters
At the end of each working day, all sheets with signatures are checked out, which takes place late in the evening or at night (we have small headquarters, there is simply no place to conduct all processes in parallel). The verifier (he is the candidate's attorney) looks at each sheet and each signature, compares with fragments of the scanned pages of the passport, checks all relevant elements on the check list. If errors are detected, it is marked in a special interface.
A witness record is checked separately. Errors in the certification are especially dangerous, since they affect the entire sheet at once. Such errors are due to the origin of approximately 9% of all invalid signatures.
Some errors can be corrected, but only the collector can make corrections to the signature lines, and the headquarters does not have it in the evening / at night, so all the information necessary for the correction is transmitted electronically. To understand the context, you need to see everything that happened with the line before. This is how the âchatâ appeared between the examiner, the operator and the lawyer. All names and other data on the image are made up. If errors seem fatal or have any doubts, the sheet is sent to a lawyer. If the signatures do not contain errors or all corrections have already been made, the verifier signs the trustee and sends the sheet for sending to the central headquarters.
Smile and neurophysiology of happiness
For quick and error-free selection of the status of the scanned string, we used buttons in the form of emoticons. It has deep neurophysiological causes. In the visual system of the brain there are ancient low-level mechanisms that respond to certain images. The visual system reacts most quickly to straight line segments of different orientations, since the lines are easily detected by the primary visual cortex. In the secondary visual cortex, simple geometric shapes are recognized (this needs to be learned) and a face pattern. And not just a face is recognized, but basic mimic expressions. That is, emoticons. Like the recognition of straight lines, it is an innate ability. Thanks to this low-level system, emoticons are recognized much faster and more accurately than text.
Icons in the form of emoticons correspond well to the meaning of statuses that the verifier can assign signatures: âgoodâ, âthere are problemsâ, âbadâ. There were some doubts with the âshow a lawyerâ emoticon, but we coped with it.There is also an opinion that emoticons humanize the interface and thus slightly improve the life of the operator. This is important because
operators had to spend long hours at work with our system and not lose vigilance.Sending sheets
Finished sheets are sent to the central headquarters every day. Sheets can be many, several hundred. We want to know exactly which sheets are ready and left the headquarters, but manually registering them is long and unreliable. To account for the sheets sent written mobile application.
He has a mode that allows you to quickly scan the codes of hundreds of sheets and reports if a sheet is attempted to be sent by mistake when it has not yet passed all processing steps at headquarters. It takes 1-2 seconds to scan a single sheet.After scanning, the sheets are packed and sent to Moscow.Form Details
All passport data is entered and displayed in Source Code Pro Regular monospaced font. It is easy to distinguish zero from the letter âOâ, and the characters are quite similar to those commonly used in modern passports.All forms are made so that between the fields and the main buttons you can switch tabs. The input focus is in the right field, not only when the page is loaded, but also after closing the error message. Modal dialogs capture the focus so that switching occurs only between their controls.
All buttons that, when pressed, take something long to show, show it with all their appearance. Input fields at the time of sending data are turned off. In case of errors, detailed explanations appear.Logistics and physical storage of sheets
Shifting the pieces of paper is one of the activities in which humanity has achieved incredible success. It would seem that you can go to the stationery shop, buy a set for collecting signatures "Federal" and not think about the details. But there is a problem: all office solutions are too expensive. We cannot deliver document scanners for several tens of thousands of rubles and cabinets with hanging folders for one hundred thousand at each headquarters, so at each stage we had to invent and create something from scrap materials.Some facts about process physics
We need to hand over 315 thousand signatures. For this, taking into account the regional quotas and the margin for various errors, about 1 million signatures must be collected and processed. On each sheet there can be a maximum of five signatures, but in reality there will be about 3-4. This gives us, roughly speaking, 300 thousand sheets.A sheet of A4 paper has an area of ââ1/16 m².The density of ordinary office paper is 80 g / m², each sheet weighs 5 g.The height of a pack of 500 sheets is 4.5 cm for blank sheets, more than 6 cm for filled ones.It turns out that all the collected sheets will weigh 1.5 tons, and they will be about 36 meters high when they are folded in one pack.How to store all this?
Subscription lists are printed, filled with signatures, checked, certified and sent daily to the central headquarters. One headquarters sends several hundred sheets a day, so there should be no problem at this stage.The most interesting begins in the central headquarters. There you need to organize a storage system that allows you to easily take sheets from the regional headquarters and work with them until the end of the collection. After the collection is completed, the sheets should be grouped by region and sewn into folders for the electoral commission.We cannot simply stack sheets in endless packs, since lawyers may at any time wish to remove part of the sheets according to a certain sample. You need to know exactly where each sheet is located, be able to quickly get it and return it.For quick access, a system of indexing the physical database of sheets was invented. The index consists of several levels: headquarters (box), box, folder. Address folder in the archive looks like this: 77â1â15. Inside each folder is 25 sheets (in random order).
On the left upper picture there is a box for 500 subscription lists in paper folders.
On the right picture - a box for 2000 sheets in overhead folders.Retrieving and sorting sheets
All sheets coming from the regions are scanned with an automatic two-way scanner (he was already in the office, so you did not have to collect it yourself from LEGO and Arduino). This device is able to upload the result to the server via SFTP. There, the scans are run through a python script that searches for QR codes in standard places, recognizes them and links scans to a common database. The script reliably handles even crumpled sheets.
After scanning, the sheets go on sorting. Each sheet is scanned using a mobile application (sorting mode). It finds a sheet in the system, changes its status to âarrived at the central headquartersâ and shows the coordinates of the folder in which the sheet should be put. The operator confirms that put the sheet in the specified folder (closes the transaction).
Sheets of one region are placed in a folder sequentially, as long as there is space in it, so the whole process happens very quickly.Backend
The Reaper 2018 is made on Django with a standard template engine and ORM. The database uses PostgreSQL. The service parts of the system - FIAS, passport checking, work with the preliminary registration data - are rendered into separate modules (django app) with their own databases.The physical world of signatures is presented in the form of several classes of objects: a subscription list, a line in a sheet, a signature. The objects of these classes have attributes that reflect the state of the object in the real world. The state machine uses the âstate machineâ template (state machine, finite state machine) and the django-fsm library. All transitions between states are registered in the form of FSM transactions, within which the necessary checks and additional actions with the object are carried out.The state diagram looks like this:
The position of the sheet in space is determined by the state of the lines that it contains. If there are lines that a lawyer has to check, the sheet receives the status âto a lawyerâ. As soon as the lawyer took the sheet and entered his code in the verification interface, the sheet receives the status âfrom the lawyerâ. Thus, we always know the exact position of all the sheets and understand their immediate fate.Testing
The signature collection system has too many different states and transitions between them to check it manually. To automate checks, all scenarios related to the work of operators and verifiers are covered with tests on the django side.It is useless to look at the system of collecting a million signatures when it does not contain these signatures. To fill the database, scripts were written that initialize the typical state of the database in the collection process, so that you can look at a system filled with something similar to real data.The collection of signatures is very limited in time, and much of this time falls on New Year holidays. We expected that the load on staffs and the collection system would be uneven. It was important that the system easily coped with any realistic stream of signatures. At peak times, up to 10,000 signatures per hour were expected. For a regular web site, this does not look serious, but in our case this order of âvisitorsâ can create a large load on the server. This is not just a visit or registration: obtaining each signature involves about 50 requests to the server and processing several high-resolution images.Stress testing was performed using Locust. This is a simple tool, accessible through PyPI. Scenarios are described as python code, roughly as unit tests in Django:
Tests can be run via a web interface, which displays graphs of the speed of requests, the number of clients, and the server response time.
The deployment of the project is organized in the same way as for the âNavalny 20! 8â site.Access to the Reaper's web applications is only possible through the headquarters VPN network.Monitoring
We use various tools for monitoring servers and applications involved in the signature collection system.Zabbix monitors the status of all virtual machines of the project.Elasticsearch collects nginx logs from all virtualoks, Kibana shows this in the form of graphs. In Sentry, all the errors from the applications and from the frontends fall. The frontends are placed in a separate âorganizationâ in order not to spoil the statistics on the backend errors. A handy thing, but getting Sentry to work under our load was quite difficult.
Goose
This is a functional monitoring, somewhat similar to uptime.com , only self-made. The backend is built on django, the queues are made on celery with a backend in redis.In the Goose added project domains. For each domain, the addresses to be monitored, the check interval and the type of check are indicated. You can check the certificate, content, HTTP-headers, redirects, and something else useful.If something went wrong, Gus can send letters and SMS or call in the middle of the night and explain the situation with a human voice (Twillio is used for calls and speech synthesis).The web interface always shows which domains have errors and how are the queue checks. Every minute is 20-25 checks.
Even in Gus, there is distributed monitoring via a âbotnetâ from mobile applications. Applications receive a list of addresses for monitoring, check them (the status of the response and the presence of the specified content) and send the result to the server. Such monitoring is convenient to monitor the blocking of Roskomnadzor and the state of the server under attack.
In the next, final chapter, we will tell about the management of projects of the IT department.