📜 ⬆️ ⬇️

How not to do federal information systems

This article will be of interest to a narrow circle of Habr's readers — developers of federal information systems and a wide audience — those who already have, have to, or have to interact in the future.
The narration will be conducted on the example of FIS GIA and reception (this name was assigned by D. Medvedev on August 31, 2013, the previous year and a half the system was known by the name given by V. Putin - FIS EGE and reception).

image

What is it all about and who needs it?


The federal information system to ensure the state final certification of students and the admission of citizens to educational organizations ( full name by reference ) is a system created for the benefit of Rosobrnadzor, to which universities and colleges for 3 years must enter information about the admission campaign, including personal data of all applicants . Namely, before the start of admission there in a certain format, the number of places, lists of admission tests, permitted benefits, and in time - data on applicants' applications, including full name and passport details, are transmitted practically in real time.

Theoretically, the supervisory authority thus checks to see if a procedure approved by the Ministry is not violated. In practice, the penalties so far only happened for non-transfer of data to the system.
')

Interaction of educational institutions with FIS


All institutions of higher and secondary vocational education are obliged to transmit information on the admission process to FIS every day. For this purpose, both a web interface for data input and viewing, and an automated interaction service for packet transmission in XML format are provided. Theoretically, everything is beautiful, but there are thick nuances. The first is the speed of interaction: in manual mode, it takes up to 20 minutes to enter a single application at peak hours, and in automated mode, packages can wait for processing in a queue for days. The second is errors in the operation of the software, generating contradictions in the data. But first things first.

Data Model Design


In the subject area of ​​the admission campaign, one applicant can submit up to three applications to the university. In the case of transferring data to XML, various entities are represented as tree elements. How would you solve the problem of submitting applications in XML? Obviously, the applications are tied to the applicant and it is logical to place information about the identity of the applicant in a higher level element, and on the applications filed by him - in nested elements. However, the FIS developers did the opposite: information about the applicant is repeated in each application and may even contradict each other, and then several lines appear in the web interface with the same full name, but different, for example, passport data. At the same time, the links from all such lines lead to the same card, in which only one random passport of several contradictory ones is displayed.

More from the wonderful. It is clear that only data exchange is performed in XML, and the internal representation in the system is still relational and is stored in a decent DBMS. And then a very good idea arises - to add to the exchange protocol the primary keys of the entities used in the university systems that are the data sources. After all, this should simplify the identification of new entities and the update of old ones. But should it be assumed that all client systems have a similar data model and is it relational in them at all? Certainly, university programmers of data exchange clients will have to face the need to generate a unique and immutable identifier where it has never been, or to encounter an experimental error in which the identifier of the certificate and, for example, the olympiad diploma cannot be the same (apparently, in FIS Documents are stored in one table, but where is this reflected in the documentation?)

Documentation


Competent documentation for the publication in the production system, which should dock hundreds and thousands of disparate systems smaller - the key to success. It is sad that the FIS developers could not solve this problem even after 3 years, although there is still some progress. The published XML structure in the form of a PDF document and an XSD schema is certainly necessary. But it is important to at least check that XSD is, firstly, valid, and secondly, it does not conflict with the reference XML document. Otherwise, hundreds of third-party developers will fix the clumsy regex and annoying length = "50" instead of maxLength = "50" instead of those to whom it is supposed to.
In addition, the formal description of the exchange protocol is categorically insufficient, because in the case of a complex data structure, the system will not accept any valid packet, but only one that satisfies a number of additional checks for adequacy. One of the examples with foreign keys is shown above.

Limitations and checks when interacting with external clients


In general, do not cross the thin line between the necessary checks and redundant restrictions that do not miss the correct data - almost like to stay on the blade of a knife. And the main thing here is a thorough understanding of the subject area before starting development. In particular, the developers of FIS this year began to cut the admission of applications for zero quotas quite acceptable in some cases. When the goal is to collect information, it is better to allow the loading of seemingly incorrect data for further analysis, and cut off only obviously incomplete ones.

Errors in the system and recommendations to the developers of the mating systems


In particular, FIS and, I suspect, state systems in general are a remarkable example of unstable “partners” to hone skills of interaction with remote systems when everything should be checked. For example, XML is sent in an HTTP request and another XML is expected in response, but:

1. It may just break the network connection.
2. There may be a timeout and, by the way, it is better to make it reasonable in advance, because otherwise waiting for an answer may take hours.
3. In response, it can come not XML at all, but anything.
4. XML may come that does not correspond to the scheme declared by the developer.
5. XML will come, but the data in it will be inconsistent. Example - the request sent 100 objects for import, the response is expected to be the number of successfully imported and a list of unloaded due to errors. In actual fact, only 83 objects turn out to be in response, and where to look for the remaining 17 and which eventually are loaded at all is a mystery.

Theoretically, all the situations described are banal, but far from any system they all occur regularly with high probability.

Organization of connection to the system and protection of PD


For those who read up to this paragraph - the most interesting. FIS GIA and admission is located in the closed network of the Federal Testing Center, to which universities are connected via ViPNET VPN clients. In addition, for decent money, a unique solution is being imposed on a little-known monopolist company for filtering data on the client side, "so as not to extricate too much from the system with the personal data of millions of citizens." There is no explanation why this filtering should be done for each client, and not the only time on the server side. By indirect evidence, this unique solution is only a proxy server that filters valid URLs when working with an FIS server.

However, recently inquiring minds have noticed that if you accidentally (or intentionally) specify another package identifier in viewing the results of importing packages in the web interface, it will open! And it will not only open, but also allow you to download an XML file with all the data of all applicants, including passports, data on previous education, information on benefits, including medical, etc. Thus, any user who has access to FIS has the opportunity to obtain a simple search of the data of a significant part of applicants for the last 3 years.



Summary


In conclusion, some banalities suggest themselves, but since thousands and thousands of IT specialists will have to face this FIS and her ilk, I think you can write them.

For those who have to interact with insufficiently thought-out, poorly documented specific information systems made under the state order - be prepared for everything and immediately lay into the exchange algorithm all conceivable and unthinkable errors, you can not trust in anything. Even in the face of lack of development time, it is better to lay a maximum of checks.

Good luck to you!

Source: https://habr.com/ru/post/236879/


All Articles