📜 ⬆️ ⬇️

Task with an asterisk: how we recoded the FIAS in KLADR



From January 1, the Federal Tax Service will no longer update the KLADR address directory. It will officially become obsolete, only FIAS will remain. But many industrial systems still work with KLADR. Suppliers are not going to update them, and doing business with their own hands is a long and expensive process.

We listened to customers and came up with a solution: take FIAS, which is very much alive, and write a recoder to KLADR.
')
From the side the task seems easy. That was what they said to us: “So you just take FIAS and rework it into KLADR?” In fact, there is no “just”. The reference books have completely different structures and it is not clear how to scatter data into the unsightly KLADR from the downloaded FIAS. However, there is no general documentation for reference books.

It was fun, which we now generously share.

We compared the structure of directories


FIAS weighs about 28 GB, it has about 450 tables. KLADR is about 500 MB and 6 tablets. Data for KLADR are only in three FIAS tables. The rest for transcoding is not needed, their KLADR does not provide.



If the diagram in parentheses is “<i>”, it means that for each region there is a sign with a similar name.

To transfer record by record will not work: either the relationships between the tables, the logic inside the tables, or something else will break.

For example, in FIAS one house is one entry in the table. We tossed home one after another in KLADR, and the directory was fatally swollen: 3.5 GB against the normal 500 MB.

KLADR is a shy man, he cannot afford a separate entry for each house. Therefore, similar houses in the directory are grouped in one line. At the same time, their numbers in the blue eye are stored in the NAME field, separated by commas.



When houses are on the same street and differ only in number, they are written in one line.

Then we began to load street by street from FIAS and group houses by records. You count, so many numbers will get separated by a comma in NAME, and you create in KLADR a general entry for these houses. (In NAME, by the way, 40 characters.) The matter went.



For those who decide to repeat our path: this is how the dependency diagram of FIAS and KLADR looks

The usual problem of recoding: it is not clear how to transfer fields between similar tables, but the documentation does not help out. There is nothing surprising, because the documentation was written clearly not for sawing one directory into another.

For example, in FIAS there are fields:


In KLADR for the same purpose, only two fields:


What code where to transfer is not clear.

Another example. This is a description of the fields in FIAS:


This is difficult to read, but suppose we had no choice. The failure is that the description does not clarify anything. What is a "formalized name"? What does formalization affect? The main thing is what field to transfer to KLADR?

As a result, we took KLADR and FIAS from the FTS and watched how the field values ​​flow from one directory to another. There is no universal recipe, so it turned out the darkness of diverse rules of transfer.

Found where to take KLADR codes


KLADR-code - the main identifier of the directory KLADR. Only it can uniquely identify the address object.

FIAS also stores KLADR codes. And with the optimization in the directory so-so, therefore, from the ADDROB table alone, you can get the codes in three ways:


At first they stuck in PLAINCODE. Checked, but it sometimes does not coincide with the code that lies in the original KLADR.

No problem, we have CODE! Again a surprise: it is empty so often that over time you stop wondering. And if not empty, it's too early to relax: the code in it may not coincide with PLAINCODE.

We got to the last bastion: KLADR codes began to glue themselves. And what would you think? Yes, the compiled code is often at odds with PLAINCODE or CODE.

The harsh reality of FIAS: CODE! = PLAINCODE! = Glued code. Where to get the correct identifier is not clear. The reference directories started to match and after 50 coffee cups they found a pattern:


And then KLADR code coincides with the code in the official directory.

Understand how to transfer planning structures and additional territories


The developers provided for FIAS more levels of address objects than in KLADR: 13 and 6, respectively.

At the same time, KLADR has objects of “extra” levels: planning structures and dopterritory. These are garden associations, country cooperatives, etc. They are transferred from FIAS with fair crutches. They do this: a conditional garden partnership is turned into a street, and the name of the partnership is added to the name of the subordinate streets in brackets.

In theory it is difficult, I will explain with an example. For example, in the garden partnership “Array N2 ST Vishnya” there is Lugovaya street. When transferring these objects from FIAS to KLADR, this is what happens:


From the point of view of human logic, it looks awful, but the necessary data appears in KLADR. Who are we to go against? Of course, we do the same.

Understand Renames and Reassignments


In Russia, cities, regions and streets are constantly renamed and reassigned. For example, the Moscow Region Railway for 40 years was a city, and then suddenly became a district of Balashikha.

Outdated entities are stored in address directories for backward compatibility. Chains of versions turn out.



There was a city V1. It was renamed - version V2 appeared in the directory. Reconcile - V3. And so on until V <k>

Chains of subordination are transferred taking into account levels that are not in KLADR. For example, the chain “street → city district → city” comes from FIAS to KLADR in the form of “street → city”.

Sometimes in FIAS one version of an object differs from another only in fields that are not present at all in KLADR. For example, on May 20, 2017, the city of Maykop changed OKTMO: it was 79701000, it became 79701000001. Nothing else has changed, but a new version of the object still appeared in FIAS. If the changed data is not provided in KLADR, transfer only the new version of the object correctly. We pretend that the past, as it were, was not.

And sometimes the object due to reassignment goes to a level that is not in KLADR. Typical history: Obiralovka village → Zheleznodorozhny town → Zheleznodorozhny district of Balashikha. The level of urban areas is not in KLADR, and Zheleznodorozhny is resettled from level 4 to level 99. It would seem that the increase, but no: at the 99th level, the disappeared address objects are stored. There was a city, became a declassed element. They also reassigned the streets, and slyly: they created several settlements in KLADR, they gave them the streets of Zheleznodorozhny, and the settlements - Balashikha.

Resolved Abbreviations


FIAS stores all official abbreviations of types of address objects.
LEVELSOCRNAMESCNAMEKOD_T_ST
3Autonomous DistrictAO305
3Territoryter303
3Arearn301
3Ulusat302
3Municipal districtm rn309
There is also a table of abbreviations in KLADR, but there are fewer records. It is impossible to simply transfer abbreviations from FIAS to KLADR: the levels do not match. Some do not exist at all, others have different numbers.

We have been comparing reference books for a long time and still brought the FTS to the clear water. Here's how it shifts levels of contraction.
FIASKLADR
0X
oneone
2one
32
four3
fiveX
6four
7five
eight6
9X
35X
65five
75X
90X
91X
“X” means the level simply closes. All abbreviations that were in FIAS at this level do not fall into KLADR. For example, in the directory there are no urban areas, premises within buildings, land, etc.

Due to level shifts, problems arise with KOD_T_ST, a unique identifier for the abbreviation. KOD_T_ST consists of two parts: the abbreviation level and the abbreviation ID itself. When moving from FIAS to KLADR, the levels shift and conflicts appear.

For example, in FIAS:


In KLADR after level shifting:


And now we do not know whether to go to the embankment or to the district.

In the official KLADR, the problem is solved famously: a nine is substituted for the code of one of the conflicting abbreviations instead of the level. Because they can. Because of this, in the reference KLADR, the abbreviation code for the region is 911 instead of 511, and the five remains in the LEVEL field.

The documentation about this is, of course, not written.

Tested in the fields


We made from FIAS the same KLADR that the FTS supplies. Checked by three parameters:

  1. Structure.
  2. Volume tables.
  3. Work with special software.

The third item was tested by the ASVCheck utility . It checks the compliance of the address format with the instructions of the Bank of Russia. In ASVCheck load KLADR, and then - a list of addresses. The utility checks the addresses in the directory and notes errors.

Through ASVCheck, we checked addresses from a live banking registry. First, they drove through the official KLADR, and then in their own way. Then, of course, looked at the differences.

I must say that ASVCheck is a mysterious guy. He does not say why he marked the address as incorrect. Debugged at random. Some errors in general looked like a utility bug: it ceased to reject part of the addresses when we simply sorted the entries in our KLADR code directory.

But everything ended successfully: now ASVCheck gives the same result on our KLADR and on the FTS directory.

Received KLADR with preference and updates


Now we have KLADR, which will live for centuries. Our customers do not need to quickly redo the software for FIAS: they connect a new directory and everything works as before.

At KLADR HFLabs:


We distribute the full December version for free, you can deploy it on a live system and test it.

If interested, write to elenar@hflabs.ru Elena Rastorgueva. Elena will ask clarifying questions and in a day she will send a new KLADR.

Source: https://habr.com/ru/post/345012/


All Articles