Part 1. The tragic. “Why do I need all this ?!”
We somehow faced the need to enter correct information about the location (registration, registration) of users, and with the fact that this problem is not solved quite as easily and simply as we would like. At first we tried KLADR, in its free incarnation. Not that we really didn’t like everything, but there was at least one very annoying circumstance - some addresses were missing in the directory. For example, there is house 10, and 10k1 - sorry, they did not deliver. In general, KLADR was attractive because it has a simple API and plugins (jQuery in particular) that can be easily embedded into the application, but repelled by content. We thought - if there is no such resource containing the full and most relevant address information, with API and plugins, the only way out is to create such a resource yourself.
Part 2. Seeker. “Where is the dog buried ?!”
And we heard about FIAS. About how he is infinitely full and beautiful. And this is the way out! There are bases on the FIAS website, and all that is needed is a young growing information system! However, the size of the base is more than 4 Gb, but oh well, this is all of Russia! The base is updated regularly, so there is where to roam. Things are easy - to deploy the base, to tie API and plugins. It is necessary to make a reservation that there are several relevant and important articles. For example, a series of
these articles here , which initially helped a lot.
Part 3. "How it works"
The problems began with the fact that the FIAS database is supplied in the .dbf format, the data from which had to somehow be transferred to PostgreSQL, which was decided to be used as a database.
')
It should be noted that in addition to the .dbf format, the database is in xml format. As you know, a person wants everything at once. To turn on and work. But to deal immediately with what FIAS gives is quite problematic.
So, at the fias.nalog.ru address in the update section there is the following:
- Information about updates: according to official data, they come out at least once a week. Specificity is as follows. Sometimes there are broken updates. Their size is about 1.2 Mb and when downloading a broken archive is given, so hurry up to tune in only for the latest update - raises questions.
- Updates in the form of fias_delta_. - in .dbf and .xml formats.
- Actually fias_ database. - in .dbf and .xml formats.
- Database in KLADR format. File name “Base. *” Arj or 7z It differs in that many fields are missing (regarding tax data, etc.), the search is performed by KLADR id. Initially, there is a temptation to use this particular format, as it is easier to understand and subsequent implementation. But a closer examination of the portal will reveal that this version will be supported only until the end of 2017, and then it will be deprecated.
Note: to work with .dbf files, you need to install a third-party extension for php_dbase.dll.So what is all the same to choose the version to download .xml or .dbf? The differences are as follows:
- xml - each file in the archive is a separate table. What is the problem. For example, the size of the file “HOUSEXX” where stored at home is more than 20 Gb in unzipped form. How to parse it? On the one hand, if you have more than 30 Gb of RAM, then this will not be a problem. Also, if you are a connoisseur of literate libraries, such as SAX Parser for Java, which do not load the entire file into memory, but parse parts, then something may happen.
- dbf - files are not only divided as tables, but also divided by region. It would be a more acceptable option if you do not need the addresses of the vast Motherland, but only one region (as in our particular case).
So, unzipped the files. What information do we need? We see the following set of files:
- ADDROB.DBF, where XX is the number of the region, contains data directly on the region, autonomous okrugs, cities and other settlements, streets.
- HOUSE.DBF, where XX is the region number, contains information on house numbers.
- NORDOC.DBF, where XX is the region number, contains information on the reasons for changes in various records.
- ROOM.DBF, where XX - the number of the region - contains information about the premises.
- STEAD.DBF, where XX - the number of the region - contains information about land plots.
- SOCRBASE.DBF - contains information about abbreviations.
- STRSTAT.DBF - contains information about the type of building.
In addition to these tables, there are a number of other service tables that contain information about abbreviations in other tables.
In most cases, it is enough to form an address right up to the house. Although, if anyone needs it, you can go further.
Thus, we will create 2 tables in the postgresql database.
Table with addresses:
CREATE TABLE addrs ( "ACTSTATUS" integer, "AOGUID" character varying(36) COLLATE pg_catalog."default", "AOID" character varying(36) COLLATE pg_catalog."default", "AOLEVEL" integer, "AREACODE" integer, "AUTOCODE" integer, "CENTSTATUS" integer, "CITYCODE" integer, "CODE" character varying(20) COLLATE pg_catalog."default", "CURRSTATUS" integer, "ENDDATE" timestamp, "FORMALNAME" character varying(120) COLLATE pg_catalog."default", "IFNSFL" integer, "IFNSUL" integer, "NEXTID" character varying(36) COLLATE pg_catalog."default", "OFFNAME" character varying(120) COLLATE pg_catalog."default", "OKATO" VARCHAR(11), "OKTMO" VARCHAR(11), "OPERSTATUS" integer, "PARENTGUID" character varying(36) COLLATE pg_catalog."default", "PLACECODE" integer, "PLAINCODE" character varying(20) COLLATE pg_catalog."default", "POSTALCODE" integer, "PREVID" character varying(36) COLLATE pg_catalog."default", "REGIONCODE" integer, "SHORTNAME" character varying(15) COLLATE pg_catalog."default", "STARTDATE" timestamp, "STREETCODE" integer, "TERRIFNSFL" integer, "TERRIFNSUL" integer, "UPDATEDATE" timestamp, "CTARCODE" integer, "EXTRCODE" integer, "SEXTCODE" integer, "LIVESTATUS" integer, "NORMDOC" character varying(36) COLLATE pg_catalog."default", "PLANCODE" integer, "CADNUM" integer, "DIVTYPE" integer ) WITH ( OIDS = FALSE ) TABLESPACE pg_default; ALTER TABLE address OWNER to postgres;
Table with house numbers:
CREATE TABLE hous ( "AOGUID" character varying(36) COLLATE pg_catalog."default", "BUILDNUM" character varying(10) COLLATE pg_catalog."default", "ENDDATE" timestamp, "ESTSTATUS" integer, "HOUSEGUID" character varying(36) COLLATE pg_catalog."default", "HOUSEID" character varying(36) COLLATE pg_catalog."default", "HOUSENUM" character varying(15) COLLATE pg_catalog."default", "STATSTATUS" integer, "IFNSFL" integer, "IFNSUL" integer, "OKATO" VARCHAR(11), "OKTMO" VARCHAR(11), "POSTALCODE" integer, "STARTDATE" timestamp, "STRUCNUM" VARCHAR(15), "STRSTATUS" integer, "TERRIFNSFL" integer, "TERRIFNSUL" integer, "UPDATEDATE" timestamp, "NORMDOC" character varying(36) COLLATE pg_catalog."default", "COUNTER" integer, "CADNUM" VARCHAR(50), "DIVTYPE" integer ) WITH ( OIDS = FALSE ) TABLESPACE pg_default;
Data import is carried out in a simple way. Open files in Excel and save them as csv. Additionally, it is recommended to change the encoding, because unlike xml files that are represented in utf-8 encoding, dbf files are encoded in win-866. Open the files in the editor (notepad ++ is suitable for this purpose) and convert to utf-8.
Import a table with addresses:
COPY addrs FROM 'PathToTheFile\ADDROB01.csv' DELIMITER ';' CSV;
Import tables with houses:
COPY addrs FROM 'PathToTheFile\HOUSE30.csv' DELIMITER ';' CSV;
What is the ADDROBXX table made of?Despite the abundance of fields, only some of them will be needed.
- ACTSTATUS - the field contains the values 0 and 1. Upon careful viewing, it will be revealed that there are many streets in the table with the same name. The fact is that the tax base contains the history of objects, i.e. changes that occurred with any address elements. For example, the streets of Lenin may be 10 pieces. Of these, 9 - will be irrelevant. They could be renamed or changed in another way. ACTSTATUS - 0 - irrelevant addresses, and 1 - will be the only relevant.
- AOGUID - Record ID. For different historical records will be the same AOGUID. They will differ in the AOID field.
- AOLEVEL - object level. At number 1 - the subject of the federation, 4 - the city, 7 - the street, etc.
- CENTSTATUS - the status of the center. For example, cities in the regional center.
- FORMALNAME is the name of the object.
- PARENTGUID - ID of the parent element. When searching for the city to which the street belongs. PARENTGUID streets will match with AOGUID streets.
- POSTALCODE - zip code.
- REGIONCODE - code of the subject of the federation.
Note: on fias-nalog.ru you can find a complete description of all fields.Important in the HOUSEXX table:- AOGUID - ID of the street in ADDROBXX where the house is located.
- HOUSEGUID - ID of the house.
- The difficulty lies in the name of the house. It consists of 4 fields:
- BUILDNUM is the hull number.
- HOUSENUM - house number.
- STRUCNUM is the building number.
- STRSTATUS is a sign of the structure (from 0 to 4, where 0 is none, 1 is a structure, 2 is a structure, 3 is a letter).
Fields can contain similar or identical information and have to leave the situation experimentally. In the QuerryController class (the link to the repository is below) there is a method: chooseBuilding, in which we tried to solve this problem. Maybe someone will get better. Queries to the database are in the same place.
Then, after the base was imported into Postgres, we started creating an API and a plug-in for our system.
For the API, in order not to get stuck, used Laravel. The query scheme turned out quite simple. The hierarchy of objects is as follows:
- Subject of the federation
- Region of the Federation
- Locality
- The outside
- House
By the way, testing the system, we are faced with the fact that not all settlements have streets, and not all streets have houses, which surprised us a lot and puzzled us. This circumstance did not give rest because if you give the user the opportunity to enter information himself, they will write a question that involuntarily raises the question - “Do these people learn anything at school ?!”.
Therefore, it was decided not to give the user the slightest opportunity for self-activity, and for such “missing” addresses to provide the opportunity to apply for the inclusion of the missing address in the help system.
The query scheme looks simple:
For the last item, a list of houses with letters, buildings, buildings, and other things is formed.
After all the ordeals made a plugin on vue.js, to work with the directory, and its alternative to jQuery.
The region and the district can be removed as unnecessary, as they are pulled together with the cities.
As you type, autocomplete hints pop up, as in KLADR. True, the difference is that the KLADR plugin is intended for autocompletion, and here only the address selected from the prompts is considered valid.
In the ASPUDcomponent folder, there is a VueJs component for working with the address base.
Sources are available in
our repository .
Part 4. How to update all this?
With updates, the situation is the following: you first need to get the update versions via the SOAP protocol. You can see how this is done in the UpdateController class (method: filesVersions ()).
Note: the version that is listed last in the list is not necessarily the one that can be downloaded on the main page. But do not rush to download only the latest version, as it may be “bat”. There have been such precedents. Further the archive with the latest version is downloaded and unpacked. To work, you must use the extension for php (php_rar.dll).
Well, then select the desired region file (or, if necessary, all files) to update the database.