📜 ⬆️ ⬇️

FIAS addresses in the PostgreSQL environment. Part 1

No matter how we treat the quality of FIAS addresses, it is necessary to work with them, because this is the only all-Russian reference book of addresses. Therefore, sooner or later, it is necessary to solve the problem of associating the location of real estate objects, legal and physical addresses with the address from FIAS.

This article describes the experience with the list of FIAS addressing elements loaded into the database running PostgreSQL. To work with FIAS addressing elements, several functions have been created in the PL / pgSQL language.

The full text of the article consists of 4 parts. The first half of this part of the article contains comments on the implementation of the function. The second is the source text of the function. For those readers who are interested only in the source code, we suggest that you go straight to the Appendix .


Pedigree addressing element


Let's start with an example. A call to the fstf_AddressObjects_AddressObjectTree function ('bfc1236d-b5d2-4734-a238-3b1e4830e963') will result in the following list of entries.

Table 1. Result of executing the fstf_AddressObjects_AddressObjectTree function
AOGUIDCurrStatusActstatusAOLevelShortNameFormalName
db9c4f8b-b706-40e2-b2b4-d31b98dcd3d10oneoneedgeKrasnoyarsk
625497d3-22de-4390-b4b4-2febfbfc15ce0one3rnBalakhtinsky
39da6405-b3e6-4baf-b332-d47b73b4d5fb0one6PMighty
bfc1236d-b5d2-4734-a238-3b1e4830e9630one7streetNew

On closer examination, you can see that the element identifier (AOGUID) “Novaya Street” was passed as the function argument, as a result, four entries were received:
')

The function has one more optional parameter sign of relevance (CurrStatus), with which you can view the pedigree not only of the current address-forming element, but also already outdated.

The full text of the function is given in the Appendix in the subsection Creating a function fstf_AddressObjects_AddressObjectTree

From the very beginning


If you know how FIAS tables are arranged, then this section can be skipped.

The need for such a function is dictated by the fact that the list of FIAS addressing elements (ADDROBJ) is a tree structure, by which each element refers to the identifier (AOGUID) of the parent record by the value of the ParentGUID field. Those. looking through the records of the original list, you usually observe a long list of streets. In order to determine in which locality a street is located, it is necessary by the value of the ParentGUID to find records with such an element identifier.



Fig. 1. The hierarchy of the addressing elements of FIAS.

This is not a slip of the pen and not a reservation. According to the value of ParentGUID, not one, but several records can be found. From which it follows that the identifier of the address-forming element is not the primary key for the table containing the list ADDROBJ.

The fact is that the list of address-forming elements, together with each element, stores the history of its “renaming”. Those. under one element identifier is stored not only the current name of the element, but its former names. That is, a separate entry in the ADDROBJ list stores data about the addressing element, as well as the characteristics of the calendar period during which the name of the element was relevant.

Table 2. History of the street "Krasnoyarsk Territory, Taimyr District, Dolgano-Nenetsky, Dudinka City, Levinsky Peski, Beregovaya Street"
AOID record IDPrevious PrevID IDID of the next record NextIDSign on KLADR CurrStatusSign of Actuality ActStatusStart Date StartDateEndDate Period End Date
fcf51361-5494-4edc-a6bc-d5c0d471c7292a993f3b-5743-426c-8b7d-b5c7affe49cd0one11/25/2015 0:0006.06.2079 0:00
2a993f3b-5743-426c-8b7d-b5c7affe49cd9199c92b-18a5-431a-8b13-f54abe36e84ffcf51361-5494-4edc-a6bc-d5c0d471c7297009/30/2015 0:0011/25/2015 0:00
9199c92b-18a5-431a-8b13-f54abe36e84fb06ff65e-aadb-42eb-9c70-a8548a40645c2a993f3b-5743-426c-8b7d-b5c7affe49cd6009/28/2015 0:0009/30/2015 0:00
b06ff65e-aadb-42eb-9c70-a8548a40645ca997aa7c-f2b1-4b7e-9471-5fc1f51a67529199c92b-18a5-431a-8b13-f54abe36e84ffive004/08/2013 0:0009/28/2015 0:00
a997aa7c-f2b1-4b7e-9471-5fc1f51a6752b27d8e44-ade1-4dd0-b4b5-5eaa7205ff0bb06ff65e-aadb-42eb-9c70-a8548a40645cfour001.11.2013 0:0004/08/2013 0:00
b27d8e44-ade1-4dd0-b4b5-5eaa7205ff0b327b28cc-5171-43c6-bd88-a0a2172bbf71a997aa7c-f2b1-4b7e-9471-5fc1f51a67523012/07/2012 0:0001.11.2013 0:00
327b28cc-5171-43c6-bd88-a0a2172bbf71fb7b54db-7efb-4aaf-b4cb-2364b35b80b3b27d8e44-ade1-4dd0-b4b5-5eaa7205ff0b2001.02.2012 0:0012/07/2012 0:00
327b28cc-5171-43c6-bd88-a0a2172bbf71fb7b54db-7efb-4aaf-b4cb-2364b35b80b3b27d8e44-ade1-4dd0-b4b5-5eaa7205ff0bone001.02.2012 0:0012/07/2012 0:00

The order of the periods of the relevance of the name of the address-forming element can be determined by viewing two multidirectional lists. To do this, each entry on the period of the relevance of the element contains two pointers to the previous one (PrevID) and subsequent periods (NextID). The first period of the address-forming element does not have a pointer to the previous period, while the last (current) one does not have a pointer to the next period.



Fig. 2. The main fields of the entry of the FIAS addressing element.

The period of relevance is characterized by the dates of the beginning and end of the period, respectively StartDate and EndDate. In this case, the dates of the beginning of the first period and the end of the last have conditional values. The date of the beginning of the first period is taken as “01/01/1900 0:00”, and the date of the end of the last (current) period is taken as “06/06/2079 0:00”.

The current (currently valid) name of the address-forming element is indicated in the record of the last period, provided that it is not completed, i.e. The end date of the period is greater than or equal to the current date.

To simplify the search for an entry about the current period of an item, in addition to the start and end date of the period, two more fields are entered: CurrStatus and ActStatus.

ActStatus takes quite expected values: “1” is the current version of the characteristics of the element, “0” is not the current or historical version, as indicated in the directory.

With the values ​​of the CurrStatus field, the situation is more complicated. Using its values, two are solved.
Tasks at the same time: the identifier of each version of the record about the address-forming element is set and the sign of the relevance of the record is assigned. Therefore, the last actual entry about the element contains the value “0” in this field, and all historical records are numbered in the order of appearance - “1” is the earliest entry, following it in time - “2”, etc.

Table 2 contains a list of records with versions of the description of the Beregovaya village, the village of Levinsky Peski. In this list, the previous to the current record contains “7” in the CurrStatus field.

How it works




Fig. 3. Simplified implementation of the fstf_AddressObjects_AddressObjectTree function.

To implement the function, it is suggested to use a recursive query similar to the one shown in Fig. 3, where a_AOGUID is the identifier of the address-forming element, a_CurrStatus is the sign of relevance according to KLADR. Both values ​​are passed to the function via parameters. Like any recursive query, this consists of two parts: the first part contains the first record of the element with the identifier a_AOGUID, the next recursive part sequentially contains all the actual parent records with respect to the records obtained by the process of the previous iterations. The transition to the parent element is performed via the ParentGUID link of the current record. It is important here that each iteration results in only one record. For this, a limit is imposed on the value of the CurrStatus feature.

An example of the result of applying a recursive query in the absence of a unique record at least at one iteration step is given in Fig. 3

The question arises - why is the restriction on the CurrStatus value implemented through the nested subquery, and not by assigning actual values?

Firstly, because CurrStatus = 0 is not always in the current item record, as Table 4 demonstrates.

Secondly, it is necessary that the function returns the result for irrelevant element names. Including, even in the case when on the way between the initial and final elements there is an element for which no actual record is declared.

Table 3. The result of performing the function on an element with a non-unique sign of the choice of record
AOGUIDCurrStatusActstatusAOLevelShortNameFormalName
db9c4f8b-b706-40e2-b2b4-d31b98dcd3d10oneoneedgeKrasnoyarsk
db9c4f8b-b706-40e2-b2b4-d31b98dcd3d10oneoneedgeKrasnoyarsk
3d4c8618-9e22-4838-8f89-80da6851da900one3rnRybinsk
719b789d-2476-430a-89cd-3fedc643d821510fourgZaozerny
719b789d-2476-430a-89cd-3fedc643d82151onefourgZaozerny

If a_CurrStatus = 0 is used as a condition for selecting a single record for an element, the function will not return Zaozerny for city records whose entries are given in Table 4. If you set a_CurrStatus = 51, then the result will be as shown in Table 3 .

Condition:

ao.currstatus = (SELECT MIN(iao.currstatus) FROM fias_AddressObjects iao WHERE ao.aoguid = iao.aoguid) 

... which is present in the second part of the recursive query is not entirely correct, if we proceed from the rule of assigning values ​​for the CurrStatus attribute. Indeed, if among the records of the address-forming element there is no actual (CurrStatus = 0), then the most recent is the record with the maximum, i.e. the last used value of the CurrStatus attribute.

When the above condition is used, the oldest is selected from the records of the irrelevant element.

Table 4. An element with a recurring non-zero value CurrStatus.
AOGUIDCurrStatusActstatusAOLevelShortNameFormalName
719b789d-2476-430a-89cd-3fedc643d821510fourgZaozerny
719b789d-2476-430a-89cd-3fedc643d82151onefourgZaozerny

Therefore, a more correct solution would be to use the following condition:

 ao.currstatus = CASE WHEN 0 < ALL(SELECT iao.currstatus FROM fias_AddressObjects iao WHERE ao.aoguid = iao.aoguid) THEN (SELECT MAX(iao.currstatus) FROM fias_AddressObjects iao WHERE ao.aoguid = iao.aoguid) ELSE 0 END 

But, in this condition there are two subqueries. So you have to choose between semantic severity.
request and the effectiveness of its implementation.

The justification for the use of the first variant of the condition is the actual absence of differences in the names of the address-forming element in the historical records.

So as of October 13, 2016, out of 26,728 address-forming elements of the Krasnoyarsk Territory in 19865 there are historical records. However, only for 1350 elements (6.8% of the number of elements with a history) there are differences in the names of the same element. Those. for 93.2% of the elements, the first and second conditions will return the same list of names. Differences are possible only in the values ​​of the CurrStatus attribute, which, given the purpose of the function, can be neglected.

Fully replacing the CurrStatus feature with ActStatus is not possible. By the condition ActStatus = 1, the current item record is definitely selected, but you must use both features to work with historical records. Such a solution is given in the Appendix Creating the function fstf_AddressObjects_AddressObjectTree .

ATTACHMENT


Creating the function fstf_AddressObjects_AddressObjectTree


A detailed description of the function can be found here.


function code
 BEGIN TRANSACTION; DROP FUNCTION IF EXISTS fstf_AddressObjects_AddressObjectTree(a_AOGUID VARCHAR(36), a_CurrStatus INTEGER); /*************************************************************************/ /*   (  )   */ /*   */ /*************************************************************************/ CREATE OR REPLACE FUNCTION fstf_AddressObjects_AddressObjectTree( a_AOGUID VARCHAR(36), /*    */ /*  */ a_CurrStatus INTEGER default NULL /*    4: */ /* 0 - , */ /* 1-50 - , .. */ /*   , */ /*      */ /*    , */ /* 51 -  */ ) RETURNS TABLE (rtf_AOGUID VARCHAR(36), rtf_CurrStatus INTEGER, rtf_ActStatus INTEGER, rtf_AOLevel INTEGER,rtf_ShortTypeName VARCHAR(10), rtf_AddressObjectName VARCHAR(100)) AS $BODY$ DECLARE c_ActualStatusCode CONSTANT INTEGER :=1; /*    */ /*   */ c_NotActualStatusCode CONSTANT INTEGER :=0; /*     */ v_AOGUID VARCHAR(36); /*    */ v_ParentGUID VARCHAR(36); /*    */ v_CurrStatus INTEGER; /*    4*/ v_ActStatus INTEGER; /*   */ /*   . */ v_AOLevel INTEGER; /*   */ v_ShortName VARCHAR(10); /*     */ v_FormalName VARCHAR(120); /*    */ v_Return_Error INTEGER; /*   */ --*********************************************************************** --*********************************************************************** BEGIN IF a_CurrStatus IS NOT NULL THEN SELECT INTO v_AOGUID,v_ParentGUID,v_CurrStatus,v_ActStatus,v_AOLevel, v_ShortName, v_FormalName ao.AOGUID,ao.ParentGUID,ao.CurrStatus,ao.ActStatus,ao.AOLevel, ao.ShortName, ao.FormalName FROM fias_AddressObjects ao WHERE ao.AOGUID=a_AOGUID AND ao.CurrStatus=a_CurrStatus; ELSE SELECT INTO v_AOGUID,v_ParentGUID,v_CurrStatus,v_ActStatus,v_AOLevel, v_ShortName, v_FormalName ao.AOGUID,ao.ParentGUID,ao.CurrStatus,ao.ActStatus,ao.AOLevel, ao.ShortName, ao.FormalName FROM fias_AddressObjects ao WHERE ao.AOGUID=a_AOGUID AND ao.ActStatus=c_ActualStatusCode; IF NOT FOUND THEN SELECT INTO v_AOGUID,v_ParentGUID,v_CurrStatus,v_ActStatus,v_AOLevel, v_ShortName, v_FormalName ao.AOGUID,ao.ParentGUID,ao.CurrStatus,ao.ActStatus,ao.AOLevel, ao.ShortName, ao.FormalName FROM fias_AddressObjects ao WHERE ao.AOGUID=a_AOGUID AND ao.ActStatus=c_NotActualStatusCode AND ao.currstatus = (SELECT MAX(iao.currstatus) FROM fias_AddressObjects iao WHERE ao.aoguid = iao.aoguid); END IF; END IF; RETURN QUERY SELECT v_AOGUID,v_CurrStatus,v_ActStatus,v_AOLevel, v_ShortName,v_FormalName; WHILE v_ParentGUID IS NOT NULL LOOP SELECT INTO v_AOGUID,v_ParentGUID,v_CurrStatus,v_ActStatus,v_AOLevel, v_ShortName, v_FormalName ao.AOGUID,ao.ParentGUID,ao.CurrStatus,ao.ActStatus,ao.AOLevel, ao.ShortName,ao.FormalName FROM fias_AddressObjects ao WHERE ao.AOGUID=v_ParentGUID AND ao.ActStatus=c_ActualStatusCode; IF NOT FOUND THEN SELECT INTO v_AOGUID,v_ParentGUID,v_CurrStatus,v_ActStatus,v_AOLevel, v_ShortName,v_FormalName ao.AOGUID,ao.ParentGUID,ao.CurrStatus,ao.ActStatus,ao.AOLevel, ao.ShortName, ao.FormalName FROM fias_AddressObjects ao WHERE ao.AOGUID=v_ParentGUID AND ao.ActStatus=c_NotActualStatusCode AND ao.currstatus = (SELECT MAX(iao.currstatus) FROM fias_AddressObjects iao WHERE ao.aoguid = iao.aoguid); END IF; RETURN QUERY SELECT v_AOGUID,v_CurrStatus,v_ActStatus,v_AOLevel,v_ShortName, v_FormalName; END LOOP; END; $BODY$ LANGUAGE plpgsql; COMMENT ON FUNCTION fstf_AddressObjects_AddressObjectTree(a_AOGUID VARCHAR(36), a_CurrStatus INTEGER) IS '  (  )    '; --ROLLBACK TRANSACTION; COMMIT TRANSACTION; SELECT * FROM fstf_AddressObjects_AddressObjectTree('719b789d-2476-430a-89cd-3fedc643d821',51) ORDER BY rtf_AOLevel; SELECT * FROM fstf_AddressObjects_AddressObjectTree('719b789d-2476-430a-89cd-3fedc643d821') ORDER BY rtf_AOLevel; 


Thanks for attention!

Source: https://habr.com/ru/post/316314/


All Articles