
Hi, Habr!
We parse the addresses in HumanFactorLabs on a particularly large scale. Our products make it
easy to enter contact information and work with them.
For 10 years of work, as a result of analyzing numerous exceptions in Russian addresses, we have developed rules for storing addresses, under which you will not lose important information.
')
Recently on Habré we were asked to give examples of unusual addresses, in connection with which this article was written.
The house number is not a number, but a string
Let's start with my favorite place - the city of Electrostal in the Moscow region. As in any self-respecting city, it has Lenin Avenue. Soviet time is over, and the avenue is growing and developing. Recently there built a new home.

Usually during the construction of new houses they are assigned numbers in the order of continuation. If the construction of houses takes place from the beginning of the street, where it is illogical to place 1 house 36 next to the house, then simply begin a new street.
The construction of new houses on this avenue was planned from the beginning. However, they decided not to do the new street, but to extend Lenin Avenue, adding the number 0 to the left of the number to the new houses.
That is, the addresses
Elektrostal pr Lenina 4 and
Elektrostal pr Lenin 04 are two different addresses.
Unfortunately, this case in Russia is not the only one.
Conclusion: store the house number as a string, so as not to lose the leading zero.
By the way, recently we received such an address from a client:
675018, Amurskaya, Mokhovaya Pad n, house L-1 . Unusual house, is not it? We did not find it on the maps, but it is in FIAS. We still understand how well the existence of the house with the leading letter is correct, but it is quite likely that the house exists, since it was indicated by a real client.
Zip code is important
It happens that in the city two streets are called the same. For example, in Moscow there are two streets on March 8. You can distinguish them only by index.
Or, for example,
Russia, the Arkhangelsk region, Ustyansky district, the village of Berezhnaya on the map of Yandex is found in three places. You do not know the index - the letter will not come.
Conclusion: save the zip code of the object.
No type anywhere
Indices are important, but not always sufficient for defining a street and even a settlement. It happens that different locations have the same zip code:
- Russia, Zabaikalsky Territory, Agin District, Aginskoye Village, Olympic Street, index: 687000
- Russia, Zabaikalsky region, Aginsky district, from Amithash, Olympic street, index: 687000
That is, it is extremely important to keep the type of settlement.
The same applies to the types of streets: if in Yandex you type "Pushkinskaya Moscow" and click "Find", then Yandex will show Pushkin Embankment, although there is still a street and a square.
Conclusion: save the type of town and street.

There are addresses without street. And even without a home.
Sometimes we meet addresses without a street. Rarely - in cities, more often - in settlements. The address “Altai, town of Shebalino, house 2” really exists (and there they prepare delicious cakes).
It happens that the house has no number, only the building. For example, people live in Zelenograd and the Suponevo neighborhood in Zvenigorod:
Zvenigorod, Suponevo, building 1 .
Conclusion: if when saving the address to the database you have checked for empty values, then:
- Allow to save the address without the street;
- Allow to save the address without a house number if the case is indicated.
Take care of the letters
House number defines a separate building. Letters are buildings, extensions, etc., located on the territory of the house. For example, if a house has number 4, then its extension can have number 41, basement - 41, fence - 4I, building - 4, panel board in a separate building - 4.
Details boring official language, if you want proofIn connection with the questions addressed to the Office about the rules for the use of signs in the literacy of buildings and structures, we explain the rules adopted in the technical inventory.
All individual elements of the site (buildings, structures, sidewalks, pavements, etc.) must be poured.
The main buildings and structures are written in capital letters of the Russian alphabet A, B, C, etc. (except for the use of the letter G, intended for office buildings and structures).
Extensions, basement, mezzanine, attic, etc. Lettered with the letters of the main structure, in which they are located, with the addition of the digital value of their number in the order of the list: A1, A2 or B2, B4, etc.
The cold extension is literally written in lower case letters of the Russian alphabet, respectively, to the letter of the main structure: a1, a2 or b1, b2, etc.
Auxiliary buildings and structures are capitalized with the capital letter G with the addition of a digital designation according to the number of the inventory: G1, G2, etc.
Gates, fences and yard tilings are cast in Roman numerals: I, II, III, etc.
It is possible that the number of buildings and structures located on one land plot exceeds the number of letters of the Russian alphabet. The existing rules for the assignment of letters to buildings do not cover this option. We consider it necessary, taking into account the requirements of the Instruction, in this case to use combinations of two letters of the Russian alphabet, for example, AB, AB ... AYA, BV, BG ... BYA, etc.
According to the Instruction on the procedure for ordering the numbering of buildings in Leningrad and its suburbs of administrative subordination approved by the Head of the Technical Inventory Bureau of the Lensovet Soviet on September 12, 1974, household number is assigned one number, regardless of the number of main buildings located in it. The location of buildings on the site is determined by his letter. A special case of such a household is a group of buildings united by a single land plot, for example, the territory of an industrial enterprise.
When allocating an independent land plot from a previously formed land plot and the need for assigning an independent address to it, a housing system is used, in which the building is assigned a hull number, for example, building 1, in addition to the main house number. not necessary.
(see "Instructions on the accounting of housing in the Russian Federation")

The rules of the lettering are valid throughout Russia, but only in St. Petersburg they are especially loved. An address containing several letters is a normal situation for
St. Petersburg :
St. Petersburg, ul. Markin, d. 16B, letter A.(Screenshot from St. Petersburg, made from maps of DoubleGIS - City Information Guide )Some letters are easily confused with numbers: does the letter Z look like the number 3 (how would you read the address
Moscow Star 23 with Z ?), The letter H in the handwritten font can be confused with the four. Letters Q and I look like abbreviations (Is the 4th house 4 letters or the fourth?).
You can separate the house number and the letter with the word “liter” so that addresses like Star Boulevard are more clearly perceived. For example, like this:
Moscow Star 23 letters Z.Findings:- Do not remove the letters from houses.
- Select a few characters for storage of letters (we store three).
- Separate the house number and the letter.
That's all. As an example, look at the structure of the response, which returns the
API Dadata.ru . With such an address storage structure, the problems described above should not concern you.