
Government agencies are putting online directories with dozens of gigabytes of information. If you know where to look, you can legally collect data on apartments on an industrial scale.
Databases with indices and areas of cities are also open. Bonus, I will tell you how to find these parts of the address, if they are missing.
')
All directories from this article are free and openly on the Internet. None of the mysterious hackers stole from the FSB.
Does housing exist
When you send a letter or pizza to a customer, it’s helpful to check if the apartment is flat. It's a shame to give money for a letter that has not arrived or to drive a courier for nothing.The Central Election Commission database is an easy-to-understand directory that can be used to check the existence of an apartment.
As planned, the CEC base is needed so that every citizen can recognize his polling station. For us, the main thing is that it has addresses down to the apartments.
The online version of the CEC base is similar to the "Explorer" in older versions of WindowsThe base has a flaw: it only stores apartments where Russian citizens are registered with the right to vote.
Another disadvantage is the spider speed. The database is hosted on very slow servers, so it will take weeks to download data: a large city without an area is loaded for 24 hours. At the same time, you can launch the spider, come the next day and see that the collection was interrupted due to changes in the structure of the base.
It seems to be the ideal alternative - the FIAS state address register, which
we wrote about in other articles . According to the documentation it has addresses to the apartment, and it is also available in .dbf. The problem is the same: the table with apartments in FIAS is almost empty: according to FIAS, there are about 70,000 apartments in Moscow. Residential homes are only slightly smaller.
Far better FIAS to check the apartment fit the data of the State Cadastral Evaluation Fund. We will tell about this directory in the "Area" section.
Postcode
The addressee's index must be checked before any shipment, otherwise the parcel may go wrong. The deputy director of the Tyumen branch of Post laments : “Almost 30% <...> of shipments, which is about 6 million units, come with an incorrect index, and we either return the correspondence back or it goes to another address ...”.You can find or check the index in the database of the Russian Post, it is laid out on the page
http://vinfo.russianpost.ru/database/ops.html in .dbf-format. Data is updated twice a month.
In the base of the Post there is no approximation to the street or house, only the town and the list of indexes. For us, this is enough: if the index to the settlement is correct, sending to the right department will delay delivery by only 2-3 days.
For example, the client left the address "683000, Nizhny Novgorod, ul ...". The Mail Base says that there is no such index in Nizhny, and 683xxx offices are located in Petropavlovsk-Kamchatsky. The client made a mistake in numbers rather than in the name of the city, so a good solution is to load the indices of Nizhny Novgorod from the Mail database and substitute the minimum 603,000. So the package will arrive in the desired city without traveling around the country for months.
How to connect the database of Russian Post indexes,
Envek already wrote,
you will find the details in his article .
A district of the city
Districts need realtors to group apartments for sale and rent. It is convenient to link the cost of delivery within the city to the district.Urban area is rarely indicated in the address: not stuck. In this case, knowing which area the house is in is sometimes helpful.
It is convenient to search for a district at the CEC database, which we talked about in the first section. It has houses and apartments, so you get an accurate result. The problems, too, have not gone away: there are not enough apartments and houses in the database, and it takes weeks to spyder.
A quick solution for locating a district is OKATO (All-Russian Classifier of Administrative-Territorial Division Objects). This colossal document describes the structure of the administrative division of Russia from the subject of the federation to the street. There are no houses in the directory.
OKATO is in the form of a ready-made database
on the GNIVTS FTS website . For lovers of surfing, there is also a text version in
ConsultantPlus . It also describes in detail the structure of the directory.
So, starting from the districts of the Altai Territory, OKATO describes the entire structure of the administrative subordination of the countryParsing the directory will give the base of intracity areas with subordinate streets.
Minus OKATO - the request can not be clarified with the help of the house. If the street is spread over several areas, do not get an unambiguous result. In such cases, the CEC base will work better.
FIAS, the main address book of the country, is not at all suitable for our purposes: they did not fill in the districts even for Moscow.
Square
By area, they consider the price of the apartment, filter the offers in the database of realtors.The area of ​​housing is in the open data of the State Cadastral Evaluation Fund. In addition to the area in the cadastral valuation base, there are many interesting things: according to the rules, any building and premises in Russia must be valued by the cadastral value, materials, number of floors, etc. Also, the cadastral valuation data is used to verify the existence of an apartment.
Cadastral valuation databases are downloaded
on the Rosreestr website .
We are interested in the item "Reports on the determination of cadastral value"Reports are divided by region and will have to be downloaded separately. For each region there are several reports available.
Apartment areas are stored in a report with the premises, facilities, construction in progress and buildingsMost of the reports are dated 2011–2014, but the state has no other data for us. Thanks and on that.
Downloading reports is a separate “pleasure”: the archive with files in the Republic of Mari El “weighs” only 1.5 gigabytes, but five hours was downloaded from the Rosreestr website.
Inside the Rosreestr archive hundreds of xml-files with the parameters of apartments and houses: areas, cadastral numbers, KLADR identifiers, OKATO codes, the materials from which the houses are builtParsing a cadastral report is a special topic that deserves a great article. Someday we will write it, but for now - a
description of the XML schema of reports on the Rosreestr website .
approximate cost
The cost of housing is used by banks to assess the wealth of borrowers.When there is space, only the price per square meter is missing for calculating the cost of housing. They are looking for the price of a “square” in a house on ad sites:
irr.ru ,
avito.ru ,
cian.ru. The ad is suitable if it contains the address of the house, the cost and area.
There is no recipe for quick accumulation of the base: ads need to be collected regularly and the longer the better. This is the only way to achieve any good address coverage. True, in a few years in small cities, towns, villages there will be white spots.
In order to regularly receive announcements, it is better to agree with the site: write to the owners and offer cooperation.
If the site has failed, it remains to spide. You need to mentally prepare for the fact that ad sites are constantly changing the layout and the spider will have to be rewritten before each launch.
What can not be learned legally
Some data can be obtained only by breaking the law. Usually, the sellers of illegal information simply lie, offering a set of unrelated tables under the guise of a secret base of the Ministry of Internal Affairs.In the public domain there is not and can not be:
- a list of people registered in an apartment or house;
- passport data of residents;
- home phone number.
The maximum that you can find out about the phone: check the compliance of the number to the locality according to the numbering plan (available
on the official website of Rossvyaz in .csv ).
Open source problems
In each paragraph about each directory I wrote “there is a problem”, “something else is fun”, “it is necessary to double-check”. All this is because official state registries are crooked.For example, this is how our developer estimated the base for cadastral valuation in the Krasnodar Territory: “The apartment number is not in the apartments field, but in the name field. There are many rural districts in the city field, which do not parish (about 40%). Out of 55885 apartment houses, 25483 were included in the final sample. ”
To bring to the production-mind at least FIAS, even the CEC base, even the cadastral assessment report is a task for a person with a strong nerve who has free time. We ate this salted porridge in kilograms when we were making
Dadatu . In order not to fight with curved references,
try our API .
One request to the API "Dadaty" costs 10 kopecks, but we have already passed seven laps of parsing directories and follow everything that the state puts into open access