Housing prices are formed from numerous factors, the main of which is the proximity to the city center and the presence of a number of different infrastructure. But real prices only in paper newspapers and real estate sites. We will build our map with real estate prices in Moscow with the help of python, Yandex API and matplotlib, a special report from the scene under the cut.
Hypothesis
As a person who does not live in Moscow, I appreciate the nature of prices in Moscow as follows:
- very expensive - within the garden ring
- expensive - from the garden ring to the TTC
- not very expensive - between TTK and MKAD, and the price decreases linearly in the direction of MKAD
- cheap - for the Moscow Ring Road
The map will contain local maxima and minima due to the proximity of important facilities or industrial zones. And there will also be a gap in prices before and after the MKAD, since This ring basically coincides with the administrative boundary of the city.
Hundreds of lines of magnificent and not so python-code will be available at the end of the article by reference.
For research, I took two sites on real estate data for the summer of this year. A total of 24,000 records of new buildings and resale housing participated in the sample, with various ads with a single address averaged by price.
')
The ads were parsed by the script and stored in the sqlite database in the format:
, , ..
About web spidersYes, due to the lack of knowledge, no third-party libraries were used and this led to the creation of two separate scripts, one for each site, pulling addresses, footage and cost of apartments. Addresses magically turned into coordinates through the Google Geocoder API. But because of the rather low level of use
quota , I was forced to run the script every day during the week. Yandex geocoder is 10 times
free .
We build function
To generalize the function to the whole plane, it must be interpolated by the existing points. For this, the
LinearNDInterpolator
function from the scipy module is suitable. To do this, you only need to install python with a set of scientific libraries, known as scipy. In the case when the data is very heterogeneous, it is almost impossible to find a plausible function on the plane. The
LinearNDInterpolator
method uses
Delaunay triangulation, breaking the whole plane into many triangles.
An important factor to consider when building functions is the spread of function values. Among the ads come across real monsters with a price per square meter of more than 10 million rubles
inside the Kremlin , they will spoil the schedule and you will see only a uniform field with a bright dot. In order for the graph to be able to distinguish almost all the data, such values need to be filtered by the boundary chosen empirically. For a statistical model, these values do not carry useful information.
Meanwhile, the result of the interpolation looks like a gradient hell (clickable):
To get an easy-to-read map, you need to distribute the values obtained to discrete levels. After that, the map becomes similar to the page from the atlas for the 7th class (clickable):
About discretization on the mapDepending on whether we want to see the general picture of prices or fluctuations near the average value, it is necessary to apply data
compassing , i.e. the distribution of data is more uniform on the scale of values, reducing more values and increasing small ones. In code, it looks like this:
zz = np.array(map(lambda x: map(lambda y: int(2*(0.956657*math.log(y) - 10.6288)) , x), zz))
Functions were selected empirically by approximation by
3–4 points on
wolframalpha .
It is worth noting that the linear interpolation method cannot calculate the values outside the boundary points. Thus, on a graph with a sufficiently large scale, we will see a very many polygon. The scale must be chosen in such a way that the graph is completely inscribed in the resulting figure.
Another look at the statistics can serve as a map with areas of low and high prices. By dynamically varying the dividing boundary into low and high prices, we will be able to see the price position in dynamics. The value of the price at each point will no longer play a role, the contribution is made only by the accuracy of the points of a particular group (clickable).
Calculations are similar to the calculation of the gravitational field at a point. For optimization, we will take into account only those points that actually contribute to the final value of the field. After calculations, the result resembles a spray (clickable).
What is the conversion?With a strict construction of the graph of the field, it shows a scattering of points corresponding to the local prevalence of the “expensive” field over the “inexpensive” field and vice versa. These points are like noise and spoil the schedule. You can remove them, for example, by the median filter above the image with a sufficiently large value. For this, I used the command interface of the program IrfanView.
Visualization
Combine the images with a schematic map of Moscow. Yandex API allows you to take a map by coordinates and specify angular dimensions for it in longitude and latitude, as well as the desired image size.
Request example:
static-maps.yandex.ru/1.x/?ll=37.5946002,55.7622764&spn=0.25,0.25&size=400,400&l=map
The only problem is that the specified angular dimensions determine not the borders of the visible area, but its guaranteed size. This means that we get a picture with angular dimensions> = 0.25. There was no way to cope with the boundaries of the visible coordinates, and they were searched manually.
About podgonianeYou can align maps relative to each other using Yandex labels, draw points on a map with given coordinates and get a map with labels.
For a couple of calls from the PIL library, images are combined with comfortable levels of transparency for observation.
map_img = Image.open(map_img_name, 'r').convert('RGBA') price_img = Image.open(prices_img_name, 'r').convert('RGBA') if price_img.size == map_img.size: result_img = Image.blend(map_img, price_img, 0.5)
results
Three images with different levels of companding and field version animation.




Some analytics:
In general, as predicted by the hypothesis, inside the garden ring and the TTC ring housing prices are maximum and decrease with distance from the center. However, within the Moscow Ring Road, the average price remains in the western and south-western parts. Outside the Moscow Ring Road, as well as in the eastern part of the TTC, the price is lower than the average.
In detail, everything is much more interesting, we note the main areas:
In the meadows and the Sparrow Hills, living rather expensive residential real estate in the area of the Sparrow Hills is not; rather, the whole area was built according to boundary values from above and below.- Residential areas near the MSU fundamental library, built and constructed high-rise buildings near Mosfilmovskiy pond are more expensive, presumably due to active construction and extensive forest-park areas. The high price in the territory of the memorial synagogue and square them. Anna Herman is also due to the surrounding prices and its location among parks and game reserves.
- In the area between the metro station "Krylatskoye" and Marshal Zhukov Avenue, housing is also considered expensive
- Despite the situation outside the Moscow Ring Road and proximity to the cemetery, houses along General Beloborodov Street are distinguished by high prices.
As can be seen on the maps, the theory is fully confirmed by the practice and a good combination of infrastructure, distance to the center and proximity to well-known Moscow facilities will be revealed by the function of linear interpolation over the coordinates.
Unfortunately, the work done is not automated in many ways, but if the article is interesting for the habrovans, I will build similar maps for the rest of the big cities of our country.
The code of the web spiders, the program itself, as well as the databases used are available through
GitHub .