📜 ⬆️ ⬇️

Housing cost as a function of coordinates


Housing prices are formed from numerous factors, the main of which is the proximity to the city center and the presence of a number of different infrastructure. But real prices only in paper newspapers and real estate sites. We will build our map with real estate prices in Moscow with the help of python, Yandex API and matplotlib, a special report from the scene under the cut.

Hypothesis


As a person who does not live in Moscow, I appreciate the nature of prices in Moscow as follows:

The map will contain local maxima and minima due to the proximity of important facilities or industrial zones. And there will also be a gap in prices before and after the MKAD, since This ring basically coincides with the administrative boundary of the city.

Hundreds of lines of magnificent and not so python-code will be available at the end of the article by reference.

For research, I took two sites on real estate data for the summer of this year. A total of 24,000 records of new buildings and resale housing participated in the sample, with various ads with a single address averaged by price.
')
The ads were parsed by the script and stored in the sqlite database in the format:
, , ..


About web spiders
Yes, due to the lack of knowledge, no third-party libraries were used and this led to the creation of two separate scripts, one for each site, pulling addresses, footage and cost of apartments. Addresses magically turned into coordinates through the Google Geocoder API. But because of the rather low level of use quota , I was forced to run the script every day during the week. Yandex geocoder is 10 times free .


We build function


To generalize the function to the whole plane, it must be interpolated by the existing points. For this, the LinearNDInterpolator function from the scipy module is suitable. To do this, you only need to install python with a set of scientific libraries, known as scipy. In the case when the data is very heterogeneous, it is almost impossible to find a plausible function on the plane. The LinearNDInterpolator method uses Delaunay triangulation, breaking the whole plane into many triangles.

An important factor to consider when building functions is the spread of function values. Among the ads come across real monsters with a price per square meter of more than 10 million rubles inside the Kremlin , they will spoil the schedule and you will see only a uniform field with a bright dot. In order for the graph to be able to distinguish almost all the data, such values ​​need to be filtered by the boundary chosen empirically. For a statistical model, these values ​​do not carry useful information.

Meanwhile, the result of the interpolation looks like a gradient hell (clickable):



To get an easy-to-read map, you need to distribute the values ​​obtained to discrete levels. After that, the map becomes similar to the page from the atlas for the 7th class (clickable):



About discretization on the map
Depending on whether we want to see the general picture of prices or fluctuations near the average value, it is necessary to apply data compassing , i.e. the distribution of data is more uniform on the scale of values, reducing more values ​​and increasing small ones. In code, it looks like this:
  zz = np.array(map(lambda x: map(lambda y: int(2*(0.956657*math.log(y) - 10.6288)) , x), zz)) #HARD zz = np.array(map(lambda x: map(lambda y: int(2*(0.708516*math.log(y) - 7.12526)) , x), zz)) #MEDIUM zz = np.array(map(lambda x: map(lambda y: int(2*(0.568065*math.log(y) - 5.10212)) , x), zz)) #LOW 

Functions were selected empirically by approximation by 3–4 points on wolframalpha .


It is worth noting that the linear interpolation method cannot calculate the values ​​outside the boundary points. Thus, on a graph with a sufficiently large scale, we will see a very many polygon. The scale must be chosen in such a way that the graph is completely inscribed in the resulting figure.

Another look at the statistics can serve as a map with areas of low and high prices. By dynamically varying the dividing boundary into low and high prices, we will be able to see the price position in dynamics. The value of the price at each point will no longer play a role, the contribution is made only by the accuracy of the points of a particular group (clickable).



Calculations are similar to the calculation of the gravitational field at a point. For optimization, we will take into account only those points that actually contribute to the final value of the field. After calculations, the result resembles a spray (clickable).



What is the conversion?
With a strict construction of the graph of the field, it shows a scattering of points corresponding to the local prevalence of the “expensive” field over the “inexpensive” field and vice versa. These points are like noise and spoil the schedule. You can remove them, for example, by the median filter above the image with a sufficiently large value. For this, I used the command interface of the program IrfanView.


Visualization


Combine the images with a schematic map of Moscow. Yandex API allows you to take a map by coordinates and specify angular dimensions for it in longitude and latitude, as well as the desired image size.

Request example:
static-maps.yandex.ru/1.x/?ll=37.5946002,55.7622764&spn=0.25,0.25&size=400,400&l=map


The only problem is that the specified angular dimensions determine not the borders of the visible area, but its guaranteed size. This means that we get a picture with angular dimensions> = 0.25. There was no way to cope with the boundaries of the visible coordinates, and they were searched manually.

About podgoniane
You can align maps relative to each other using Yandex labels, draw points on a map with given coordinates and get a map with labels.

For a couple of calls from the PIL library, images are combined with comfortable levels of transparency for observation.

 map_img = Image.open(map_img_name, 'r').convert('RGBA') price_img = Image.open(prices_img_name, 'r').convert('RGBA') if price_img.size == map_img.size: result_img = Image.blend(map_img, price_img, 0.5) 


results


Three images with different levels of companding and field version animation.



Some analytics:

In general, as predicted by the hypothesis, inside the garden ring and the TTC ring housing prices are maximum and decrease with distance from the center. However, within the Moscow Ring Road, the average price remains in the western and south-western parts. Outside the Moscow Ring Road, as well as in the eastern part of the TTC, the price is lower than the average.

In detail, everything is much more interesting, we note the main areas:

As can be seen on the maps, the theory is fully confirmed by the practice and a good combination of infrastructure, distance to the center and proximity to well-known Moscow facilities will be revealed by the function of linear interpolation over the coordinates.

Unfortunately, the work done is not automated in many ways, but if the article is interesting for the habrovans, I will build similar maps for the rest of the big cities of our country.

The code of the web spiders, the program itself, as well as the databases used are available through GitHub .

Source: https://habr.com/ru/post/265783/


All Articles