Recently, I have observed a trend that more and more people are facing problems in the absence of geodata. Rather, even not so in their closeness. If yesterday everyone was satisfied with the possibilities of mapping services for the placement of markers, now the user wants more: highlight the street, show houses on it, calculate the length of rivers, etc. And then a surprise awaits them, seemingly on the map, they see all this, but they cannot do anything - they are just pictures. The development that Google.Maps, that Yandex.Maps stopped at the display of pictures, geocoding, and navigation.
Step right, step left and you're at a dead end. The only way is to recreate the necessary geometry yourself. And if you draw a road and a dozen houses down the street quite quickly and easily, then let's say a hundred other kilometers of the waterway repels all desire.
It is in such difficult moments behind the backs advertised by Google and Yandex that the OpenStreetMap project is visible, and he says: - I have the data, you want it .
Yes, perhaps not so full in places, but it is geodata, and not funny pictures. And this opens up great opportunities for their processing and analysis. In addition, the data are available under an open ODbL license and have two main conditions of use: a mandatory reference to the data source (OpenStreetMap members) and in the case of public derived data, they must also be published under an ODbL license.
And now let's dip into this data. Today our goal is to extract addresses from them and get their coordinates. And we will do it not by the piece, as geocoders do - by entering the address and getting the coordinates, but immediately all are available. Save the result will be in CSV.
The peculiarity of the simple way is that we do not need any databases, know yourself copy and paste. But that it was possible, will have to make sacrifices. Namely, our simplification is that we will extract addresses for a predetermined one locality. We simply initially limit our searches to a given territory — we will cut out the required locality from the general data set.
So, the starting point is OSM data about the entire planet . But they are very large, so if we do not need to cover the entire planet, then we take more local territories, for example, on Geofabric is divided by continents. For the Russian segment, the best option is Gis-Lab , where files are conveniently cut by region. We take the necessary region file in pbf format.
For further work we will need the tools osmconvert and osmfilter . With the help of the first one, you can convert data from different formats and cut, leaving only the regions of interest, thereby reducing the volume and, as a consequence, the processing speed. The second is designed to filter objects by their properties.
As agreed, we need to localize the data only for our locality. Using osmconvert, this can be done clumsily by specifying a bounding box with two points. The parameter -b=<x1>,<y1>,<x2>,<y2>
specified for this by specifying the south-west point and the north-east point. If the settlement has a complex shape and a rectangular cutout is not suitable, you can create a polygon trimming from broken lines. The parameter for setting such a file will be -B=file.poly
. The format is pretty simple: the first line is the name; then "1" contour number; then we list the coordinates of points that would completely cover our NP and where the last point closes on the first; end of contour; end of file. More information about the format and how to get it .
Here is an example file:
kursk 1 36.035249 51.838105 35.991534 51.562810 36.125976 51.563141 36.317305 51.681037 36.333813 51.780274 36.159021 51.837612 36.035249 51.838105 END END
The main thing is that the houses from neighboring NPs do not get to us in the area being cut out. Also, to simplify, we will turn all houses into points, for this we use the - all --all-to-nodes
key. We ask to make the output file in o5m format (it is enough to indicate the extension of the file), since another utility can only work with it.
As a result, we will have the first team like this:
osmconvert.exe -B=city.poly --all-to-nodes RU-Region.pbf -o=1_to-node.o5m
Now we have point data only from the specified area. But there is a lot that we don’t need as a result, for example, roads, parks, fences, etc.
Therefore, the next step we will filter out of all the variety only at home containing addresses. Houses in OSM are indicated by the building tag, and the address information in tags beginning with addr . We also discard the information about the author and the version of the object, we just do not need it.
osmfilter.exe 1_to-node.o5m --keep="building AND addr*" --drop-author --drop-version -o=2.building-addr.o5m
Well, the file has again decreased in size and now the time has come to see what remains inside. It's time to turn it into something human-readable. As promised it will be CSV.
We use all the same osmconvert, which provides the output to CSV. Of the parameters, it is necessary to focus on the columns and from which tags information is inserted into them.
osmconvert.exe 2.building-addr.o5m -o=3.addr.csv --csv-headline --csv-separator=; --csv="@id addr:street addr:housenumber @lat @lon"
@id | addr: street | addr: housenumber | @lat | @lon |
---|---|---|---|---|
1000000147959515 | Masalova street | 25A | 51.6522509 | 36.0337820 |
1000000147960436 | School Street | 71 | 51.6546536 | 36.0139438 |
1000000147965426 | Kotova Gora Street | one | 51.7337383 | 36.1837660 |
Well, this is already some kind of geodata and they can be inserted into the GIS and somehow analyzed, as can be seen in the picture presented at the beginning.
But the topic is not limited to addresses only. In the second step, you can leave, for example, public transport stops and visualize transport accessibility. But this is completely from another article ...
Source: https://habr.com/ru/post/320562/
All Articles