📜 ⬆️ ⬇️

OpenStreetMap: Past, Present and Future

I will say right away: I am not an OSM user and, moreover, not a member of the project. Nevertheless, I believe that I know quite a lot about him, and I want to present my thoughts in the form of a review note on the articles that I managed to find here. A kind of controversy with the authors of these articles and comments to them. More precisely, with their theses - it is important not who said, but what is said.

Part One: Getting Started


It began, apparently, with “The Shuttle Radar Topography Mission (SRTM) that flew on the 11-day mission in February of 2000”. Then Steve Coast, following the example of Wikipedia, in July 2004 created the project OpenStreetMap (even before the advent of Google Maps, of course). After that, the “triumphal march of Soviet power” began: for use in OSM, satellite images were made available under CC-BY-SA license from NearMap, NOAA, GeoEye, DigitalGlobe, ErosB, Google (!), US Census TIGER, AND, MassGIS, GeoBase and many others. The population of volunteers is also growing by leaps and bounds. A little more, a little bit more - and the whole planet will be digitized and provided for free use by all-all-all!

Part two. Our days


We wanted the best, but it turned out as always. They began to knit, they began to knit their hands, and then everyone made fun.
')
The so-called "big business" clearly sensed to live , and GISs began to multiply like cockroaches: Google Maps, Yandex Maps, Rambler Maps, Mail.Ru Maps aka Maps.me, Yandex.Narodny map, Apple Maps, Bing Maps, Yahoo! Maps and so on, and so on and so forth. Suddenly they started talking about GIT - “geo-information technologies” (this is about the same thing as much more widely and noisyly disseminated “nanotechnologies”, which today are silent in a rag), about “spatial databases” (you can think of spatial databases than This is different from any others), about “environmental monitoring” (a popular tool for cutting dough now), etc. There was even a "spatial economy" plus a whole industry of automobile, pedestrian, bicycle and other nafigatorov. In a word, the trend changed to the opposite: if before some “private shops” (for example, Japan KSJ2 Import or TIGER specializing in North America) merged the data into a common OSM boiler, and donated, now they are spreading around large companies that provide these data are already for money, and almost every one of them is ready to devour every single one of its competitors (some have already been eaten). OpenStreetMap among these crowds has become almost invisible, even if you do not add here the rapidly multiplying zoo "geosoft": Mapnik, Maperitive, CloudMade, JOSM, Cartagen, Potlatch, Merkaartor, Vespucc, Go Map !!, OsmAnd, Navit, and other GNOME Maps .

The main competitors of the project OpenStreetMap are Wikimapia (woke up!), Google Map Maker and Yandex.Monarious map. We'll see ... Unlike OpenStreetMap and Wikimapia, user-created data cannot be used freely on its resources by third parties (or by the users themselves), this requires the consent of the Yandex administration ... well, to hell with you guys! Unlike OpenStreetMap, Google considers the map created by the community as personal intellectual property ... and there you are, unfortunate thieves! Wikimapia is an interactive map based on Google Maps AP ... ah, this is just an add-on, and it seems that there is no access to the full database at all ... well, it’s not very desirable! Moreover, the Free Software Foundation encourages everyone to contribute to the OpenStreetMap project in order to create a free alternative to proprietary Google Earth, the task of which is in the list of high-priority projects of the Foundation. OpenStreetMap is described as a response from Google and commercially limited geodata from the world of free software.

As you can see, OSM is very different from the rest of the GIS in terms of money — even Wikimapia exists on the basis of a commercial company and receives advertising revenue! But it's not about money at all - after all, why shouldn't users pay for a quality product? But what, excuse me, is a “product”? According to my ideas, users, first of all, are interested in the quality of the data - their accuracy, relevance, accuracy, completeness. And only in the second - a convenient, friendly, intuitive interface. Here it is, a radical difference between OSM and any other "competitors"! She, and only she provides full access to the data itself, and not to the "fun pictures" like other services. Because it is not a map, but a database containing information about objects on the earth's surface.

We found out that at the first stage, data from different companies flowed into OSM, and already all sorts of APIs spill from it in the form of competitors. Why? Yes, because it allows “on the sly” to declare not only the service itself (usually quite rotten, like the OSM itself), but also all geodata wrapped in it as its intellectual property. And the data is surely stolen: here cut me, burn me - I do not believe that each of them (or at least one of them) collected his own map using only his volunteers (by the way, it’s indecent to assign their work)! What? Only yours? And if you find? Databases - IN STUDIO! Oh, do not give? Well, excuse me, the presumption of guilt works here.

OSM gives access to all information, and the rest - only that it deigns to issue server software in response to a request, which creates (at least a theoretical) opportunity for various manipulations with information. About the rest of the "charms" of access to data only through the API has already been described in the articles here on Habré. For example: The fact is that this time we need to process approximately 5 million addresses. And maybe 50 - it was not immediately clear. As you know, Google will ban your IP after about 10 thousand addresses, Yandex will do the same to you, although it is possible a little later (25 thousand requests a day like). And besides, both APIs are REST, which means it is relatively slow. And even if you buy a paid subscription, the speed of this will not increase a penny.

The OSM position seems to me to be immaculately clean and probably sits in the livers of all competitors: the data is available under an open ODbL license and have two main conditions of use: a mandatory reference to the data source (OpenStreetMap members) and in the case of public derived data they should also be published under a license ODbL . Moreover, the use of information for creating maps from proprietary services like Google Maps is prohibited without the permission of the copyright holder . Finally, OpenStreetMap maps are used by such sites and organizations as the United Nations, Wikipedia, Microsoft Bing Maps, the Russian Federal Space Agency, MapQuest, Wikimapia (Wow! Have they changed their orientation?), Oxford University, the website of the US President and others.

But enough dithyrambs - "Now let's talk about rubbish."

Yes, there is darkness in general. The same can be indicated by a bunch of different tags, it is very difficult to foresee all the options. There are old tags, but which no one has cleaned, there are simply erroneous ones. There is a huge amount of outdated information, which no one follows the relevance of, especially this applies to all establishments and data on the presence / absence of passage.

I agree: the number of errors there is simply gigantic: much more than the habrovcanin thinks just quoted! And to identify them is not so simple: the data, albeit text, but multilingual - approximately 40-50 languages. In addition, there are a lot of duplicates and just garbage. It is even if we forget that data errors have a dirty tendency to accumulate (and even generate induced errors), and the data itself is constantly changing, supplementing, and becoming obsolete. But who told you that these “proprietary” systems do not have errors? Or that they are there at least less? They are simply not visible, as are the data itself. But the unreliability of information in an open system can be corrected, but in a closed one - no. So, at least, potentially OSM is still better than any other “analog” - it is verifiable!

Once (a few years ago) I laughed that as many as 17 South Poles were found in the OSM database! I decided to check, as it is now, I swung Antarctica (03/30/2019) and got ... 50 poles. By copying: FIFTY! Do not believe? Look!

1. node id = 1042050310 lat = -90.0 lon = 0.0
2. node id = 4028080674 lat = -90.0 lon = -23.1236527
3. node id = 4055651342 lat = -90.0 lon = 19.0
4. node id = 4055651343 lat = -90.0 lon = 19.0
5. node id = 4055651344 lat = -90.0 lon = 19.0
6. node id = 4055651345 lat = -90.0 lon = 19.0
7. node id = 4055651346 lat = -90.0 lon = 19.0
8. node id = 4055651347 lat = -90.0 lon = 19.0
9. node id = 4055651348 lat = -90.0 lon = 19.0
10. node id = 4055651349 lat = -90.0 lon = 19.0
11. node id = 4324771561 lat = -90.0 lon = 1.1844246
12. node id = 4324771562 lat = -90.0 lon = 105.6989404
13. node id = 4324771563 lat = -90.0 lon = 132.4186852
14. node id = 4324771564 lat = -90.0 lon = 137.780023
15. node id = 4324771565 lat = -90.0 lon = 143.3682855
16. node id = 436012592 lat = -90.0 lon = 0.0
17. node id = 5478583892 lat = -90.0 lon = 33.2303881
18. node id = 5478583893 lat = -90.0 lon = 158.3249615
19. node id = 5478583894 lat = -90.0 lon = -86.5237898
20. node id = 5478583895 lat = -90.0 lon = 48.2165046
21. node id = 5478583904 lat = -90.0 lon = 45.5704167
22. node id = 5478583905 lat = -90.0 lon = 72.694447
23. node id = 5478583906 lat = -90.0 lon = -4.6036415
24. node id = 5478583907 lat = -90.0 lon = 156.4156146
25. node id = 5478583908 lat = -90.0 lon = 174.4279763
26. node id = 5478583909 lat = -90.0 lon = -25.4418087
27. node id = 5478583910 lat = -90.0 lon = 90.8751034
28. node id = 5478583911 lat = -90.0 lon = 81.7634245
29. node id = 5478583912 lat = -90.0 lon = -40.8082874
30. node id = 5478583913 lat = -90.0 lon = -85.5106502
31. node id = 5478583914 lat = -90.0 lon = -112.6347103
32. node id = 5478583915 lat = -90.0 lon = -35.3366556
33. node id = 5478583916 lat = -90.0 lon = 163.6440556
34. node id = 5478583917 lat = -90.0 lon = 145.6316657
35. node id = 5478583918 lat = -90.0 lon = -14.4985701
36. node id = 5515059773 lat = -90.0 lon = 152.6074683
37. node id = 5515059774 lat = -90.0 lon = 125.0793442
38. node id = 5515059775 lat = -90.0 lon = -19.8055055
39. node id = 5515059776 lat = -90.0 lon = 125.378694
40. node id = 5515059777 lat = -90.0 lon = 96.2779594
41. node id = 5515059778 lat = -90.0 lon = 9.34518
42. node id = 5515059779 lat = -90.0 lon = -47.4747063
43. node id = 5515059780 lat = -90.0 lon = -49.0473178
44. node id = 5515059781 lat = -90.0 lon = 8.9047508
45. node id = 5515059782 lat = -90.0 lon = 166.9006628
46. ​​node id = 5515059783 lat = -90.0 lon = -165.5712167
47. node id = 5515059784 lat = -90.0 lon = -20.6863713
48. node id = 5515059785 lat = -90.0 lon = -165.8705753
49. node id = 5515059786 lat = -90.0 lon = -136.769845
50. node id = 5515059787 lat = -90.0 lon = -49.8370691

Moreover, 8 South Poles were entered into the base 2016-03-11 at 23:52:59, 10 - 2018-03-14 at 18:09:21, 9 - 2018-03-14 at 18:09:22, 15 - 2018-03-29 at 20:30:33. It even looks like vandalism (especially since such nodes and aydishki often go in a row). However, without the vandalism of shit in the OSM database abound.

The second thing OSM is blamed for is a low level of service for the end user. I don’t even want to discuss this: it is quite possible that it is worse than many others - perhaps even the worst - what difference does it make? The competitors should be at least somewhat better! But, I repeat, OSM is a database, and everything else is just a “superstructure over the base”.

Another claim to OpenStreetMap sounds like this: The tagging scheme in OSM is at the same time its most important architectural advantage, since it allows you to describe virtually any object properties, since no one really restricts you in choosing new tags for new properties of objects; and at the same time the most painful of its place, because any freedom in choosing the means of designation always gives rise to religious wars of various groups of users who have not concurred how to designate one or another controversial object.

Or:

Sometimes this is the use of the usual natural language leads, if not to catastrophic results for semantics, then to something that is worthy of the epithet "extreme uncertainty."

Or even:

“Any tags” is the biggest curse of the osma. It is evident that people invented it with Linux brain. The core core must be carefully thought out and standardized. It seemed to me that this is an axiom axiom.

But it seems to me that this “axiom axiom” is killing a project in the bud! And here I intend to firmly defend the position of OSM: this is another question, who has the “Linux brain” there! Tell me: why are these concepts opposed at all? Natural language is good, formal classifications are good too ! And the OSM “any tags you like” principle is great! And the fact that the content of the cartographic database is primary in this project, and not how the contents of this database reflect the Standard style on osm.org is also great! Volunteers do a gigantic, grandiose work of truly monstrous labor-intensiveness - and that we will wrinkle our noses in disgust and tell them how, in our opinion, they should enter or edit the data ?! No, we must bow to them low in the legs and provide an opportunity to work in the way that is convenient for them! And nothing else!

Would you mind? Will you say that the database should preserve more or less well-balanced data semantics? That in different countries the same objects can vary considerably in composition and content of tags? What is the fuzzy procedure for assigning tags to another important principle of OSM: verifiability, which states that any designation made in the database should be such that another project participant could confirm it, that is, unambiguously designate it in the same way ? Are you sure that this is generally possible in principle ? Again we confuse the concepts of "meaning" and "value" ...

Tags of a single object must be unique, i.e. An object cannot contain two identical tags with different values, for example, highway = primary and highway = secondary within the same object is not allowed.

And everyone, of course, will listen in harmony and will strictly observe this? Do you even believe it? Or they are able to unambiguously determine what is behind the highway: primary, secondary, or what? And nothing that the highway tag has more than 500 values, of which many are many times more popular than the ones just mentioned? There are 11 million "unclassified", 15 million "track", 23 million "service", 43 million "residential", not to mention all sorts of street_lamp, living_street, traffic_signals, bus_stop and others. And what, along with the usual building = yes, there are several thousand building = yes - with this, what would you say? And you know that the total number of tags in the database exceeds 50 thousand, and that there, next to multimillionaires (amenity, barrier, landuse, maxspeed, natural, oneway, power, surface and others), there are about 20 thousand tags that occur only once on the whole database? And what popular data (foot = yes, access = private, waterway = stream, the same highway = secondary, building = house, oneway = no, surface = asphalt, power = tower, ...) is not just tags? Does anyone seriously think that it is possible to describe all the variety of data in OSM on some unfortunate Wiki page and, moreover, make the entire community comply with it?

A polyline can be either a way object containing a node, or a relation containing a way, i.e. cannot be a relation containing both way and node at the same time, however, it can be a relation object containing both a way and other relational containing only way objects.

Can't it? Do you know how many of them really are ? 9,478,938 references to 6,163,522 nodes (node) of a 2,106,305 relation (relation), moreover, hold on tight! - 1 794 837 (85%) of them contain exactly the way and node at the same time, and 106 219 of the remaining ones have only the node type as child elements. Data relevance - December 2018.

A lot of talk is also conducted on the definition of areal and linear objects, and generally how it should be , but almost no one is interested in how it really is . Therefore, the value of such conversations is zero, if not negative at all.

One more note before moving on to the “bright future”:
The file containing vector data on the entire planet Planet.osm in uncompressed OSM XML takes more than 1TB, in XML format, compressed bz2, takes 41.8GB, and in binary PBF format 18.1GB. All of these formats store the same data.

This is not quite true: the whole planet occupies less than a terabyte (even less than 900 gigs), and as for the rest, here are the latest data:

Latest Weekly Planet XML File 75 GB, created 3 days ago.
Latest Weekly Planet PBF File 44 GB, created 3 days ago.

In any case, in my opinion, this is no reason to poke around in binary formats (especially if you don’t just read them, but modify them) - all the more so it’s quite possible to reduce the amount of database in an uncompressed format by about an order of magnitude.

Part Three How is it going to end


Obviously, the OpenStreetMap project will be utterly crushed - it simply does not have the slightest chance! Too many money bags got into a fight for the consumer. OSM has long been discussed precisely as a service, but not as a database, with all the consequences. And even those who work with it, as with a database, consider it only as a data warehouse, as a source for their sample, which is read only, and already modify something in the sample, something they think, they show something to the user. ... and the base remains the same - the same crap that it was before all the fluttering. And then they themselves complain that the results of calculations quickly become obsolete as OSM changes every day. Sisyphean labor, however! But a huge army of programmers can do this almost forever, thereby demonstrating their need, their relevance.

So, maybe there are still some chances to save OSM from defeat? I think only one thing: programmers should focus their efforts on editing the base itself , and not all sorts of gadgets, pulled from it. It is necessary to work in the "centaur" mode, so that people go about their business, and processors - their own. To bring OSM to a normal state: correct errors, structure data, make their processing convenient for both human and computer brains. How to do it? This is the topic of another article.

Or even a whole cycle of articles. However, this is still a utopia ...

Thanks to everyone who read this to the end.

Source: https://habr.com/ru/post/448460/


All Articles