Prologue
The project OpenStreetMap (OSM) of open geographic information data under the free license CC-BY-SA (and soon under the Open Database License) is widely known not to waste time on its detailed presentation. The main feature of the project and its main advantage compared to any other analogues was the principle of fully open geographic data, which can be used by anyone and in any way (under the CC-BY-SA license) and can be freely supplemented and refined by any project participant. Like any other data, geographic data is also subject to structuring during storage and processing. In this article I will try to describe the main parts of the OSM data structure, focusing more on the adopted data types and their representation in spatial form. Working constantly with the data of the OSM project it is often necessary to clarify or clarify some basic aspects, so it became necessary to briefly present them in the form of one text.
In general, the entire data structure can be represented schematically in the following figure:

Figure 1. OpenStreetMap data structure.
All data can be divided into three main groups:
')
- data types that describe in the form of a hierarchical relationship the object itself, as a kind of spatial entity, having its own final result - the known coordinates of all parts of the object;
- the informational part is a descriptive characteristic of an object that does not have a direct relationship to the spatial geographical structure of the object (its name, physical, logical, and other properties);
- object service attributes necessary to organize the process of storing and processing information in the form of a data set, such as a unique identifier, the state of an object in the database, the time of the last editing of an object in the database, etc.
I will not dwell on the service attributes of objects, I will only note that any independent object in OSM and their characteristics have them, and the specific version of the project’s API defines the binding nature of certain attributes. Currently, this is API version 0.6.
Part 1. Basic types of geographic data in OpenStreetMap
There are basically three basic types: point (
node ), line (
way ), and relation (
relation ). In this case, I am not using the translation of the data types themselves, but their actual property. The types node, way, and relation are called so, because that is how they were originally invented. One and all objects in OSM are described by these three types of data, after which information is filled with combinations of tags. The data model in OSM is built on a hierarchical reference structure, which means that any subsequent data type does not contain the information contained in the previous types, but forms a new entity, referring to some set of objects of the previous type. It should also be mentioned that any object in the OSM data structure has its own identifier (ID), unique within this type of object. It is by this identifier that the link to the object itself occurs. Consider the structure of the basic types in order.
The first type: a point (node) is a minimum data set that contains information about a pair of coordinates: latitude, longitude (lat, lon) and is basic in the hierarchical model. This is the only data type that stores geographic information itself - coordinates, in the form of latitude and longitude. The OSM data model operates exclusively on two-dimensional data within the WGS84 projection. In the future, we will assume that the coordinates are not the information component of the object of a point, but an integral part of its structure. In XML notation, an object of this type will look like this:
<node id = '19 'lat = '58 .888047127548994' lon = '49 .747870758186764 '/>
One point with a unique id of 19 and a pair of coordinates. Coordinates in OSM are used in decimal notation, as it is much easier to process than coordinate formats with minutes and seconds. By itself, a point can be an independent object, describing a point object (geometric primitive) or not having its own information component at all, but being part of another object (line or relationship). At the same time, running a little ahead, I note that the point at the same time can be an independent object, carrying unique information and be part of another object.
The second data type: line (way) is a collection of pointers to objects of the type point (node). At a minimum, the line consists of a single point, i.e. must contain at least one link to an already existing object of type point. A line from one point does not contradict the OSM data structure, but contradicts the concepts of elementary geometry and causes fainting and panic in some too vulnerable data processing algorithms, therefore the correct line always contains at least references to two existing objects of the type of point.
The correct XML notation of the line type object will consist in the description of all necessary points, followed by the record itself about the line that lists all its points. In its simplest form, it will look like this:
<node id = '23 'lat = '58 .875047918145675' lon = '49 .785240674006126 '/>
<node id = '22 'lat = '58 .86687448573524' lon = '49 .737090974777324 '/>
<way id = '24 '>
<nd ref = '22 '/>
<nd ref = '23 '/>
</ way>
The order of listing points in a line is important; it characterizes the sequence of points in a line and the direction of the line itself, i.e. a line always has a beginning and an end, even if it is closed (in this case they simply coincide). In this example, we first described two points, specifying their coordinates, and then describing a line, referring to the id of these points. One point can enter any number of line objects, and it should be described only once, i.e. the point may be common to two lines, in which case the link to it is contained in both lines. Thus, an integral graph of objects is built (most often a road graph for calculating routing), which is a collection of objects (lines) that are connected through their common members (points).
If we want to create another line from the already existing points 19 and 23, then we describe it as follows:
<way id = '48 '>
<nd ref = '19 '/>
<nd ref = '23 '/>
</ way>
In our case, points 19 and 23 have already been described above, and point 23 has become part of two lines 24 and 48 and has become common to them.
Our lines 24 and 48 can be represented graphically in the projection of the mercartor as follows:

Figure 2. Two lines
Captions in the figure are object id: red for dots, black for lines; the arrow indicates the direction of the line, i.e. both lines end at point 23.
The next data type is relation . In fact, all objects except the point are already relations, but the lines are separated into a separate data type as the most common, describing the main geometric primitives: lines, polylines and polygons. For all more complex geometric objects, as well as for objects that are not purely geometric, but logical (collections, lists, hierarchies of interrelations), a universal data type is intended - relations.
In general, the description of a relationship differs from a line in that a line is always a collection of points, and a relationship is a collection of any objects, such as points and lines, and other relations. Therefore, in a relationship, not only the object id is indicated, but also its type. In the most minimal variant the relation can contain the reference only to one object. Using the objects described in the examples above, you can write a relation:
<relation id = '31 '>
<member type = 'way' ref = '24 '/>
<member type = 'node' ref = '19 '/>
</ relation>
This is a fictitious abstract relation describing that it contains two objects (members of the relation) - point 24 and line 19 and no longer carrying any other information. In the real case, the relationship must specify the type as a tag (informational component) of the relation object itself, and the members of the relationship must specify the roles in the references to the objects.
Below is an example of the most common multipolygon type relationship, which describes one closed outer polygon of three points with a closed polygon cut from it, also from three smaller points. The geometric primitives (closed and non-closed polygons) and object tags will be discussed further, but for now you should pay attention to the role parameters of the relation objects and the presence of a tag describing the type.
<node id = '1218' lat = '58 .870941122729505 'lon = '49 .758021019729554' />
<node id = '1216' lat = '58 .8704000725183 'lon = '49 .74703196841415' />
<node id = '1215' lat = '58 .879055860772034 'lon = '49 .74964840920353' />
<node id = '1209' lat = '58 .86471853452049 'lon = '49 .780522410518245' />
<node id = '1207' lat = '58 .863365649894774 'lon = '49 .72453057762546' />
<node id = '1206' lat = '58 .892035483174 'lon = '49 .74755525657201' />
<way id = '1217'>
<nd ref = '1215' />
<nd ref = '1216' />
<nd ref = '1218' />
<nd ref = '1215' />
</ way>
<way id = '1208'>
<nd ref = '1206' />
<nd ref = '1207' />
<nd ref = '1209' />
<nd ref = '1206' />
</ way>
<relation id = '1221'>
<member type = 'way' ref = '1208' role = 'outer' />
<member type = 'way' ref = '1217' role = 'inner' />
<tag k = 'type' v = 'multipolygon' />
</ relation>
The outer role indicates that this object will be the outer contour of the graphic object, and the inner role indicates that the space inside this object should be excluded from the area of ​​the resulting object. Graphically, our multipolygon will look like this:

Figure 3. Multipolygon
As with the line, for a relationship, the order of enumeration of members plays a role and is taken into account when using this relationship. For example, a relation may not be a geometric figure, but a route of public transport (logical scheme), then it includes successive sections of roads along which the bus will move and a list of points — stops at which it stops; therefore, the order of switching on the roads shows the sequence of the route and the order of stops is the sequence of their visits.
The relation object can be a member of another relation, while the nesting level and the hierarchy upwards is not limited by anything. The structural constraint is that the relation object cannot be a member of itself, i.e. contain links to yourself. Recursion in the structure of data types in OSM is unacceptable, although of course nothing prevents to create such an object and even quite successfully insert it into the database.
Having defined three basic types of objects, it is necessary to introduce the notion of initial and final objects.
The initial object is any object that is part of any other object, i.e. It is a child of at least one object, but it itself does not include any other object. In the case of OSM this is always the point. Or its other definition - the initial object carries in its structure (not in the information part!) Only geographical coordinates and does not contain references to other objects.
The final object is the hierarchy-maximum parent object, which is not a child with respect to any other object, i.e. not part of any other object. It can be any of the three types listed: points, lines, relations. A separate point object, not entering anywhere, consisting of a single object of type point, is not the initial object, because it is not the beginning of a hierarchy of objects, but it is the final object, because the spatial description of the object ends on it.
Part 2. Information scheme of objects
An object type describes the geographical (spatial) properties of an object, but says nothing about the properties of the object itself, its characteristics, purpose, and so on. For this there is an informational part of the OSM data structure based on the principles of tagging objects, i.e. assigning them certain labels and specifying the properties of these labels. Tags are specified as key = value pairs, which in XML notation for our line 24 looks like this:
<way id = '24 '>
<nd ref = '22 '/>
<nd ref = '23 '/>
<tag k = 'highway' v = 'primary' />
</ way>
In this case, we added the property of our line, namely, we specified the highway tag with the primary value, which in the accepted tagging scheme means that our line is the main road (the class road is lower than the main road, but higher than the secondary one). Any object can have any number of tags, which allows you to set all its basic properties and describe all the minor parameters, as well as add any information to the object in an arbitrary form. The tagging scheme itself in OSM is at the same time its most important architectural advantage, since it allows one to describe virtually any properties of an object, since no one really restricts you in choosing new tags for new properties of objects; and at the same time the most painful of its place, because any freedom in choosing the means of designation always gives rise to religious wars of various groups of users who have not concurred how to designate one or another controversial object.
If we slightly expand the informational description of our two lines 24 and 48 from the first part, we can get something like:
<node id = '23 'lat = '58 .87753645355202' lon = '49 .79290110146539 '>
<tag k = 'highway' v = 'traffic_signals' />
</ node>
<node id = '22 'lat = '58 .87456113991739' lon = '49 .73690926857261 '/>
<node id = '19 'lat = '58 .89362576054878' lon = '49 .7492065402827 '/>
<way id = '48 '>
<nd ref = '19 '/>
<nd ref = '23 '/>
<tag k = 'embankment' v = 'yes' />
<tag k = 'highway' v = 'secondary' />
<tag k = 'incline' v = 'up' />
<tag k = 'lanes' v = '2' />
<tag k = 'maxspeed' v = '60 '/>
<tag k = 'name' v ​​= 'Pozharsky street' />
</ way>
<way id = '24 '>
<nd ref = '22 '/>
<nd ref = '23 '/>
<tag k = 'highway' v = 'primary' />
<tag k = 'lanes' v = '6' />
<tag k = 'lit' v = 'yes' />
<tag k = 'name' v ​​= 'Minin Avenue' />
<tag k = 'oneway' v = 'yes' />
<tag k = 'ref' v = 'M84' />
</ way>
Line 48 here has become “Pozharsky Street”, with a speed limit of 60 km / h, a number of lanes equal to two, with a positive gradient of slope away from point 19 to point 23, which is a secondary road and raised relative to the ground level on the embankment. And line 24 is our main road (class higher than secondary) with 6 lanes of traffic, having stationary lighting and one-way traffic permitted in the direction from point 22 towards point 23, is called Minin Avenue and is part of the federal highway M84. Both roads have a common point of 23, which is a crossroads with traffic lights.
Writing in detail the current agreements on tagging is not the task of one article; for this, there is a separate draft wiki documentation for fixing the accepted agreements
[1] . Let us dwell only on the basic principles of tagging.
- On any final object must be tags, no matter how long the hierarchy. Any geographical (spatial) description of the object must end with at least one information property of the object. The “type = (multipolygon | *)” tag for relational objects is an exception, it is not informational, it is part of the structure of which relations are described, i.e. besides it there must be at least one more tag. We gave the definition of the final object in the first part.
- Tags can be perfect on any type of object. The presence or absence of tags on one object as part of another object does not affect the necessity or prohibition of the presence of tags on another object, unless they of course do not contradict each other logically and are not redundant. For example, a line may have a tag that it is a road, and at points that are part of this road there may be tags describing point objects at these points - traffic lights, speed bumps, etc., i.e. the point is not being a final object (part of the road), however, it can independently describe any object.
- Tags of a single object must be unique, i.e. An object cannot contain two identical tags with different values, for example, highway = primary and highway = secondary within the same object is not allowed. Some software can use enumerated value types, for example highway = primary; secondary can be interpreted as highway = primary and highway = secondary at the same time, but most software interprets as "highway" = "primary; secondary", where "primary; secondary" will not be match neither primary nor secondary.
- Tags describing the properties of an object should be only on the object itself. It sounds so logical that even a little silly, but it really delivers a lot of trouble when processing data. The point is that no object properties should be duplicated on the members of this object or on the parent objects. If we take, for example, the “forest” object, described as a closed polygon with the natural = wood property, then this property should not be entered into each point from which the polygon is built clearly even for a small child, but things are much more complicated with relationships and multipolygons. Let us consider the variant with the same forest, but already executed not as a polygon, but as a multipolygon, i.e. in the form of a relation consisting of at least two closed polygons: the outer (the outer role) and the inner (the inner role). In this case, the natural = wood property should not be on the polygon with the outer role, but on the relation itself. If the relationship includes several objects with the same properties as with different, then all the properties common to all members should be described in the relation itself, i.e. in their parent object, and unique properties on each object whose properties they describe. Returning to the same forest, let's say we have several members with the outer role and how many (or not, that is also possible) inner members. At the same time, all outer members are a forest, but each of them has its own name, so the natural = wood tag must be on the relation itself, and the unique name tags on each of the outer polygons have their own.
Part 3. Geometric primitives
The main or at least the most common task for any geographic spatial data is to obtain a graphical representation of the objects described by this data. Simply put, the rendering of the maps, schemes, plans themselves. The algorithms, rules, styles, and methods of rendering maps are already the task of application software, but it all comes down to drawing the basic geometric primitives, which are obtained from the objects of the three data types listed in the first part.
Further I will refer to data types in the form of their notation in OSM, i.e. node, way, and relation, so as not to be confused with geometrical objects by points, lines, polylines, etc., therefore, below the line I will not mean the geographical spatial data type way in OSM, but the geometric figure - the line that is the shortest distance between two points on the plane.So, what are the basic geometric primitives.
A point is a single object of type node. Its position in a given projection on the map corresponds to its spatial position in geographic coordinates. One pair of lat / lon coordinates is translated into x / y coordinates of the map, taking into account the projection.
line features
The line (line) is, as we have said, the shortest distance between two points corresponds to an object of type way, containing two node objects. Any two nodes, since we operate with a flat space, and therefore the distance between any two points will always be a straight line.
A polyline is a connected sequence of segments, where each segment is one line, connected by its end with the beginning of the next segment. The whole sequence is a single whole object. Corresponds to a way object containing three or more nodes. A polyline can be a relation object containing successively included way objects, where each next way object begins with a node object that ended the previous way. A polyline can be either a way object containing a node, or a relation containing a way, i.e. cannot be a relation containing both way and node at the same time, however, it can be a relation object containing both a way and other relational containing only way objects.
Types of polylines:

Figure 4. polyline = way (node1, node2, node3, node4)

figure 5. polyline = relation (way1 (node1, node2), way2 (node2, node3), way3 (node3, node4))

figure 6. polyline = relation2 (way1 (node1, node2), relation1 (way2 (node2, node3), way3 (node3, node4)))
areal objects
A polygon is a closed polyline whose last point is the same as the first. In OSM data types, there is a way object with several (three or more) node objects, and the number of members of the way object is always one more, since The first node object is repeated twice: at the beginning and at the end of the list. A polygon can also be assembled in the form of a relation, in which the successively included way, forming a jointly closed contour, i.e. the beginning of each way object corresponds to the end of one other way object. Unlike a polygon in a way, a polygon in a relation does not duplicate the last object in the enumeration from the first, since it is necessary for the way due to the fact that it refers to node objects and only by the fact of duplicating node as the first and last element of the list of members can be judged about the fact that this is a closed polygon, and not a linear polyline object. In a polygon assembled as a relation, the last node object of the last included way object or the relation containing the way corresponds to the first node object of the first included way.
In the case of a polygon described in the form of a relation, the type = multipolygon tag must be indicated on the relation object itself. Thus, we determine that we are talking about an areal geometric, and not a linear object.
Types of polygons:

figure 7. polygon = way (node1, node2, node3, node4, node1)

figure 8. polygon = relation (way1 (node1, node2, node3), way2 (node3, node4), way3 (node4, node1))

figure 9. polygon = relation3 (relation1 (way1 (node1, node2, node3), way2 (node3, node4), way3 (node4, node5), way4 (node5, node6, node7)), relation2 (way5 (node7, node8) , way6 (node8, node1)))
composite objects
Composite objects are objects that cannot be described by a single primitive way, always built on the basis of relational objects whose type = multipolygon. Polygons and polylines described as a relation can always be simplified to a single way object, while a composite object in the most simplified case gives at least two way objects. For example, this is a square figure (polygon) from which another figure is mathematically subtracted (a smaller polygon). An example of a multipolygon, which is a composite object, with an illustration was
given in the first part , in the description of relation objects.
Multiple polygons for composite objects, as well as polygons and polylines, can be assembled from a simple way or from another relation consisting of any number of ways. For such a multipolygon it is necessary to indicate the role for each incoming member, be it a way or a relation. There must be at least one member with the outer role. It is an object or objects with this role that define the main (external) geometric contour of the resulting object. Objects with the inner role may not be in the particular case, but then such a multipolygon is an ordinary polygon, simply described redundantly. Objects with the inner role indicate which areas inside the outer contour are not part of the resulting shape. For example, it is a clearing in the forest or a courtyard of the house, bounded on all sides by the walls of this house.
All objects with one role in one multipolygon should be collected in one or several closed contours that do not intersect at the boundaries.
Restrictions and agreements on the use of graphic primitives described as a relation
Despite the simplicity, consistency and scalability of the construction of data objects of the relation in the real world of OSM, they have a huge number of limitations associated primarily with the limitations of the current software data processing algorithms. Therefore, there are a number of agreements on the use of relation objects in certain situations. All of them can be reduced to the following main points:
- All areal objects described by a relation are multipolygons, i.e. have type = multipolygon and a mandatory indication of the role for members of outer | inner, thus the polygons in figures 8 and 9 are always only multipolygons;
- There are only two linear generally accepted objects described by relation: type = boundary for administrative boundaries and type = route for public transport routes, all other types or relation without types are simply collections or lists of objects and are not treated as geometric objects; 5 6 , , relation, ;
- relation, type = multipolygon|boundary|route — , - , .
4. , ?
OSM, . OSM , .
, OSM . , . relation , way , , , way , , . , , , , . . , , () , .
, , , . , OSM , area=yes , area=no. , , , area=yes. , , , , , .
, , . ( type=multipolygon ), - , . — .
, OSM , node. ,
area , , . way , . relation , , . OSM relation.
area, . relation , , , type, type=polyline multipolygon, , , , area.
, OpenStreetMap.
[1] — OSM, :
http://wiki.openstreetmap.org/wiki/Map_Features :
http://wiki.openstreetmap.org/wiki/RU:Map_Features