
Hello! Today I would like to talk about rendering, which is not related to computer games, animated films or industrial design systems. It will be about the engine for rendering maps in real time for the project MAPS.ME. In this post, I will describe the general principles of the engine and some of the rakes, which we stepped on (and those that have successfully bypassed). If you are engaged in rendering large amounts of data, especially of a cartographic nature, our experience, I hope, will be useful in your projects or, at least, curious. All interested in asking under the cat.
About MAPS.ME and Rendering
MAPS.ME is a mobile application that allows the user to get full-fledged offline maps on their device with search, navigation and a whole number of cool pieces. We receive map data from
OpenStreetMap , process, package and provide to users through the application. To date, OpenStreetMap covers the whole world with a fairly high degree of detail, we need to render the map on the device, and the amount of displayed map data can be very significant. In order to efficiently display cartographic data, we in MAPS.ME developed an engine called Drape. This engine is just preparing for release, it is much better than the current graphics library, so today (in advance) we will talk about Drape.
It is clear that the decision to create a new graphics engine in an existing project does not arise from scratch, the previous drawing mechanisms did not allow the project to be developed in the desired direction. Many (especially the guys from gamedev) have a steady (quite understandable) relation to the “engine writers." Unfortunately, in this area there is an extremely poor selection of third-party developments, and the requirements for rendering maps are very different from the requirements for rendering games. The main activity of the graphic developer in game dev is to display the virtual world according to the style of the game. Game designers, artists, modelers, who can always come to the rescue, traditionally stand next to the game graphic programmer: reduce polygonage of models, redraw the textures or rebuild the game level so that the artifacts do not slow down. We draw the real world, moreover - mapped by people from an open community. Despite the rather high culture of mapping, in OpenStreetMap there are a lot of vector data, which in their original form are poorly suitable for display. Only thanks to the guys involved in the pre-data preparation algorithms, we have a chance to draw a map. Thus, in our business, you can only rely on your ability to optimize the rendering and algorithms for preliminary data preparation. Our main task is to draw a map consisting of many objects, as quickly as possible, beautifully and according to data that cannot be blindly trusted. It is this problem that our Drape engine solves, to the principles of which we smoothly proceed.
')
About the nature of the Drape engine
If we talk about the maps to which we are accustomed to the web, then the vast majority of applications use the tile display model. A tile is an image that contains part of a map of a certain level of detail. With various map manipulations (scale, shift, initial load) we get the necessary tiles from the server, and the browser displays them. All this works great online and even a bit offline if you cache tiles on the user's side. But as soon as we start thinking about full offline maps, this approach turns out to be unacceptable. At the moment, it is extremely difficult to compress the pre-prepared raster data so that the user’s mobile device has enough disk space. Therefore, in Drape we operate with vector data and render the map in real time. The Drape engine is a cross-platform multithreaded system written in C ++. The engine itself operates with three main threads:
- UI thread. In this thread, user actions are recorded and code specific for a particular platform (iOS, Android) is executed. This thread spawns the operating system on which the application is running.
- FR stream (short for Frontend Renderer). The main task of this stream is the rendering of prepared vertex and index buffers to the screen. This stream is generated by the engine itself.
- BR-stream (short for Backend Renderer). Vertex and index buffers and other data are drawn in this stream for drawing to FR. This thread is also generated by the engine.
The interaction between threads is standardized by means of messages. FR and BR streams contain message queues and can exchange data with each other without restrictions. They also have the opportunity to request the execution of any code on the UI-stream. The UI thread can send messages to any of the remaining threads and, if necessary, block itself until the message is processed. The schematic diagram of the engine is presented below.

To make it easier for you to imagine how this works, I will explain with an example. Suppose we need to draw a rectangular fragment of the map, given by the coordinates of the corners (in our case, in the Mercator map projection) on the screen with the specified resolution. The engine will solve this problem approximately as follows:
- On the UI stream, user actions (in our case, inaction), as well as everything necessary for calculating the viewport, will be collected, packed and sent to the FR.
- FR on the input data will form the viewport (matrix of the world, view and projection), will determine the level of detail of the map data. The calculated data are projected onto the so-called tile tree. This tree is somewhat similar to the notorious quad tree , widely used in graphics engines for optimizing scene rendering. As in the quadtree, the idea of nesting spaces is supported in the tile tree. For example, 4 tiles of the 7th level of detailing will be 1 level 6 tile. Unlike the classic quadtree, our tree rebuilds its structure depending on the set of tiles that are currently displayed. Those. if tiles of only level 6 of detail are currently displayed, the tree degenerates into the list. And if suddenly the user increases the scale of the map to the 7th level of detail, then tiles of the 7th level become child nodes for the corresponding tiles of the 6th level. When 4 tiles of the 7th level are prepared for drawing, the tile corresponding to them from the 6th level will be forced out of the tile tree. It should be clarified here that under tiles I do not mean pre-prepared images. The tree operates only with areas of space and graphic data associated with these areas (vertex and index buffers, textures, shaders, etc.).
After a small lyrical digression about the nature of the tile tree, let us return to FR. With a new viewport, FR marks the nodes of the tile tree that are no longer visible, if any, and also determines which tiles should appear in the tree in order to fully cover the new viewport. All these data are packaged in a message and sent to the BR-stream. The FR stream at this time continues to display what it currently has in view of the new matrices. - BR accepts the message and starts generating geometry (and other related graphic data) for new tiles. It parallelizes the generation using several auxiliary threads. As soon as the graphic data for the tile is generated, they are immediately sent in a message to the FR stream. Other tiles at this time may still be in the process of generation.
- FR accepts graphic data for a tile, searches for a place in the tile tree for them, and reorganizes the tree structure according to the current situation. New data can be drawn on the next frame. Over time, the FR receives data for all tiles, and the system comes to a standstill until a new viewport comes from the UI stream.
The above description of the main principle of the engine, of course, is high-level. The embodiment of all this in the C ++ code is fraught with a huge number of nuances, some of which we now consider.
About the nuances
Thread Synchronization and Performance
The attentive reader could guess that the streams will actively exchange messages with each other in a situation where the user moves the card. Finger movement on the device screen is usually smooth, which means a huge number of messages will be sent to the FR stream, each of which will generate dozens of messages from FR to BR, which will be accompanied by heavyweight geometry generation operations. Perspective unpleasant, as you know. To overcome this problem, we, having stepped on the throat of our own song, refused to use messages to set the viewport. This is the only case where the data from the UI stream is put into the FR stream directly, bypassing the queue. And the viewport is updated once per frame.
In general, when there are thread-safe queues, common sense dictates that message traffic should be as low as possible. The fewer messages, the fewer queues are blocked on writing / reading. We also have a large number of messages with graphic data sent from BR to FR on completion of the generation process. To overcome this problem, we began to aggregate the data, collecting diverse geometry in common buffers (this, by the way, is also useful for rendering to minimize drawcalls). As a result, we have reached the point that our buffers are close in size to the maximum allowed when addressing 16-bit indices.
Ardent greetings from OpenGL ES
OpenGL ES and multithreading is a very hot mix. The situation is aggravated by drivers for mobile graphics chips, especially in the Android OS. Problems have arisen almost all the way to development, they are now, and, unfortunately, there are no prerequisites for the fact that they will cease to arise.
- OpenGL contexts are created by the operating system, and they are their own for the FR and BR stream. The OpenGL specification does not allow access to the context outside the stream in which it was created. However, graphic resources (at least some) between threads can be shared. This is what makes asynchronous resource preparation possible. It turns out that we have 2 threads that operate on OpenGL contexts, and there is an operating system thread that controls the lifetime of these contexts. For example, in the Android OS, the OpenGL context may die in peace when the application goes to the background. And by this time, both threads that use contexts should free all resources and stop executing any OpenGL commands. This forces us to put extra effort into managing the lifetime of the FR and BR streams to ensure that they complete before the system flow does.
- Glyphs of fonts are also prepared on the BR-stream. They are first loaded using the FreeType library, processed by the SDF ( Signed Distance Field ) algorithm, and then transferred to the texture using the glTexSubImage2D function. It turns out that one stream (BR) writes to the texture, while the other (FR) reads from this texture when rendering. Unfortunately, not all drivers coped with this task, and we were forced to transfer the call to glTexSubImage2D to the render stream. It turned out that not only the BR-stream prepares data for us to draw.
- A special joy was brought to us by the Tegra 3 chip, on which glGetActiveUniform does not work, if you call glDetachShader immediately after linking the shaders, then multiplying by 0 in the shader gives some unpredictable results.
Geometry update
Sometimes there are situations when the geometry needs to be updated in the process of drawing. It is not even about vertex animations, although for them everything below is also true. In our case, a vivid example is the update of vertex data, depending on the scale of the map. Vertex data needs to be updated if we want, for example, that the texture pattern retains its size regardless of scale, as is the case with arrows indicating the direction of flow of the rivers. The solution using uniform variables is not always suitable, since in some cases an array of uniform variables would be required equal to the length of the vertex buffer. To solve this problem, we used the following trick. We pack the unchanged parts of the vertex buffer in a standard way, for example:
{ [ 1, 1] [ 2, 2] … [ N, N] }
For the variable components of the vertices we form a separate buffer:
{ [ 1] [ 2] … [ N] }
When we need to update the texture coordinates, we rewrite only one vertex buffer, which significantly reduces the amount of data transferred to the GPU.
Intersection of objects
On the cards there is a huge amount of textual and iconic symbols. When zooming out, these symbols begin to overlap each other, making the map difficult to read. In one building there can be many establishments with their own name and icon, which will overlap even on a large scale. Therefore, we needed a mechanism that prioritizes this graphic information and prevents its display with overlap. Such a mechanism is called overlay tree. The essence of this mechanism is quite simple: when rendering, all text and character data form a
kd-tree according to its position on the screen. According to the tree, we define those objects that we should see at the current moment, and then the priorities of these objects come into play. As a result, we get a set of the most priority non-overlapping symbols, which we display. However, here we are faced with one unpleasant problem. When animating, and especially when the map was rotated, the texts and icons began to blink strongly, as the tree was rearranged on each frame. To avoid this, we were forced to introduce inertia into the mechanisms of rebuilding the tree.
About future
In this post, not all the technologies that we used to create the map rendering engine are considered. Text rendering using the SDF algorithm, the algorithms for generating the geometry of roads and routes, antialiasing, and much more, worthy of individual posts, remained unlit. I hope you found at least curious about what we are doing. If so, then in the future you will find new posts about our rendering and, of course, the release of the Drape engine.
PS Gentlemen, graphic developers, if, while reading this post, you had a steady desire to create your own maps
with preference and courtesans , I hasten to inform you that you have an alternative - to
join our team! If you have any questions, feel free to write in a personal.