📜 ⬆️ ⬇️

We draw a map with one select or about the benefits of multi-indexes

image
This article is written in continuation of the series, which tells about the prototyping of a simple and productive dynamic web map server. Earlier it was described how spatial indexes are arranged in it, as well as how to take and draw a spatial layer just like that. Now we make it a bit more elegant.

In terms of working with one layer, we more or less figured out, but there are dozens of layers in reality, each of them will have to be queried in the spatial index, even if nothing has got into the search extent ... architecturally clean up the process and get rid of the obviously useless actions?

Indexing


It is possible, why not. But for this we will have to create a single spatial index for all spatial layers. Recall how our indexes are arranged:

Well, to create a single index, you will have to divide all the layers with one grid, number the same blocks and, in addition to the object identifier, store the table identifier in the index. So:

We measure the performance of the index.


We will launch a series of 10,000 random searches in the square [35 ... 45.50 ... 60] ° on the general index and in the old way. All requests are executed in one connection i. show performance on one core.
Request sizeTime in a single indexTime as before
0.01 °7'35 ''8'26 ''
0.1 °8'25 ''9'20 ''
0.5 °15'11 "16'7 "
Conclusion : yes, a single index is faster, but the result is not amazing. On the other hand, all 4 of our layers are relatively densely populated, again, it is important that the amount of data is relatively small so that all the advantages of a single search should stand up to their full height. On the third hand, this is live data as it is.

Draw a map


First of all, let's draw it in the old way so that it can be compared with anything.
 create procedure mk_test_gif( in cminx double precision, in cminy double precision, in cmaxx double precision, in cmaxy double precision) { declare img any; img := img_create (512, 512, cminx, cminy, cmaxx, cmaxy, 1); declare cl integer; declare bg integer; { cl := img_alloc_color (img, 0, 0, 255); bg := img_alloc_color (img, 0, 0, 255); whenever not found goto nf; for select blob_to_string(Shape) as data from xxx.YYY."water-polygon" as x, xxx.YYY."v_water-polygon_spx_enum_items_in_box" as a where a.minx = cminx and a.miny = cminy and a.maxx = cmaxx and a.maxy = cmaxy and x."_OBJECTID_" = a.oid and x.maxx_ >= cminx and x.minx_ <= cmaxx and x.maxy_ >= cminy and x.miny_ <= cmaxy do { img_draw_polygone (img, data, cl, bg); } nf:; ...      ,    img_draw_polyline. ... } declare ptr integer; ptr := img_tostr (img); img_destroy (img); declare image any; image := img_fromptr(ptr); string_to_file('test.gif', image, -2); return; }; mk_test_gif(35., 50., 45., 60.); 
Work takes 17.2 seconds.
')
Now as it is promised, one. The first thing that comes to mind
aggregates .
An aggregate is an object that is created when the cursor is opened, called on each line of the result, and finalized when the cursor is closed.
 create function mapx_agg_init (inout _agg any) {;}; create function mapx_agg_acc ( inout _agg any, in _tab integer, in _oid integer ) { if (_agg is null) { declare img any; img := img_create (512, 512, _tab[0], _tab[1], _tab[2], _tab[3], 1); _agg := img; return 0; } else { return case when _tab = 4 then (img_draw_polygone(_agg, ( select blob_to_string(bl.Shape) from "xxx"."YYY"."building-polygon" as bl where bl."_OBJECTID_" = _oid), 255, 255)) when _tab = 3 then (img_draw_polyline(_agg, ( select blob_to_string(hw.Shape) from "xxx"."YYY"."highway-line" as hw where hw."_OBJECTID_" = _oid), 100, 100)) when _tab = 2 then (img_draw_polygone(_agg, ( select blob_to_string(vg.Shape) from "xxx"."YYY"."vegetation-polygon" as vg where vg."_OBJECTID_" = _oid), 10, 10)) when _tab = 1 then (img_draw_polygone(_agg, ( select blob_to_string(wt.Shape) from "xxx"."YYY"."water-polygon" as wt where wt."_OBJECTID_" = _oid), 50, 50)) else 1 end; } }; create function mapx_agg_final (inout _agg any) returns integer { declare ptr integer; ptr := img_tostr (_agg); img_destroy (_agg); declare image any; image := img_fromptr(ptr); string_to_file('nskx_ii.gif', image, -2); return 1; }; create aggregate mapx_agg (in _tab integer, in _oid integer) returns integer from mapx_agg_init, mapx_agg_acc, mapx_agg_final; create procedure mk_testx_ii_gif( in cminx double precision, in cminy double precision, in cmaxx double precision, in cmaxy double precision) { declare cnt integer; select mapx_agg(tab, oid) into cnt from ( select * from (select vector(cminx, cminy, cmaxx, cmaxy) as tab, 0 as oid) as f1 union all (select tab, oid from xxx.YYY."v_total__spx_enum_items_in_box" as a where a.minx = cminx and a.miny = cminy and a.maxx = cmaxx and a.maxy = cmaxy) ) f_all; } mk_testx_ii_gif(35., 50., 45., 60.); 
Unfortunately, there is no regular method to pass the initialization parameters to the aggregate, so you have to go for the trick, slip the union from the data and the initialization string to it, construct the context not in the constructor, but when you get the first string.
How so, the attentive reader will say, one select was promised, and there a whole bunch of them! In fact, the row identifiers come to the aggregate in an orderly and tablely manner, so seeming subqueries are in fact hand-organized join.
So, the running time is 42 seconds. NDA

Another attempt
 create procedure mk_testx_gif( in cminx double precision, in cminy double precision, in cmaxx double precision, in cmaxy double precision) { declare img any; img := img_create (512, 512, cminx, cminy, cmaxx, cmaxy, 1); declare cnt, cnt2 integer; declare cl1, bg1 integer; cl1 := img_alloc_color (img, 0, 0, 255); bg1 := img_alloc_color (img, 0, 0, 255); declare cl2, bg2 integer; cl2 := img_alloc_color (img, 0, 255, 0); bg2 := img_alloc_color (img, 0, 255, 0); declare cl3, bg3 integer; cl3 := img_alloc_color (img, 255, 100, 0); bg3 := img_alloc_color (img, 255, 100, 0); declare cl4, bg4 integer; cl4 := img_alloc_color (img, 255, 0, 0); bg4 := img_alloc_color (img, 255, 0, 0); select sum ( case when geom is null then 0 when geom_type = 2 then (img_draw_polyline(img, geom, cl, bg)) else (img_draw_polygone(img, geom, cl, bg)) end ) into cnt from ( select case when a.tab = 4 then ( select blob_to_string(bl.Shape) from "xxx"."YYY"."building-polygon" as bl where bl."_OBJECTID_" = a.oid) when a.tab = 3 then ( select blob_to_string(hw.Shape) from "xxx"."YYY"."highway-line" as hw where hw."_OBJECTID_" = a.oid) when a.tab = 2 then ( select blob_to_string(vg.Shape) from "xxx"."YYY"."vegetation-polygon" as vg where vg."_OBJECTID_" = a.oid) when a.tab = 1 then ( select blob_to_string(wt.Shape) from "xxx"."YYY"."water-polygon" as wt where wt."_OBJECTID_" = a.oid) else '' end as geom, case when a.tab = 3 then 2 else 1 end as geom_type, case when a.tab = 4 then cl4 when a.tab = 3 then cl3 when a.tab = 2 then cl2 when a.tab = 1 then cl1 else 0 end as cl, case when a.tab = 4 then bg4 when a.tab = 3 then bg3 when a.tab = 2 then bg2 when a.tab = 1 then bg1 else 0 end as bg from xxx.YYY."v_total__spx_enum_items_in_box" as a where a.minx = cminx and a.miny = cminy and a.maxx = cmaxx and a.maxy = cmaxy ) f_all; declare ptr integer; ptr := img_tostr (img); img_destroy (img); declare image any; image := img_fromptr(ptr); string_to_file('testx.gif', image, -2); return; }; mk_testx_gif(35., 50., 45., 60.); 
This query runs 17.4 seconds. Thus, the optimizer was able to recognize hidden join'y and execute the query without any loss for beauty. A small gain in the actual index was eaten by the increased complexity of the query.

And here is the result:
image
In this nondescript image several million objects.

findings


Drawing a map in one query is not difficult. We even managed to do it without losing much performance. Win, however, also failed and the reason seems in the data structure. And this structure is sharpened by the use of traditional GIS.

For example, the 'highway-line' table contains a couple of dozen layers of different types with different attributes. Usually, such tables serve as the basis for all road layers that refer to one physical table and differ in filters. Of course, it is more convenient to work with one table than with two dozen (this moment was one of the motives for this work). Again, we have a common spatial index for you of these layers.

But there are also disadvantages. As before, to draw each layer, you need to execute a separate SQL query. Those. even though there is one index, there are still several searches for it. The maximum you can win on is on page caching. An additional index is needed - a record type; it is also required to search by it and cross the samples. In addition, since objects of different types are intermixed, there is little chance that objects of the same type will be side by side (on the same page) and thus the total number of readings increases.

What if we scatter, for example, a 'highway-line' table on a bunch of sub-tables by type and combine all of them into one spatial index, as we did above? Working with the index will not change, we will need only one search in it. Working with data will only accelerate. data locality will increase - data of the same type, close spatially more often appear next to the disk. And if the data of any type is not in the search extent, they simply will not be processed. No matter how many of them, it will not affect the reading of useful data.

And one more observation at the end. Indexes as such are pretty odd objects. Neither in relational algebra nor in relational calculus are they close to them. This is an implementation-dependent extension that allows the query processor to execute them more efficiently. In our case, the indices are loaded with some semantics, which is not in the data. In our multi-table index, the relationships between the layers are described, in fact, the layers that we want to draw together turn out to be in a specific index.

On the other hand, we cannot perceive our index as a table (although in fact it is a table, it is a forced measure due to the fact that we are forced to remain within the framework of a specific DBMS) because its values ​​are table identifiers. This is a metatable and the query plan depends on the metadata from this one.

Again, traditionally, the query processor is free to choose which indices to use. But this is not our case. We can create multiple multi-table indexes and explicitly indicate which index is the source of the primary data stream. Regardless of what the optimizer thinks about this. It is nice to see how the “tree of life is forever green” through the “dry theory” of the relational model.

PS:

As an illustration, the marvelous "city plan of Paris" by Mark Twain’s hand from the story of the same name was used in the article’s cap.

Source: https://habr.com/ru/post/201474/


All Articles