We draw a map with one select or about the benefits of multi-indexes

This article is written in continuation of the series, which tells about the prototyping of a simple and productive dynamic web map server. Earlier it was described how spatial indexes are arranged in it, as well as how to take and draw a spatial layer just like that. Now we make it a bit more elegant.

In terms of working with one layer, we more or less figured out, but there are dozens of layers in reality, each of them will have to be queried in the spatial index, even if nothing has got into the search extent ... architecturally clean up the process and get rid of the obviously useless actions?

Indexing

It is possible, why not. But for this we will have to create a single spatial index for all spatial layers. Recall how our indexes are arranged:

The space of the layer is divided into blocks by a rectangular lattice.
All blocks are numbered.
Each indexed object is assigned one or more block numbers in which it is located.
When searching, the extent of the query is split into blocks, and for each block or a continuous chain of blocks from the usual integer index, all the objects that belong to them are taken

Well, to create a single index, you will have to divide all the layers with one grid, number the same blocks and, in addition to the object identifier, store the table identifier in the index. So:

We will train as before on the shape of the data obtained from OSM Russia .
Take 4 relatively populated layers - water, forests, buildings and roads.
As a DBMS, as before, we will use OpenSource edition of OpenLink Virtuoso , the official Win_x64 build of version 7.0.0, taken from the developer’s site.
For drawing, we will use the GD- based and previously described C plug-in that extends the set of built-in PL / SQL functions.
Take the extent of the data: xmin = -180, ymin = 35, xmax = 180, ymax = 85 degrees.
Divide this extent into blocks of 0.1 ° in latitude and longitude. Thus we will receive a grid of 3600500 blocks. 0.1 ° is about 11 km, a bit too much to draw at the house level, but our goal at the moment is not the maximum performance in specific conditions, but a concept check.
Load the layers in the DBMS in the same way as before:
- vegetation-polygon.shp: 657673 items, load time 2 '4' '
- water-polygon.shp: 380868 items, load time 1 '22' '
- building-polygon.shp: 5326421 items, load time 15 '42' '
- highway-line.shp: 2599083 elements, load time 7 '56' '

Now there are data tables (Ex: "xxx". "YYY". "Water-polygon") and associated tables with spatial indexes (Ex: "xxx". "YYY". "Water-polygon__spx"). Using the fact that they are all formed with the same parameters, we merge them into a single spatial index

registry_set ('__spx_startx', '-180'); registry_set ('__spx_starty', '35'); registry_set ('__spx_nx', '3600'); registry_set ('__spx_ny', '500'); registry_set ('__spx_stepx', '0.1'); registry_set ('__spx_stepy', '0.1'); drop table "xxx"."YYY"."total__spx"; create table "xxx"."YYY"."total__spx" ( "node_" integer, "tab_" integer, "oid_" integer, primary key ("node_", "tab_", "oid_")); create procedure mp_total_spx () { for select "node_", "oid_" from "xxx"."YYY"."water-polygon__spx" do { insert into "xxx"."YYY"."total__spx" ("oid_","tab_","node_") values ("oid_", 1, "node_"); commit work; } for select "node_", "oid_" from "xxx"."YYY"."vegetation-polygon__spx" do { insert into "xxx"."YYY"."total__spx" ("oid_","tab_","node_") values ("oid_", 2, "node_"); commit work; } for select "node_", "oid_" from "xxx"."YYY"."highway-line__spx" do { insert into "xxx"."YYY"."total__spx" ("oid_","tab_","node_") values ("oid_", 3, "node_"); commit work; } for select "node_", "oid_" from "xxx"."YYY"."building-polygon__spx" do { insert into "xxx"."YYY"."total__spx" ("oid_","tab_","node_") values ("oid_", 4, "node_"); commit work; } }; mp_total_spx ();

Making a general index takes 2'16 ''. Using cursors instead of constructions (insert ... select ...) is caused by the fact that in the OpenSource version there is a 50 MB limit on the size of a single transaction log in the log. This restriction is easy to fix by rebuilding, but that’s another story.

Now we set up work with the index in the same way as we did before

 create procedure "xxx"."YYY"."total__spx_enum_items_in_box" ( in minx double precision, in miny double precision, in maxx double precision, in maxy double precision) { declare nx, ny integer; nx := atoi (registry_get ('__spx_nx')); ny := atoi (registry_get ('__spx_ny')); declare startx, starty double precision; startx := atof (registry_get ('__spx_startx')); starty := atof (registry_get ('__spx_starty')); declare stepx, stepy double precision; stepx := atof (registry_get ('__spx_stepx')); stepy := atof (registry_get ('__spx_stepy')); declare sx, sy, fx, fy integer; sx := cast (floor ((minx - startx)/stepx) as integer); fx := cast (floor ((maxx - startx)/stepx) as integer); sy := cast (floor ((miny - starty)/stepy) as integer); fy := cast (floor ((maxy - starty)/stepy) as integer); declare ress any; ress := vector(dict_new (), dict_new (), dict_new (), dict_new ()); for (declare iy integer, iy := sy; iy <= fy; iy := iy + 1) { declare ixf, ixt integer; ixf := nx * iy + sx; ixt := nx * iy + fx; for select "node_", "tab_", "oid_" from "xxx"."YYY"."total__spx" where "node_" >= ixf and "node_" <= ixt do { dict_put (ress["tab_" - 1], "oid_", 0); } } result_names('oid', 'tab'); declare ix integer; for (ix := 0; ix < 4; ix := ix + 1) { declare arr any; arr := dict_list_keys (ress[ix], 1); gvector_digit_sort (arr, 1, 0, 1); foreach (integer oid in arr) do { result (oid, ix + 1); } } }; create procedure view "xxx"."YYY"."v_total__spx_enum_items_in_box" as "xxx"."YYY"."total__spx_enum_items_in_box" (minx, miny, maxx, maxy) ("oid" integer, "tab" integer);

We check the correctness of the work:

 select count(*) from "xxx"."YYY"."v_total__spx_enum_items_in_box" as a where a.minx = 82.963 and a.miny = 54.9866 and a.maxx = 82.98997 and a.maxy = 55.0133; select count(*) from "xxx"."YYY"."v_vegetation-polygon_spx_enum_items_in_box" ... select count(*) from "xxx"."YYY"."v_water-polygon_spx_enum_items_in_box" ... select count(*) from "xxx"."YYY"."v_building-polygon_spx_enum_items_in_box" ... select count(*) from "xxx"."YYY"."v_highway-line_spx_enum_items_in_box" ...

The first request returned 29158, the rest: 941 + 33 + 20131 + 8053. Matches.

We measure the performance of the index.

We will launch a series of 10,000 random searches in the square [35 ... 45.50 ... 60] ° on the general index and in the old way. All requests are executed in one connection i. show performance on one core.

Request size	Time in a single index	Time as before
0.01 °	7'35 ''	8'26 ''
0.1 °	8'25 ''	9'20 ''
0.5 °	15'11 "	16'7 "

Conclusion : yes, a single index is faster, but the result is not amazing. On the other hand, all 4 of our layers are relatively densely populated, again, it is important that the amount of data is relatively small so that all the advantages of a single search should stand up to their full height. On the third hand, this is live data as it is.

Draw a map

First of all, let's draw it in the old way so that it can be compared with anything.

 create procedure mk_test_gif( in cminx double precision, in cminy double precision, in cmaxx double precision, in cmaxy double precision) { declare img any; img := img_create (512, 512, cminx, cminy, cmaxx, cmaxy, 1); declare cl integer; declare bg integer; { cl := img_alloc_color (img, 0, 0, 255); bg := img_alloc_color (img, 0, 0, 255); whenever not found goto nf; for select blob_to_string(Shape) as data from xxx.YYY."water-polygon" as x, xxx.YYY."v_water-polygon_spx_enum_items_in_box" as a where a.minx = cminx and a.miny = cminy and a.maxx = cmaxx and a.maxy = cmaxy and x."_OBJECTID_" = a.oid and x.maxx_ >= cminx and x.minx_ <= cmaxx and x.maxy_ >= cminy and x.miny_ <= cmaxy do { img_draw_polygone (img, data, cl, bg); } nf:; ...      ,    img_draw_polyline. ... } declare ptr integer; ptr := img_tostr (img); img_destroy (img); declare image any; image := img_fromptr(ptr); string_to_file('test.gif', image, -2); return; }; mk_test_gif(35., 50., 45., 60.);

Work takes 17.2 seconds.
')
Now as it is promised, one. The first thing that comes to mind
aggregates .
An aggregate is an object that is created when the cursor is opened, called on each line of the result, and finalized when the cursor is closed.

 create function mapx_agg_init (inout _agg any) {;}; create function mapx_agg_acc ( inout _agg any, in _tab integer, in _oid integer ) { if (_agg is null) { declare img any; img := img_create (512, 512, _tab[0], _tab[1], _tab[2], _tab[3], 1); _agg := img; return 0; } else { return case when _tab = 4 then (img_draw_polygone(_agg, ( select blob_to_string(bl.Shape) from "xxx"."YYY"."building-polygon" as bl where bl."_OBJECTID_" = _oid), 255, 255)) when _tab = 3 then (img_draw_polyline(_agg, ( select blob_to_string(hw.Shape) from "xxx"."YYY"."highway-line" as hw where hw."_OBJECTID_" = _oid), 100, 100)) when _tab = 2 then (img_draw_polygone(_agg, ( select blob_to_string(vg.Shape) from "xxx"."YYY"."vegetation-polygon" as vg where vg."_OBJECTID_" = _oid), 10, 10)) when _tab = 1 then (img_draw_polygone(_agg, ( select blob_to_string(wt.Shape) from "xxx"."YYY"."water-polygon" as wt where wt."_OBJECTID_" = _oid), 50, 50)) else 1 end; } }; create function mapx_agg_final (inout _agg any) returns integer { declare ptr integer; ptr := img_tostr (_agg); img_destroy (_agg); declare image any; image := img_fromptr(ptr); string_to_file('nskx_ii.gif', image, -2); return 1; }; create aggregate mapx_agg (in _tab integer, in _oid integer) returns integer from mapx_agg_init, mapx_agg_acc, mapx_agg_final; create procedure mk_testx_ii_gif( in cminx double precision, in cminy double precision, in cmaxx double precision, in cmaxy double precision) { declare cnt integer; select mapx_agg(tab, oid) into cnt from ( select * from (select vector(cminx, cminy, cmaxx, cmaxy) as tab, 0 as oid) as f1 union all (select tab, oid from xxx.YYY."v_total__spx_enum_items_in_box" as a where a.minx = cminx and a.miny = cminy and a.maxx = cmaxx and a.maxy = cmaxy) ) f_all; } mk_testx_ii_gif(35., 50., 45., 60.);

Unfortunately, there is no regular method to pass the initialization parameters to the aggregate, so you have to go for the trick, slip the union from the data and the initialization string to it, construct the context not in the constructor, but when you get the first string.
How so, the attentive reader will say, one select was promised, and there a whole bunch of them! In fact, the row identifiers come to the aggregate in an orderly and tablely manner, so seeming subqueries are in fact hand-organized join.
So, the running time is 42 seconds. NDA

Another attempt

 create procedure mk_testx_gif( in cminx double precision, in cminy double precision, in cmaxx double precision, in cmaxy double precision) { declare img any; img := img_create (512, 512, cminx, cminy, cmaxx, cmaxy, 1); declare cnt, cnt2 integer; declare cl1, bg1 integer; cl1 := img_alloc_color (img, 0, 0, 255); bg1 := img_alloc_color (img, 0, 0, 255); declare cl2, bg2 integer; cl2 := img_alloc_color (img, 0, 255, 0); bg2 := img_alloc_color (img, 0, 255, 0); declare cl3, bg3 integer; cl3 := img_alloc_color (img, 255, 100, 0); bg3 := img_alloc_color (img, 255, 100, 0); declare cl4, bg4 integer; cl4 := img_alloc_color (img, 255, 0, 0); bg4 := img_alloc_color (img, 255, 0, 0); select sum ( case when geom is null then 0 when geom_type = 2 then (img_draw_polyline(img, geom, cl, bg)) else (img_draw_polygone(img, geom, cl, bg)) end ) into cnt from ( select case when a.tab = 4 then ( select blob_to_string(bl.Shape) from "xxx"."YYY"."building-polygon" as bl where bl."_OBJECTID_" = a.oid) when a.tab = 3 then ( select blob_to_string(hw.Shape) from "xxx"."YYY"."highway-line" as hw where hw."_OBJECTID_" = a.oid) when a.tab = 2 then ( select blob_to_string(vg.Shape) from "xxx"."YYY"."vegetation-polygon" as vg where vg."_OBJECTID_" = a.oid) when a.tab = 1 then ( select blob_to_string(wt.Shape) from "xxx"."YYY"."water-polygon" as wt where wt."_OBJECTID_" = a.oid) else '' end as geom, case when a.tab = 3 then 2 else 1 end as geom_type, case when a.tab = 4 then cl4 when a.tab = 3 then cl3 when a.tab = 2 then cl2 when a.tab = 1 then cl1 else 0 end as cl, case when a.tab = 4 then bg4 when a.tab = 3 then bg3 when a.tab = 2 then bg2 when a.tab = 1 then bg1 else 0 end as bg from xxx.YYY."v_total__spx_enum_items_in_box" as a where a.minx = cminx and a.miny = cminy and a.maxx = cmaxx and a.maxy = cmaxy ) f_all; declare ptr integer; ptr := img_tostr (img); img_destroy (img); declare image any; image := img_fromptr(ptr); string_to_file('testx.gif', image, -2); return; }; mk_testx_gif(35., 50., 45., 60.);

This query runs 17.4 seconds. Thus, the optimizer was able to recognize hidden join'y and execute the query without any loss for beauty. A small gain in the actual index was eaten by the increased complexity of the query.

And here is the result:

In this nondescript image several million objects.

findings

Drawing a map in one query is not difficult. We even managed to do it without losing much performance. Win, however, also failed and the reason seems in the data structure. And this structure is sharpened by the use of traditional GIS.

For example, the 'highway-line' table contains a couple of dozen layers of different types with different attributes. Usually, such tables serve as the basis for all road layers that refer to one physical table and differ in filters. Of course, it is more convenient to work with one table than with two dozen (this moment was one of the motives for this work). Again, we have a common spatial index for you of these layers.

But there are also disadvantages. As before, to draw each layer, you need to execute a separate SQL query. Those. even though there is one index, there are still several searches for it. The maximum you can win on is on page caching. An additional index is needed - a record type; it is also required to search by it and cross the samples. In addition, since objects of different types are intermixed, there is little chance that objects of the same type will be side by side (on the same page) and thus the total number of readings increases.

What if we scatter, for example, a 'highway-line' table on a bunch of sub-tables by type and combine all of them into one spatial index, as we did above? Working with the index will not change, we will need only one search in it. Working with data will only accelerate. data locality will increase - data of the same type, close spatially more often appear next to the disk. And if the data of any type is not in the search extent, they simply will not be processed. No matter how many of them, it will not affect the reading of useful data.

And one more observation at the end. Indexes as such are pretty odd objects. Neither in relational algebra nor in relational calculus are they close to them. This is an implementation-dependent extension that allows the query processor to execute them more efficiently. In our case, the indices are loaded with some semantics, which is not in the data. In our multi-table index, the relationships between the layers are described, in fact, the layers that we want to draw together turn out to be in a specific index.

On the other hand, we cannot perceive our index as a table (although in fact it is a table, it is a forced measure due to the fact that we are forced to remain within the framework of a specific DBMS) because its values are table identifiers. This is a metatable and the query plan depends on the metadata from this one.

Again, traditionally, the query processor is free to choose which indices to use. But this is not our case. We can create multiple multi-table indexes and explicitly indicate which index is the source of the primary data stream. Regardless of what the optimizer thinks about this. It is nice to see how the “tree of life is forever green” through the “dry theory” of the relational model.

PS:

As an illustration, the marvelous "city plan of Paris" by Mark Twain’s hand from the story of the same name was used in the article’s cap.

Source: https://habr.com/ru/post/201474/

All Articles

We draw a map with one select or about the benefits of multi-indexes

Indexing

We measure the performance of the index.

Draw a map

findings

PS:

More articles: