Z-order vs R-tree, continued

Last time we came to the conclusion that for the effective operation of a spatial index based on a Z-order, 2 things must be done:

efficient subinterval acquisition algorithm
low-level work with B-tree

That's exactly what we do under the cut.

Let's start with a more interesting algorithm.

Subquery breakdown

We will consider a 2-dimensional algorithm (indexation of points) due to its relative simplicity. However, the algorithm is easily generalized to large dimensions.
')
Now we (also for simplicity) will use non-negative integer coordinates. The coordinates are limited to 32 bits, so the value of the index key can be stored in uint64

We need the following simple z-numbering properties:

Let a certain rectangle be marked on the plane. Then, among all points of the rectangle, the smallest z-number has a lower left corner of the rectangle (we will call it “z”), and the largest one has the upper right corner (we will call it “Z”). This property obviously follows from the method of constructing z-numbers.
Each rectangle can be divided in one way into two rectangles (by a vertical or horizontal cut) so that all the z-numbers of the first rectangle are less than all the z-numbers of the second. This follows from the self-similarity of the Z-curve. The unit cell of four cells is divided in half, then in half in two cuts, the same happens at all levels of the hierarchy.

How exactly do you need to cut the extent to maintain the continuity property of the intervals?

It follows from the self-similarity of a curve that one can cut only along the lattice with a step of a power of two and with a node at the origin of coordinates.

But which lattice of 64 available to choose? It is quite simple. The extent to be cut must occupy more than one cell in the desired grid, otherwise there is simply nothing to cut. On the other hand, in any of the coordinates, the size cannot exceed 2 cells, and at least one must be strictly 2 cells. If, in both dimensions, the extent being cut occupies 2 cells, we will cut according to the coordinate whose rank in the construction of the Z-value is highest.

How to find such a grid? This is also easy. It is enough to compare the Z-values of the angles z and Z. We begin to compare with the higher digits and find the digit where their values diverged. Here is the desired grid.

How to make a cut? Here it is necessary to recall the method of constructing the Z-value, namely, that the x & y coordinates alternate through the discharge. Consequently:

Let the difference between z and Z begin in discharge m
We will cut into one coordinate, on which it was necessary to m, regardless of whether x is or y, even if in this case it is x, however, everything works the same for y
From the extent (x0, y0, x1, y1) we should get two: (x0, y0, x2-1, y1), (x2, y0, x1, y1)
And in order to get x2, it is enough to zero all the digits of the x coordinates under m, i.e. through one
x2-1 is obtained by zeroing the m bit and assigning 1 to all the low-order x bits

So, what does the subquery generation algorithm look like? Pretty simple:

We get a queue of subqueries, initially in this queue a single element - the desired extent
While the queue is not empty:
1. We get item from the top of the queue
2. If the stop criterion does not work for this request
  1. We get z and Z - values for its angles
  2. We compare z and Z - we find the discharge m, on which we will cut
  3. In the above way we get two subqueries
  4. We place them in the queue, first the one with the larger Z-values, then the second

This method guarantees us the generation of subqueries, in which the Z-values of the final (that is, those on which the stop criterion has worked) of the subqueries only increase, no forwarding occurs.

Stop criterion

It is very important, if you neglect it, the generation of subqueries will continue until everything is cut into single squares.

It should be noted that in developing such a criterion we cannot rely only on the parameters of the query, area, geometric properties ... In real data, the distribution of points can be very uneven, for example, cities and empty spaces. The population of the area is unknown to us in advance.

This means integration with the search in the index tree, which, as we remember, is a B-tree.

What does a subquery look like in terms of an index? This is a collection of data located on a certain number of pages. If the subquery is empty, then the data interval is empty, but nonetheless looks somewhere, because to understand that there is no data, you must try to read them and go down from the top of the tree to the leaf page. It may happen that an empty query also looks outside the tree.

In general, the process of subtracting subquery data consists of two phases -

Sounding a tree. Starting from the root page, we are looking for a key less than or equal to the Z-value of the lower left corner of the subquery (z) and so go down to the leaf page. At the exit we get a stack of pages and a leaf page “in the air”. On this page we are looking for an element greater than or equal to the one we searched for
Read ahead to the end of the subquery. In PostgreSQL, leafy B-tree pages are linked by a list, if that were not the case, in order to get the next page you would have to climb up the stack of pages in order to go down to it.

So, after the probe request, we have in our hands a leaf page on which our data presumably begins. Different situations are possible:

Found page element is greater than the upper right corner of the subquery (Z). Those. No data.
Found page element is less than Z, last page element is less than Z. Ie The subquery starts on this page, ends somewhere further.
The found page element is less than Z, the last page element is greater than Z. Ie The entire subquery is located on this page.
The found page element is less than Z, the last page element is Z. Ie The subquery begins on this page, but it probably ends on the next (several elements with the same coordinates). And maybe much further if there are many duplicates.

Option N1 does not require any action. For N2, the following stopping criterion (splitting subqueries) seems natural - we will cut them until we get options 3 or 4. With option N3, everything is clear, in the case of N4 data pages there may be several, but it is pointless to cut the subquery . on the next page (s) there can only be data with a key equal to Z, after the cut we will find ourselves in the same situation. To cope with this, it is sufficient to simply read all the data with the key equal to Z from the following pages (pages). They may not be present, in general, N4 is a rather exotic case.

Work with B-tree

Low-level work with the B-tree does not present special difficulties. But you have to create an extension . The general logic is as follows: register the SRF function:

CREATE TYPE __ret_2d_lookup AS (c_tid TID, x integer, y integer); CREATE FUNCTION zcurve_2d_lookup(text, integer, integer, integer, integer) RETURNS SETOF __ret_2d_lookup AS 'MODULE_PATHNAME' LANGUAGE C IMMUTABLE STRICT;

Which receives the input index name and extent. And returns a set of elements, in each of which a pointer to a row of the table and coordinates.

Access to the actual tree is as follows:

 const char *relname; /*   */ ... List *relname_list; RangeVar *relvar; Relation rel; ... relname_list = stringToQualifiedNameList(relname); relvar = makeRangeVarFromNameList(relname_list); rel = indexOpen(rel); ... indexClose(rel);

Getting the root page:

 int access = BT_READ; Buffer buf; ... buf = _bt_getroot(rel, access);

In general, the search in the index is made like a normal search in the B-tree (see postgresql / src / backend / access / nbtree / nbtsearch.c). The changes are related to the specifics of the key, perhaps it could have been done without it, even if it were a bit slower.

Search inside the page looks like this:

 Page page; BTPageOpaque opaque; OffsetNumber low, high; int32 result, cmpval; Datum datum; bool isNull; ... page = BufferGetPage(buf); opaque = (BTPageOpaque) PageGetSpecialPointer(page); low = P_FIRSTDATAKEY(opaque); high = PageGetMaxOffsetNumber(page); ...     ... /*       */ if (P_ISLEAF(opaque)) return low; /*   -   */ return OffsetNumberPrev(low);

Receiving a page element:

 OffsetNumber offnum; Page page; Relation rel; TupleDesc itupdesc = RelationGetDescr(rel); IndexTuple itup; Datum datum; bool isNull; uint64 val; ... itup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offnum)); datum = index_getattr(itup, 1, itupdesc, &isNull); val = DatumGetInt64(datum);

Summary algorithm

We get a queue of subqueries, initially in this queue a single element - the desired extent
While the queue is not empty:
1. We get item from the top of the queue
2. Perform a probe query in the index for z (lower left corner). In order to save, you can do the probe not every time, but only if the last value found (which is initially 0) is greater than or equal to z
3. If the found minimum value exceeds Z (upper right corner), we finish processing this subquery, go to P.2
4. Check the last element of the leaf page of the B-tree where the probe request stopped
5. If it is greater than or equal to Z, select the page elements, filter them for belonging to the search extent and remember the resulting points.
6. If it is equal to Z, read the index forward until the keys are fully exhausted with a value equal to Z and also remember them
7. Otherwise - the last value of the page is less than Z.
  1. We compare z and Z - we find the discharge m, on which we will cut
  2. In the above way we get two subqueries
  3. We place them in the queue, first the one with the larger Z-values, then the second

Preliminary results

The title of the article presents the breakdown of a real query into subqueries and resulting points. A comparison of the results found by the R-tree with those obtained by the algorithm described above is shown. Picture debugging times and it is clear that one point is not enough.

But the pictures are pictures, but I want to see a comparison of performance. From our side there will be the same table:

 create table test_points (x integer,y integer); COPY test_points from '/home/.../data.csv'; create index zcurve_test_points on test_points(zcurve_val_from_xy(x, y));

And requests like:

 select count(1) from zcurve_2d_lookup('zcurve_test_points', 500000,500000,501000,501000);

We will compare with the R-tree as a standard de facto. Moreover, unlike the previous article, we need an “index only scan” on the R-tree, since our Z-index no longer refers to the table.

 create table test_pt as (select point(x,y) from test_points); create index test_pt_idx on test_pt using gist (point); vacuum test_pt;

On such data request:

 explain (analyze, buffers) select * from test_pt where point <@ box( point(500000, 500000), point(510000, 510000));

gives out:

  QUERY PLAN --------------------------------------------------------------------------------------------- Index Only Scan using test_pt_idx on test_pt (cost=0.42..539.42 rows=10000 width=16) (actual time=0.075..0.531 rows=968 loops=1) Index Cond: (point <@ '(510000,510000),(500000,500000)'::box) Heap Fetches: 0 Buffers: shared hit=20 Planning time: 0.088 ms Execution time: 0.648 ms (6 rows)

as required.

Actually comparison:

Request type	Index type	Time ms	Shared reads	Shared hits
100100 ~ 1 point	R-tree Z-curve	0.4 * 0.34 *	1.8 1.2	3.8 3.8
10001000 ~ 100 points	R-tree Z-curve	0.5 ... 7 ** 0.41 *	6.2 2.8	4.9 37
10,000x10000 ~ 10,000 points	R-tree Z-curve	4 ... 150 * 6.6 **	150 43.7	27 2900

* - data obtained by averaging a series of length 100 000
** - data obtained by averaging a series of different lengths, 10,000 vs 100,000
*** - data obtained by averaging a series of different lengths, 1 000 vs 10 000
**** - data obtained by averaging a series of length 10 000

The measurements were conducted on a modest virtual machine with two cores and 4 GB of RAM, so the times have no absolute value, but the numbers of the pages you read can be trusted.
The times are shown on the second runs, on the heated server and the virtual machine. The number of buffers read is on a freshly raised server.

findings

in any case, on small queries, the Z-index is faster than the R-tree
and reads at the same time noticeably less pages
R-tree much earlier begins to massively miss the cache
at the same time, the Z-index itself is four times smaller, so that the cache works more efficiently for it
moreover, it is possible that the misses go past the disk cache of the host machine, otherwise it is difficult to explain such a difference

What's next

The considered two-dimensional point index is intended only for testing the concept, in life it is of little use.

Even for a three-dimensional index of a 64-bit key, there is no longer enough (or end-to-end) for a useful resolution.

So what will be ahead:

pass to another key, numeric
3D option, including points on the sphere
checking the ability to work with the Hilbert curve
full measurements
4D option - rectangles
6D variant - rectangles on a sphere
...

PS: The sources are laid out here with the BSD license, described in this article corresponds to the “raw-2D” branch

PPS: The algorithm as such was developed in ancient times (2004-2005) in collaboration with Alexander Artyushin.

PPPS: Many thanks to the guys from PostgresPro for encouraging me to implement this algorithm in PostgreSQL.

PPPPS: Continued here

Source: https://habr.com/ru/post/319810/

All Articles