📜 ⬆️ ⬇️

Own index types in Caché DBMS

In the object and relational data models of the Caché DBMS, there are three types of indices - regular, bitmap and bitslice . If for some reason these indexes are not enough, starting from version 2013.1 the programmer can determine his type of indexes and use it in any classes.

Details under the cut (if you are not afraid of words like method-generator).

“Own type of indexes” is a class that implements the methods of the % Library.FunctionalIndex interface for inserting / deleting / changing values ​​in the index. This class can be specified as an index type in the index definition.

For example:
')
Property A As %String; Property B As %String; Index someind On (A,B) As CustomPackage.CustomIndex; 

The CustomPackage.CustomIndex class is just the implementation of its type of indexes.

As an example, consider a small prototype of the quad-tree index for spatial data, created on the hackathon by a team composed of Andrei ARechitsky Rechitsky, Alexander Pogrebnikov and the author of these lines. The hackathon was held as part of the annual school of developers InterSystems (special thanks to the mastermind of the hackathon tsafin ). The materials of the school, by the way, are available on our website .

In this article we will not touch on what a quad tree is and how to work with it.

Let us dwell on the creation of a class that implements the % Library.FunctionalIndex interface for the current implementation of a quadtree. She was engaged in our hackathon team Andrei. Andrei created the class SpatialIndex.Indexer , which knew how two methods - Insert (x, y, id) and Delete (x, y, id) . When creating an object of the SpatialIndex.Indexer class , it was necessary to specify the global node in whose subnodes an index was written. It remained for me to create the SpatialIndex.Index class that implements the InsertIndex , UpdateIndex , DeleteIndex, and PurgeIndex methods . The first three of these methods take as input the Id of the row to be changed and the values ​​to be indexed in the same order as in the definition of the index in the class where this index is used. In our example, pArg (1) is A , pArg (2) is B.

 Class SpatialIndex.Index Extends %Library.FunctionalIndex [ System = 3 ] { ClassMethod InsertIndex(pID As %CacheString, pArg... As %Binary) [ CodeMode = generator, ServerOnly = 1 ] { if %mode'="method" { //' set IndexGlobal = ..IndexLocation(%class,%property) $$$GENERATE($C(9)_"set indexer = ##class(SpatialIndex.Indexer).%New($Name("_IndexGlobal_"))") $$$GENERATE($C(9)_"do indexer.Insert(pArg(1),pArg(2),pID)") } } ClassMethod UpdateIndex(pID As %CacheString, pArg... As %Binary) [ CodeMode = generator, ServerOnly = 1 ] { if %mode'="method" { //' set IndexGlobal = ..IndexLocation(%class,%property) $$$GENERATE($C(9)_"set indexer = ##class(SpatialIndex.Indexer).%New($Name("_IndexGlobal_"))") $$$GENERATE($C(9)_"do indexer.Delete(pArg(3),pArg(4),pID)") $$$GENERATE($C(9)_"do indexer.Insert(pArg(1),pArg(2),pID)") } } ClassMethod DeleteIndex(pID As %CacheString, pArg... As %Binary) [ CodeMode = generator, ServerOnly = 1 ] { if %mode'="method" { //' set IndexGlobal = ..IndexLocation(%class,%property) $$$GENERATE($C(9)_"set indexer = ##class(SpatialIndex.Indexer).%New($Name("_IndexGlobal_"))") $$$GENERATE($C(9)_"do indexer.Delete(pArg(1),pArg(2),pID)") } } ClassMethod PurgeIndex() [ CodeMode = generator, ServerOnly = 1 ] { if %mode'="method" { //' set IndexGlobal = ..IndexLocation(%class,%property) $$$GENERATE($C(9)_"kill " _ IndexGlobal) } } ClassMethod IndexLocation(className As %String, indexName As %String) As %String { set storage = ##class(%Dictionary.ClassDefinition).%OpenId(className).Storages.GetAt(1).IndexLocation quit $Name(@storage@(indexName)) } } 

The IndexLocation method is auxiliary, by the name of the class and the index it returns the name of the global node in which to store the index values.

Consider a test class with an index of type SpatialIndex.Index :

 Class SpatialIndex.Test Extends %Persistent { Property Name As %String(MAXLEN = 300); Property Latitude As %String; Property Longitude As %String; Index coord On (Latitude, Longitude) As SpatialIndex.Index; } 

When compiling the SpatialIndex.Test class , for each index of the SpatialIndex.Index type, the following methods are generated in the INT code:

 zcoordInsertIndex(pID,pArg...) public {   set indexer = ##class(SpatialIndex.Indexer).%New($Name(^SpatialIndex.TestI("coord")))   do indexer.Insert(pArg(1),pArg(2),pID) } zcoordPurgeIndex() public {   kill ^SpatialIndex.TestI("coord") } zcoordSegmentInsert(pIndexBuffer,pID,pArg...) public {   do ..coordInsertIndex(pID, pArg...) } zcoordUpdateIndex(pID,pArg...) public {   set indexer = ##class(SpatialIndex.Indexer).%New($Name(^SpatialIndex.TestI("coord")))   do indexer.Delete(pArg(3),pArg(4),pID)   do indexer.Insert(pArg(1),pArg(2),pID) } 

And the % SaveData , % DeleteData , % SQLInsert , % SQLUpdate , % SQLDelete methods call index methods. For example, in % SaveData :

  if insert {    // ...    do ..coordInsertIndex(id,i%Latitude,i%Longitude,"")    // ... } else {    // ...    do ..coordUpdateIndex(id,i%Latitude,i%Longitude,zzc27v3,zzc27v2,"")    // ... } 

The fun example is to look at a working example - download files from the repository https://github.com/intersystems-ru/spatialindex/tree/no-web-interface . This is a link to a branch without a web interface. Import the classes themselves, unzip RuCut.zip and download the data:

 do $system.OBJ.LoadDir("c:\temp\spatialindex","ck") do ##class(SpatialIndex.Test).load("c:\temp\rucut.txt") 

The file rucut.txt stores data about 100'000 locations in Russia - the name and coordinates. The load method reads each line from the file and saves it as an object of the SpatialIndex.Test class. After its execution in the global ^ SpatialIndex.TestI (“coord”), the quadtree will be stored in Latitude and Longitude coordinates.

And now requests


Build index - half the battle. The most interesting thing is when queries can use this index. For indexes of non-standard types, there is a standard syntax for using them, which looks like this:

 SELECT * FROM SpatialIndex.Test WHERE %ID %FIND search_index(coord, 'window', 'minx=56,miny=56,maxx=57,maxy=57') 

Here % ID% FIND search_index is the fixed part. Next comes the name of the index, note without quotes. All other parameters ('window', 'minx = 56, miny = 56, maxx = 57, maxy = 57) are passed to the Find method, which also needs to be defined in the class describing the type of index (in our case, SpatialIndex.Index ):

 ClassMethod Find(queryType As %Binary, queryParams As %String) As %Library.Binary [ CodeMode = generator, ServerOnly = 1, SqlProc ] { if %mode'="method" { //' set IndexGlobal = ..IndexLocation(%class,%property) set IndexGlobalQ = $$$QUOTE(IndexGlobal) $$$GENERATE($C(9)_"set result = ##class(SpatialIndex.SQLResult).%New()") $$$GENERATE($C(9)_"do result.PrepareFind($Name("_IndexGlobal_"), queryType, queryParams)") $$$GENERATE($C(9)_"quit result") } } 

There are two parameters here - queryType and queryParams , but this is absolutely not necessary, there may be more or less.

The Find method when compiling a class in which the SpatialIndex.Index index is used generates a helper method z <IndexName> Find , which is called when executing SQL queries:

 zcoordFind(queryType,queryParams) public { Set:'$isobject($get(%sqlcontext)) %sqlcontext=##class(%Library.ProcedureContext).%New()   set result = ##class(SpatialIndex.SQLResult).%New()   do result.PrepareFind($Name(^SpatialIndex.TestI("coord")), queryType, queryParams)   quit result } 

The Find method must return an instance of the class that implements the % SQL.Abstract interface. The methods of this interface are NextChunk , PreviousChunk return bit strings in chunks of 64000 bits each . If the entry with the ID number satisfies the sample conditions, then the corresponding bit (number_bit * 64000 + position_number_internal_bit) is set to 1.

 Class SpatialIndex.SQLResult Extends %SQL.AbstractFind { Property ResultBits [ MultiDimensional, Private ]; Method %OnNew() As %Status [ Private, ServerOnly = 1 ] { kill i%ResultBits kill qHandle quit $$$OK } Method PrepareFind(indexGlobal As %String, queryType As %String, queryParams As %Binary) As %Status { if queryType = "window" { for i = 1:1:4 { set item = $Piece(queryParams, ",", i) set param = $Piece(item, "=", 1) set value = $Piece(item, "=" ,2) set arg(param) = value } set qHandle("indexGlobal") = indexGlobal do ##class(SpatialIndex.QueryExecutor).InternalFindWindow(.qHandle,arg("minx"),arg("miny"),arg("maxx"),arg("maxy")) set id = "" for { set id = $O(qHandle("data", id),1,idd) quit:id="" set tChunk = (idd\64000)+1, tPos=(idd#64000)+1 set $BIT(i%ResultBits(tChunk),tPos) = 1 } } quit $$$OK } Method ContainsItem(pItem As %String) As %Boolean { set tChunk = (pItem\64000)+1, tPos=(pItem#64000)+1 quit $bit($get(i%ResultBits(tChunk)),tPos) } Method GetChunk(pChunk As %Integer) As %Binary { quit $get(i%ResultBits(pChunk)) } Method NextChunk(ByRef pChunk As %Integer = "") As %Binary { set pChunk = $order(i%ResultBits(pChunk),1,tBits) quit:pChunk="" "" quit tBits } Method PreviousChunk(ByRef pChunk As %Integer = "") As %Binary { set pChunk = $order(i%ResultBits(pChunk),-1,tBits) quit:pChunk="" "" quit tBits } } 

The InternalFindWindow method of the SpatialIndex.QueryExecutor class in the example above is an implementation of searching for points that fall into a given rectangle. Further, in the FOR loop, the IDs of the matching strings are written into bit sets.

In our hackathon project, in addition to searching in a rectangle, Andrey implemented a search inside the oval:

 SELECT * FROM SpatialIndex.Test WHERE %ID %FIND search_index(coord,'radius','x=55,y=55,radiusX=2,radiusY=2') and name %StartsWith 'Z' 

A bit about % FIND predicate


This predicate has an additional parameter SIZE , which can prompt the query optimizer an approximate order of the number of rows that will satisfy the predicate. Based on this parameter, the optimizer will choose whether or not to use the index to which % FIND refers.

For example, let's add the following index to the SpatialIndex.Test class:

 Index ByName on Name; 

Recompile the class and build this index:

 write ##class(SpatialIndex.Test).%BuildIndices($LB("ByName")) 

And, of course, run TuneTable:

 do $system.SQL.TuneTable("SpatialIndex.Test", 1) 

Consider the query plan:

 SELECT * FROM SpatialIndex.Test WHERE name %startswith 'za' and %ID %FIND search_index(coord,'radius','x=55,y=55,radiusX=2,radiusY=2') size ((10)) 



The coord index is supposed to return few rows, so the optimizer will not apply to the index by the Name field.

Another picture for the request:

 SELECT * FROM SpatialIndex.Test WHERE name %startswith 'za' and %ID %FIND search_index(coord,'radius','x=55,y=55,radiusX=2,radiusY=2') size ((1000)) 



When executing this query, both indexes will be used.

As a final example, a query that uses only an index across the Name field — using the coord index, if it is expected that it returns about 100,000 lines, is useless:

 SELECT * FROM SpatialIndex.Test WHERE name %startswith 'za' and %ID %FIND search_index(coord,'radius','x=55,y=55,radiusX=2,radiusY=2') size ((100000)) 



Thanks to all who read or at least looked through this article to the end.

In addition to the documentation, links to which are slightly lower, other implementations of the % Library.FunctionalIndex and % SQL.AbstractFind interfaces will be of great help. To see these implementations, open one of these classes in the studio and select Class -> Inherited Classes from the menu.

References:

Source: https://habr.com/ru/post/272689/


All Articles