
In the object and relational data models of the Caché DBMS, there are three types of indices - regular,
bitmap and
bitslice . If for some reason these indexes are not enough, starting from version 2013.1 the programmer can determine his type of indexes and use it in any classes.
Details under the cut (if you are not afraid of words like method-generator).
“Own type of indexes” is a class that implements the methods of the
% Library.FunctionalIndex interface for inserting / deleting / changing values in the index. This class can be specified as an index type in the index definition.
For example:
')
Property A As %String; Property B As %String; Index someind On (A,B) As CustomPackage.CustomIndex;
The
CustomPackage.CustomIndex class is just the implementation of its type of indexes.
As an example, consider a small prototype of the quad-tree index for spatial data, created on the
hackathon by a team composed of Andrei
ARechitsky Rechitsky,
Alexander Pogrebnikov and the author of these lines. The hackathon was held as part of the annual school of developers InterSystems (special thanks to the mastermind of the hackathon
tsafin ). The materials of the school, by the way, are available on
our website .
In this article we will not touch on what a quad tree is and how to work with it.
Let us dwell on the creation of a class that implements the
% Library.FunctionalIndex interface for the current implementation of a quadtree. She was engaged in our hackathon team Andrei. Andrei created the class
SpatialIndex.Indexer , which knew how two methods -
Insert (x, y, id) and
Delete (x, y, id) . When creating an object of the
SpatialIndex.Indexer class
, it was necessary to specify the global node in whose subnodes an index was written. It remained for me to create the
SpatialIndex.Index class that implements the
InsertIndex ,
UpdateIndex ,
DeleteIndex, and
PurgeIndex methods . The first three of these methods take as input the
Id of the row to be changed and the values to be indexed in the same order as in the definition of the index in the class where this index is used. In our example,
pArg (1) is
A ,
pArg (2) is
B. Class SpatialIndex.Index Extends %Library.FunctionalIndex [ System = 3 ] { ClassMethod InsertIndex(pID As %CacheString, pArg... As %Binary) [ CodeMode = generator, ServerOnly = 1 ] { if %mode'="method" { //' set IndexGlobal = ..IndexLocation(%class,%property) $$$GENERATE($C(9)_"set indexer = ##class(SpatialIndex.Indexer).%New($Name("_IndexGlobal_"))") $$$GENERATE($C(9)_"do indexer.Insert(pArg(1),pArg(2),pID)") } } ClassMethod UpdateIndex(pID As %CacheString, pArg... As %Binary) [ CodeMode = generator, ServerOnly = 1 ] { if %mode'="method" { //' set IndexGlobal = ..IndexLocation(%class,%property) $$$GENERATE($C(9)_"set indexer = ##class(SpatialIndex.Indexer).%New($Name("_IndexGlobal_"))") $$$GENERATE($C(9)_"do indexer.Delete(pArg(3),pArg(4),pID)") $$$GENERATE($C(9)_"do indexer.Insert(pArg(1),pArg(2),pID)") } } ClassMethod DeleteIndex(pID As %CacheString, pArg... As %Binary) [ CodeMode = generator, ServerOnly = 1 ] { if %mode'="method" { //' set IndexGlobal = ..IndexLocation(%class,%property) $$$GENERATE($C(9)_"set indexer = ##class(SpatialIndex.Indexer).%New($Name("_IndexGlobal_"))") $$$GENERATE($C(9)_"do indexer.Delete(pArg(1),pArg(2),pID)") } } ClassMethod PurgeIndex() [ CodeMode = generator, ServerOnly = 1 ] { if %mode'="method" { //' set IndexGlobal = ..IndexLocation(%class,%property) $$$GENERATE($C(9)_"kill " _ IndexGlobal) } } ClassMethod IndexLocation(className As %String, indexName As %String) As %String { set storage = ##class(%Dictionary.ClassDefinition).%OpenId(className).Storages.GetAt(1).IndexLocation quit $Name(@storage@(indexName)) } }
The
IndexLocation method is auxiliary, by the name of the class and the index it returns the name of the global node in which to store the index values.
Consider a test class with an index of type
SpatialIndex.Index :
Class SpatialIndex.Test Extends %Persistent { Property Name As %String(MAXLEN = 300); Property Latitude As %String; Property Longitude As %String; Index coord On (Latitude, Longitude) As SpatialIndex.Index; }
When compiling the
SpatialIndex.Test class
, for each index of the
SpatialIndex.Index type, the
following methods are generated in the INT code:
zcoordInsertIndex(pID,pArg...) public { set indexer = ##class(SpatialIndex.Indexer).%New($Name(^SpatialIndex.TestI("coord"))) do indexer.Insert(pArg(1),pArg(2),pID) } zcoordPurgeIndex() public { kill ^SpatialIndex.TestI("coord") } zcoordSegmentInsert(pIndexBuffer,pID,pArg...) public { do ..coordInsertIndex(pID, pArg...) } zcoordUpdateIndex(pID,pArg...) public { set indexer = ##class(SpatialIndex.Indexer).%New($Name(^SpatialIndex.TestI("coord"))) do indexer.Delete(pArg(3),pArg(4),pID) do indexer.Insert(pArg(1),pArg(2),pID) }
And the
% SaveData ,
% DeleteData ,
% SQLInsert ,
% SQLUpdate ,
% SQLDelete methods call index methods. For example, in
% SaveData :
if insert { // ... do ..coordInsertIndex(id,i%Latitude,i%Longitude,"") // ... } else { // ... do ..coordUpdateIndex(id,i%Latitude,i%Longitude,zzc27v3,zzc27v2,"") // ... }
The fun example is to look at a working example - download files from the repository
https://github.com/intersystems-ru/spatialindex/tree/no-web-interface . This is a link to a branch without a web interface. Import the classes themselves, unzip RuCut.zip and download the data:
do $system.OBJ.LoadDir("c:\temp\spatialindex","ck") do ##class(SpatialIndex.Test).load("c:\temp\rucut.txt")
The file rucut.txt stores data about 100'000 locations in Russia - the name and coordinates. The
load method reads each line from the file and saves it as an object of the
SpatialIndex.Test class. After its execution in the global
^ SpatialIndex.TestI (“coord”), the quadtree will be stored in
Latitude and
Longitude coordinates.
And now requests
Build index - half the battle. The most interesting thing is when queries can use this index. For indexes of non-standard types, there is a standard syntax for using them, which looks like this:
SELECT * FROM SpatialIndex.Test WHERE %ID %FIND search_index(coord, 'window', 'minx=56,miny=56,maxx=57,maxy=57')
Here
% ID% FIND search_index is the fixed part. Next comes the name of the index, note without quotes. All other parameters
('window', 'minx = 56, miny = 56, maxx = 57, maxy = 57) are passed to the
Find method, which also needs to be defined in the class describing the type of index (in our case,
SpatialIndex.Index ):
ClassMethod Find(queryType As %Binary, queryParams As %String) As %Library.Binary [ CodeMode = generator, ServerOnly = 1, SqlProc ] { if %mode'="method" { //' set IndexGlobal = ..IndexLocation(%class,%property) set IndexGlobalQ = $$$QUOTE(IndexGlobal) $$$GENERATE($C(9)_"set result = ##class(SpatialIndex.SQLResult).%New()") $$$GENERATE($C(9)_"do result.PrepareFind($Name("_IndexGlobal_"), queryType, queryParams)") $$$GENERATE($C(9)_"quit result") } }
There are two parameters here -
queryType and
queryParams , but this is absolutely not necessary, there may be more or less.
The
Find method when compiling a class in which the
SpatialIndex.Index index is used generates a helper method
z <IndexName> Find , which is called when executing SQL queries:
zcoordFind(queryType,queryParams) public { Set:'$isobject($get(%sqlcontext)) %sqlcontext=##class(%Library.ProcedureContext).%New() set result = ##class(SpatialIndex.SQLResult).%New() do result.PrepareFind($Name(^SpatialIndex.TestI("coord")), queryType, queryParams) quit result }
The
Find method must return an instance of the class that implements the
% SQL.Abstract interface. The methods of this interface are
NextChunk ,
PreviousChunk return bit strings in chunks of 64000 bits
each . If the entry with the
ID number satisfies the sample conditions, then the corresponding bit (number_bit * 64000 + position_number_internal_bit) is set to 1.
Class SpatialIndex.SQLResult Extends %SQL.AbstractFind { Property ResultBits [ MultiDimensional, Private ]; Method %OnNew() As %Status [ Private, ServerOnly = 1 ] { kill i%ResultBits kill qHandle quit $$$OK } Method PrepareFind(indexGlobal As %String, queryType As %String, queryParams As %Binary) As %Status { if queryType = "window" { for i = 1:1:4 { set item = $Piece(queryParams, ",", i) set param = $Piece(item, "=", 1) set value = $Piece(item, "=" ,2) set arg(param) = value } set qHandle("indexGlobal") = indexGlobal do ##class(SpatialIndex.QueryExecutor).InternalFindWindow(.qHandle,arg("minx"),arg("miny"),arg("maxx"),arg("maxy")) set id = "" for { set id = $O(qHandle("data", id),1,idd) quit:id="" set tChunk = (idd\64000)+1, tPos=(idd#64000)+1 set $BIT(i%ResultBits(tChunk),tPos) = 1 } } quit $$$OK } Method ContainsItem(pItem As %String) As %Boolean { set tChunk = (pItem\64000)+1, tPos=(pItem#64000)+1 quit $bit($get(i%ResultBits(tChunk)),tPos) } Method GetChunk(pChunk As %Integer) As %Binary { quit $get(i%ResultBits(pChunk)) } Method NextChunk(ByRef pChunk As %Integer = "") As %Binary { set pChunk = $order(i%ResultBits(pChunk),1,tBits) quit:pChunk="" "" quit tBits } Method PreviousChunk(ByRef pChunk As %Integer = "") As %Binary { set pChunk = $order(i%ResultBits(pChunk),-1,tBits) quit:pChunk="" "" quit tBits } }
The
InternalFindWindow method of the
SpatialIndex.QueryExecutor class in the example above is an implementation of searching for points that fall into a given rectangle. Further, in the FOR loop, the
IDs of the matching strings are written into bit sets.
In our hackathon project, in addition to searching in a rectangle, Andrey implemented a search inside the oval:
SELECT * FROM SpatialIndex.Test WHERE %ID %FIND search_index(coord,'radius','x=55,y=55,radiusX=2,radiusY=2') and name %StartsWith 'Z'
A bit about % FIND predicate
This predicate has an additional parameter
SIZE , which can prompt the query optimizer an approximate order of the number of rows that will satisfy the predicate. Based on this parameter, the optimizer will choose whether or not to use the index to which
% FIND refers.
For example, let's add the following index to the
SpatialIndex.Test class:
Index ByName on Name;
Recompile the class and build this index:
write ##class(SpatialIndex.Test).%BuildIndices($LB("ByName"))
And, of course, run TuneTable:
do $system.SQL.TuneTable("SpatialIndex.Test", 1)
Consider the query plan:
SELECT * FROM SpatialIndex.Test WHERE name %startswith 'za' and %ID %FIND search_index(coord,'radius','x=55,y=55,radiusX=2,radiusY=2') size ((10))

The coord index is supposed to return few rows, so the optimizer will not apply to the index by the
Name field.
Another picture for the request:
SELECT * FROM SpatialIndex.Test WHERE name %startswith 'za' and %ID %FIND search_index(coord,'radius','x=55,y=55,radiusX=2,radiusY=2') size ((1000))

When executing this query, both indexes will be used.
As a final example, a query that uses only an index across the
Name field — using the
coord index, if it is expected that it returns about 100,000 lines, is useless:
SELECT * FROM SpatialIndex.Test WHERE name %startswith 'za' and %ID %FIND search_index(coord,'radius','x=55,y=55,radiusX=2,radiusY=2') size ((100000))

Thanks to all who read or at least looked through this article to the end.
In addition to the documentation, links to which are slightly lower, other implementations of the
% Library.FunctionalIndex and
% SQL.AbstractFind interfaces will be of great help. To see these implementations, open one of these classes in the studio and select Class -> Inherited Classes from the menu.
References: