Suppose that you have a table with a large number of records and you need to add one or more indexes to it with the following conditions:
- their generation should be as fast as possible
- so that generation can be done in portions.
For example, if there is a table for 300M records and working with it can only be done during off-hours, then the whole process can be broken up into three nights of 100M records. - the emergence of new indexes and the process of their generation should not interfere with the current work with the class / table
For this, we could use the already known
% BuildIndices () method, but in this case it will not satisfy our conditions.
What is the way out?
Theory
The new class
% Library.IndexBuilder with the one, but powerful
% ConstructIndicesParallel () method has been added to the Caché 2013.1 database version.
From the name it is already becoming clear that generation will occur in parallel with the involvement of all processor cores.
')
So, consider the parameters of this method in more detail:
ClassMethod % ConstructIndicesParallel ( pTaskId = "" , pStartId As% Integer = 0 , pEndId As% Integer = -1 , pSortBegin As% Integer = 1 , pDroneCount As% Integer = 0 , pLockFlag As% Integer = 1 , pJournalFlag%% 1 ) as% Status- pTaskId - ID of the background process. Leave blank / undefined for interactive call
- pStartId - ID from which to start the generation. Default 1
- pEndId - ID to complete the generation on. The default is -1, meaning the last ID in the table
- pSortbegin is a 1/0 flag that determines whether to use $ SortBegin when generating.
- pDroneCount - the number of background processes for generating indices.
The default is 0. In this case, the code will independently determine the optimal number of processes based on the number of available cores / processors and the number of records processed. - pLockFlag - a flag that determines the behavior of the lock during generation:
- 0 = No lock
- 1 = Extent locking - Gets an exceptional lock on the entire extent during generation.
- 2 = Row level locking - Gets a shared lock on each row being processed and the index node for the item. When the generation of an index for a particular row is completed, the row is immediately unlocked.
- pJournalFlag - 0/1 flag defining the use of journaling:
1 - index generation will be logged, 0 - will not.
Practice
Now consider an example of applying a new class.
To begin with, we will create an educational class in the
USER area, fill it with 1M records with variable length lines [1-100] and construct an index using the classic
% BuildIndices () , so that we can compare:
Class demo.test Extends% Persistent
{
Index idxn On n As SQLUPPER ( 6 );
Property n As% String ( MAXLEN = 100 );
ClassMethod Fill ( n As% Integer = 10,000,000 )
{
set data = $ Replace ( $ Justify ( "" , 100), "" , "a" )
set time = $ ZHorolog
do DISABLE ^% NOJRN
do .. % KillExtent ()
set ^ demo.testD = n
set ^ demo.testD (1) = $ ListBuild ( "" , $ Extract ( data , 1, $ Random (100) +1))
for i = 2: 1: n set ^ ( i ) = $ ListBuild ( "" , $ Extract ( data , 1, $ Random (100) +1))
do ENABLE ^% NOJRN
write "insert =" , $ ZHorolog - time , "sec." !
}
ClassMethod BIndex ()
{
set time = $ ZHorolog
do .. % BuildIndices (, 1,1)
write "reindex =" , $ ZHorolog - time , "sec." !
}
}
My results:
USER>do ##class(demo.test).Fill()
= 9.706935 .
USER>do ##class(demo.test).BIndex()
= 71.966953 .
Now use the new
% IndexBuilder class. To do this, follow these steps:
- First, clear the index data from the previous test using the % PurgeIndices () method (optional step)
- inherit our class from % IndexBuilder
- we list the indices separated by commas in the INDEXBUILDERFILTER parameter.
If this parameter is left empty, all indexes will be regenerated. - we will make our index invisible for SQL, so that the optimizer does not use an index that is not yet ready for work.
For this, we use the $ SYSTEM.SQL.SetMapSelectability () method:
ClassMethod SetMapSelectability ( pTablename As% Library.String = "" , pMapname As% Library.String = "" , pValue As% Boolean = "" ) as% Library.String
Argument Description:
- pTablename - table name
- pMapname - index name
- pValue - 0/1 flag, defining the visibility (1) or invisibility (0) of the index for the SQL optimizer
Note: You can make the index invisible long before it is added to the class.
- call the % ConstructIndicesParallel () method
- make our index visible for SQL
- Profit!
As a result, our class will take the following form:
Class demo.test Extends ( % Persistent , % IndexBuilder )
{
Parameter INDEXBUILDERFILTER = "idxn" ;
Parameter BITMAPCHUNKINMEMORY = 0 ;
Index idxn On n As SQLUPPER ( 6 );
Property n As% String ( MAXLEN = 100 );
...
ClassMethod FastBIndex ()
{
do .. % PurgeIndices ( $ ListBuild ( "idxn" ))
do $ SYSTEM .SQL . SetMapSelectability ( $ classname (), "idxn" , $$$ NO )
do .. % ConstructIndicesParallel (,,, 1,, 2.0)
do $ SYSTEM .SQL . SetMapSelectability ( $ classname (), "idxn" , $$$ YES )
}
}
My results:
USER>do ##class(demo.test).FastBIndex()
Building 157 chunks and will use parallel build algorithm with 4 drone processes.
SortBegin is requested.
Started drone process: 3812
Started drone process: 4284
Started drone process: 7004
Started drone process: 7224
Expected time to complete is 43 secs to build 157 chunks of 64,000 objects using 4 processes.
Waiting for processes to complete....done.
Elapsed time using 4 processes was 34.906643.
As you can see, the speed has doubled.
On your hardware and on your data, the results can get even better.
Even faster?
But is it possible to further accelerate the regeneration of indices?
If you have a lot of RAM left, then yes.
In the process of generating indices by the designer for internal needs, so-called bitmap blocks are temporarily formed. By default, they are written to private globals, but using the
BITMAPCHUNKINMEMORY boolean parameter
you can specify that they are formed in RAM. To do this, assign the parameter to 1.
Note that if RAM is allocated a little, and the indices are large, then you can get the error
<STORE> .
By default,
BITMAPCHUNKINMEMORY is 0.