Indexing nonatomic attributes

Wikipedia Quotes ( 1NF ):

Each row and column intersection contains exactly one value from the corresponding domain (and nothing more).

The same value can be atomic or non-atomic, depending on the meaning of this value. For example, the value "4286" is
atomic , if its meaning is “pin-code of a credit card” (when splitting or re-ordering, the meaning is lost)
non-atomic if its meaning is “a set of numbers” (no sense is lost when splitting or reordering)

This article will discuss the standard methods for accelerating SQL queries for the following field types: string, date, simple list (in $ LB format), list collections and collection arrays.

Table of contents:

Introduction

Consider first the classic version of the example of the list of phones.
Create test data:

create table cl_phones(tname varchar2(100), phone varchar2(30)); insert into cl_phones(tname,phone) values ('','867-843-25'); insert into cl_phones(tname,phone) values ('','830-044-35'); insert into cl_phones(tname,phone) values ('','530-055-35'); insert into cl_phones(tname,phone) values ('','530-055-35'); insert into cl_phones(tname,phone) values ('','555-011-35'); insert into cl_phones(tname,phone) values ('','530-055-31'); insert into cl_phones(tname,phone) values ('','531-051-32'); insert into cl_phones(tname,phone) values ('','532-052-33'); insert into cl_phones(tname,phone) values ('','533-053-35');

Now, we list the phones separated by a comma for each name:

SELECT
% exact ( tname ) tname ,
LIST ( phone ) phonestr
FROM cl_phones
GROUP BY tname

or so:

SELECT
distinct % exact ( tname ) tname ,
LIST ( phone % foreach ( tname )) phonestr
FROM cl_phones

Result:

tname	phonestr
Andrew	867-843-25,830-044-35,530-055-35
Vania	530-055-31,531-051-32,532-052-33,533-053-35
Maksim	530-055-35,555-011-35

By building an index on the phone , we can do a very quick search for a specific phone. The only disadvantage of such a solution is in the duplication of names: the more elements in the list, the more our database will swell.
')
Therefore, sometimes it is useful to store multiple values in one field at once - it can be a list of phones or its parts, passwords, etc., - as a string with a separator, and at the same time have the ability to do a quick search for individual values. Of course, you can create an ordinary index on such a field and do a search for a substring in this large string, but, first, since there can be quite a few elements, the index length will be significant, and second, such an index will still will not help speed up our search.

So how to be?

Especially for such cases, a special kind of index was introduced for fields containing collections.
Collections can be either "real" (built-in list of <...> and array of <...> ) or "virtual."
In the case of built-in collections, the system is responsible for forming such an index on them, and the programmer cannot change this process; in the case of virtual ones, the responsibility for forming the index lies with the programmer.
A simple delimited string, a date, a simple list — these are examples of such “virtual” collections.

So, the index syntax on the collection is as follows:

INDEX idx1 ON (MyField (ELEMENTS));

INDEX idx1 ON (MyField (KEYS));

The process of forming such an index is the responsibility of the method named propertyname BuildValueArray , which the developer must implement independently.
The general signature of the method is as follows:

ClassMethod propertynameBuildValueArray ( value , ByRef valueArray ) As% Status

Where:

value - the value of the field for splitting into elements;
valueArray is the resulting array containing individual elements.
An array is a set of key / value of the form:
array (key1) = value1
array (key2) = value2
etc.

As mentioned above, for embedded collections, this method is automatically generated by the system and has the [Final] attribute, which prevents the developer from overriding it.

Let's build such indexes and see how to use them in our SQL queries.

Note: in order to avoid artifacts from the previous example, it is recommended to completely clean the globals and the storage scheme of the class before each new one.

Delimited string

Create the following class:

Class demo.test Extends% Persistent
{

Index iPhones On Phones (ELEMENTS);

Property Phones As% String ;

ClassMethod PhonesBuildValueArray (
value ,
ByRef array ) As% Status
{
If value = "" {
Set array (0) = value
} else {
Set list = $ ListFromString ( value , "," ), ptr = 0
While $ ListNext ( list , ptr , item ) {
Set array ( ptr ) = item
}
}
Quit $$$ OK
}

ClassMethod Fill ()
{
& sql ( truncate table demo . test )
& sql ( insert into demo . test ( phones )
select null union all
select 'a' union all
select 'b, a' union all
select 'b, b' union all
select 'a, c, b' union all
select ',,'
)

ZWrite ^ demo.testD
ZWrite ^ demo.testI
}

}

Run the Fill () method in the terminal:

USER>do ##class(demo.test).Fill()
^demo.testD=6
^demo.testD(1)=$lb("","")
^demo.testD(2)=$lb("","a")
^demo.testD(3)=$lb("","b,a")
^demo.testD(4)=$lb("","b,b")
^demo.testD(5)=$lb("","a,c,b")
^demo.testD(6)=$lb("",",,")
^demo.testI("iPhones"," ",1)=""
^demo.testI("iPhones"," ",6)=""
^demo.testI("iPhones"," A",2)=""
^demo.testI("iPhones"," A",3)=""
^demo.testI("iPhones"," A",5)=""
^demo.testI("iPhones"," B",3)=""
^demo.testI("iPhones"," B",4)=""
^demo.testI("iPhones"," B",5)=""
^demo.testI("iPhones"," C",5)=""

As you can see, the index got a string not entirely, but its separate parts. Thus, you decide how to split one large string into substrings. In addition to the delimited lines, this can be xml, json, or something else.

The contents of our table will be as follows:

ID	Phones
one	(null)
2	a
3	b, a
four	b, b
five	a, c, b
6	,,

Now we will try to find all the lines containing the value "a". For this, predicates like '% xxx%' or ['xxx' are usually used, for example:

select * from demo . test where Phones [ 'a'
select * from demo . test where Phones like '% a%'

But in this case our iPhones index will not be used. To use it, you need to use a special predicate.

FOR SOME %ELEMENT() (%VALUE = )

In view of the above, our request will look as follows:

select * from demo . test where for some% element ( Phones ) ( % value = 'a' )

As a result, due to the use of a specialized index, the speed of this query compared with the previous options will be much higher.
Of course, more complicated conditions are permissible, for example:

( % Value % STARTSWITH 'a' )
( % Value [ 'a' and % Value [ 'b' )
( % Value in ( 'c' , 'd' ))
( % Value is null )

And now a little magic ...

Hiding sensitive data

In the BuildValueArray method , we usually fill in an array based on the value of value .

But what happens if we do not follow this rule?

Let's try this example:

Class demo.test Extends% Persistent
{

Index iLogin On Login (ELEMENTS);

Property Login As% String ;

ClassMethod LoginBuildValueArray (
value ,
ByRef array ) As% Status
{
If value = "Vasya" {
Set array (0) = "test1"
Set array (1) = "test2"
Set array (2) = "test3"
} ElseIf value = "Peter" {
Set array ( "-" ) = "111"
Set array ( "5.4" ) = "222"
Set array ( "fg" ) = "333"
} else {
Set array ( "key" ) = "value"
}
Quit $$$ OK
}

ClassMethod Fill ()
{
& sql ( truncate table demo . test )
& sql ( insert into demo . test ( login )
select 'Vasya' union all
select 'Vasya' union all
select 'Peter' union all
select 'Peter' union all
select 'Ivan' union all
select 'Ivan'
)

ZWrite ^ demo.testD
ZWrite ^ demo.testI
}

}

After filling, the table contents will be as follows:

ID	Login
one	Vasya
2	Vasya
3	Petya
four	Petya
five	Ivan
6	Ivan

And now - attention! - try to execute the following query:

select * from demo . test where for some% element ( Login ) ( % value = '111' )

We will return:

ID	Login
3	Petya
four	Petya

As a result, we see that part of the data is visible in the table, and part is hidden in the index, but by which we can nonetheless do a search.

Where can it be useful to us?

For example, in the index you can hide not one, as is customary, but a whole set of available passwords for a given user or any other sensitive information that is undesirable to open with SQL. Of course, there are other options for this, for example GRANT column-privilege . But in this case, you will have to use stored procedures to access protected fields.

Hiding sensitive data (continued)

If you look at the globals in which data and indexes are stored for our table, then we will not see the values of our keys there: "5.4", "fg", etc .:

^demo.testD=6
^demo.testD(1)=$lb("","")
^demo.testD(2)=$lb("","")
^demo.testD(3)=$lb("","")
^demo.testD(4)=$lb("","")
^demo.testD(5)=$lb("","")
^demo.testD(6)=$lb("","")
^demo.testI("iLogin"," 111",3)=""
^demo.testI("iLogin"," 111",4)=""
^demo.testI("iLogin"," 222",3)=""
^demo.testI("iLogin"," 222",4)=""
^demo.testI("iLogin"," 333",3)=""
^demo.testI("iLogin"," 333",4)=""
^demo.testI("iLogin"," TEST1",1)=""
^demo.testI("iLogin"," TEST1",2)=""
^demo.testI("iLogin"," TEST2",1)=""
^demo.testI("iLogin"," TEST2",2)=""
^demo.testI("iLogin"," TEST3",1)=""
^demo.testI("iLogin"," TEST3",2)=""
^demo.testI("iLogin"," VALUE",5)=""
^demo.testI("iLogin"," VALUE",6)=""

Then why did we ask them?

To answer this question, we slightly modify our index and reload the table.

Index iLogin On (Login (KEYS), Login (ELEMENTS));

Globals will take a different form (I will only give global with indexes):

^demo.testI("iLogin"," -"," 111",3)=""
^demo.testI("iLogin"," -"," 111",4)=""
^demo.testI("iLogin"," 0"," TEST1",1)=""
^demo.testI("iLogin"," 0"," TEST1",2)=""
^demo.testI("iLogin"," 1"," TEST2",1)=""
^demo.testI("iLogin"," 1"," TEST2",2)=""
^demo.testI("iLogin"," 2"," TEST3",1)=""
^demo.testI("iLogin"," 2"," TEST3",2)=""
^demo.testI("iLogin"," 5.4"," 222",3)=""
^demo.testI("iLogin"," 5.4"," 222",4)=""
^demo.testI("iLogin"," FG"," 333",3)=""
^demo.testI("iLogin"," FG"," 333",4)=""
^demo.testI("iLogin"," KEY"," VALUE",5)=""
^demo.testI("iLogin"," KEY"," VALUE",6)=""

Ok, now we have both the key values and the element values themselves. How can it help us in the future?

For example, in the previously proposed option with passwords, besides the passwords themselves, you can store the date up to which the password will be valid, or something else. And then in our request this fact can be used as follows:

select * from demo . test where for some% element ( Login ) ( % key = '-' and % value = '111' )

Where and what to store - you decide, the only thing you need to remember is that the keys are unique, but the values are not.
In addition, in the "collection" index as well as in the standard index, you can store additional data:

Index iLogin On (Login (KEYS), Login (ELEMENTS)) [ Data = ( Login , Login (ELEMENTS))];

With this option, the above query will no longer access the data, but take everything from the index, which will also save time.

Date (time, etc.)

It would seem, but what do dates have to do with collections? The most direct, because you often have to do a search only by day, month or year. A regular index will not help us here, but the “collection” one will be very relevant.

Let's consider the following example:

Class demo.test Extends% Persistent
{

Index iBirthDay On (BirthDay (KEYS), BirthDay (ELEMENTS));

Property BirthDay As% Date ;

ClassMethod BirthDayBuildValueArray (
value ,
ByRef array ) As% Status
{
If value = "" {
Set array (0) = value
} else {
Set d = $ zd ( value , 3)
Set array ( "yy" ) = + $ p ( d , "-" , 1)
Set array ( "mm" ) = + $ p ( d , "-" , 2)
Set array ( "dd" ) = + $ p ( d , "-" , 3)
}
Quit $$$ OK
}

ClassMethod Fill ()
{
& sql ( truncate table demo . test )
& sql ( insert into demo . test ( birthday )
select { d '2000-01-01' } union all
select { d '2000-01-02' } union all
select { d '2000-02-01' } union all
select { d '2001-01-01' } union all
select { d '2001-01-02' } union all
select { d '2001-02-01' }
)

ZWrite ^ demo.testD
ZWrite ^ demo.testI
}

}

The contents of our table after filling will take the following form:

ID	Birthday
one	01/01/2000
2	01.02.2000
3	02/01/2000
four	01.01.2001
five	01.02.2001
6	01.02.2001

Now it is very simple, and most importantly very quickly , you can do a search for specific parts of the date, for example, to output all February birthdays:

select * from demo . test where for some% element ( BirthDay ) ( % key = 'mm' and % value = 2)

Result:

ID	Birthday
3	02/01/2000
6	01.02.2001

Simple list

In Caché, there is a special data type for a simple list ( % List ), which can be used instead of a string, if there are difficulties with the choice of a separator.
Using such a field is not much different from working with a string.

Consider a small example:

Class demo.test Extends% Persistent
{

Index iList On List (ELEMENTS);

Property List As% List ;

ClassMethod ListBuildValueArray (
value ,
ByRef array ) As% Status
{
If value = "" {
Set array (0) = value
} else {
Set ptr = 0
While $ ListNext ( value , ptr , item ) {
Set array ( ptr ) = item
}
}
Quit $$$ OK
}

ClassMethod Fill ()
{
& sql ( truncate table demo . test )
& sql ( insert into demo . test ( list )
select null union all
select $ LISTBUILD ( 'a' ) union all
select $ LISTBUILD ( 'b' , 'a' ) union all
select $ LISTBUILD ( 'b' , 'b' ) union all
select $ LISTBUILD ( 'a' , 'c' , 'b' ) union all
select $ LISTBUILD ( 'a ,,' , null , null )
)

ZWrite ^ demo.testD
ZWrite ^ demo.testI
}

}

The contents of the table in the ODBC display mode will look like this:

Note: Caché has three modes of data presentation: logical, ODBC and data display options .

ID	List
one	(null)
2	a
3	b, a
four	b, b
five	a, c, b
6	"A ,," ,,

In this case, the element separator is not involved, so we can use any characters in the elements.

When outputting a field of type % List in ODBC mode, the ODBCDELIMITER parameter is used as a delimiter, which by default is equal to ",".

For example, with this field, our table will look like this:

Property List As% List ( ODBCDELIMITER = "^" );

ID	List
one	(null)
2	a
3	b ^ a
four	b ^ b
five	a ^ c ^ b
6	a ,, ^^

Search for items is no different from the delimited string:

select * from demo . test where for some% element ( List ) ( % value = 'a ,,' )

ID	List
6	"A ,," ,,

It is worth noting that the option with % INLIST does not yet use “collection” indexes, and therefore it will be slower than the above proposed:

select * from demo . test where 'a ,,' % inlist List

Collection list

Let's rewrite the example above, but instead of a simple list, we use the list collection:

Class demo.test Extends% Persistent
{

Index iListStr On ListStr (ELEMENTS);

Property ListStr As a list of% String ;

ClassMethod Fill ()
{
& sql ( truncate table demo . test )
& sql ( insert into demo . test ( liststr )
select null union all
select $ LISTBUILD ( 'a' ) union all
select $ LISTBUILD ( 'b' , 'a' ) union all
select $ LISTBUILD ( 'b' , 'b' ) union all
select $ LISTBUILD ( 'a' , 'c' , 'b' ) union all
select $ LISTBUILD ( 'a ,,' , null , null )
)

ZWrite ^ demo.testD
ZWrite ^ demo.testI
}

}

In this example, almost everything is the same, but not quite. Special attention should be paid to the following things:

on the COLLATION values of our fields, keys, and index elements in the array , which undergo appropriate transformations before saving to the global.
Compare the values in the global index in both examples, especially the representation of a NULL value;
the BuildValueArray method is missing, so we cannot use keys, but only element values;
the type of our field is a special collection class ( % ListOfDataTypes ).

Array collection

As noted above, the collection list does not allow us to use keys. The array collection corrects this flaw.

Create the following class:

Class demo.test Extends% Persistent
{

Index iArrayStr On (ArrayStr (KEYS), ArrayStr (ELEMENTS));

Property str As% String ;

Property ArrayStr As array Of% String ;

ClassMethod Fill ()
{
& sql ( truncate table demo . test )
& sql ( insert into demo . test ( str )
select null union all
select 'aaa' union all
select 'bbb' union all
select 'bbb' union all
select 'ccc' union all
select null
)
& sql ( insert into demo . test_ArrayStr ( test , element_key , arraystr )
select 1, '0' , 'test1' union all
select 1, '1' , 'test2' union all
select 1, '2' , 'test3' union all
select 2, '0' , 'test1' union all
select 2, '1' , 'test2' union all
select 2, '2' , 'test3' union all
select 3, '-' , '111' union all
select 3, '5.4' , '222' union all
select 3, 'fg' , '333' union all
select 4, '-' , '111' union all
select 4, '5.4' , '222' union all
select 4, 'fg' , '333' union all
select 5, 'key' , 'value' union all
select 6, 'key' , 'value'
)

ZWrite ^ demo.testD
ZWrite ^ demo.testI
}

}

Here you need to make a few explanations:

our data is still stored in two globals: ^ the name of the class D (the data itself) and ^ the name of the class I (the indices);
having one class, however, we already have two tables: as usual demo.test and one auxiliary demo.test_ArrayStr ;
The demo.test_ArrayStr table provides convenient SQL access to the array data and has the following fields, some of which are predefined in advance:
- element_key - key value (the field name is predefined);
- ArrayStr - element value;
- test - a link to the parent demo.test table;
- ID - service primary key, which has the format test || element_key (the field name is predefined);
the type of our field is a special collection class ( % ArrayOfDataTypes ).

So, the contents of our tables after the Fill () method will be as follows:

Demo.test table

ID	str
one	(null)
2	aaa
3	bbb
four	bbb
five	ccc
6	(null)

Table demo.test_ArrayStr

ID	test	element_key	ArrayStr
1 \|\| 0	one	0	test1
1 \|\| 1	one	one	test2
1 \|\| 2	one	2	test3
2 \|\| 0	2	0	test1
2 \|\| 1	2	one	test2
2 \|\| 2	2	2	test3
3 \|\| 5.4	3	5.4	222
3 \|\| -	3	-	111
3 \|\| fg	3	fg	333
4 \|\| 5.4	four	5.4	222
4 \|\| -	four	-	111
4 \|\| fg	four	fg	333
5 \|\| key	five	key	value
6 \|\| key	6	key	value

It seems that having instead of one table two, we are now forced to do JOIN between them, but it is not.

Given the object extensions for SQL provided by the Caché DBMS, our test query, which displays the str field from demo.test for strings with the "-" key and the value of the "111" element, will take the following form:

select test ID , test -> str from demo . test_ArrayStr where element_key = '-' and arraystr = '111'

or such

select % ID , str from demo . test where test_ArrayStr -> element_key = '-' and test_ArrayStr -> arraystr = '111'

Result:

ID	str
3	bbb
four	bbb

As you can see, nothing complicated and no JOIN , since all our data is actually stored in one global and Caché knows about the “relatedness” of these tables. Therefore, the fields can be referenced from both tables. The test_ArrayStr field in the demo.test table does not actually exist, although we can access the related table through it.

Finally

The indexing mechanism described here is widely used in some system classes, such as, for example, % Stream.GlobalCharacterSearchable , which provides for the indexing of a text stream and searching it through SQL. The article does not specifically cover examples of indexing collections into classes, because of their wide variety: embedded, stored, streams, custom, with collections of collections and others. In addition, it is not always convenient and efficient to work with many of them through SQL, because the author does not see any particular need for such collections except for some very rare cases. Also, full-text search is not covered here, since this is another area with its own indexes and approach to work through SQL. In addition, examples of using properties such as SqlListType and SqlListDelimiter were omitted , but an inquisitive reader, I hope, can independently try them in action.

Useful links:

Source: https://habr.com/ru/post/182520/

All Articles

Indexing nonatomic attributes

Introduction

Delimited string

Hiding sensitive data

Hiding sensitive data (continued)

Date (time, etc.)

Simple list

Collection list

Array collection

Finally

More articles: