Have you changed $ Increment to $ Sequence?

If it were a Twitter post, it would be: “ Caché ObjectScript programmers! Use $ Sequence instead of $ Increment to generate an Id . " But here Habr, therefore it is necessary to develop a thought - welcome under cat.

A small digression for readers who have nothing to change, and they see the word $ Increment for the first time. $ Increment is a built-in function in Caché ObjectScript - an atomic operation. The argument of the $ Increment function can only be a variable - not an expression. $ Increment implicitly blocks the variable, increases its value by 1, unlocks the variable, returns the new value. $ Increment is widely used when you need to assign a numeric identifier of the counter type to new objects or records, in such cases the function argument is the name of the global. It looks like this:

for i = 1: 1: 10000 {
set id = $ Increment (^ Person)
set surname = ## class ( % PopulateUtils ). LastName () ; random surname
set name = ## class ( % PopulateUtils ). FirstName () ; random name
set ^ Person ( id ) = $ ListBuild ( surname , name )
}

What is $ Sequence ? This function appeared in version 2015.1, like $ Increment, it performs an atomic operation and returns the value of its argument increased by 1. ~~Unlike $ Increment , the $ Sequence argument can only be a global (not a local variable)~~ . When the process first accesses $ Sequence from a certain global, $ Sequence caches a set of returned values and returns the values from the cache on subsequent calls. The value of the global is increased by the number of cached values. When the values in the cache end, $ Sequence caches the new set, again increasing the value of the global. $ Sequence automatically determines the number of values that need to be cached. The more often the process accesses $ Sequence , the more values will be cached:
')

USER>set $Sequence(^myseq)="" USER>for i=1:1:15 {write "increment:",$Seq(^myseq)," allocated:",^myseq,! } increment:1 allocated:1 increment:2 allocated:2 increment:3 allocated:4 increment:4 allocated:4 increment:5 allocated:8 increment:6 allocated:8 increment:7 allocated:8 increment:8 allocated:8 increment:9 allocated:16 increment:10 allocated:16 increment:11 allocated:16 increment:12 allocated:16 increment:13 allocated:16 increment:14 allocated:16 increment:15 allocated:16

You see, when $ Sequence (^ myseq) returned 9, the next 8 values (up to 16) were already cached for the current process. A parallel process referring to $ Sequence (^ myseq) would get the value 17.

$ Sequence is intended for use in processes that in parallel increase the same global value. Since $ Sequence caches values in chunks, there may be gaps in the order of identifiers if the process did not use all the values allocated to it. Actually, the main purpose of the $ Sequence function is to generate unique counter values. $ Increment in this sense is a slightly more general function.

To compare $ Increment and $ Sequence , run a small example:

Class Habr.IncSeq.Test
{
ClassMethod filling ()
{
lock + ^ P: "S"
set job = $ job
for i = 1: 1: 200000 {
set id = $ Increment (^ Person)
set surname = ## class ( % PopulateUtils ). LastName ()
set name = ## class ( % PopulateUtils ). FirstName ()
set ^ Person ( id ) = $ ListBuild ( job , surname , name )
}
lock - ^ P: "S"
}
ClassMethod run ()
{
kill ^ Person
set z1 = $ zhorolog
for i = 1: 1: 10 {
job .. filling ()
}
lock ^ P
set z2 = $ zhorolog - z1
lock
write "done:" , z2,!
}
}

The run method runs ten processes, each of which inserts 200,000 entries into the ^ Person global. Blocking on global ^ P is needed only so that the parent process waits for the end of the work of the child processes. Therefore, he tries to get an exclusive lock on global ^ P , but he will get it only when all child processes complete their work and remove the shared lock; Immediately after this, we read the time cut-off ( $ zhorolog ) again, remove the received lock on ^ P and see how many seconds the insertion of records took. On my quad-core laptop, the execution of the run method took 21 seconds (for the bore, I’ll say it was the fifth launch of the same method):

 USER>do ##class(Habr.IncSeq.Test).run() done:21.40948

It is interesting to know what this 21 seconds left for. Having started ^% SYS.MONLBL (about which, by the way, there was an article on Habré), we see the following picture:

  ; ** Source for Method 'filling' ** 1 10 .000433 lock +^P:"S" 2 10 .000013 set job = $job 3 10 .000038 for i=1:1:200000 { 4 1999991 13.222959 set id = $Increment(^Person) 5 1997246 7.029486 set surname = ##class(%PopulateUtils).LastName() 6 1995420 4.766967 set name = ##class(%PopulateUtils).FirstName() 7 1999680 208.226093 set ^Person(id) = $ListBuild(job, surname, name) 8 1999790 1.69106 } 9 10 .000205 lock -^P:"S" ; ** End of source for Method 'filling' ** ; ; ** Source for Method 'run' ** 1 1 .01005 kill ^Person 2 1 .000003 set z1 = $zhorolog 3 1 .000004 for i=1:1:10 { 4 10 .056381 job ..filling() 5 0 0 } 6 1 26.244814 lock ^P 7 1 .000003 set z2 = $zhorolog - z1 8 1 .000006 lock 9 1 .000009 write "done:",z2,! ; ** End of source for Method 'run' **

The first column in the report ^% SYS.MONLBL is the line number in the method, the second is the number of executions of this line, the third is how many seconds this line has been executed.

A total of 13.2 seconds was spent on getting Id. Dividing 13.2 by the number of processes, we get that each of them spent 1.32 seconds to get a new Id, 1.1 seconds to calculate the name and surname, and 20.8 seconds to write data to the global. The total time (26.24) was 5 seconds longer due to the profiler.

Let's replace in our test (namely, in the method of filling () ) $ Increment (^ Person) to $ Sequence (^ Person) and run the test again:

 USER>do ##class(Habr.IncSeq.Test).run() done:3.324123

The result is amazing. Suppose that $ Sequence reduced the Id acquisition time, but where did 20.8 seconds go to write the data? See the results ^% SYS.MONLBL:

  ; ** Source for Method 'filling' ** 1 10 .000523 lock +^P:"S" 2 10 .000017 set job = $job 3 10 .000048 for i=1:1:200000 { 4 1911382 1.69533 set id = $Sequence(^Person) 5 1753050 3.783609 set surname = ##class(%PopulateUtils).LastName() 6 1830006 3.407867 set name = ##class(%PopulateUtils).FirstName() 7 1827874 21.544164 set ^Person(id) = $ListBuild(job, surname, name) 8 1879819 .843424 } 9 10 .00023 lock -^P:"S" ; ** End of source for Method 'filling' ** ; ; ** Source for Method 'run' ** 1 1 .010926 kill ^Person 2 1 .000004 set z1 = $zhorolog 3 1 .000004 for i=1:1:10 { 4 10 .049543 job ..filling() 5 0 0 } 6 1 5.090719 lock ^P 7 1 .000003 set z2 = $zhorolog - z1 8 1 .000007 lock 9 1 .00001 write "done:",z2,! ; ** End of source for Method 'run' **

Each process now takes 0.17 seconds instead of 1.32 to get an Id. But why on record in base 2.15 seconds are spent for process? How can this be? The fact is that globals are stored in blocks of (usually) 8 kilobytes each. Each process before changing the global (such as set ^ Person (id) = ... ) gets an internal lock on the block. If several processes attempt to change the same block, one process waits for the other to release the block. If there are ten such processes, then nine are waiting for one. If you look at the global ^ Person created with $ increment , you can see that almost never two adjacent records are created by one process:

 1: ^Person(100000) = $lb("12950","Kelvin","Lydia") 2: ^Person(100001) = $lb("12943","Umansky","Agnes") 3: ^Person(100002) = $lb("12945","Frost","Natasha") 4: ^Person(100003) = $lb("12942","Loveluck","Terry") 5: ^Person(100004) = $lb("12951","Russell","Debra") 6: ^Person(100005) = $lb("12947","Wells","Chad") 7: ^Person(100006) = $lb("12946","Geoffrion","Susan") 8: ^Person(100007) = $lb("12945","Lennon","Roberta") 9: ^Person(100008) = $lb("12944","Beatty","Mark") 10: ^Person(100009) = $lb("12946","Kovalev","Nataliya") 11: ^Person(100010) = $lb("12947","Klingman","Olga") 12: ^Person(100011) = $lb("12942","Schultz","Alice") 13: ^Person(100012) = $lb("12949","Young","Filomena") 14: ^Person(100013) = $lb("12947","Klausner","James") 15: ^Person(100014) = $lb("12945","Ximines","Christine") 16: ^Person(100015) = $lb("12948","Quine","Mary") 17: ^Person(100016) = $lb("12948","Rogers","Sally") 18: ^Person(100017) = $lb("12950","Ueckert","Thelma") 19: ^Person(100018) = $lb("12944","Xander","Kim") 20: ^Person(100019) = $lb("12948","Ubertini","Juanita")

Parallel processes tried to break through to the same block, and waited longer for their turn to write to the block than actually changed the data. In the case of $ Sequence , Id are issued in large chunks, spreading different processes across different blocks:

 1: ^Person(100000) = $lb("12963","Yezek","Amanda") // 351     12963 353: ^Person(100352) = $lb("12963","Young","Lola") 354: ^Person(100353) = $lb("12967","Roentgen","Barb")

“Everything is great,” the reader will say, but after all, with object and SQL access, Caché uses $ Increment for us to generate new Id. How to use $ Sequence ? Starting with version 2015.1, the storage parameter IDFunction defines a function that generates an Id. By default, it is “increment”, but you can change it to “sequence” (In the Studio inspector, select Storage> Default> IDFunction)

Finally:

Do not believe anything written here. I do not specifically write the characteristics of the computer and the settings of the instance of Caché, on which I ran this test - better run it yourself.

Bonus

As another test, I put together a small ECP configuration with a database server on a laptop and an application server on a virtual machine inside this laptop. Configured to display the global ^ Person to the remote (remote, not removed) base. There can be no representativeness of this test. $ Increment with ECP should be used carefully . However, here are the results:

$ Increase

 USER>do ##class(Habr.IncSeq.Test).run() done:163.781288

^% SYS.MONLBL:

  ; ** Source for Method 'filling' ** 1 10 .000503 lock +^P:"S" 2 10 .000016 set job = $job 3 10 .000044 for i=1:1:200000 { 4 1843745 1546.57015 set id = $Increment(^Person) 5 1880231 6.818051 set surname = ##class(%PopulateUtils).LastName() 6 1944594 3.520858 set name = ##class(%PopulateUtils).FirstName() 7 1816896 16.576452 set ^Person(id) = $ListBuild(job, surname, name) 8 1933736 .895912 } 9 10 .000279 lock -^P:"S" ; ** End of source for Method 'filling' **

$ Sequence

 USER>do ##class(Habr.IncSeq.Test).run() done:13.826716

^% SYS.MONLBL:

  ; ** Source for Method 'filling' ** 1 10 .000434 lock +^P:"S" 2 10 .000014 set job = $job 3 10 .000033 for i=1:1:200000 { 4 1838247 98.491738 set id = $Sequence(^Person) 5 1712000 3.979588 set surname = ##class(%PopulateUtils).LastName() 6 1809643 3.522974 set name = ##class(%PopulateUtils).FirstName() 7 1787612 16.157567 set ^Person(id) = $ListBuild(job, surname, name) 8 1862728 .825769 } 9 10 .000255 lock -^P:"S" ; ** End of source for Method 'filling' **

The $ Sequence function has some limitations — read the documentation before use.

Thanks for attention!

Source: https://habr.com/ru/post/263793/

All Articles

Have you changed $ Increment to $ Sequence?

Bonus

$ Increase

$ Sequence

More articles: