📜 ⬆️ ⬇️

CombGuid. Generating SQL-Friendly Guid Server Values ​​in .net Applications

Using uuid as a primary key for tables has many advantages, one of which is the ability to get identifiers for objects created in the client application without having to contact the database server. But using uuid as a primary key also has a drawback: guids generated by the client application may not be “friendly” to the SQL server, which in turn can lead to an overhead when adding a new record. The possible increase in the cost of the insert operation results from the fact that the SQL server for storing the tables for which the primary key is specified usually uses structures known as b-trees . When adding a new record to a table, the SQL server, in accordance with the sorting according to the primary key, searches for the sheet on which the inserted record should be placed. Given the pseudo-random algorithm for generating uuid, the sorting order of the new record is also random and it is possible that the sheet on which the record should be placed is completely filled. In such cases, the SQL server has to split the sheet in two and rebuild the branches of the b-tree leading to this sheet. In order not to push the SQL server with the need to constantly rebuild the cluster index when adding new records, you can generate the primary key values ​​in an incremental sequence. One of the options for generating Guid in increasing order is to tie the sorting order generated by Guid to the current time. The similarly generated identifiers are often called CombGuid, hinting that they are made up of two halves — a pseudo-random part, like the usual Guid, and a string tied to time.

How SQL Server compares uuids

SQL Server sorts uuid values ​​in a way other than .net. Comparison is conducted by byte groups from right to left. Inside the byte group, the comparison is already left-to-right. (A byte group is a sequence bounded by the '-' character.) If you compare the two uuid values,
@ u1 = '206AEBE7-ABF0-47A8-8AA5-6FDDF39B9E4F'
and
@ u2 = '0F8257A1-B40C-4DA0-8A37-8BBC55183CAE', the output is that @ u2> @ u1, because, as mentioned above, the SQL server starts the comparison with the rightmost byte groups, where 6FDDF39B9E4F <8BBC55183CAE. More technically, bytes 9 through 15 have the greatest effect on the sorting order of uuid in databases, in descending order.

Implementing CombGuid in the Magnum Library

In our project, we use the Magnum library, part of which is the static CombGuid class with the only Generate () method that creates time-bound Guids. Magnum is an open source library hosted on GitHub . I was not lazy and looked at how the implementation of the Guid creation method looks in this library.

public static class CombGuid { static readonly DateTime _baseDate = new DateTime(1900, 1, 1); public static Guid Generate() { byte[] guidArray = Guid.NewGuid().ToByteArray(); DateTime now = DateTime.Now; // Get the days and milliseconds which will be used to build the byte string var days = new TimeSpan(now.Ticks - _baseDate.Ticks); TimeSpan msecs = now.TimeOfDay; // Convert to a byte array // Note that SQL Server is accurate to 1/300th of a millisecond so we divide by 3.333333 byte[] daysArray = BitConverter.GetBytes(days.Days); byte[] msecsArray = BitConverter.GetBytes((long)(msecs.TotalMilliseconds/3.333333)); // Reverse the bytes to match SQL Servers ordering Array.Reverse(daysArray); Array.Reverse(msecsArray); // Copy the bytes into the guid Array.Copy(daysArray, daysArray.Length - 2, guidArray, guidArray.Length - 6, 2); Array.Copy(msecsArray, msecsArray.Length - 4, guidArray, guidArray.Length - 4, 4); return new Guid(guidArray); } } 

The algorithm is pretty simple.
The number of days since January 1, 1900 is encoded in 9-10 bytes. We must not forget to re-compile the source in 2079, when the number of past days will no longer fit into two bytes. 11-15 bytes are used to encode milliseconds from the beginning of the day, for some reason divided into 3.333333. The comments in the code indicate that this operation is related to the fact that the accuracy of storing timestamps in SQL Server is 1/300 of a second. A rather strange decision, considering that in the process of generating uuid, it doesn’t matter how much the SQL server stores timestamps, we use milliseconds only to create uuid. I googled this question a little, but I understood only that Magnum library author Chris Patterson copied the CombGuid generation code from Nhibernate. As you can see here , the GenerateComb method contains the same code. To be fair, it should be noted that the division of milliseconds by 3.333333 does not have a special effect on the operation of the algorithm, it is just an extra, optional step.
')
Guid vs CombGuid. Compare the speed of insertion in the database

Finally, we came to what it all was about, compared to how many uuids generated by the Guid.NewGuid () method are slower than their counterparts created through CombGuid.Generate () in the context of inserting records into the SQL server table.
For the test, I created two scripts that create tables on the SQL server and insert 100000 rows into these tables. The first script inserts rows with Id created using the CombGuid.Generate () method into the database, the second using the Guid.NewGuid () method.

A piece of test script.
 USE [CombIdTest] GO --   DBCC DROPCLEANBUFFERS; DBCC FREEPROCCACHE; CREATE TABLE [dbo].[CombId]( [ID] [uniqueidentifier] NOT NULL, [Value] [varchar](4000) NOT NULL, CONSTRAINT [PK_CombId] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO --      begin transaction insert into CombId Values ('5cb31d3d-3793-428e-beb0-a2e4047e255c','somevalue'); insert into CombId Values ('1e905fa1-e4d4-4a2c-a185-a2e4047e255d','somevalue'); --  99998  insert commit transaction 

Before performing the insertion, the buffer caches are reset and the insertion itself is performed in one transaction in order to reduce the number of accesses to the transaction log. Each script was launched three times, the “Total execution time” parameter from the client statistics was taken as the runtime. Measurements were made on MSSQL Server 2012.

Measurement results (in milliseconds).
one23The average
Combguid2795288228602845,667
Randomguid3164312931113134,667

The advantage of a script that inserts records containing CombGuid is just over 10 percent over a script with “regular” uuid. The use of CombGuid also had a positive effect on the size of the table - its size was almost one and a half times smaller: 3.75 MB vs. 5.25 MB.

Well, a couple of questions for last

What do you use as primary keys in your databases?
If you use uuid or similar byte structures, how do you generate them?

Source: https://habr.com/ru/post/214667/


All Articles