⬆️ ⬇️

Questions about indexes that you do not need to ask





After answering 14 questions about the indexes that you were embarrassed to ask , I had a lot more comments, clarifications and corrections. To compile from all this article looked like a venture with a minimum of benefit. And it made me wonder, and why should we be “ashamed to ask” such questions at all? Shame not to know? Is there a way to figure it out without driving yourself into the paint? There is. And he will relieve from numerous inaccuracies that abound in many "answers." You will feel literally every byte of your base with your fingertips.



To do this, I propose to “raise the hood” from SQL Server and plunge into the sweet world of hexadecimal dumps. It may be that everything inside is much easier than you thought.



Preparing a workplace



In my experiments, I will use the free SQL Server 2014 Express Edition along with the usual Management Studio. But almost all of the knowledge gained is rooted in SQL Server 7.0, originally from the nineties, and it will not be difficult to use the acquired skills in old installations.

')

For most of the examples, we will need the included 3604. trace flag. I will not replicate it in each of them, but I assume that it is always on. Just remember - if neither the dump nor the errors are visible, then add to the beginning of your code:



DBCC TRACEON(3604) GO 


Some commands require an explicit database name. In my case, it's “Lab.” If you have chosen a different identifier, do not forget to make the appropriate changes. The same applies to some other values. For example, the physical addresses of the pages may well be different. I will try to remind you of this, but try to keep abreast of your code.



Server responses often return a lot of fields or other data, of which only a small part of them will be of lively interest. Since this is not about the SELECT command - adjusting the output is not easy. Therefore, I will periodically cut off the excess, here and there. Do not be surprised if you see a lot more data than me.



Page organization



In SQL Server, the basic structure for organizing data is a page of 8192 bytes (8KB or 8KiB, as you like). It can be addressed by specifying the file code (FID - File ID), of which it is part (below we will see it in sys.database_files), and the page code (PID - Page ID) in this file. We will write this address in the form of “FID: PID”. For example, 1: 142 would mean page with code 142 in file with code 1.



 SELECT FID = file_id, name, physical_name FROM sys.database_files 


 FID name physical_name ----------- --------------- ------------------- 1 Lab D:\Lab.mdf 2 Lab_log D:\Lab_log.ldf 


Pages of both cluster and ordinary indexes are organized in the form of trees (B-Tree). Like any tree, it has a root (root node), leaves (leaf nodes) and intermediate nodes (intermediate nodes, but we can call them branches, continuing the vegetable analogy). To make it easy to distinguish these elements from each other without visualizing, there is the concept of an index level. Zero value means the level of leaves, the maximum - the root of the tree. All that is between the branches.







Let's create a simple table for 1000 entries with a clustered index. The first column will be a simple auto-increment field, the second will be the same value, but with a minus sign, the third will be a binary value 0x112233445566778, which is convenient to use as a marker in the dump.



 CREATE TABLE [dbo].[ClusteredTable] ( [ID] [int] IDENTITY(1,1) PRIMARY KEY CLUSTERED, [A] AS (-[ID]) PERSISTED NOT NULL, [B] AS (CONVERT([binary](8), 0x1122334455667788)) PERSISTED NOT NULL ) GO INSERT INTO [dbo].[ClusteredTable] DEFAULT VALUES; GO 1000 


Now we will use the undocumented, but immortal DBCC IND command for paging autopsies. The first argument is to specify the name or code of the database where the table is located. The second is the name of this table (or the code of the corresponding object), the third is the index code. There is a fourth optional parameter - partition code. But it does not interest us in this context, in contrast to the index code, which I will discuss in more detail. A value of 0 indicates that the Heap level is being requested. In essence, this is a data layer with no indexes. Number 1 is reserved for the cluster index (displayed along with the data, as in Heap). In all other cases, you simply specify the code of the corresponding index. Let's look at the cluster index of our table.



Since 2012, the DMV function sys.dm_db_database_page_allocations has been introduced. It is somewhat more convenient to use, gives more detailed information and has a lot of other advantages. All our examples can be easily reproduced using it.



 DBCC IND('Lab', 'ClusteredTable', 1) GO 


 PageFID PagePID IndexID PageType IndexLevel ------- ----------- ----------- -------- ---------- 1 78 1 10 NULL 1 77 1 1 0 1 79 1 2 1 1 80 1 1 0 1 89 1 1 0 1 90 1 1 0 


As a server response, we get a list of index pages found. The first columns are already familiar to us FID and PID, which together give the address of the page. IndexID once again confirms that we are working with a cluster index. The page type (PageType) with a value of 1 is data, 2 is index, 10 is IAM (Index Allocation Map), which has an indirect relation to indexes and we will ignore it. Another friend of ours is the index level (IndexLevel). By its maximum value, we see the root of the clustered index - page 1:79. According to zero values ​​- leaves 1:77, 1:80, 1:89, 1:90.



Page structure



Mentally, we can already draw a tree of pages. But this is easy to do only when we have two levels. And they could be much more. Therefore, researching the page itself is indispensable. We start with the root and use another undocumented, but the same immortal command - DBCC PAGE.



As the first argument, as well as DBCC IND, it takes the name or code of the base. Next comes a pair of FID and PID of the page you are looking for. The last value is the output format of the following available values:





In order not to explain the meaning of some terms in the abstract, just take a look at the root page dump (do not forget to substitute your page addresses):



 DBCC PAGE('Lab', 1, 79, 2) GO 


 PAGE: (1:79) ... 000000001431A000: 01020001 00800001 00000000 00000b00 00000000 .................... 000000001431A014: 00000400 78000000 6c1f8c00 4f000000 01000000 ....x...l.?.O....... 000000001431A028: 23000000 cc000000 0e000000 00000000 00000000 #...I............... 000000001431A03C: 9dcd3779 01000000 00000000 00000000 00000000 .I7y................ 000000001431A050: 00000000 00000000 00000000 00000000 06000000 .................... 000000001431A064: 074d0000 00010006 44010000 50000000 01000687 .M......D...P......? 000000001431A078: 02000059 00000001 0006ca03 00005a00 00000100 ...Y......E...Z..... 000000001431A08C: 00002121 21212121 21212121 21212121 21212121 ..!!!!!!!!!!!!!!!!!! ... 000000001431BFE0: 21212121 21212121 21212121 21212121 21212121 !!!!!!!!!!!!!!!!!!!! 000000001431BFF4: 21212121 81007600 6b006000 !!!!..vk`. OFFSET TABLE: Row - Offset 3 (0x3) - 129 (0x81) 2 (0x2) - 118 (0x76) 1 (0x1) - 107 (0x6b) 0 (0x0) - 96 (0x60) 


At this stage, I want to show the structure of the page as a whole, so I cut out most of the dump that does not affect this.



It all starts with a 96-byte header, the end of which I noted in the output above with a break, so that it is more readable. In everyday life it is not there. Behind the heading are records with data called “slots” (I did not begin to translate this term). More precisely, they try to go after the title. But the data in the database - a variable value. Tuples are added, deleted, updated. Therefore, they can be placed not immediately after the title, but completely in a chaotic order, with an interval after the title, and between the slots too.



In order to control their current location, at the very end of the page is the index of these slots. Each element occupies two bytes, stores the slot offset on the page as little-endian (that is, the latest value 0x6000 in our dump we read byte by bye backwards - 0x0060) and are numbered from the end. The most recent is slot 0, in front of it slot 1 and so on. The index of the slots and the data seem to move towards each other from different sides of the page. The decryption of their index is given in the dump itself after the OFFSET TABLE header. Compare it with the index dump at the end of the page - 81007600 6b006000.







Record structure



Now let's take a look at the slots themselves in detail and for this we apply the output mode 1 for the DBCC PAGE instruction.



 DBCC PAGE('Lab', 1, 79, 1) GO 


 ... Slot 0, Offset 0x60, Length 11, DumpStyle BYTE Record Type = INDEX_RECORD Record Attributes = Record Size = 11 Memory Dump @0x0000000014B1A060 0000000000000000: 06000000 074d0000 000100 .....M..... Slot 1, Offset 0x6b, Length 11, DumpStyle BYTE Record Type = INDEX_RECORD Record Attributes = Record Size = 11 Memory Dump @0x0000000014B1A06B 0000000000000000: 06440100 00500000 000100 .D...P..... Slot 2, Offset 0x76, Length 11, DumpStyle BYTE Record Type = INDEX_RECORD Record Attributes = Record Size = 11 Memory Dump @0x0000000014B1A076 0000000000000000: 06870200 00590000 000100 .?...Y..... Slot 3, Offset 0x81, Length 11, DumpStyle BYTE Record Type = INDEX_RECORD Record Attributes = Record Size = 11 Memory Dump @0x0000000014B1A081 0000000000000000: 06ca0300 005a0000 000100 .E...Z..... ... 


These are the same four slots that we saw before, only their binary representation is cut from the page dump and served on a platter, which obviously will facilitate their further research.



The first byte of each of them (0x06) contains information about the type of record. More precisely, only bits 1–3 (in our case, 011b = 3) are responsible for it, which can take one of the following values:



  1. 0 - data;
  2. 1/2 - forwarded / forwarding records;
  3. 3 - index;
  4. 4 - binary data (blob);
  5. 5/6/7 - ghost entries for index / data / versions.


Since in our tests we will not see the zero bit set, we can safely say that the record describes the index if the low part of the byte is 0x06.



In our case, the first byte is the key value for which the clustered index is built, followed by the page address in the slightly inverted PID: FID format. That is, for slot 3 the key ID value will be 0xca030000 (970 in decimal form after the little-endian coup), the PID is 0x5a000000 (90), the FID is 0x0100 (1). What can be translated as: "to search for records with ID from 970, go to page 1:90". For slot 2, we get ID = 0x87020000 (647), PID = 59000000 (89), FID = 0x0100 (1). And we read: "for entries with ID from 647 to 970 (which slot 3 already serves), follow page 1:89". And so, all four slots send us to four pages, each for its own range of values. We have already seen these four pages at the very beginning, when we have speculatively built a tree on the list of pages. Look at the very first output of the DBCC IND command. These are the same addresses as pages with leaf level data (PageType = 1, IndexLevel = 0).



If our table were much larger, these records would refer to the pages of the intermediate level index (Index Level of the root minus one level). And if we look for, say, the value ID = 743794, then in the root page we would get a slot, which is responsible for a sufficiently wide range of 700000-750000. It would send us to a similar page, with the exception that there we would narrow the search range to slot 743000-744000. Which, in turn, would have sent already to the page with the data where the records 743750-743800 are stored.







After talking about the data, let's go to page 1:89, serving values ​​from 657 to 970, which we just discovered.



 DBCC PAGE('Lab', 1, 89, 1) GO 


 ... Slot 0, Offset 0x60, Length 23, DumpStyle BYTE Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP Record Size = 23 Memory Dump @0x0000000011F1A060 0000000000000000: 10001400 87020000 79fdffff 11223344 55667788 ....?...yyyy."3DUfw. 0000000000000014: 030000 ... Slot 1, Offset 0x77, Length 23, DumpStyle BYTE Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP Record Size = 23 Memory Dump @0x0000000011F1A077 0000000000000000: 10001400 88020000 78fdffff 11223344 55667788 ........xyyy."3DUfw. 0000000000000014: 030000 ... ... 


The data page is not fundamentally different from the pages with an index. It's all the same title and the same slots with their index at the bottom of the page. For example, I was limited to only a couple of first entries, on the dump of which we will go through a sequence.



  1. 0x10. Record type with which you are already familiar. Bits 1–3 give us a value of 0, which indicates a data page. But the set bit 4 still pops up. It signals that the record has a NULL BITMAP bitmap, where one bit is assigned to each field of the table under study. If this bit is set, it is considered that the value in the corresponding field is NULL.
  2. 0x00. The second byte is responsible for tricky ghosted-forwarded records, starting with SQL Server 2012, and we are not interested in it now.
  3. 0x1400. All types can be classified as types with fixed length and variable. The most obvious example is binary (n) and varbinary (n). Immediately after this, the values ​​will go directly to the data with fixed-type types. And it itself indicates what offset in the slot will begin the next piece of information. The given value (namely, 20 in decimal form after a reversal from little-endian) = 2 (first bytes with types) + 2 (these bytes) + 4 (bytes to type int in the ID field) + 4 (bytes to the same type in field A) + 8 (byte on binary field B).
  4. 0x87020000 / 88020000 are the values ​​of 647 and 648, which are, nothing more than the values ​​of the ID field. Remember, earlier we wrote that slot 2 maintains records starting with ID = 647. That’s what they are.
  5. 0xFFFFFD79 / 0xFFFFFD78 - and these are already the values ​​of the field "A" (-647, -648), which we calculated using the formula -ID.
  6. 0x11223344 55667788 - a portion of the data logically completes our binary marker, which we defined for field B.
  7. 0x0300 - the number of columns. As expected - there are three (ID, A, B).
  8. 0x00 - and this is the same NULL BITMAP, which I mentioned earlier. Its length depends on the number of columns (bit per each) and is rounded to the nearest byte. In our case, one byte will be enough to store three bits (for three ID, A, B fields). Since we do not have NULL values, all bits are cleared.




Normal indexes



Now, let's create in our table a regular, non-clustered index across field A and see how it differs from the cluster index.



 CREATE INDEX [IX_ClusteredTable] ON [dbo].[ClusteredTable] ( [A] ASC ) GO 


Let's see which pages are involved and for this we indicate the code of the new index - 2.



 DBCC IND('Lab', 'ClusteredTable', 2) GO 


 PageFID PagePID IndexID PageType IndexLevel ------- ----------- ----------- -------- ---------- 1 94 2 10 NULL 1 93 2 2 0 1 118 2 2 0 1 119 2 2 1 


Very familiar picture. There are leaves-level pages, which we recognize by IndexLevel = 0. Only now they have type PageType = 2, that is, they are index pages, not pages with data. There is a root node with the same IndexLevel = 1, PageType = 2. At its address 1: 119 we will continue further research.



 DBCC PAGE('Lab', 1, 119, 1) GO 


 Slot 0, Offset 0x60, Length 15, DumpStyle BYTE Record Type = INDEX_RECORD Record Attributes = Record Size = 15 Memory Dump @0x000000001095A060 0000000000000000: 0618fcff ffe80300 005d0000 000100 ..uyye...]..... Slot 1, Offset 0x6f, Length 15, DumpStyle BYTE Record Type = INDEX_RECORD Record Attributes = Record Size = 15 Memory Dump @0x000000001095A06F 0000000000000000: 065afeff ffa60100 00760000 000100 .Z?yy¦...v..... 


And again, nothing fundamentally new. The same title, the same slots. Are these two slots that refer to the two pages (1:93, 1: 118) found above? We decompose on the shelves. The first byte with type 0x06 says that it is an index. The last 6 bytes are PID: FID, which we already know how to disassemble. After transformation into the usual format, we get exactly 1:93 (0x5d000000 0100) and 1: 118 (0x7600000000 0100). But between them we are waiting for an interesting surprise. For example, for slot 1, the values ​​0x5AFEFFFF and 0xA6010000 are nothing but -422 and 422. That is, in addition to the value of the field A, which we included in the index, it also contains the value of the cluster key ID.



The rest of the scheme is similar to that observed in a cluster index. The range from -1000 to -422 is served by page 1:93, from -422 to page 1: 118. With the only exception that the clustered index we have referred to pages with data, and where does this one lead? Let's go, we'll see.



 DBCC PAGE('Lab', 1, 118, 1) GO 


 ... m_pageId = (1:118) m_headerVersion = 1 m_type = 2 ... Slot 0, Offset 0x60, Length 12, DumpStyle BYTE Record Type = INDEX_RECORD Record Attributes = NULL_BITMAP Record Size = 12 Memory Dump @0x000000001331A060 0000000000000000: 165afeff ffa60100 00020000 .Z?yy¦...... Slot 1, Offset 0x6c, Length 12, DumpStyle BYTE Record Type = INDEX_RECORD Record Attributes = NULL_BITMAP Record Size = 12 Memory Dump @0x000000001331A06C 0000000000000000: 165bfeff ffa50100 00020000 .[?yy?...... ... 


I deliberately left one line from the heading so that we could once again see that fate brought us to the index page (page type m_type = 2). Everything else you probably already read joking. Should not confuse even a new record type - 0x16, because it is 0x10 + 0x06. The first term is bit 4 of the presence of NULL_BITMAP, which we saw on the data pages. The second is the type of index.



They are surely followed by the values ​​-422 (0x5AFEFFFF for slot 0) and -421 (0x5BFEFFFF for slot 1) from column A. We believe that the previous index level said that page 1: 118 works from -422. These values ​​are followed by cluster keys (422, 421), which also do not surprise us. And then - only the number of columns (0x0200) and NULL_BITMAP (0x00), as on data pages. Note that there are no links to the data pages themselves.



Did you know before that the usual index works through a cluster (if available, of course)? And if you clustered on a large (especially natural) key, then you have the risk of getting bloated secondary indexes? I hope no. Also, they didn’t know much of this article, because then there is a lot of sense in it :)



I want to think that now, you will be wary of statements with style: “An important characteristic of a clustered index is that all values ​​are sorted in a certain order, either increasing or decreasing” from the article mentioned at the beginning. Knowing that the location of the pages is unknown, that orderliness on the page is determined by the slots index, it is a stretch to talk only about quasi-orderliness. And better that the cluster index ... clusters. Yes, he does it perfectly. You know exactly which pages have a data cluster from X to Y.



I would like to hope that now you can, with pleasure from the work done, explore the work of the indices with Heap. Find answers to the questions: “is the GUID good for a clustered index,” “how does page split work in such cases,” “does UNIQUIFIER look like,” “is the Heap true faster for insertion?” And many others on your own. conditions. Returning to the title of the article, "you will not need to ask them."



See you soon under the hood, I hope you like this process.

Source: https://habr.com/ru/post/247949/



All Articles