Hi, Habr! I present to you the translation of the article
SQL Server Insert Performance for Clustered Indexes vs. Heap tablesQuestion
I have read many different documentation on modern SQL Server methods, stating that each table should have a clustered index and not be a bunch of non-clustered indexes. Most sources point out the administrative benefits of using clustered indexes. But is there any impact on performance and other positive or negative sides?
Decision
To check the performance of each option, we will create two identical tables with one difference: in one there will be a primary key created on the basis of a clustered index, and in the other a primary key will be created on the basis of a nonclustered index and the data will remain as a heap. Below is a script to create test tables and fill them with data.
The first thing I noticed at once was that the heap used more space, since table and index are separate structures. With a clustered index, data is stored in the leaves of the index and therefore uses less space. Below is a table showing the amount of space used by each object at the end of the data load.
Table | Index | Used (KB) | Reserved (KB) | Number of lines |
---|
testtable | PK__testtabl__357D0D3E3D086A66 | 257952 | 257992 | 4999999 |
testtable2 | HEAP | 256992 | 257032 | 4999999 |
testtable2 | PK__testtabl__357D0D3F2CBA35D8 | 89432 | 89608 | 4999999 |
After the primary data has been loaded, we proceed to the second stage of loading and look at the performance using the SQL Profiler, and also check the used space. Below is a script that loads an additional 100,000 records into each table.
')
To begin with, we will look at the used space, which, as expected (after all, the pages were filled to the end and nothing was deleted), increased proportionally for each object. Both tables increased in size about the same.
Table | Index | Used (KB) | Reserved (KB) | Number of lines |
---|
testtable | PK__testtabl__357D0D3E3D086A66 | 263128 | 263176 | 5099999 |
testtable2 | HEAP | 262392 | 262472 | 5099999 |
testtable2 | PK__testtabl__357D0D3F2CBA35D8 | 91216 | 91272 | 5099999 |
But the SQL Profiler shows us more interesting things. Since when inserting data into a heap, two objects need to be updated - a nonclustered index and the table itself, this also requires additional resources from the CPU, plus more read and write operations are performed at this moment than when inserting data into a table with a clustered index, therefore It will take a little longer.
Index type | CPU (ms) | Read operations | Write operations | Duration (ms) |
---|
Clustered | 3500 | 304919 | 654 | 11288 |
A pile | 3890 | 406083 | 904 | 11438 |
Now we will randomly delete some of the data and insert another 100,000 entries to check how the presence of white space on some pages will affect performance. The script that performs the removal and insertion is shown below.
Script 3
This time, SQL Profiler shows an increase in the performance gap between the two tables in favor of a clustered index.
Index type | CPU (ms) | Read operations | Write operations | Duration (ms) |
---|
Clustered | 3562 | 304859 | 653 | 10334 |
A pile | 4973 | 422142 | 7053 | 13042 |
Most likely, this difference arose from the fact that when inserting records into a heap, the DBMS will look for empty space on each page to place data in it. The reason for this is the fact that the data in the heap is not sorted, which means that the inserted records can be placed anywhere. When it comes to a table with a clustered index, when increasing the value of the primary key, the insertion is always done at the end of the table. Also pay attention to the space used by the tables. In the case of a heap, when reusing space, the amount of space occupied by the table should not increase, because we inserted as much as we deleted. Below is information about the tables, recorded after the last script, in which it is clear that the space occupied by the heap has not changed and this confirms the earlier suggestion.
Table name | Index name | Used (KB) | Reserved (KB) | Number of lines |
---|
testtable | PK__testtabl__357D0D3E3D086A66 | 268304 | 268360 | 4199999 |
testtable2 | HEAP | 262392 | 262472 | 4199999 |
testtable2 | PK__testtabl__357D0D3F2CBA35D8 | 93008 | 93064 | 4199999 |
Conclusion
In this study, we have seen that using a clustered index, as compared to a heap, gives the best results for all the categories measured: CPU, I / O, and execution time. The only side effect is that using a clustered index requires a bit more space. This space can be reduced by rebuilding the index. In any case, before making any changes to the indexes on the production system, you need to check everything on your equipment to make sure that the performance has improved.