Indexes are the first thing that needs to be well understood in the work of SQL Server , but in a strange way, basic questions are not often asked in forums and receive not so many answers.
Rob Sheldon answers these embarrassing professional circles, questions about indexes in SQL Server : some of them we are just too shy to ask, and before asking others we first think twice.
From translatorThis post is a compilation of two articles by Rob Sheldon:
If you write queries in
T-SQL , but you don’t understand where the data comes from, then you should read this translation.
If you want to know more, then at the end of the translation I give three books from which to move on.
Used terminology in Russian translationindex | index |
heap | a bunch |
table | table |
view | representation |
B-tree | balanced tree |
clustered index | clustered index |
nonclustered index | nonclustered index |
composite index | composite index |
covering index | covering index |
primary key constraint | primary key constraint |
unique constraint | uniqueness constraint |
query | request |
query engine | query subsystem |
database | database |
database engine | storage subsystem |
fill factor | index fill factor |
surrogate primary key | surrogate primary key |
query optimizer | query optimizer |
index selectivity | index selectivity |
filtered index | filtered index |
execution plan | execution plan |
Skip reading the basics and go directly to 14 questions.')
The Basics of Indexes in SQL Server
One of the most important ways to achieve high performance in
SQL Server is to use indexes. The index speeds up the query process by providing quick access to the data rows in the table, just as the pointer in the book helps you quickly find the necessary information. In this article, I will provide a brief overview of indexes in
SQL Server and explain how they are organized in the database and how they help speed up the execution of database queries.
Index structure
Indexes are created for table and view columns. Indexes provide a way to quickly find data based on the values in these columns. For example, if you create an index on a primary key and then search for a row with data using primary key values,
SQL Server will first find the index value and then use the index to quickly find the entire row with data. Without an index, a full scan (scan) of all rows of the table will be performed, which can have a significant performance impact.
You can create an index on most columns of a table or view. The exception is mostly columns with data types for storing large objects (
LOB ), such as
image ,
text, or
varchar (max) . You can also create indexes on columns intended for storing data in
XML format , but these indexes are arranged a little differently than standard ones and their consideration is beyond the scope of this article. The article also does not
cover columnstore indexes. Instead, I focus on the indexes most commonly used in
SQL Server databases .
An index consists of a set of pages, index nodes, which are organized in the form of a tree structure - a
balanced tree . This structure is hierarchical in nature and begins with the root node at the top of the hierarchy and leaf nodes, at the bottom, as shown in the figure:

When you form a query on an indexed column, the query subsystem starts going up from the root node and gradually moves down through the intermediate nodes, with each layer of the intermediate level containing more detailed information about the data. The query subsystem continues to move through the nodes of the index until it reaches the lowest level with the leaves of the index. For example, if you are looking for the value 123 in the indexed column, then the query subsystem will first determine the page at the first intermediate level at the root level. In this case, the first page indicates the value from 1 to 100, and the second from 101 to 200, thus the query subsystem will turn to the second page of this intermediate level. Further it will be clarified that you should refer to the third page of the next intermediate level. From here, the query subsystem reads at the lower level the value of the index itself. The leaves of the index can contain both the data of the table itself, and just a pointer to the rows with data in the table, depending on the type of the index: clustered index or nonclustered.
Clustered index
A clustered index stores the actual data rows in the leaves of the index. Returning to the previous example, this means that the data string associated with the key value of 123 will be stored in the index itself. An important characteristic of a clustered index is that all values are sorted in a specific order, either increasing or decreasing. Thus, a table or view can have only one clustered index. In addition, it should be noted that the data in the table is stored in a sorted form only if a clustered index is created on this table.
A table that does not have a clustered index is called a heap.
Nonclustered index
Unlike a clustered index, the leaves of a nonclustered index contain only those columns (
key ) for which this index is determined, and also contains a pointer to rows with real data in the table. This means that an additional operation is necessary for the subquery system to detect and obtain the required data. The content of the data pointer depends on the way the data is stored: a clustered table or a heap. If the pointer refers to a clustered table, then it leads to a clustered index, using which real data can be found. If the pointer refers to a heap, then it leads to a specific identifier of the data string. Nonclustered indexes cannot be sorted in contrast to clusterized ones, however you can create more than one nonclustered indexes on a table or view, up to 999. This does not mean that you should create as many indices as possible. Indexes can both improve and degrade system performance. In addition to the ability to create multiple nonclustered indexes, you can also include additional columns (
included column ) in your index: not only the value of the indexed columns themselves will be stored on the index leaves, but also the values of these non-indexed additional columns. This approach will allow you to bypass some of the restrictions imposed on the index. For example, you can include a non-indexable column or bypass the restriction on the length of the index (900 bytes in most cases).
Types of indexes
In addition to the fact that the index can be either clustered or non-clustered, it is possible to further configure it as a composite index, a unique index or a covering index.
Composite index
Such an index may contain more than one column. You can include up to 16 columns in the index, but their total length is limited to 900 bytes. Both clustered and nonclustered indexes can be composite.
Unique index
Such an index ensures the uniqueness of each value in the indexed column. If the index is composite, then uniqueness applies to all the columns in the index, but not to each individual column. For example, if you create a unique index on the columns
NAME and
NAME , then the full name must be unique, but duplicates in the name or surname are possible separately.
A unique index is automatically created when you define a column constraint: a primary key or a uniqueness constraint:
- Primary key
When you define a primary key constraint on one or more columns, then SQL Server automatically creates a unique clustered index if the clustered index has not been created previously (in this case, a unique non-clustered index is created on the primary key) - Uniqueness of values
When you define a unique constraint on values, then SQL Server automatically creates a unique nonclustered index. You can specify that a unique clustered index be created if the clustered index has not yet been created on the table.
Covering index
This index allows a specific query to immediately get all the necessary data from the leaves of the index without additional reference to the records of the table itself.
Index Design
How useful indexes can be, how carefully they should be designed. Since indexes can take up considerable disk space, you don’t want to create indexes more than necessary. In addition, indexes are automatically updated when the data row itself is updated, which can lead to additional resource overhead and performance degradation. When designing indexes, several considerations regarding the database and its queries should be taken into account.
Database
As noted earlier, indexes can improve system performance, since they provide the query subsystem in a fast way to locate data. However, you should also take into account how often you are going to insert, update or delete data. When you change data, the indexes should also be changed to reflect the corresponding actions on the data, which can significantly reduce system performance. Consider the following guidelines when planning your indexing strategy:
- For tables that are frequently updated, use as few indices as possible.
- If the table contains a large amount of data, but their changes are minor, then use as many indices as necessary to improve the performance of your queries. However, think carefully before using indexes on small tables, since perhaps using index search may take longer than scanning all the rows.
- For clustered indexes, try to use as short fields as possible. Best practice is to use a clustered index on columns with unique values and not allowing the use of NULL. This is why the primary key is often used as a clustered index.
- The uniqueness of the values in the column affects the performance of the index. In general, the more duplicates you have in a column, the worse the index works. On the other hand, the more unique values, the higher the performance index. When possible use a unique index.
- For a composite index, take into account the order of the columns in the index. Columns that are used in WHERE expressions (for example, WHERE FirstName = 'Charlie' ) must be in the index first. Subsequent columns should be listed taking into account the uniqueness of their values (the columns with the highest number of unique values go first).
- You can also specify an index on the calculated columns, if they meet certain requirements. For example, an expression that is used to get the value of a column must be deterministic (always return the same result for a given set of input parameters).
Database Requests
Another consideration to consider when designing indexes is what queries are being performed on the database. As mentioned earlier, you should consider how often data changes. Additionally, the following principles should be used:
- Try to insert or modify as many lines in a single query as possible, rather than do it in several single queries.
- Create a nonclustered index on the columns that are often used in your queries as search conditions in WHERE and joins in JOINs .
- Consider indexing the columns used in the string search queries for exact match values.
And now, actually:
14 index questions in SQL Server that you feel shy about asking
Why can't a table have two clustered indexes?If a clustered table has many advantages, then why use a bunch?How to change the default index fill factor?Is it possible to create a clustered index on a column containing duplicates?How is the table stored if a clustered index was not created?What is the relationship between uniqueness constraints and the primary key with the table indexes?Why in SQL Server are clustered and non-clustered indexes called a balanced tree?How can an index improve query performance in general if you have to go through all these index nodes?If indexes are so wonderful, why not just create them on each column?Is it necessary to create a clustered index on a column with a primary key?So what if you indexed a view, will it still be a view?Why use a covering index instead of a composite index?Does the number of duplicates in the key column matter?Is it possible to create a nonclustered index only for a specific subset of the key column data?Why can't a table have two clustered indexes?
Want a short answer? The clustered index is the table. When you create a clustered index on a table, the storage subsystem sorts all the rows in the table in ascending or descending order, according to the index definition. A clustered index is not a separate entity like other indexes, but a mechanism for sorting data in a table and facilitating quick access to data rows.
Imagine that you have a table containing a history of sales operations. The Sales table includes information such as an order ID, product position in the order, product number, quantity, product number and order date, etc. You create a clustered index on the
OrderID and
LineID columns , sorted in ascending order, as shown in the following
T-SQL code:
CREATE UNIQUE CLUSTERED INDEX ix_oriderid_lineid ON dbo.Sales(OrderID, LineID);
When you run this script, all rows in the table will be physically sorted first by the OrderID column and then by the LineID, but the data itself will remain in a single logical block in the table. For this reason, you cannot create two clustered indexes. There can be only one table with the same data and this table can be sorted only once in a certain order.
If a clustered table has many advantages, then why use a bunch?
You're right. Clustered tables are different and most of your queries will execute better to tables with a clustered index. But in some cases, you may want to leave the tables in their natural, original state, i.e. in the form of a heap, and create only nonclustered indexes to maintain the health of your queries.
A heap, as you remember, stores data in a random order. Typically, the storage subsystem adds data to the table in the order in which they are inserted, but the subsystem also likes to move rows for more efficient storage. As a result, you have no chance to predict in what order the data will be stored.
If the query engine must find the data without the benefits of a nonclustered index, it will do a full table scan to find the rows it needs. On very small tables, this is usually not a problem, but as soon as the heap grows in size, performance drops rapidly. Of course, a nonclustered index can help by using a pointer to the file, the page and the row where the necessary data is stored - this is usually a much better alternative to scanning the table. But even in this case, it is difficult to compare with the advantages of a clustered index when considering query performance.
However, a bunch can help improve performance in certain situations. Consider a table with a large number of inserts, but rare updates or deletion of data. For example, a table that stores a log is primarily used to insert values until it is archived. In the heap, you will not see page splitting and data fragmentation, as is the case with a clustered index, because rows are simply added to the end of the heap. Too much page separation can have a significant impact on performance and in a not very good way. In general, the heap allows you to insert data relatively painlessly and you do not have to struggle with the overhead of storage and maintenance, as is the case with a clustered index.
But the lack of updating and deleting data should not be considered as the only reason. Data sampling is also an important factor. For example, you should not use a heap if you frequently perform data range queries or the requested data often must be sorted or grouped.
All this means that you should consider using a heap only when working with very small tables or your entire interaction with the table is limited to inserting data and your queries are extremely simple (and you still use nonclustered indexes). Otherwise, hold a well-designed clustered index, for example, defined on a simple incremental key field, as a widely used column with
IDENTITY .
How to change the default index fill factor?
Changing the default index fill factor is one thing. Understanding how the default factor works is another. But first, a couple of steps back. The fill factor of the index determines the amount of space on the page for storing the index at the lower level (leaf level) before starting to fill a new page. For example, if the coefficient is set to 90, then with growth the index will take 90% on a page, and then go to the next page.
By default, the value of the index fill factor in
SQL Server is 0, which is equivalent to a value of 100. As a result, all new indexes automatically inherit this setting unless you specifically specify a value other than the system default for the system or change the default behavior. You can use
SQL Server Management Studio to adjust the default value or run the
sp_configure system saved procedure. For example, the following set of
T-SQL commands sets the value of the coefficient equal to 90 (you must first switch to the advanced settings mode):
EXEC sp_configure 'show advanced options', 1; GO RECONFIGURE; GO EXEC sp_configure 'fill factor', 90; GO RECONFIGURE; GO
After changing the index fill factor, you must restart the
SQL Server service. Now you can check the set value by running sp_configure without the specified second argument:
EXEC sp_configure 'fill factor' GO
This command should return a value of 90. As a result, all newly created indexes will use this value. You can check this by creating an index and querying the fill factor value:
USE AdventureWorks2012;
In this example, we created a nonclustered index in the
Person table in the
AdventureWorks2012 database. After creating the index, we can get the fill factor value from the sys.indexes system tables. The request should return 90.
However, imagine that we deleted the index and created it again, but now we have indicated a specific fill factor value:
CREATE NONCLUSTERED INDEX ix_people_lastname ON Person.Person(LastName) WITH (fillfactor=80); GO SELECT fill_factor FROM sys.indexes WHERE object_id = object_id('Person.Person') AND name='ix_people_lastname';
This time, we added the
WITH statement and the
fillfactor option for our
CREATE INDEX index creation operation and set the value to 80. The
SELECT statement now returns the corresponding value.
Until now, everything was pretty straightforward. Where you can really get stuck in this whole process is when you create an index that uses the default coefficient value, meaning that you know this value. For example, someone clumsily picks at the server settings and he is so upset that he sets the index fill factor to 20. Meanwhile, you continue to create indexes, assuming a default value of 0. Unfortunately, you have no way to know the value of the coefficient as long as you do not create an index, then check the value, as we did in our examples. Otherwise, you will have to wait for the moment when the query performance will drop so much that you begin to suspect something.
Another problem you should keep in mind is rebuilding indexes. As with the creation of an index, you can specify the value of the index fill factor when you rebuild it. However, unlike the index creation command, the restructuring does not use the default server settings, even though it may seem so. Even more, if you do not specifically specify the value of the index fill factor,
SQL Server will use the value of the coefficient with which this index existed prior to its restructuring. For example, the following
ALTER INDEX operation rebuilds the index we just created:
ALTER INDEX ix_people_lastname ON Person.Person REBUILD; GO SELECT fill_factor FROM sys.indexes WHERE object_id = object_id('Person.Person') AND name='ix_people_lastname';
When we check the value of the fill factor, we get a value of 80, because we specified it during the last creation of the index. The default value is ignored.
As you can see, changing the value of the index fill factor is not that difficult. It is much more difficult to know the current value and understand when it is applied. If you always specifically specify the ratio when creating and rebuilding indexes, then you always know the specific result. Is that you have to take care that someone else did not screw up in the server settings again, causing the restructuring of all indexes with a ridiculously low index fill factor.
Is it possible to create a clustered index on a column containing duplicates?
Yes and no. Yes, you can create a clustered index on a key column containing duplicate values. No, the value of the key column cannot remain in a state of uniqueness. Let me explain. If you create a non-unique clustered index (non-unique clustered index) on a column, the storage subsystem adds an integer value (uniquifier) to the duplicate value to verify the uniqueness and, accordingly, to ensure the ability to identify each row in a clustered table.
For example, you can decide to create in the table with customer data a clustered index on the
LastName column storing the last name. The column contains values such as Franklin, Hancock, Washington and Smith. Then you insert the values Adams, Hancock, Smith, and again Smith. But the value of the key column must be unique, so the storage subsystem changes the value of duplicates so that they look something like this: Adams, Franklin, Hancock, Hancock1234, Washington, Smith, Smith4567 and Smith5678.
At first glance, this approach seems normal, but the integer value increases the key size, which can be a problem with a large number of duplicates, and these values will become the basis of a nonclustered index or a foreign key reference. For these reasons, you should always try to create unique clustered indexes whenever possible. If this is not possible, then at least try to use columns with very high content of unique values.
How is the table stored if a clustered index was not created?
SQL Server supports two types of tables: clustered tables that have a clustered index and heap tables or simply heaps. Unlike clustered tables, the data in the heap is not sorted in any way. In essence, this is the jumble (pile) of data. If you add a row to such a table, the storage subsystem simply adds it to the bottom of the page. When the page is filled with data, they will be added to the new page. In most cases, you will want to create a clustered index on a table in order to take advantage of the ability to sort and speed up queries (try to imagine finding a phone number in your address book that is not sorted by any principle). However, if you decide not to create a clustered index, you can still create a nonclustered index at the heap. In this case, each index line will have a pointer to the heap line. The pointer includes the file ID, page number and line number of the data.
What is the relationship between uniqueness constraints and the primary key with the table indexes?
The primary key and the uniqueness constraint ensure that the values in the column are unique. You can create only one primary key for the table and it cannot contain
NULL values. You can create several constraints on the uniqueness of the value of the table and each of them can have a single entry with
NULL .
When you create a primary key, the storage subsystem also creates a unique clustered index, in case the already clustered index was not created. However, you can override the default behavior and then a nonclustered index will be created. If a clustered index exists when you create a primary key, a unique nonclustered index will be created.
When you create a unique constraint, the storage subsystem creates a unique nonclustered index. But you can specify the creation of a unique clustered index, if it was not created earlier.
In general, the uniqueness constraint value and unique index are one and the same.
Why in SQL Server are clustered and non-clustered indexes called a balanced tree?
Base indexes in SQL Server, clustered or non-clustered, are distributed across page sets — index nodes. These pages are organized as a specific hierarchy with a tree structure called a balanced tree. At the top level is the root node, at the bottom, leaf nodes, with intermediate nodes between the top and bottom levels, as shown in the figure:
The root node provides the main entry point for queries attempting to retrieve data through an index. Starting from this node, the query subsystem initiates the transition through the hierarchical structure down to the appropriate end node containing the data., , , 82. , , 1-100. 1-100 51-100, 76-100. , , 82. , .
, ?
-, . . , , .
,
SQL Server (, ). ,
. ( , ). , , , (), ().
,
SQL Server , . , , , , . , .
, ?
. , . , ,
SELECT ,
INSERT ,
UPDATE DELETE , .
SELECT , , . ? , ,
UPDATE . , – . , , . , , , , – , , , .
DELETE . , .
INSERT , : , .
So consider the types of queries to your database when thinking about what type of indexes and how much should be created. No longer means better. Before adding a new index to a table, calculate the cost of not only basic queries, but also the amount of disk space occupied, the cost of maintaining health and indexes, which can lead to a domino effect for other operations. Your index design strategy is one of the most important aspects of implementation and should include many considerations: from the size of the index, the number of unique values, to the type of index supported by the query.Is it necessary to create a clustered index on a column with a primary key?
, . , , , , , . , , , .
, . .
The primary key of a table can be a good choice, because it uniquely identifies each row in the tables without the need to add additional data. In some cases, the best choice would be a surrogate primary key, which has not only a sign of uniqueness, but also small size, and whose values increase sequentially, which makes nonclustered indexes based on this value more efficient. The query optimizer also likes this combination of a clustered index and a primary key, because the joining of tables is faster than connecting in a different way that does not use the primary key and the associated clustered index. As I said, this is a marriage made in heaven., , , : , . , . , , . , , , . , . , .
, - ?
– , . , , , . , , , , , .
( ), . , ,
SELECT , . . , , . , , ( , , ). , — , , .
Before you can create an index on a view, it must meet several restrictions. For example, a view can refer only to base tables, but not to other views, and these tables must be in the same database. There are actually many other limitations, so be sure to consult the SQL Server documentation for all the dirty details.Why use a covering index instead of a composite index?
-, , . , . , , , , . , , , , .
, , , . , .. ( ). . , .
. , . , ,
OrderID OrderDate in the Sales table : SELECT OrderID, OrderDate FROM Sales WHERE OrderID = 12345;
You can create a composite nonclustered index on both columns, but the OrderDate column only adds to the overhead of maintaining the index, but it still cannot serve as a particularly useful key column. The best solution would be to create a covering index with the OrderID key column and the optional OrderDate column included : CREATE NONCLUSTERED INDEX ix_orderid ON dbo.Sales(OrderID) INCLUDE (OrderDate);
In doing so, you avoid the drawbacks that arise when indexing redundant columns, while at the same time retaining the advantages of storing data on the leaves when performing queries. The included column is not part of the key, but the data is stored on the end node, the index sheet. This can improve query performance without incurring any additional costs. In addition, the columns included in the covering index are less constrained than the key columns of the index.Does the number of duplicates in the key column matter?
, . : , .
, . , . ,
FirstName LastName , John Doe, John Doe, John Doe.
The coefficient of uniqueness of key column values is called index selectivity. The more unique values, the higher the selectivity: the unique index has the highest possible selectivity. The query subsystem is very fond of columns with a high selectivity value, especially if these columns participate in the WHERE selection conditions of your most frequently performed queries. The higher the index selectivity, the faster the query subsystem can reduce the size of the resulting data set. The downside, of course, is that columns with a relatively small number of unique values are rarely good candidates for indexing.Is it possible to create a nonclustered index only for a specific subset of the key column data?
, . , , , . , « » , ,
SQL Server 2008 , , . , .. , , , — . . .
, .
CREATE INDEX WHERE . , , NULL, :
CREATE NONCLUSTERED INDEX ix_trackingnumber ON Sales.SalesOrderDetail(CarrierTrackingNumber) WHERE CarrierTrackingNumber IS NOT NULL;
, , , . , ..
SQL Server , , , .
, , . , , . -. .