📜 ⬆️ ⬇️

Optimize SQL Server performance using indexes

Introduction


As you know, indexes increase productivity similarly to the table of contents or subject index in the book. After reading a few articles on the Internet and a couple of chapters from books, I would like to know how indices help increase the speed of data retrieval from SQL Server. Consider an example.
For this we need two tables linked by a foreign key. There will be 10,000 rows in the main table, 1,000,000 in the second. The first table will contain a primary key and 2 columns with text, the second table will contain a primary key, 2 text fields, 2 numeric fields, a date, a calculated column and a foreign key.

Database structure

DB structure
CREATE TABLE "maintable"( "ID" INT NOT NULL IDENTITY (1, 1) PRIMARY KEY, "name" NVARCHAR (50) NULL, "description" NVARCHAR (100) NULL ) CREATE TABLE "secondtable" ( "ID" INT NOT NULL IDENTITY (1, 1) PRIMARY KEY, "sname" NVARCHAR (50) NULL, "sdescr" NVARCHAR (100) NULL, "somefirstnumber" INT DEFAULT 0, "somesecondnumber" INT DEFAULT 1, "somedatetime" DATETIME NOT NULL, "howmanyhours" INT NULL DEFAULT 0, "newdate" AS DATEADD(HOUR, "howmanyhours", "somedatetime"), "mainID" INT NOT NULL CONSTRAINT FK_SecondToMain FOREIGN KEY ("mainID") REFERENCES "maintable" ("ID") ); --  CREATE PROCEDURE "insertintomain" @Name NVARCHAR (50), @Descr NVARCHAR (100) AS INSERT INTO "SimpleIndex"."dbo"."maintable" ("name", "description") VALUES (@Name, @Descr); GO CREATE PROCEDURE "insertintosecond" @Sname NVARCHAR(50), @Sdescr NVARCHAR(100), @Firstnumber INT, @Secondnumber INT, @SomeDT DATETIME, @Hours INT, @MainID INT AS INSERT INTO "SimpleIndex"."dbo"."secondtable" ( "sname", "sdescr", "somefirstnumber", "somesecondnumber", "somedatetime", "howmanyhours", "mainID") VALUES ( @Sname, @Sdescr,@Firstnumber, @Secondnumber, @SomeDT, @Hours, @MainID) GO 



Insert data

It is clear that just do not insert a million lines, use C #.
Create a simple console application.
 using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Data; using System.Data.SqlClient; namespace SimpleAddingRandomRecordsToDatabase { class Program { static string alphabet() { //      string alphabet = "abcdefgh ijklmnopqrst uvwxyz    "; //    Encoding enc = Encoding.UTF8; var bytes = enc.GetBytes(alphabet); alphabet = enc.GetString(bytes); return alphabet; } static int alphlen = alphabet().Length; static string getRandomString(int lenth, int rndparam, bool allowspacebars) { string str = ""; Random rand = new Random(rndparam); for (int i = 0; i < lenth; i++) { int num = rand.Next(alphlen); str = str + alphabet()[num]; } if (!allowspacebars) { str = str.Replace(" ", ""); } return str; } static void Main(string[] args) { string result; Console.WriteLine(" 1   "); result = Console.ReadLine(); if (result.Equals("1")) { //   SqlConnection cn = new SqlConnection(); SqlConnectionStringBuilder cs = new SqlConnectionStringBuilder(); cs.IntegratedSecurity = true; cs.DataSource = @"KONSTANTIN\KONSTANTINSQL"; cs.InitialCatalog = "SimpleIndex"; cn.ConnectionString = cs.ToString(); cn.Open(); //  10 000     int i2 = 0; for (int i = 0; i < 10000; i++) { SqlCommand cm = cn.CreateCommand(); cm.CommandType = CommandType.StoredProcedure; cm.CommandText = "insertintomain"; i2++; cm.Parameters.Add(new SqlParameter("Name", getRandomString(50, i2, false))); i2++; cm.Parameters.Add(new SqlParameter("Descr", getRandomString(100, i2, true))); cm.ExecuteNonQuery(); } // 1 000 000 ,     i2 = 0; for (int i = 0; i < 1000000; i++) { Random rand = new Random(); DateTime dt = DateTime.Now; dt = dt.AddHours((-1) * rand.Next(2000)); SqlCommand cm = cn.CreateCommand(); cm.CommandType = CommandType.StoredProcedure; cm.CommandText = "insertintosecond"; i2++; cm.Parameters.Add(new SqlParameter("Sname", getRandomString(50, i2, false))); i2++; cm.Parameters.Add(new SqlParameter("Sdescr", getRandomString(100, i2, true))); i2++; Random rand2 = new Random(i2); cm.Parameters.Add(new SqlParameter("Firstnumber", rand2.Next(10000))); i2++; rand2 = new Random(i2); cm.Parameters.Add(new SqlParameter("Secondnumber", rand2.Next(100))); cm.Parameters.Add(new SqlParameter("SomeDT", dt)); i2++; rand = new Random(i2); cm.Parameters.Add(new SqlParameter("Hours", rand.Next(30))); i2++; rand = new Random(i2); cm.Parameters.Add(new SqlParameter("MainID", rand.Next(9999)+1)); cm.ExecuteNonQuery(); } cn.Close(); Console.WriteLine(" ."); } } } } 


Run the console application, click "1" and wait. It took me 15-20 minutes to insert.
')

Description of Results


In this section, we will look at sample queries in indexed and non-indexed columns.
One clustered index is created implicitly - this is the primary key.

Primary Key Search

Perform a search for a row by the primary key, pointing to a specific row number of ten thousand:
 SELECT "name", "description" FROM "maintable" WHERE "ID" = 3823; 
Time spent searching: 1,523 seconds .
The second request took 0.9 seconds , and the third - 0.1 seconds .

Perform the second query with high selectivity from the same table:
 SELECT "name", "description" FROM "maintable" WHERE "ID" BETWEEN 5000 AND 6000; 
The first request took 0.3 seconds , the second 0.26 seconds , the third 0.25 seconds .

Now we will execute similar requests to the table containing one million records. For a specific line number:
 SELECT * FROM "secondtable" WHERE "ID" = 728472; 
Time spent on search: 1.4 seconds , the second query showed 1.36 seconds. . Next will be the average request time.

The primary key columns already have an index, so it’s likely that the performance increase will not work. Based on the results, we can say that repeated queries with the same parameters increase the search speed.

Pattern Search

Consider a text search by pattern with indexed and non-indexed columns:
 SELECT * FROM "maintable" WHERE "name" LIKE '%'; 
Without an index, such a query is executed in 0.172 seconds out of 10,000 rows. With index
 CREATE NONCLUSTERED INDEX iName ON "maintable" ("name") 
this is done in 0.112 seconds , and with a composite index containing the “name” and “description” fields, the request took 0.09 seconds .

Request
 SELECT * FROM "maintable" WHERE "name" LIKE '%dc%'; 
runs in 0.172 seconds with no index and 0.112 seconds with an index .

Perform a pattern search on a table containing a million entries.
 SELECT * FROM "secondtable" WHERE "sname" LIKE '%'; 
This query without indexing would take an average of 1.7 seconds , whereas with a column index it would take 0.15 seconds .

Search by date

To begin with we will execute request of small selectivity
 SELECT * FROM "secondtable" WHERE "somedatetime" BETWEEN '2012-26-11 11:30:00' AND '2012-26-11 11:32:00' 
Without an index, it takes an average of 0.7 seconds , indexed - 0.22 seconds .

Request high selectivity (420600 rows)
 SELECT * FROM "secondtable" WHERE "somedatetime" BETWEEN '2012-26-11 11:30:00' AND '2012-31-12 11:32:00' 
Without an index - 19.219 seconds , with an index of the date field only - 9.9 seconds .

Query by date on calculated column

 SELECT * FROM "secondtable" WHERE "newdate" BETWEEN '2012-26-11 11:30:00' AND '2012-31-12 11:32:00' 
This query without an index will take 12.2 seconds , whereas with an index - 9.75 . It is important to note that the original column containing the date has already been indexed.

Join tables

When combining tables, it is advised to always index a column that is a foreign key. In the following example, we will verify that this is indeed the case.
 CREATE NONCLUSTERED INDEX iFK ON "secondtable" ("mainID"); SELECT "name", "sname", "somefirstnumber" FROM "maintable" M INNER JOIN "secondtable" S ON M."ID" = S."mainID" WHERE M."ID" = 8271 
Such a request returns 101 lines. Without an index - 0.422 seconds , and with an index - 0.122 seconds . As we see, the query speed has increased more than three times. Consider the remaining queries related to the merge.

But the index does not always increase performance. For example, a query (420,000 lines)
 SELECT "name", "sname", "somefirstnumber" FROM "maintable" M INNER JOIN "secondtable" S ON M."ID" = S."mainID" WHERE S."somedatetime" BETWEEN '2012-26-11 11:30:00' AND '2012-31-12 11:32:00'; 
showed identical results without an index, with an index of only a foreign key, a foreign key and a date, with a composite index simultaneously including these two columns - approximately 6.8 seconds .

Combining with pattern matching

Perform a merge with a pattern search among a million entries:
 SELECT "name", "sname", "somefirstnumber" FROM "maintable" M INNER JOIN "secondtable" S ON M."ID" = S."mainID" WHERE "sname" LIKE '%'; 
Without indices, the query will take 0.766 seconds and return 5 rows.
With a foreign key index only - 0.4 sec. ;
With a text field index only - 0.125 sec. ;
With foreign key index and text field index - 0.109 sec. ;
With a composite index containing foreign key and text field - 0.35 sec. .

Comparison of results


Consider the results in the form of graphs. The vertical axis shows the elapsed time ( and not the speed! ).
1. Sampling on the primary key. Since there is already a clustered index, consider a large and small selective sample from one table:

From this graph it is clear that sampling a large amount of data is processed faster.

2. Search by primary key with join of tables. The following graph shows that foreign key indexing speeds up a search operation by more than three times:


3. Search by text template. From the graph it is clear that the performance grows many times, if you use an index with a large amount of data:


4. Search by date. Indices significantly increase performance for both small and large samples within the same table:

Combining tables and searching by date showed similar results with and without indices.

5. Search by calculated column. The index also reduces search time.


6. Search by pattern with association. The text field index and both indexes significantly increase performance:


Let's sum up


After analyzing the results, we can say that:


So, we see that indexing significantly increases SELECT queries, however, indexes take up additional memory and periodically need to be rebuilt. The optimal solution is difficult to find, so developers and database administrators spend a lot of time searching for the golden mean, and besides, indices need to be reorganized. But still, if you have a large amount of data, the answer, whether to use indexes or not, is obvious.

Source: https://habr.com/ru/post/164717/


All Articles