📜 ⬆️ ⬇️

Why a lot of JOIN in the query is bad or do not interfere with the optimizer

image

Recently faced with one application that generated queries to the database. I understand that this will not surprise anyone, but when the application began to slow down and I was asked to understand what was the reason, I was very surprised to find these requests. This is what SQL Server sometimes has to deal with:

SELECT COUNT(DISTINCT "pr"."id") FROM ((((((((((((((((("SomeTable" "pr" LEFT OUTER JOIN "SomeTable1698" "uf_pr_id_698" ON "uf_pr_id_698"."request" = "pr"."id") LEFT OUTER JOIN "SomeTable1700" "ufref3737_i2" ON "ufref3737_i2"."request" = "pr"."id") LEFT OUTER JOIN "SomeTable1666" "x0" ON "x0"."request" = "ufref3737_i2"."f6_callerper") LEFT OUTER JOIN "SomeTable1666" "uf_ufref4646_i3_f58__666" ON "uf_ufref4646_i3_f58__666"."request" = "ufref3737_i2"."f58_") LEFT OUTER JOIN "SomeTable1694" "x1" ON "x1"."request" = "ufref3737_i2"."f38_servicep") LEFT OUTER JOIN "SomeTable3754" "ufref3754_i12" ON "pr"."id" = "ufref3754_i12"."request") LEFT OUTER JOIN "SomeTable1698" "uf_ufref3754_i12_reference_698" ON "uf_ufref3754_i12_reference_698"."request" = "ufref3754_i12"."reference") LEFT OUTER JOIN "SomeTable1698" "x2" ON "x2"."request" = "ufref3737_i2"."f34_parentse") LEFT OUTER JOIN "SomeTable4128" "ufref3779_4128_i14" ON "ufref3737_i2"."f34_parentse" = "ufref3779_4128_i14"."request") LEFT OUTER JOIN "SomeTable1859" "x3" ON "x3"."request" = "ufref3779_4128_i14"."reference") LEFT OUTER JOIN "SomeTable3758" "ufref3758_i15" ON "pr"."id" = "ufref3758_i15"."request") LEFT OUTER JOIN "SomeTable1698" "uf_ufref3758_i15_reference_698" ON "uf_ufref3758_i15_reference_698"."request" = "ufref3758_i15"."reference") LEFT OUTER JOIN "SomeTable3758" "ufref3758_i16" ON "pr"."id" = "ufref3758_i16"."request") LEFT OUTER JOIN "SomeTable4128" "ufref3758_4128_i16" ON "ufref3758_i16"."reference" = "ufref3758_4128_i16"."request") LEFT OUTER JOIN "SomeTable1859" "x4" ON "x4"."request" = "ufref3758_4128_i16"."reference") LEFT OUTER JOIN "SomeTable4128" "ufref4128_i17" ON "pr"."id" = "ufref4128_i17"."request") LEFT OUTER JOIN "SomeTable1859" "uf_ufref4128_i17_reference_859" ON "uf_ufref4128_i17_reference_859"."request" = "ufref4128_i17"."reference") LEFT OUTER JOIN "SomeTable1666" "uf_ufref4667_i25_f69__666" ON "uf_ufref4667_i25_f69__666"."request" = "uf_pr_id_698"."f69_" WHERE ("uf_pr_id_698"."f1_applicant" IN (248,169,180,201,203,205,209,215,223,357,371,379,3502,3503,3506,3514,3517,3531,3740,3741) OR "x0"."f24_useracco" IN (578872,564618,565084,566420,566422,566936,567032,567260,567689,579571,580813,594452,611522,611523,615836,621430,628371,633044,634132,634136) OR "uf_ufref4646_i3_f58__666"."f24_useracco" IN (578872,564618,565084,566420,566422,566936,567032,567260,567689,579571,580813,594452,611522,611523,615836,621430,628371,633044,634132,634136) OR "uf_ufref4667_i25_f69__666"."f24_useracco" IN (578872,564618,565084,566420,566422,566936,567032,567260,567689,579571,580813,594452,611522,611523,615836,621430,628371,633044,634132,634136) OR ("uf_pr_id_698"."f10_status" Is Null OR "uf_pr_id_698"."f10_status" <> 111) AND "ufref3737_i2"."f96_" = 0 AND (("ufref3737_i2"."f17_source" Is Null OR "ufref3737_i2"."f17_source" <> 566425) AND ("ufref3737_i2"."f17_source" Is Null OR "ufref3737_i2"."f17_source" <> 566424) OR ("uf_pr_id_698"."f10_status" Is Null OR "uf_pr_id_698"."f10_status" <> 56) ) AND ("uf_pr_id_698"."f12_responsi" IN (578872,564618,565084,566420,566422,566936,567032,567260,567689,579571,580813,594452,611522,611523,615836,621430,628371,633044,634132,634136) OR "x1"."f19_restrict" IN (578872,564618,565084,566420,566422,566936,567032,567260,567689,579571,580813,594452,611522,611523,615836,621430,628371,633044,634132,634136) OR "uf_ufref3754_i12_reference_698"."f12_responsi" IN (578872,564618,565084,566420,566422,566936,567032,567260,567689,579571,580813,594452,611522,611523,615836,621430,628371,633044,634132,634136) OR "x2"."f12_responsi" IN (578872,564618,565084,566420,566422,566936,567032,567260,567689,579571,580813,594452,611522,611523,615836,621430,628371,633044,634132,634136) OR "x3"."f5_responsib" IN (578872,564618,565084,566420,566422,566936,567032,567260,567689,579571,580813,594452,611522,611523,615836,621430,628371,633044,634132,634136) OR "uf_ufref3758_i15_reference_698"."f12_responsi" IN (578872,564618,565084,566420,566422,566936,567032,567260,567689,579571,580813,594452,611522,611523,615836,621430,628371,633044,634132,634136) OR "x4"."f5_responsib" IN (578872,564618,565084,566420,566422,566936,567032,567260,567689,579571,580813,594452,611522,611523,615836,621430,628371,633044,634132,634136) OR "uf_ufref4128_i17_reference_859"."f5_responsib" IN (578872,564618,565084,566420,566422,566936,567032,567260,567689,579571,580813,594452,611522,611523,615836,621430,628371,633044,634132,634136)) AND ("uf_pr_id_698"."f12_responsi" Is Null OR "uf_pr_id_698"."f12_responsi" <> 579420) ) AND "pr"."area" IN (700) AND "pr"."area" IN (700) AND "pr"."deleted_by_user"=0 AND "pr"."temporary" = 0 

The name of the objects has been changed.

Most striking is that the same table is used many times, and the number of brackets is crazy. But not only did I not like this code, SQL Server is also not thrilled and spends a lot of resources on creating a plan for it. The request can be performed from 50 to 150 ms, and the construction of the plan can take up to 2.5 seconds. Today I will not consider options for correcting the situation, I will just say that in my case it was impossible to correct the generation of the request in the application.

I would like to consider why SQL Server has been building a query plan for so long. In any DBMS, SQL Server is not an exception, the main issue of optimization is the way of connecting tables with each other. Besides the very method of joining, the order of joining tables is very important.
')
Let's talk about the order of joining tables in more detail. In this question, it is very important to understand that the possible number of table joints grows exponentially rather than linearly . For example, for 2 tables the number of possible connection options is only 2, for 3 this number can go up to 12. A different join order may have a different query cost and the SQL Server optimizer must choose the “best” method, but with more tables, this becomes a resource-intensive task. If SQL Server starts to sort through all possible options, such a query can never be executed, for this reason SQL Server never does this and always looks for a “good enough plan” rather than the “best” one. SQL Server is always trying to find a compromise between the execution time and the quality of the plan.

Here is a vivid example of the growth in the number of connection options exponentially. SQL Server can choose different join methods (left-deep, right-deep, bushy trees).

Visually, it looks like this:

image

The table shows the possible number of connection options when increasing the number of tables:

image

You can get these values ​​yourself:

For left-deep: 5! = 5 x 4 x 3 x 2 x 1 = 120

For bushy tree: (2n – 2)! / (N – 1)!

Conclusion: Be more careful with the number of JOINs in the query and do not interfere with the optimizer. If you fail to achieve the desired result in the query, where there is a lot of JOIN, break it into several smaller queries and you will be surprised how much better the result can be.

PS Of course, we must understand that in addition to determining the order of joining tables, the query optimizer must also choose which type of connection to use, choose the method of accessing data (Scan, Seek), and so on.

Source: https://habr.com/ru/post/325738/


All Articles