This post is the third part of the series on connection operators (be sure to read part 1 - nested loops joins , and part 2 - merge joins ). The translation of the article was prepared specifically for students of the “MS SQL Server Developer” course.
Hash Match Joins are reliable workhorses of physical connection operators.
While Nested Loops Join will fail, if there is too much data to put in memory, and Merge Join will require the input data to be sorted, Hash Match will connect any data that you submit to the input (provided that the equality predicate is executed for the connection and so far there is enough free space in your tempdb).
Watch a video on the topic on YouTube
The hash match algorithm consists of two stages, which work as follows:
During the first “Build” phase, SQL Server creates a hash table in memory from one of the input tables (usually the smallest of the two). The hashes are calculated based on the keys of the input data and then stored together with the string in the hash table in the appropriate block. In most cases, there is only one row of data in each block, except when:
After creating the hash table, the “Probe” stage begins (validation). In the second stage, SQL Server calculates the hash of the key for each row in the second input table and checks if it exists in the hash table created in the first stage. If there is a match for this hash, then it is checked whether the keys of the string (s) in the hash table and the rows from the second table match (this check must be performed due to possible collisions).
A common variant of the hash match algorithm occurs when the build phase fails to create a hash table that can be fully stored in memory:
This happens when there is more data than can be allocated in memory, or when SQL Server provides insufficient memory for a hash match connection.
When SQL Server does not have enough memory to store the hash table during the build phase, it continues to work, storing some blocks in memory and placing other blocks in tempdb.
During the validation phase, SQL Server connects the data rows from the second table to blocks in the build phase, which are in memory. If the block to which this string potentially matches is currently not in memory, SQL Server writes this string to tempdb for later comparison.
When matches for one block are completed, SQL Server clears this data from memory and loads the following blocks into memory. It then compares the rows of the second table (currently in tempdb) with the new blocks in memory.
As is the case with each physical connection operator in this series, details about the hash match operator can be found in Hugo Cornelis help (Hugo Kornelis) about hash match .
Knowing the intrinsic features of how the match join hash works allows us to determine what the optimizer thinks about our data and higher-level join operators, helping us focus on tuning performance.
Here are a few scenarios that should be considered the next time you see that the hash match join is used in your execution plan:
Thank you for reading the article. You might also like my twitter .
We covered this topic in the previous open lesson . We are waiting for your comments!
Source: https://habr.com/ru/post/459314/
All Articles