📜 ⬆️ ⬇️

Announced hardware support for transactional memory in Haswell

Haswell will be a very innovative tock . Last year, a description of new operations with integers in AVX became available. And this week another extension of the X86 architecture was published . Haswell will have hardware support for transactional memory! On the English-language sites discussion boils. ISN Arstechnica LWN Engadget

I think this is the most non-trivial extension of the X86 architecture in many, many years. The feature is called Transactional Synchronization Extensions (hereinafter TSX), and consists of two parts - Hardware Lock Elision (HLE) and Restricted Transactional Memory (RTM). Pay attention to the word "Restricted". That's right, there are some limitations on the volume, granularity and level of transaction nesting.

About these restrictions and how it all will work in more detail under a cat. (No pictures, boring technical text)

Software Transactional Memory (STM) is a fairly well-known concept. In a nutshell, its essence is that transactions are defined, within which all changes to any parts of memory are atomically committed if there are no conflicts with other threads, or are completely canceled in the event of a conflict. STM has two key advantages over concurrency with locks — the potentially better scalability and programming convenience (using transactions is much simpler than locks, and, unlike locks, transactions can easily be composite).
')
There are a lot of software implementations, for example, Intel STM compiler or TBoost.STM. There is a proposal to add support for transactional memory in C ++. My favorite Clojure STM is generally the only idiomatic multi-threaded shared memory mechanism.

Unfortunately, mainly due to performance issues, STM for imperative languages ​​has not gained much popularity.

Processor manufacturers have long licked at the support of Transactional Memory in the gland. This would potentially solve performance problems. For example, there was such an UltraSPARC Rock processor with hardware support for transactional memory, but, unfortunately, it did not take off. Azul and IBM also have working hardware that supports transactional memory.

How is everything in Haswell? Usually they begin to explain with HLE, then RTM goes, and then how it is arranged inside. I'll start with the insides, then go to RTM, and leave HLE for the last, so it will be more interesting.

In the implementation of the cache in Haswell, some magic will appear, which for each cache line will determine its membership in the read-set and write-set active transactions. Also, there are several new u-ops needed to start and complete a transaction, find out the reason for the rollback and to diagnose.

RTM is simply the use of these new instructions by the programmer. The XBEGIN instruction starts a transaction and sets a handler for the interrupted transaction, XEND terminates, and XABORT terminates. Naturally, if a program with RTM runs on an X86 processor that does not support TSX, #UD will occur.

HLE is much trickier inside, but just for the programmer. Imagine that there is such a TZ: using the implementation of the above-described primitives, to achieve an improvement in the performance of the already written multithreaded software with locks (you can recompile, you can not change anything.) At the same time, unlike RTM, it is required that the resulting binary be executed correctly as on processors with TSX, and without.

How was this done? In a sense, HLE can be considered a variant of rw-lock for the variable used for blocking + transaction :). XAQUIRE and XRELEASE prefixes are entered. These are not instructions, they are prefixes (like LOCK, REP)! They are ignored by a processor that does not support TSX. These prefixes are recommended to be used before taking a lock and releasing it, respectively. When taking a lock, writing to a variable does not occur; instead, it is placed in the read-set process, and the transaction begins. Thanks to this magic (or hack, if it so pleases), other processes can also execute the locked code until there is a conflict over the write-set. Then at XRELEASE and writing to the lock (which is also ignored), the transaction commits if executed without conflicts. If there were conflicts, then a return to the usual semantics takes place.

I will list the limitations of TSX.
1. Since granularity is 64 bytes, inaccurate data placement in memory along with false sharing will now cause false transaction abort ... Since read-sets and write-sets are defined in the cache, this limits the amount of memory with which the transaction can work successfully.
2. There is a bunch of conditions when transactions break off. (When an interrupt arrives, using PAUSE, X87 type instructions, context switching, debugging, etc.)
3. Nesting. There is a limit on the level of nesting. (Sorry can not say what) Interruption of a transaction for any reason at any nesting level resets all nested transactions!
If you interrupt a transaction, you can see the reason. But still it will be very fun to debug and profile all this! Of course, for performance analysis, new CPUs will appear, counting TSX related events.

Support in the software has already begun to appear, for example, Binutils is already commited . Next HLE in pthreads :), and then, I hope, use RTM in STM implementations.
As I wrote above, the details are available in a document on the Intel site.

Source: https://habr.com/ru/post/137567/


All Articles