Topology of the seventh generation of Intel Core processors (former codename Kaby Lake), which will be available in late 2016. Photo: IntelA team of researchers from the University of North Carolina and Intel have developed the
CAF hardware acceleration technology
(Core to Core Communication Acceleration Framework) , which can significantly speed up the exchange of data between the processor cores. By eliminating this bottleneck, manufacturers will finally be able to increase the number of cores in the CPU without an exponential increase in service traffic between them.
In the implementation of parallel programs, the most difficult thing is to ensure proper coordination of resources shared between processes. On modern processors, the parallel interaction between the cores is synchronized by two methods: using message transfer and through shared memory.
In the first case, a single-threaded process is started on each core, which communicates with other processes running on other cores.
')
When interacting through shared memory on each processor of a multiprocessor system, a thread of execution is launched that belongs to one process. Streams exchange data through a common memory location for this process. The number of threads corresponds to the number of processors.
The researchers propose to implement this coordination of resources
at the hardware level . In the
abstract of the prepared scientific work, they note that "interaction through shared memory by its nature includes disabilities for compliance with coherence and cache miss, which greatly increases overhead and a large amount of [extra] network traffic occurs."
Many important tasks require a large exchange of traffic between the cores, so that a significant increase in overhead costs greatly affects productivity. This also applies to the performance of the computing pipeline, which is widely used in software solutions for operating parallelism at the command level.
Hardware acceleration coordinates the work of the nuclei in parallel computing. The proposed hardware coordination of cores is much more efficient than any methods of software synchronization that are used now.
βThis approach, which we called the framework for accelerating communication between the cores (CAF), improves the data exchange speed by a factor of 2β12,β
said Yan Solihin, a professor of electrical and computer engineering at the University of North Carolina and co-author of scientific work. βIn other words, the speed of execution β from beginning to end β is at least twice as fast.β
In a prepared scientific paper, the authors carried out an analysis of the overhead (excessive network traffic between the cores), which occurs when synchronizing parallel computing with existing software methods. And offered an alternative solution.
The key element of the new framework is a hardware module for managing the Queue Management Device (QMD) queue. It is capable of performing simple computing functions and is hardware connected to the communication subsystem, that is, to NoC (a network on a chip β
mini-Internet inside the processor ).
Illustration from the article " Network on Crystal - mini-Internet inside the processor "The QMD module takes on the task of managing the queue and synchronizing the parallel interaction of the cores, without any additional software instructions on the CPU cores. It is like a hardware router on the network.
Prospective development is important in light of the constant increase in cores on modern processors. This is quite a natural process, since the increase in clock frequency has almost stopped due to natural physical limitations. Manufacturers have no choice but to parallelize the calculations.
In such conditions, synchronization of parallel interaction between the cores becomes the bottleneck, which limits the performance of the system. With the efficient routing of traffic between the processor cores, you can continue scaling the CPU architecture and create processors with tens and hundreds of cores with almost linear performance scaling.
Efficient routing of traffic between cores is a key technology that is needed to further scale up the multicore CPU architecture.In addition to accelerating the exchange of data between the cores when synchronizing parallel computing, the QMD module can be useful when aggregating data from multiple cores. Researchers believe that it will speed up the processing of some basic computational operations by up to 15%.
The scientific work "
CAF: Core to Core Communication Acceleration Framework " will be presented at the 25th conference on parallel architectures and compilation methods
PACT '16 , which will be held September 11-15, 2016 in Haifa (Israel).
The inventors are Ipeng Wang (Yipeng Wang, University of North Carolina), Ren Wang, Andrew Hedrich (Andrew Herdrich) and James Tsai (all - Intel Corp.), as well as the lead author of the scientific work Yan Solihin from the University of North Carolina and the United States National Science Foundation.
The article is included in the collection of reports
Proceedings of the 2016 International Conference on Parallel Architects and Compilation , pp. 351-362, doi: 10.1145 / 2967938.29679595. A collection of reports is likely to be distributed to conference participants and published on the Internet.