Life in the era of "dark" silicon. Part 3

Other parts: Part 1 . Part 2 .

This post is a continuation of the story "Life in an era of" dark "silicon . " In the previous part of the story was about the use of universal logic in the dark areas of silicon. This time we will consider the use of specialized logic.

"The Specialized Horseman" or the use of specialized solutions.

“We will use all of that dark silicon
area to build specialized
cores, each of them tuned for
the task at hand (10-100x more
energy efficient and only turn
on the ones we need ... ”
')
As more and more of the microprocessor transistors become "dark", the area they occupy becomes an exponentially cheaper resource in terms of heat dissipation and power consumption. One of the possible ways to use this area to improve energy efficiency through parallelization was described earlier. However, this approach has several limitations. Even in ideal conditions, only 2-2.5 times reduction in energy consumption is possible, with an increase in the occupied area by 2-3 times. In addition, not all nodes, in principle, can be parallelized, and not every program can find data parallelism ...
One of the approaches that allow more efficient use of space in exchange for energy efficiency is the use of dark silicon for the implementation of specialized units (coprocessors), each of which for a specific task is either much faster or much more energy efficient (about 100-1000 times) than general-purpose processors [one]. Well, the implementation of the necessary actions can be distributed between coprocessors and general-purpose kernels in the most preferred way. At the same time, currently unused units of the co-processors can be completely turned off to save energy.
Prospects for the comprehensive use of specialized nodes in the future are visible to the naked eye: specialized accelerators for tasks such as processing, graphics, computer vision, video coding, and others are already widespread today. These accelerators allow us to increase productivity and energy efficiency by orders of magnitude, especially for highly parallel computing.
Researchers [2] have extrapolated this trend and expressed expectations that in the near future we will see systems that for the most part consist of specialized units, rather than general-purpose kernels. In the literature, such systems are called Coprocessor Dominated Architectures, or CoDAs.
If you look at the Intel Medfield microprocessor circuit below, you can see that this “near future” is much closer than it seems :). This is the processor that is used in Intel Mint - the first smartphone based on x86 architecture. And, as can be seen, besides, actually, the processor core, the crystal contains a large variety of specialized blocks.

Intel Medfield Platform

However, the growing use of specialized nodes to deal with the problem of dark silicon leads to the fact that the developers (and not only them) faced many problems, which together became known as the “ tower-of-babel problem”. This is a reference to the modern interpretation of the biblical story about the “Babylonian pandemonium”, when, because of the confusion of languages, people stopped understanding each other and could not continue building. Due to the use of accelerators, our understanding of general-purpose computing is becoming increasingly fragmented and the increasingly blurred traditionally clear boundaries between software developers, software, and hardware, which ultimately performs the calculations. Let's give some examples.
Already, we can see how specialized languages, such as CUDA, which are actually monopolized by one company, are intended for specific hardware and cannot be transferred even to similar architectures (AMD) are gaining popularity. (CUDA has alternatives, but it's not about that)
There are problems of over-specialization of accelerators, which make them inapplicable even for tasks closely related to their main purpose. For example, there are cases when calculations with double precision (double), conducted for scientific purposes, give incorrect results on GPUs, the Floating-Point Units, which specialize in graphic tasks.
Also known problems with the introduction of development due to excessive efforts to program a heterogeneous "iron". For example, the slow growth in popularity of Sony Playstation 3 due to the difficulties of porting games and using the capabilities of the Cell processor architecture in programming.
Finally, specialized hardware nodes are at risk of obsolescence, since standards are sometimes revised (for example, the update of the JPEG standard), and changing their hardware implementation other than replacing the device will not work.

Isolation of people from the complexity of the system . All of the factors listed above suggest a potentially exponential increase in the effort required from a person for developing CoDAs and for programming them. The fight against the crisis of the Tower of Babel requires the emergence of new approaches to how specialization is expressed and how it is used in future processing systems. We need new scalable architectural solutions, everywhere using specialized nodes to minimize power consumption and maximum performance.

Overcoming the restrictions imposed on the specialization law Amdala . Amdal's law illustrates the limitation of the growth of computing system performance with an increase in the number of calculators. Its interpretation in relation to specialization and energy consumption means that if, for example, only half of the calculations can be transferred to accelerators, then energy consumption can be reduced by no more than two times (due to accelerators). This serves as an additional obstacle to specialization and makes it necessary to look for approaches that would allow saving energy on most calculations, not only regular, parallel and predictable, but also irregular.
Now research is being conducted in the field of automated generation of accelerators from program code fragments. The goal is to detect frequently used, slow or energy-intensive program sections - “hot spots”. And then, to synthesize the description of a specialized core, performing the same actions, but much faster or spending less energy. Such specialized cores are called conservation cores or c-cores .

An example of such an approach to building CoDA systems, aimed at both regular and irregular calculations, is the UCSD GreenDroid processor [3]. The approach is based on the detection of “hot spots” in the Android mobile environment and the use of hundreds of conservation cores, to which the compiler transfers the execution of the “hot” code segments. This approach allows you to achieve 8-10 fold gain in energy efficiency without additional efforts from the programmer. (Although, of course, this topic deserves a separate post :))

Lifecycle conservation cores in GreenDroid

Unlike NTV processors, with this approach, there is no need to look for additional concurrency to cover performance losses in sequential versions. As a result, c-cores are likely to be used in a wider range of tasks, including sequential tasks. However, for highly parallel loads, NTV processors may have an advantage.

At this point, I finish the story about what “dark” silicon is and what basic approaches to its use exist. Despite the fact that silicon is becoming darker with each generation of Moore's law, the future looks bright and exciting to researchers in this field. Over time, dark silicon will change the entire computing stack. And these changes will bring many opportunities for further research and improvement.
* And there is also such a "non-classical" topic related to "dark" silicon, energy and temperature issues, as system-level optimization. If this topic is of interest, there will be a continuation related to it.
UPD: here is the sequel .

Sources

1. Venkatesh, Sampson, Goulding, Garcia, Bryksin, Lugo-Martinez, S. Swanson, and MB Taylor. “Conservation cores: Reducing the energy of mature computations.” In ASPLOS, 2010.
2. N. Hardavellas, M. Ferdman, B. Falsa_, and A. Ailamaki. “Toward dark silicon in servers." IEEE Micro, 2011.
3. N. Goulding-Hotta et al. “The GreenDroid mobile application processor: An architecture for silicon's dark future." Micro, IEEE, March 2011.

Source: https://habr.com/ru/post/160919/

All Articles

Life in the era of "dark" silicon. Part 3

"The Specialized Horseman" or the use of specialized solutions.

Sources

More articles: