High performance computing on the x86 architecture for an era of presence technology

Search, machine learning, data analysis, content creation and delivery, augmented and virtual reality, computer games - the list of resource-intensive tasks for computer systems is expanding every day. When we talk about the technological side of the issue, we discuss two topics.

The first is the ever-growing need for increased computing performance. Whatever the semiconductor industry suggests, end users quickly adapt to new products and again ask for more.

The second is, of course, market changes. I have been working in this field for more than 30 years, and over the past time, a huge number of changes have occurred. Let's look at just one example: the emergence of the World Wide Web and a graphical user interface turned a PC from a word processing tool into a real access portal to the whole world, which radically changed people's lives. As a result, we were able to find and share information with such ease that previously seemed inaccessible. Then the devices became portable and got a permanent connection to the network - at first they were laptops, and then smartphones and tablets. Soon there was a real explosion in the popularity of applications, and the huge amount of data that needed to be stored, processed, and analyzed required even more computational performance.
')
Four years ago, AMD began renovating its high-performance x86 core architecture to meet these growing needs. Our previous family of processors showed significant success in the field of energy efficiency, but these chips were not ready to provide the necessary level to support demanding applications. Based on this task, we created the design of the kernel, codenamed “Zen”, from scratch.

When designing Zen, we wanted to develop a new and modern core in all respects. The architecture has been optimized for higher performance, throughput and energy efficiency so that processors can cope with the most demanding applications. As a result, Zen turned out to be a much more productive solution, marking the return of AMD to the market of high-performance systems, showing the implementation of 40% more instructions per clock cycle without increasing energy consumption ¹ . This ambitious goal was achieved by us with a focused focus on performance and power consumption. The newly developed micro-architecture includes significant changes in the field of processing instructions, schemes of work of executive modules and cache memory subsystems to speed up the execution of tasks and their parallel operation. As we showed at the Hot Chips 2016 conference held at Stanford University, Zen is notable for improved branch prediction, choosing the right instructions, and working with the micro-op cache to better execute these instructions. Also, the new architecture supports a 75% greater planning depth, increasing the number of instructions, and also performs more instructions in parallel mode, thanks to an increase in the width of instructions by 50% compared with the previous generation of cores. This combination provides a huge increase in computing power per clock.

But if you have a powerful engine, you need to refuel it. We conditionally call this process “feed the beast”. And in our case, the fuel is the data and instructions received from the memory. We reworked the cache hierarchy, providing the chip with 8 MB of L3 cache, shared L2 cache for instructions and data, as well as separate cache volumes with low latency for commands and data. A single kernel can now work with the cache five times faster than in the previous architecture.

The Zen Pre-fetcher block plays a critical role for throughput and embodies one of the most complex algorithms created for processors. The pre-fetcher assumes and determines which instruction will be needed in the next cycle, based on the data about the current task. How well you implement your plans, and how quickly you can correct mistakes is a question not only of science, but also of art, and in the case of Zen we achieved impressive results in this aspect.

The significant increase in throughput in Zen compared to the previous generation of processors is explained by the transition to the parallel multithreading (SMT) architecture. This approach allows the kernel to track tasks within the program, and when a task is paused, waiting for another command or data to arrive, another task is taken for processing, which is not in standby mode. Thus, in terms of software, we get additional processor resources when the SMT mode is on.

Finally, on improving energy efficiency. Zen CPU was designed for use in various devices - from laptops with passive cooling to supercomputers - and all of them require high energy efficiency. In a world where a performance gain of 10% is considered significant, our goal of a 40% increase in performance without an additional increase in power seemed impossible at first glance. However, AMD engineers focused on the task and found new ways to reduce power consumption and optimize the micro-architecture, and also applied a more progressive clock control scheme.

The desire to improve energy efficiency was incorporated into the product from the very beginning: when creating a new design, engineers tried to save every microwatt, and each circuit was optimized in terms of power. As a result, even if a small part of the processor is not involved in active work, it is completely turned off to prevent unnecessary energy consumption, but when you overclock the clock frequency and increase the load, the processor shows a very high performance per Watt. In addition, Zen is produced using the new 14nm FinFET process technology. FinFET transistors are smaller, more economical and more productive than analogs of the previous generation. This winning solution from our industrial partner allowed us to achieve the maximum effect when implementing a new micro-architecture of computational cores. And the possibility of modifications of FinFET transistors allows you to create a wide range of solutions: from low-power with low consumption to chips for large loads with higher frequency and performance.

What results will bring all these innovations in the end, it will become clear next year, when the first products with processors based on Zen cores will appear on the market, but today we can say that laboratory tests impress us. We recently demonstrated an 8-core / 16-line desktop processor Summit Ridge and a 32-core / 64-line Naples processor for servers. Both of these processors with Zen cores allow us to look to the future with great optimism. It is also important to understand that Zen is only an intermediate step towards the future of high-performance AMD x86 computing. Our development plan includes the next generation of chips with additional improvements, and our teams are already working on new projects, because the constant changes and the upward trend in productivity continue to set the pace of development of the industry.

Mark Papermaster, Senior Vice President and CTO at AMD

^{Based on internal estimates of the performance of the xen core x86 architecture compared to the x86 "excavator" cores.}

Source: https://habr.com/ru/post/399941/

All Articles

High performance computing on the x86 architecture for an era of presence technology

More articles: