Let's take a look inside Apple’s own GPU chip used in the iPhone.

For the very first Apple, iPhone, and iPad product models, the company licensed and used a PowerVR GPU chip from Imagination Technologies for graphics output. Apple even acquired about 10% of Imagination and is its largest customer, bringing about 30% of revenue. And just as Apple started with using the ARM CPU under license, and now uses its own developments, it seems to have moved from using PowerVR to developing its own GPU. It first appeared in the A8 processor used in the iPhone 6, and its descendants are in the A9 and A10 Fusion, used in the iPhone 6S and 7.

A modern GPU, such as those found inside the iPhone and iPad, has three main components that require a well-coordinated work to display the picture. The first is hardware providing graphics with fixed functions, responsible for processing API commands, rasterizing triangles, and raster output. The second is the shader core, the heart of the GPU, which executes the software shaders (vertices, geometry, pixels, and the calculation of the shaders). The latter is software; graphics driver running on the CPU and combining all the functions that control the operation of the GPU. The driver converts graphics applications written in Metal or OpenGL ES API into a set of commands for hardware with fixed functions and programmable shaders running in shader cores. One of the most important components of a driver is a compiler that creates machine code for working on shader cores.

In older generations, iron with fixed functions, shader cores and Apple driver used under license from Imagination Technologies. But over the past 6-7 years, Apple has aggressively hired graphic architects and compiler programmers and drivers from companies such as AMD, Intel, Google, and Nvidia to develop their own GPU. For example, Mike Wuerthele from Apple Insider wrote that this year about 25 people moved from Imagination Technologies to Apple. Apple's GPU, apparently, still uses iron with fixed functions from PowerVR. But based on the various evidence available in the public domain, it is clear that Apple replaced the programmable shader cores with its own, more efficient and faster. To take advantage of their use, Apple has also developed its own driver and compiler, issuing code for its architecture. The overall result of this was a unique, proprietary GPU design, despite some legacy from PowerVR. This is a world-class design with impressive speed and energy efficiency. The A9 processor has the best results in all speed measurements, and the A10 Fusion is another 40-50% faster.
')
Apple's GPU architecture documentation has never been shared. So that developers can take advantage of the GPU, they need to figure out how to write shader programs for the Metal and OpenGL compilers. At the WWDC 2016 conference, Apple engineers presented “Advanced Shader Optimization for Metal”, containing the most detailed instructions for tuning and details of the architecture of their GPUs to date. The PowerVR Series 6 GPU architecture also suffers from a lack of documentation, but Imagination Technologies shared some simple optimization instructions. Comparing the available information on these two chips, we can conclude that they are very different. In particular, Apple’s set of registers and data conversion functions are better designed for speed and efficient use of energy, and compiling them is easier.

Apple improves performance and energy efficiency with smaller registers

The OpenGL ES and Metal API mobile graphics APIs support a 16-bit, half-precision floating-point format used for counting and storing image data, which consumes less power than 32-bit single-precision calculations. Half-precision calculations in some cases lose accuracy faster than single-precision calculations. But for many applications that work with graphics, image processing and machine learning, half-precision is enough to produce the right results — especially because most displays have a dynamic range of 8 pixels per pixel to 12 bits.

The set of registers for Apple's GPU consists of 16-bit registers, ideally suited for semi-data, judging by the presentations available in the public domain [1]. Single-precision floating-point and other 32-bit data require two registers. As a result, a set of registers can store two times more 16-bit variables than 32-bit ones. Apple engineers emphasize that using half-time calculations leads to a serious increase in speed and energy savings compared to single precision, which means that their architecture is focused on using half-precision as the main concept in design.

In contrast, GPU PowerVR Series 6 and 7 use 32-bit registers and are designed for single-precision calculations, judging by instructions from Imagination Technologies [2]. In Series 6, the most frequently used instructions, FMAD, FMUL, and FADD, are able to work with half precision, but just zeroing a few bits of the source and result registers. Some instructions may work with two 16-bit SIMD elements within one register (and the Series 7 expands these possibilities to more instructions), but the SIMD version differs greatly from the scalar version using 16-bit registers. For PowerVR, storing data in 16-bit format is a waste of register memory for nothing, and the maximum number of stored variables is not automatically doubled. Therefore, the use of 16-bit data should reduce the amount of data passing through the memory and energy consumption, but will not necessarily increase the speed or efficiency of energy consumption, as is the case with Apple's GPU.

Simple conversion gives programmers access to half precision

One of the common problems with 16-bit data is that although most calculations do not cause problems with decreasing accuracy, some still require high accuracy. For example, a shader counting the color of a large block of pixels, and then counting the average, can do with 16 bits for each individual pixel, but may require the use of 32 bits when summing the data for an exact count. If converting pixel data from 16 to 32 bits is too expensive, the shader will use 32 bits to produce an accurate result.

Apple's GPU offers very fast conversion between data types, which encourages precision mixing and creates more opportunities for fast and low-consumption 16-bit calculations. According to their presentation, the conversion of data types is “free” - apparently, there is a hardware converter somewhere in the data path. From the point of view of hardware, this approach is more expensive, but, among other things, it seriously simplifies the compiler and makes the work of programmers easier.

PowerVR Series 6 and 7 can do data conversion accuracy, but, of course, not "free." The optimization instructions clearly state that each data transformation (with a decrease or increase in accuracy) is expensive, and recommends that programmers write shaders with a minimum number of transformations [3].

Apple's GPU: the difference in technology

The difference between the set of registers and the conversion of data to Apple’s GPU and to Imagination’s GPU is huge. The organization of a set of registers is the basis of the shader core, it affects the design of almost everything, from the architecture of the instruction set of shader cores to the execution of parts of the code and dispatch logic. As an example, the size of the register determines the data path and the circuit of almost all the work of the shader core. Data conversion does not affect so much, but the difference is very important for the compiler and for developers. The PowerVR Series 7 GPU is quite similar to the previous 6th generation, and uses 32-bit registers. Based on this difference, we can conclude that Apple's GPU uses its own shader cores developed by the company. And this means that Apple has developed its own shader compiler for the OpenGL ES and Metal API, and, most likely, its own graphics driver.

Even some programs that measure speed, see the difference. The results of GFXBench somehow hit the table, where the GPU for the iPhone 7 is described as G9.

But this test result from the public base was soon rubbed, and all references to the G9 disappeared.

There are many other differences between Apple GPU and PowerVR that can be detected by running special tests with Metal shaders and comparing the results with similar OpenGL ES shaders on PowerVR GPUs. Not all differences will apply to the gland. For example, Apple GPU supports OpenGL ES versions up to 3.0, and PowerVR GPU works with later ones. But such differences may occur due to the peculiarities of the software and drivers.

Strategic advantages of own design

Apple's vertical integration is unique to consumer electronics. In the case of the iPhone and iPad, the company controls almost everything - from the design of the basic processor circuits to the OS and services for Maps, iMessage and Camera users. This allows you to play on the joint work of hardware and software, which is unattainable for their competitors.

The overall trend is clear - at every step Apple increases control over the platform and the ecosystem. Initially, Apple used standard ARM processors, and most of the work was outsourced to Samsung, but eventually developed its own CPUs that are compatible with ARMv8, ahead of rivals. Similarly, the company bought Anobit and used the command and technology to create its own storage controller for flash memory. Developing your own CPU is just the next step in creating strategic advantages.

The most obvious of them - Apple's GPU is better (faster and more efficient) than its rivals, including those from ARM or Imagination, as well as Qualcomm. Leadership in speed means greater user satisfaction and less battery drain, both in the case of games and in the case of image generation and machine learning.

The Metal Performance Shaders library includes dozens of well-optimized shaders that run on the GPU and provide a rich set of tools to developers [4]. These include neural networks for classification, image processing procedures. Instagram uses the GPU for tonemapping and photo contrast enhancement. It is even possible that the apple camera uses a GPU for different effects. Half-frame performance is ideal for imaging and neural networking, and Apple's shader architecture performs better than PowerVR.

The second advantage is that Apple can create new features and fix bugs in the GPU without bringing benefits to competitors.

The third is time to market and planning. The A-series processors come out in an aggressive annual cycle set by the iPhone. Since the iPhone is a premium product, Apple must hit users with speed and show good progress to stimulate demand. As a result, Apple often becomes a leading customer of new technologies (for example, 10nm from TSMC), which means a big risk. With its own GPU, a company can decide to spend as much time and energy on achieving the goal as needed. Imagination just has less money and staff.

Apple has to neatly tie together the design, testing, manufacturing, and software ecosystem to launch the millions of phones and tablets that fall into the hands of users. The months preceding the exit consist of rabid cycles of searching and correcting errors, updating graphics software and hardware. Since the GPU is now its own, this cycle is under the direct control of Apple with a small number of external dependencies, which helps the company to keep up with the launch on time.

A hypothetical example is that if Apple engineers find a serious error in the core of the shader, they will be able to correct and check it the same day. Involvement of a third party means that, for the beginning, the party itself must assess the priority of the error, approve changes or workarounds, which may take some time due to the need to coordinate with other clients. Samsung's Exynos 5410 is a textbook example of the danger of working with third-party intellectual property. It was developed on the basis of the ARM Cortex A15 and A7 in the Big.Little configuration in order to save energy consumption, but due to the caches consistency error Samsung had to disable the power saving features. Own development greatly reduces such risks, since in this case there is no conflict of interests, and the transfer of information within the company is much easier than between companies.

The last advantage of a proprietary GPU is a reduction in dependence on suppliers, which gives the company a bargaining position in negotiations and reduces business risks. Creating alternatives for key suppliers, internal or external, is one of Apple's old principles. For example, Apple was dependent on Qualcomm and their LTE modems for the iPhone. When Intel developed a competing modem, Apple adapted it for most GSM providers. In the future, this allows you to reduce costs and creates interesting opportunities.

A proprietary GPU creates a hidden alternative to using Imagination intellectual property in the future. The company has already created a GPU development team and a driver team, which together developed most of the A8, A9 and A10 processors. If Imagination Technologies was purchased or lagged behind in the technical part, Apple could simply develop its own hardware for graphics with fixed functions to replace PowerVR.

Apple's next steps

After many years of hiring graphic architects, Apple has developed its own GPU, which is already installed on the A8, A9 and A10 processors running on iPhone 6, 6S and 7. The GPU still has PowerVR hardware, but it’s clear that shader cores are very different from those using Imagination Technologies. This means that Apple has made its own compilers for Metal and OpenGL ES, and most likely, its own driver.

Judging by the history of the company, there is nothing surprising in the development of its own GPU. In addition to the obvious advantages in speed, there are less obvious ones - improved control over the ecosystem, less time to go to the market, fewer errors.

Apple has three options ahead. Status quo - iron licensing with fixed functions from Imagination Technologies to supplement its components. In this case, Apple will upgrade to the next version of PowerVR, but, at the same time, having agreed in parallel on the best conditions and deductions. The second option is to buy Imagination Technologies. But with this, unnecessary side projects will come (MIPS line), and Apple already missed this opportunity in 2016. But Apple can continue to develop its own GPU, which as a result will lead to a predominance over Imagination Technologies. Companies will have to decide whether they can do better on their own, but so far they have been very good at achieving world-class skills in new areas.

Links

[1] Alex Kan and Fiona, Advanced Metal Shader Optimization. WWDC 2016. One and two
[2] PowerVR Series6 Compiler Instruction Set Reference. March 17, 2016
[3] PowerVR Performance Recommendations. March 17, 2016
[4] Metal Performance Shaders Framework

Source: https://habr.com/ru/post/398851/

All Articles