Features of using real x86 architecture registers

In this article, we will look at the experience of the author, who is confronted with the features of the implementation of real numbers at the hardware level. Many modern information technology professionals work with high levels of data abstraction. It seems that the article will open their eyes to some interesting things.

A long time ago, at the UNSET lectures (Programming in high-level languages), we were told about real numbers. The first information was superficial. He got closer acquainted with them already after he finished his studies at the university, and this acquaintance made him think hard. And this acquaintance happened after we didn’t figure out the double data type.

I got a program written in C ++ using the Borland Turbo C ++ compiler. For calculations, it used the double data type, i.e. real type double precision. At certain points in time, this very double program overflowed and dropped successfully. In the program, factorial was calculated, and the maximum factorial that can fit in double is 170! ≈ 7.3 ³⁰⁶ . The calculation of factorial 171! ≈1.2 ³⁰⁹ caused an overflow of the double data type. It is the overflow problem that led to the study of the current situation in calculations with real numbers. More on this later in the article.

Overflow of double-precision real numbers is a global problem, which consists of three components: support by a programming language, support by a compiler, and the architecture of the processor on which our program will run.
')
With programming language, everything is simple and standardized. In our ~~hated~~ favorite C ++ language there are three real data types: float, double and long double, respectively single, double and more than double precision. Moreover, the language standard says that “the type long double provides at least as much as double as”. That is, a long double must be at least double. The developers of the Borland Turbo C ++ compiler used this loophole in the standard, equating long double to double.

With x86 architecture, everything is not smooth either. For the simplest mathematical operations (addition, subtraction, multiplication, division, shifts, calculation of the mathematical functions sin, cos, etc.), the developers of processors provide the appropriate registers. Registers can be divided into those that work with integer numbers and those that work with real numbers. There are processor architectures in which there are no registers for working with real numbers. For example, ARMv7. In such cases, the time of operations on real numbers increases by several orders of magnitude, since these operations now need to be emulated programmatically using integer registers and addition, subtraction, and shift operations. Computing, for example, trigonometric functions programmatically could slow down the computation by several orders of magnitude, since such functions are approximately calculated using mathematical series.

Lyrical digression. This is exactly the problem we faced on one of the projects. It was necessary to count the number of people passing under the camera. Used an embedded solution with ARMv7 for real-time video processing. Recognized and considered past people. And image processing is the work with real numbers, which in the architecture used just did not exist. I had to switch to a more advanced hardware solution, but that's another story. Come back.

The widely used x86 architecture before the release of the processor 80486, also had no real registers. Old-timers probably remember such a thing as a mathematical coprocessor, which was installed next to a conventional processor and had a corresponding designation (8087, 80287 or 80387) and worked without active cooling and even without a radiator. The appearance of the 8087 coprocessor was the impetus for the emergence of the IEEE 754-1985 standard, we will reflect on it later.

These coprocessors added three abstract real data types, eight 80-bit registers and a bunch of assembler commands to work with them. Now, conditionally, in one measure, real numbers could be added, subtracted, multiplied, divided, and also the root can be extracted, the trigonometric function can be calculated, etc. Acceleration of calculations reached 500% on specific tasks. And there was no acceleration on word processing tasks, so they put this co-processor for $ 150 as an option. Then rarely did anyone listen to music on a computer, but the video was not for a wide user at all.

Starting with the 80486 processor series, the coprocessor was integrated into the processor itself. In addition to the Intel486SX, this processor came out later and had a disabled coprocessor. Physically from the other processors, he was not particularly different. Apparently, Intel decided to implement defective copies with errors in the coprocessor area.

Let us consider in more detail the real registers of the mathematical coprocessor. Although, in fact, it is a register of one type. Large, 80-bit, and in the presence of 8 pieces of them in the stack. But the programmer has three types of abstraction of real numbers: short (single) format (single precision), long (double precision) and extended number format (extended precision). Here the Russian translation of the terms is given from the book [1]. Characteristics of real numbers are presented in the table:

If the programmer chose to use, for example, a short format (32 bits), then the coprocessor would insert a number into an 80-bit register, perform operations on it, and then return the number back to a reduced size if, during operation, the output went beyond the short format , then returned NaN (not a number is not a number).

Further development of x86 architecture added a bunch of extensions (MMX, SSE, SSE2, SSE3, SSSE3, SSE4, SSE5, AVX, AVX2, AVX-512, etc.), and along with extensions new registers of 128, 256, 512 bits [2] , and a bunch of new assembly commands to work with them. These extensions provide the ability to work only with real numbers of single and double precision, for example, each 512-bit register is capable of operating with either eight 64-bit double-precision numbers, or sixteen 32-bit single-precision numbers.

From thinking on the subject of architecture, let us turn to compilers. In the C ++ programming language, the float data type corresponds to 32-bit real numbers of the x86 architecture, double to 64-bit, but with long double everything is much more interesting. As mentioned above, many compiler developers use the assumption of the standard and make the type long double equal to double. But the hardware x86 allows you to operate with an extended 80-bit format. And there are compilers that allow you to use them. Consider compilers in more detail.

Oddly enough, but among those ignoring the 80-bit extended data format, there are many well-known and widely used compilers, here is an incomplete list: Microsoft Visual C ++, C ++ Builder, Watcom C ++, Comeau C / C ++. But the list of compilers supporting the extended format is quite interesting: Intel C ++, GCC, Clang, Oracle Solaris Studio. Consider compilers in more detail.

In the compiler from Intel there could not be an extended format - how did the manufacturer leave his hardware without the appropriate tool? The use of the compiler is not free. The compiler is widely used in scientific calculations and in high-performance multiprocessor systems.

The free GCC compiler easily supports the extended format under the Linux operating system. With Windows everything is more interesting. There are two adaptations of the compiler for Windows operating system: MinGW and Cygwin. Both can manipulate the extended format, but MinGW uses Microsoft's runtime and this means that real numbers that exceed 64-bit double cannot be seen / output anywhere. With Cygwin, things are a little better, since porting is more complex.

Clang is similar to GCC, supports extended format.

Well, a little about Oracle Solaris Studio, previously Sun Studio. By the end of its existence, Sun has made many of its technologies available. Including your own compiler. It was originally designed for the Solaris OS with SPARC architecture processors. Later, the operating system together with the compiler was also ported to the x86th architecture. The compiler along with the IDE is available under the Linux operating system. Unfortunately, this compiler is “forgotten” and does not support the latest trends of the C ++ language.

To solve the double overflow problem voiced at the beginning of the article, after all thinking, suffering and searching, it was decided to completely rewrite the code and use the features of the GCC Cygwin compiler. The long double data type was used to store data. The performance of similar systems using 64-bit and 80-bit real numbers is different. When using 64-bit real numbers, the compiler tries to optimize everything and use the fastest "newest" extensions of x86 architecture. When switching to 80-bit numbers, the “ancient” “coprocessor” part of the architecture is activated.

Of course, it was possible to solve the overflow problem, using the software method of processing large real numbers, but then the performance drop would be significant, since the program calculated mathematical models containing trigonometric functions, root extraction and factorial calculation. Work on the calculation of the model using the extended format took about 8 to 12 hours of CPU time, depending on the input parameters.

At the end of the article, let's think a bit about the IEEE 754 [3,4,5] standard. The first version of the standard, as noted, was released thanks to the mathematical co-processor 8087. Subsequent versions of this standard were released in 1997 and 2008. It is the 2008 standard that is most interesting. It describes the real numbers of quadruple precision (quadruple, quadruple-precision floating-point format) [6]. It is this data storage format that would be best suited for the above task. But it is not implemented in the accessible processor architecture of popular computers. On the other hand, the x86 architecture has long had registers (128, 256, 512 bits) of the required size, but they are used for fast work with several numbers of single and double precision. I met on the Internet the information that Intel was going to introduce support for quad-precision in future processors, but apparently it remained only on paper.

From modern architectures that support quadruple accuracy, we can distinguish SPARC V8 and V9 architectures. Although they appeared back in 1990 and 1993, respectively, but the physical implementation of the quadruple precision appeared only in 2004. In 2015, IBM released the POWER9 CPU (ISA 3.0) specification, which has support for quadruple real numbers.

The accuracy of quadruple real numbers is redundant to a wide range of users. It is mainly used in scientific calculations. For example, in astrophysical calculations. This may explain that the IBM360 computers produced in the 70s-80s had the support of real numbers 128 bits in size, but, of course, they did not meet the modern IEEE 754 standard. They used this machine mainly in scientific calculations.

Just say a few words about the Russian developer of processors MCST. This company designs and manufactures SPARC architecture processors. But, interestingly, they first developed and released the processors of the “old” SPARC V8 architecture (MCST-R150 in 2001 and MCST R500 in 2004) without the support of the four-precision real numbers, although the new architecture of SPARC V9 was long ago. And only in 2011 they released the MCST R1000 processor with the SPARC V9 architecture with the support of the four-precision real numbers.

A few words about the IEEE 754 standard. There is an interesting article [3] on the Internet, in which, quite emotionally, the problems and shortcomings of the existing standard are described. The article [4] also describes the standard and its problems. In addition, it says about the need for new approaches in the representation of real numbers. In the two articles above, many of the shortcomings of the representation of numbers are described, for my part I will add this. In programming, there is such a term as “crutch” that is something wrong, but it does not help in the current moment in solving the current problem in the most optimal way. So, real numbers conforming to the IEEE754 standard are a crutch.

And this conclusion appeared that's why. Because there is no negative zero, converting to decimal format and back ambiguous, when working with real numbers, the programmer should always be aware of the dangerous behavior of real when approaching the permissible limits of the possible range of values, and when comparing real numbers you need to compare the range with acceptable accuracy.

Fascinating materials and sources:

Yurov V.I. Assembler. Textbook for universities. 2nd ed. - SPb .: Peter 2005
x86
Yurovitsky V.M. IEEE754-tick threatens humanity
Yashkardin V. IEEE 754 - standard binary floating point arithmetic
Wikipedia articles on the IEEE 754 standard: one , two, and three .
Wikipedia article on quadruple precision of real numbers

Source: https://habr.com/ru/post/353182/

All Articles

Features of using real x86 architecture registers

Fascinating materials and sources:

More articles: