(Self) processor identification. Part two. Hairy CPUID

In the first part, I talked about the need to identify extensions present on a specific processor. This is necessary so that the executable code (operating system, compiler, or user application) can reliably determine which hardware features it can use. Also in the previous article I compared several popular general purpose CPU architectures. The identification capabilities between them vary greatly: some provide complete information about the ISA extensions, while others are limited to a couple of numbers to distinguish between the vendor and the audit.
In this part I will talk about a single instruction from the Intel IA-32 architecture - CPUID, introduced specifically for listing the extensions declared by the processor. A little bit about what happened before her appearance, what she can tell you, what surprises you can expect and what kind of software allows you to interpret her conclusion.

^{Image source: [1]}

Story

As I stated in the first part, the following tendency is present: the more “embedded” nature of the processor, the less opportunities for identification are embedded in its architecture. The creators of embedded systems for some reason do not worry about the portability of binary code.
')
Intel 8086, the microprocessor of the 1970s, which grew out of the 8008, 8080, 8085 “calculator” series, was no exception. Initially, no means of identification were incorporated into it.
Starting from 808386, information about the model, stepping and family began to be reported in the EDX register immediately after a reboot (receiving a RESET signal). The CPUID instruction encoded by the 0x0f 0xa2 bytes was entered into the 80486 processors. The presence of the CPUID could be recognized by the possibility of writing to bit 21 of the flag register. To support work on older CPUs, one had to go to very sophisticated methods in order to distinguish processors of the series from 8086 to 80386.

CPUID algorithm wars (1996)

In a paper [2] of 1996, Robert Collins proposed an algorithm for distinguishing between all IA-32 products from Intel that existed at that time. He was not satisfied with the official identification method from Intel, based only on differences in the behavior of the PUSH SP instruction (more on this below), since it was not universal. In his works, Robert offered to use the following additional tricks.

Execute instructions that are not present on all processors, catching the #UD exception. Knowing which instructions did not generate an exception, you can determine the processor family. However, for 8086/8088 such an approach would not have worked, since they did not define behavior for unsupported commands.
The PUSH SP instruction works differently on 8086 and 80286. On the first CPU, the value of the SP register falls on the stack before changing its value. At 80286, this error was fixed:
"The iAPX 286 will push you beyond the iAPX 86/88."
80186 also puts an incorrect SP value on the stack, but at the same time supports CPUID.
Writing a word (16 bits) to a segment at offset 0xffff (i.e., starting from the last byte addressed) on an 8086 processor will cause the second byte of this word to fall into memory at offset 0, while at 80186 this byte will go abroad segment at offset 0x10000.
Some Intel Pentium clones supported CPUID, but did not report this using bit 21 of the flag register, contradicting the documentation, or allowed to dynamically enable support for this instruction after loading.
Distinguishing processor models with the same “digit” (for example, between 80386 DX and 808386 SX, CX, EX, SL or Intel Pentium P5, P54C, OverDrive) also required careful consideration of differences in supported extensions.
80386 DX and SX can be distinguished by the difference in the number of modified bits in the CR0 register.
Some identification information could have been obtained using a documented series of operations on I / O ports (IN / OUT instructions).
The differences between the 80486 models could be obtained by checking for the presence of the 80487 mathematical coprocessor.

To get information about stepping on 80386, you need to read the EDX value immediately after the RESET. But at this moment the BIOS starts working, which will surely overwrite this register long before control is transferred to the user code! However, here, Robert comes up with and describes a cunning scheme with the manipulation of the infamous A20 line in order to deceive the reboot process and gain control.

The techniques listed in the original article were tested primarily on Intel CPUs. In the article, the author acknowledges that they do not allow for the reliable classification of x86 clones from other manufacturers.

Interface

For a system programmer, the job of identifying some extension usually consists of setting input values in the EAX registers (leaf, eng. Leaf) and ECX (podlist, eng. Subleaf), executing the CPUID and reading the result in four registers: EAX, EBX, ECX, EDX . The individual bit fields of the output registers will contain information about the values of the associated architectural parameters of the specific processor core.

All valid combinations of input lists-shims and four registers at the output form the CPUID table. For modern processors, it contains about two dozen rows of four 32-bit columns.
I will not describe in detail all the officially described fields of this table. Those interested can always find them in Intel SDM [1] (I recommend to be patient - about 40 pages of text only about CPUID). Moreover, for the ISA extensions already announced but not yet released in physical products, the corresponding new CPUID fields can be found in [3]. Instead, I classify the information that can be extracted from the output of this instruction. To designate the bit fields of the table I will use the notation adopted for this: CPUID.leaf.subleaf.reg [bitstart: bitend] . For example, CPUID.0.EBX [31: 0] are bits 0 to 31 of the output register EBX after the CPUID execution, which received sheet 0 (EAX = 0) at the input; podlist (ECX input value) is ignored, so it is not specified.

Regions of sheets

Unsupported input values EAX and ECX do not lead to exceptions, but instead return zeroes in all four registers, or “garbage” (values of another sheet according to the specification). The permissible combinations of sheets and sheet sheets form three continuous regions.

Regular region - all sheets with numbers starting from zero and up to a maximum value equal to CPUID.0.EAX [31: 0]. The number of the maximum sheet is constantly growing and has long passed for the top ten.
Extended region - all sheets, starting from 0x80000000 and up to a maximum value equal to CPUID.0x80000000.EAX [31: 0]. For quite a long time, this maximum value remains at 0x80000008. I did not find documentary evidence, but I have a feeling that the very emergence of a range of extended sheets is related to AMD’s introduction of a 64-bit IA-32 architecture extension.
The range of sheets 0x40000000-0x4fffffff is considered reserved; promises that the CPUID values returned for it will always be zero. However, this does not prevent some from using it for their own needs. For example, KVM virtual machines return four numbers in the 0x40000000 sheet [0, 0x4b4d564b, 0x564b4d56, 0x4d].
Hidden text
This is the ASCII string "KVMKVMKVM"

ISA

The most important data for practitioners encoded in the CPUID are the flags of the supported instruction sets. Traditionally, bits in CPUID.1.ECX and CPUID.1.EDX were set aside for this. A few examples below.

CPUID.1.ECX [0] - SSE3 - vector instructions.
CPUID.1.ECX [9] - SSSE3 - other vector instructions.
CPUID.1.ECX [7] - EIST - Enhanced Intel SpeedStep®, dynamically changing the frequency of the processor.
CPUID.1.EDX [25] - SSE - also vector instructions.
CPUID.1.EDX [26] - SSE2 - again vector instructions.

However, at present, ISA extensions may be indicated by bits in other sheets, since the capacity of sheet 1 has been exhausted.

CPUID.6.EAX [1] - Intel Turbo Boost, out-of-the-box overclocking.
CPUID.7.0.EBX [4] - Hardware Lock Elision, CPUID.7.0.EBX [11] - Restricted Transactional Memory - two extensions from Intel to support transactional memory .
CPUID.0x80000001.ECX [5] - LZCNT, instruction for counting the number of higher zero bits, similar (even too) to BSR.

Brand string

Of course, no vendor will miss the opportunity to perpetuate their name in the identification data of their product. Moreover, it is desirable to do this not just in the form of a number, but to type in an ASCII line (well, at least that is not Unicode).
In IA-32 CPUID text can be found in at least two groups of sheets. CPUID.0.EBX, ECX, EDX contain 12 bytes of ASCII string specific to each vendor. For Intel, this is, of course, "GenuineIntel." And the three CPUID.0x80000002–0x80000004 sheets provide as much as 48 bytes for encoding in ASCII so-called Brand String. It can be seen when printing cat / proc / cpuinfo in Linux. And, although its format is more or less standardized: “vendor brand is a series of CPU @ frequency”, I strongly do not recommend making decisions on its contents in the program code. Its content can vary too much: the frequency can be specified in MHz or in GHz (and in reality it can be completely different due to dynamic adjustment), spaces can change position, and the simulator or virtual machine can substitute anything there at all. All information from the brand string can be found programmatically more reliable ways.

Cache

Information about caches, such as their type, number, capacity, geometry, and separability between cores is useful for tuning high-performance mathematical software, for example, the BLAS (basic linear algebra system) libraries.
Initially, the configuration of the caches described sheet 2. Designed it is not very far-sighted. The coding format of the information in it was not the most flexible, it could not support constant changes in the size and configuration of several levels of caches in the future. Currently, the use of information from sheet 2 is not recommended, there can be 0xFFs.
Judging by the fact that the 0x80000006 list is in the extended range (although I’m not sure, I haven’t yet found any documentary evidence), it was not added by Intel. With the help of it, an attempt was made to supplement the information in sheet 2 with data on the structure of the caches that software developers needed. At the same time, there was again no intention to provide space for growth.
Sheet 4 - the last and so far the most flexible view of data on caches. The price for this is the addition of the concept of podlist encoded in ECX. Each sublist describes one cache: data, code or combined, determines its level, capacity, etc. Will there be enough of the fourth sheet for a long time - wait and see.

Topology

“Topology” here means, of course, not a section of mathematics, but information about the mutual arrangement of individual cores and hyper-threads (if Intel HyperThreading is supported) as part of the current processor. For modern Intel server processors, the following levels of hierarchy are distinguished.

SMT is the level of a hyper-stream, an entity containing an individual architectural state (registers), but potentially separating actuators with other threads (as part of a single core).
The core (core) is an entity containing an individual set of computing devices (adders, multipliers, etc.). One core can have in itself one, two (for a CPU with HyperThreading) or four (for Xeon Phi) hyper-threads.
A package (package) is actually the entire piece of hardware purchased in the store and inserted into the socket (socket) on the motherboard. It has at least one core. In multiprocessor server systems there may be several packejs.

The concept of "logical processor" corresponds to the lowest of the levels present in the system. That they see the operating system. The cost of migration of processes between them, data transfer delays, effects of caches, configuration of NUMA memory, etc. depends on whether two logical processors are relatives (that is, they are part of a single core or package). That is why the topology data is contained in the CPUID sheet 0xB and its podlist.
In addition, for addressing tasks for delivering interrupts from peripherals and other processors, each logical processor has a so-called. APIC ID is a unique number in the system. Topology affects the law by which these numbers are given to the set of active nuclei. They are not always consistent; for example, on a system with HyperThreading disabled, all APIC IDs will be even.
The classic APIC ID is stored in the CPUID.1.EBX [31:24] field. This is only 8 bits, which limits the number of logical processors to 256, which, of course, is not enough in modern realities. Therefore, there is its extension - X2APIC ID, stored in CPUID.0xB.EDX [31: 0]. I think that these 32 bits will be enough for a longer period.
The "coordinates" of each logical processor in the topology of its package are unique. For this reason, it would be nice to take care of providing affinity for a stream that reads several CPUID sheets in a row, otherwise it risks getting values from different cores.

Variable Fields

If problems with the topology seem to be small, then I hasten to inform you that the contents of the CPUID table itself may change dynamically during system operation. Of course, not all fields can be changed; and yet from the BIOS settings you can directly influence whether the OS sees some of the capabilities of the CPU used. I will cite only some of them.

Bit 18 of the CR4 register affects CPUID.1: ECX.OSXSAVE [27], indicating support for the XSAVE instruction.
The fields of the IA32_MISC_ENABLE register affect several CPUID fields at once: bit 3 - on the TM1 and TM2 fields, bit 16 - on the EIST field, bit 34 - on the XD field (execution disable), etc.
Turning on bit 22 of the IA32_MISC_ENABLE register generally “cuts off” all sheets of the CPUID tables older than the third (apparently, this was done for compatibility with Windows NT4, not for nothing that this bit is called NT4).

miscellanea

In this section I collected other interesting points related to the history and work of the CPUID team.

Processor Serial Number

At the time of the Pentium III, each processor received a unique serial number contained in CPUID.3.ECX and CPUID.3.EDX [7]. It is easy to imagine how such a feature would be convenient for the needs of protecting software from copying. However, in 1999, the European Community protested , reasonably fearing that such functionality would damage the privacy of users of such systems. Already in Intel Pentium IV, the serial number was removed, now sheet 3 returns zeros.

Vendors and CPUID

A very interesting table [5] tells what different vendors store (or stored in the past) in different CPUID sheets. For example, a certain mystery level 0x8fffffff is described in which AMD K8 processors returned the string IT'S HAMMER TIME .

Agner Fog about ISA wars

The story of the emergence of extensions of the IA-32 instruction set in a competitive environment of several companies [4]. Adding new instructions has always influenced CPUID, and not everyone could always agree on how to do it correctly.

They messed up the CPUID! IA32_BIOS_SIGN_ID

I always liked the CPUID instruction by the concise nature of my interface and the lack of surprises in my work: one register at the input and four at the output. In her work there is no generation of exceptions, no memory access, no reading / modification of the flags register, it is not affected by prefixes, it works in all processor modes. Compared with the zoo CISC-commands IA-32, it was almost ideal.
... until it turned out that sometimes it is necessary to submit two registers to the entrance for coding a sheet and a sublist. Okay, not so good. Well, at least the output registers are known in advance and always change ...
And it turned out that sometimes the CPUID changes another register — namely, IA32_BIOS_SIGN_ID — and stores in it the signature of the current microcode program of the processor. This happens if the processor firmware has been updated before. For some reason, information about this procedure was scattered over a manual [1] per thousand pages, and therefore it eluded me for a very long time.

Software for reading CPUID

Unlike some other architectures, in IA-32, the CPUID instruction is unprivileged, i.e. can be executed by custom software, not just the OS kernel. Therefore, ordinary programs can freely explore what features the CPU has on which they run. Of course, many tools have been written to represent the confusing binary information CPUID in a human-friendly form. I will list here some.

CPU-Z www.cpuid.com/softwares/cpu-z.html . A very popular identification application for Windows. For my taste, too concise.
The CPUID Explorer. www.flounder.com/cpuid_explorer2.htm . More detailed and therefore convenient application for Windows. Unfortunately, it has not been updated for a long time, so it does not know about the modern CPUID fields. This, by the way, is a common problem of all programs of this type - they become obsolete very quickly.
Intel® Processor Identification Utility for Windows: www.intel.com/support/processors/tools/piu/sb/CS-014921.htm . The official application from Intel, however, does not know much.
msr-tools from Intel Open-source Technology Center: 01.org/msr-tools . Programs for obtaining CPUID values and MSR registers. For reasons I do not understand, reading CPUID requires root rights; In addition, instead of directly invoking instructions, the most reliable interfaces of the Linux kernel are used.
Another Linux cpuid: www.etallen.com/cpuid.html . The best sample I could find. Prints detailed information about all flags on all logical processors.
I started writing my own bike: ggg-cpuid [6]. Unlike other applications, the goal of my project is to be able to collect identification information on processors of different architectures, not just IA-32. Now working on IA-32, IA-64 and ARM. As far as possible and with time, I will add different systems.

Literature

Intel Corporation. Intel® 64 and IA-32 Architectures Software Developer's Manual. Volumes 1-3, 2014. www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
Robert R. Collins. CPUID algorithm wars. Dr. Dobbs Journal, November 1996. www.drdobbs.com/database/cpuid-algorithm-wars/184410005
Intel® Architecture Instruction Set Extensions Programming Reference. software.intel.com/en-us/intel-isa-extensions
Agner Fog. Stop the instruction set war. Agner`s CPU blog. www.agner.org/optimize/blog/read.php?i=25
x86 architecture CPUID. sandpile.org/x86/cpuid.htm
Grigory Rechistov. A set of CPU identification tools for Intel IA-32, IA-64 and other systems. github.com/grigory-rechistov/ggg-cpuid
Intel Processor Identification and the CPUID Instruction. AP-485 Application Note, 1999. netwinder.osuosl.org/pub/misc/docs/i386/24161812.pdf

Source: https://habr.com/ru/post/220851/

All Articles