Interrupts in pipelined processors

Surely you know what interrupts are. Perhaps even interested in the processor device. Almost certainly you have never seen a clear story about how the processor detects an interrupt, goes to the handler, and, most importantly, returns from it exactly where it should be.

I wrote this article a year. Initially, it was designed for hardwarders. Understanding that I will never finish it, as well as the thirst for fame and the desire to be read by more than ten people, made me adapt it to a relatively wide audience, throwing out diagrams, pieces of code on Veriloga and kilometers of time diagrams.

If you ever wondered what the words “the processor supports precise aborts” mean in datashit, please under the cat.

Some terminology: processor, processes and interrupts

In order not to try to embrace the immense, I will not consider:

Processors with exotic architectures (stack, stream, asynchronous, and so on), because their market share is very small, and as an example it is more logical to use common architecture. RISC I chose solely for religious reasons
Multi-core processors, because each processor core processes its interrupts independently of other cores
Superscalar, multi-threaded, and VLIW processors, because from the point of view of interrupt organization, they are similar to scalar processors (although, of course, much more complicated).

Thus, by processors I will understand only single-core single-threaded scalar RISC-processors. I assume that the reader is at least in general terms familiar with their device.
')
So, a processor is a device that executes a sequence of commands (a program) to solve a certain task. For each command, in turn, the processor must perform a sequence of operations called the instruction cycle and consists of the following steps:

Command selection from memory
Command decoding
Command execution
Write results to registers and / or memory

A processor with sequential execution of commands begins the execution of the next instruction cycle only after the previous one is completed, that is, only one instruction is executed at a time.

A processor with parallel execution of instructions can execute several instructions simultaneously. For example, a processor with a four-stage command pipeline can simultaneously record the results of the first command, use the second, decode the third, and select the fourth from memory.

A process is an executing program. The process should produce the same results regardless of whether it is executed on the processor with sequential or parallel execution of instructions. The state of the process is determined by the contents:

CPU instruction counter (program counter, aka instruction pointer)
processor registers (general purpose, status, flags, and so on)
random access memory

In real-time systems, the effect of cache memory, MMU associative translation buffers (translation lookaside buffer, TLB), and dynamic transition prediction tables must also be considered.

Each executed command somehow updates the state of the process:

arithmetic and logic commands update the contents of the registers and the program counter
jump instructions update the contents of the instruction counter and the dynamic jump prediction table
load commands update the contents of the registers, the command counter and the cache memory (in case of a cache miss; if you need to replace the cache line, then also the RAM)
save commands update the contents of the RAM (or cache) and the command counter

An interrupt is an event upon which the processor must suspend the execution of the current process, save its state, and begin performing another process called the interrupt handler. After the completion of the interrupt handler, the state of the interrupted process must be restored, and in the event of a fatal interrupt (for example, due to hardware failure), the processor must be restarted or stopped.

Depending on the interrupt source, it can be:

Internal, if caused by the execution of a command in the processor:
- Software (software interrupt), if caused by a special command
- An exception (exception, fault, abort is all that), if caused by an error when executing the command
External if caused by an event outside the processor

A command that was executed when any of the above interrupts occurred will be called an interrupted command for brevity.

Saving and restoring the state of a process can be implemented in hardware, software, or software and hardware. In the future, I will consider the simplest hardware and software version, in which:

the processor saves the instruction counter to the special register of the return address (PAB), at the same time writing the interrupt vector to the instruction counter, thus starting the interrupt handler
all other process state elements are saved by the interrupt handler if necessary (for example, before using registers, it must save their contents to the stack)
before terminating the interrupt handler, it must restore all process state elements that were changed (for example, restore the contents of registers saved to the stack)
the interrupt handler is completed with the return-from-interrupt command, which writes the contents of the PAB back to the command counter, that is, returns control to the interrupted process

After returning control to the interrupted process, he should be able to continue working as if he had not been interrupted. This requirement is trivial, but for most modern processors it is quite difficult to fulfill. It is so difficult that sometimes it is abandoned. Interruptions that guarantee the fulfillment of this requirement are called precise (precise), and others - inaccurate (imprecise).

Accurate and inaccurate interrupts

Formally, an interrupt is called exact if all of the following conditions are met:

All commands preceding the interrupted one were fully executed and correctly saved the state of the process.
all commands following the interrupted have not been executed and in no way changed the state of the process
interrupted command, depending on the type of interruption, either was completely executed or was not executed at all

The first two conditions of accuracy do not need comments. The third condition is due to the following:

A command that was executed at the time of the arrival of an external interrupt must update the state of the process before it is saved. The same applies to the command that caused the software interrupt. In both cases, the PAB will indicate a command that, without an interruption, should have been executed next. It will be executed immediately after returning from the interrupt handler.
The team that caused the exception is the “bad” team. Its results are most likely incorrect, so it should not update the state of the process. Instead, its address is saved in the PAB, after which the interrupt handler is called, which will try to correct the error. After returning from the handler, this command will be executed again. If it again causes the same exception, then the error is uncorrectable and the processor will generate a fatal interrupt.

Obviously, external interrupts must always be accurate. Who needs a processor that cannot correctly recover the process after processing a timer interrupt?

Software interrupts and exceptions may be accurate or inaccurate. In some cases, it is impossible to do without exact exceptions - for example, if there is a MMU in the processor (then, if a TLB miss occurs, control is transferred to the appropriate exception handler, which programmatically adds the page to the TLB, after which it should be possible to re-execute the command that caused the miss ).

In microcontrollers, exceptions may be inaccurate. For example, if the save command caused an exception due to a memory error, instead of trying to somehow fix the error and rerun this command, you can simply reload the microcontroller and start the program again (that is, do the same thing that the watchdog does timer when the program hung).

In most textbooks on computer architecture (including classics such as Patterson & Hennessy and Hennessy & Patterson ), accurate interruptions are bypassed. In addition, inaccurate interrupts are of no interest. In my opinion, these are excellent reasons to continue the story about exact interruptions.

Precise interrupts in processors with sequential execution of commands

For processors with sequential execution of commands, the implementation of exact interrupts is quite simple, so it seems logical to start with it. Since only one command is executed at a time, at the moment of interruption detection all the commands preceding the interrupted one are already executed, and the subsequent ones are not even started.

Thus, to implement accurate interrupts in such processors, it is enough to make sure that the interrupted command never updates the state of the process until it becomes clear whether it caused an exception or not.

The place where the processor must determine whether to allow the command to update the process status or not is called a commit point . If the processor saves the results of the command, that is, the command did not cause an exception, then they say that this command is fixed (in slang it is committed).

To understand where the point of fixation should be located, it is useful to recall the steps of the command cycle:

Command selection from memory
Command decoding
Command execution
Write results to registers and / or memory

By definition, it must be before the results are recorded, but by this point it should already be known whether the command caused an exception or not. An exception can occur at any of four stages, for example:

memory error while fetching the command
unknown opcode when decoding
division by zero in performance
memory error while writing results

Obviously, the implementation of accurate interrupts is impossible until the problem of recording results in memory is solved:

you cannot fix a command and allow it to write the results into memory until it becomes clear that the command did not raise an exception
it is impossible to know that an exception was not caused without writing the results into memory (for this you need to receive a confirmation from the memory controller that the recording was successful)

As you can guess, this problem is quite difficult to solve, so for many processors, “almost exact” interrupts are implemented, that is, all interrupts are accurate, except for exceptions caused by memory errors when writing results. In this case, the fixation point is between the third and fourth stages of the command cycle.
Important! It must be remembered that the command counter must also be updated strictly after the fixation point of the results. At the same time, it changes regardless of whether a command is fixed or not — either the address of the next command, the interrupt vector, or the PAB is written to it.

Precise interrupts in processors with parallel instruction execution

To date, there are almost no processors with sequential execution of commands (I can only recall analogs of Intel's 8051) - they were supplanted by processors with parallel execution of instructions, which, other things being equal, provide higher performance. The simplest processor with parallel execution of commands is the processor with the instruction pipeline.
Despite numerous advantages, the pipeline of commands considerably complicates the implementation of accurate interruptions, which has been a great sadness for developers for many decades.

In a processor with sequential execution of commands, the steps of the command cycle depend on each other. The simplest example is the command counter. Initially, it is used at the sampling stage (as an address in the memory where the command should be read from), then at the execution stage (to calculate its next value), and then, if the command is fixed, it is updated at the recording results stage. This leads to the fact that you can not select the next command until the previous one completes the last stage and updates the command counter. The same applies to all other signals inside the processor.

A processor with a command pipeline can be obtained from the processor with sequential execution of commands, if you make it so that each stage of the command cycle is independent of the previous and subsequent stages.

For this, the results of each stage, except the last, are stored in auxiliary memory elements (registers) located between the stages:

The result of the selection — the encoded instruction — is stored in a register located between the sampling and decoding stages.
The result of decoding - the type of operation, the values of the operands, the address of the result - are stored in the registers between the stages of decoding and execution
Execution results - the new value of the command counter for the conditional transition, the result of an arithmetic operation calculated in the ALU, and so on - are stored in the registers between the execution and recording stages
At the last stage, the results are already recorded in registers and / or memory, so no auxiliary registers are needed.

This is how the resulting pipeline works:

 Tact SC Sampling Decoding Performance Record
 1 0x00 Team1 - - -
 2 0x04 Team2 Team1 - -
 3 0x08 Team3 Team2 Team1 -
 4 0x0C Team4 Team3 Team2 Team1 Team1            
 5 0x10 Team5 Team4 Team3 Team3 Team2

Pay attention to the SK column ("command counter"). Its value changes every clock and determines the address in memory, where the command is selected from.
The attentive reader has already noticed a small discrepancy - to ensure the accuracy of interrupts, the first team does not have the right to change the command counter before the fourth clock cycle. To fix this, we need to move the command counter for the fixation point of the result (suppose that it is between the third and fourth stages):

 Tact Sample Decoding Performance Record SC
 1 Team1 - - - 0x00
 2 - Team1 - - 0x00
 3 - - Team1 - 0x00
 4 Team2 - - Team1 0x04
 5 - Team2 - - 0x04

Processor performance has dropped a bit, right? In fact, the solution lies on the surface - we need two instruction counters! One should be at the beginning of the pipeline and indicate where to read the commands, the second at the end, and point to the command that should be fixed next.
The first is called “speculative”, the second - “architectural”. Most often, the speculative command counter does not exist by itself, but is built into the predictor of transitions. It looks like this:

 Tact CCQ Sampling Decoding Execution Record_ASC results
 1 0x00 Team 1 - - - 0x00
 2 0x04 Team2 Team1 - - 0x00
 3 0x08 Team3 Team2 Team1 - 0x00
 4 0x0C Team4 Team3 Team2 Team2 Team1 0x04
 5 0x10 Team5 Team4 Team3 Team3 Team2 0x08

What happens next is what happens next. The team, moving between stages, drags the address from which it was selected (that is, its FCS). Before the fixation point of the result, the processor looks to see if an external interrupt has arrived, if the command has caused an exception, and also compares its address with the ACK:

If an external interrupt arrives, the command is committed, but the address of the next command is not recorded in the ACK, but in the PAB. In the ACK is written the address of the interrupt vector.
If an exception occurs, the command is not committed, instead, the address of the vector of the corresponding exception is written to the ACK, and the address of the command is recorded in the PAB.
If the address of the command is not equal to the ASC, it also does not commit (more on that later). If the address is equal to ACK and the exception did not occur - the processor fixes the command and updates the ACK (writes the transition address in the case of a branch command or simply increments in the case of another command)

Why the address of the command may not be equal to the ACK? Take my favorite example: the processor has just been turned on, and it selects the first command from the interrupt table, which is nothing more than a command to go to the far distance (at 0x1234 address):

 Tact CCQ Sampling Decoding Execution Record_ASC results
 1 0x00 jump 0x1234 - - - 0x00
 2 0x04 Team2 jump 0x1234 - - 0x00
 3 0x08 Team3 Team2 jump 0x1234 - 0x00
 4 0x0C Team4 Team3 Team2 Jump2 team 0x1234
 *** For Komandy2 on the fourth cycle, its address (0x04) is not equal to ACK, because the transition was predicted incorrectly ***
 5 0x1234 Team666 - - - 0x1234
 6 0x1238 Team667 Team666 - - 0x1234
 7 0x1240 Team668 Team667 Team666 - 0x1234
 8 0x1244 Team669 Team668 Team667 Team666 0x1238

That's all. Of course, the shown four-stage pipeline is simple to the point of impossibility. In fact, some commands can execute more than one clock cycle, and even a simple microcontroller can complete them not in the order in which he launched them, while ensuring the accuracy of interrupts. However, the general principle of organizing interrupts, I assure you, remains the same.

Those who wish to aggravate the brain explosion, I recommend to familiarize yourself with the Implementation of precise interrupts in pipelined processors . Yes, your newest Intel Cor Core Seven works exactly as described in this article twenty-five years ago. Welcome to the eighties!

Source: https://habr.com/ru/post/188002/

All Articles

Interrupts in pipelined processors

Some terminology: processor, processes and interrupts

Accurate and inaccurate interrupts

Precise interrupts in processors with sequential execution of commands

Precise interrupts in processors with parallel instruction execution

More articles: