📜 ⬆️ ⬇️

Interrupts in pipelined processors

Surely you know what interrupts are. Perhaps even interested in the processor device. Almost certainly you have never seen a clear story about how the processor detects an interrupt, goes to the handler, and, most importantly, returns from it exactly where it should be.

I wrote this article a year. Initially, it was designed for hardwarders. Understanding that I will never finish it, as well as the thirst for fame and the desire to be read by more than ten people, made me adapt it to a relatively wide audience, throwing out diagrams, pieces of code on Veriloga and kilometers of time diagrams.

If you ever wondered what the words “the processor supports precise aborts” mean in datashit, please under the cat.

Some terminology: processor, processes and interrupts


In order not to try to embrace the immense, I will not consider:

Thus, by processors I will understand only single-core single-threaded scalar RISC-processors. I assume that the reader is at least in general terms familiar with their device.
')
So, a processor is a device that executes a sequence of commands (a program) to solve a certain task. For each command, in turn, the processor must perform a sequence of operations called the instruction cycle and consists of the following steps:
  1. Command selection from memory
  2. Command decoding
  3. Command execution
  4. Write results to registers and / or memory

A processor with sequential execution of commands begins the execution of the next instruction cycle only after the previous one is completed, that is, only one instruction is executed at a time.

A processor with parallel execution of instructions can execute several instructions simultaneously. For example, a processor with a four-stage command pipeline can simultaneously record the results of the first command, use the second, decode the third, and select the fourth from memory.

A process is an executing program. The process should produce the same results regardless of whether it is executed on the processor with sequential or parallel execution of instructions. The state of the process is determined by the contents:

In real-time systems, the effect of cache memory, MMU associative translation buffers (translation lookaside buffer, TLB), and dynamic transition prediction tables must also be considered.

Each executed command somehow updates the state of the process:

An interrupt is an event upon which the processor must suspend the execution of the current process, save its state, and begin performing another process called the interrupt handler. After the completion of the interrupt handler, the state of the interrupted process must be restored, and in the event of a fatal interrupt (for example, due to hardware failure), the processor must be restarted or stopped.

Depending on the interrupt source, it can be:
  1. Internal, if caused by the execution of a command in the processor:
    • Software (software interrupt), if caused by a special command
    • An exception (exception, fault, abort is all that), if caused by an error when executing the command

  2. External if caused by an event outside the processor

A command that was executed when any of the above interrupts occurred will be called an interrupted command for brevity.

Saving and restoring the state of a process can be implemented in hardware, software, or software and hardware. In the future, I will consider the simplest hardware and software version, in which:

After returning control to the interrupted process, he should be able to continue working as if he had not been interrupted. This requirement is trivial, but for most modern processors it is quite difficult to fulfill. It is so difficult that sometimes it is abandoned. Interruptions that guarantee the fulfillment of this requirement are called precise (precise), and others - inaccurate (imprecise).

Accurate and inaccurate interrupts


Formally, an interrupt is called exact if all of the following conditions are met:
  1. All commands preceding the interrupted one were fully executed and correctly saved the state of the process.
  2. all commands following the interrupted have not been executed and in no way changed the state of the process
  3. interrupted command, depending on the type of interruption, either was completely executed or was not executed at all

The first two conditions of accuracy do not need comments. The third condition is due to the following:

Obviously, external interrupts must always be accurate. Who needs a processor that cannot correctly recover the process after processing a timer interrupt?

Software interrupts and exceptions may be accurate or inaccurate. In some cases, it is impossible to do without exact exceptions - for example, if there is a MMU in the processor (then, if a TLB miss occurs, control is transferred to the appropriate exception handler, which programmatically adds the page to the TLB, after which it should be possible to re-execute the command that caused the miss ).

In microcontrollers, exceptions may be inaccurate. For example, if the save command caused an exception due to a memory error, instead of trying to somehow fix the error and rerun this command, you can simply reload the microcontroller and start the program again (that is, do the same thing that the watchdog does timer when the program hung).

In most textbooks on computer architecture (including classics such as Patterson & Hennessy and Hennessy & Patterson ), accurate interruptions are bypassed. In addition, inaccurate interrupts are of no interest. In my opinion, these are excellent reasons to continue the story about exact interruptions.

Precise interrupts in processors with sequential execution of commands


For processors with sequential execution of commands, the implementation of exact interrupts is quite simple, so it seems logical to start with it. Since only one command is executed at a time, at the moment of interruption detection all the commands preceding the interrupted one are already executed, and the subsequent ones are not even started.

Thus, to implement accurate interrupts in such processors, it is enough to make sure that the interrupted command never updates the state of the process until it becomes clear whether it caused an exception or not.

The place where the processor must determine whether to allow the command to update the process status or not is called a commit point . If the processor saves the results of the command, that is, the command did not cause an exception, then they say that this command is fixed (in slang it is committed).

To understand where the point of fixation should be located, it is useful to recall the steps of the command cycle:
  1. Command selection from memory
  2. Command decoding
  3. Command execution
  4. Write results to registers and / or memory

By definition, it must be before the results are recorded, but by this point it should already be known whether the command caused an exception or not. An exception can occur at any of four stages, for example:
  1. memory error while fetching the command
  2. unknown opcode when decoding
  3. division by zero in performance
  4. memory error while writing results

Obviously, the implementation of accurate interrupts is impossible until the problem of recording results in memory is solved:


As you can guess, this problem is quite difficult to solve, so for many processors, “almost exact” interrupts are implemented, that is, all interrupts are accurate, except for exceptions caused by memory errors when writing results. In this case, the fixation point is between the third and fourth stages of the command cycle.
Important! It must be remembered that the command counter must also be updated strictly after the fixation point of the results. At the same time, it changes regardless of whether a command is fixed or not — either the address of the next command, the interrupt vector, or the PAB is written to it.

Precise interrupts in processors with parallel instruction execution


To date, there are almost no processors with sequential execution of commands (I can only recall analogs of Intel's 8051) - they were supplanted by processors with parallel execution of instructions, which, other things being equal, provide higher performance. The simplest processor with parallel execution of commands is the processor with the instruction pipeline.
Despite numerous advantages, the pipeline of commands considerably complicates the implementation of accurate interruptions, which has been a great sadness for developers for many decades.

In a processor with sequential execution of commands, the steps of the command cycle depend on each other. The simplest example is the command counter. Initially, it is used at the sampling stage (as an address in the memory where the command should be read from), then at the execution stage (to calculate its next value), and then, if the command is fixed, it is updated at the recording results stage. This leads to the fact that you can not select the next command until the previous one completes the last stage and updates the command counter. The same applies to all other signals inside the processor.

A processor with a command pipeline can be obtained from the processor with sequential execution of commands, if you make it so that each stage of the command cycle is independent of the previous and subsequent stages.

For this, the results of each stage, except the last, are stored in auxiliary memory elements (registers) located between the stages:
  1. The result of the selection — the encoded instruction — is stored in a register located between the sampling and decoding stages.
  2. The result of decoding - the type of operation, the values ​​of the operands, the address of the result - are stored in the registers between the stages of decoding and execution
  3. Execution results - the new value of the command counter for the conditional transition, the result of an arithmetic operation calculated in the ALU, and so on - are stored in the registers between the execution and recording stages
  4. At the last stage, the results are already recorded in registers and / or memory, so no auxiliary registers are needed.

This is how the resulting pipeline works:

 Tact SC Sampling Decoding Performance Record
 1 0x00 Team1 - - -
 2 0x04 Team2 Team1 - -
 3 0x08 Team3 Team2 Team1 -
 4 0x0C Team4 Team3 Team2 Team1 Team1            
 5 0x10 Team5 Team4 Team3 Team3 Team2            

Pay attention to the SK column ("command counter"). Its value changes every clock and determines the address in memory, where the command is selected from.
The attentive reader has already noticed a small discrepancy - to ensure the accuracy of interrupts, the first team does not have the right to change the command counter before the fourth clock cycle. To fix this, we need to move the command counter for the fixation point of the result (suppose that it is between the third and fourth stages):

 Tact Sample Decoding Performance Record SC
 1 Team1 - - - 0x00
 2 - Team1 - - 0x00
 3 - - Team1 - 0x00
 4 Team2 - - Team1 0x04
 5 - Team2 - - 0x04

Processor performance has dropped a bit, right? In fact, the solution lies on the surface - we need two instruction counters! One should be at the beginning of the pipeline and indicate where to read the commands, the second at the end, and point to the command that should be fixed next.
The first is called “speculative”, the second - “architectural”. Most often, the speculative command counter does not exist by itself, but is built into the predictor of transitions. It looks like this:

 Tact CCQ Sampling Decoding Execution Record_ASC results
 1 0x00 Team 1 - - - 0x00
 2 0x04 Team2 Team1 - - 0x00
 3 0x08 Team3 Team2 Team1 - 0x00
 4 0x0C Team4 Team3 Team2 Team2 Team1 0x04
 5 0x10 Team5 Team4 Team3 Team3 Team2 0x08

What happens next is what happens next. The team, moving between stages, drags the address from which it was selected (that is, its FCS). Before the fixation point of the result, the processor looks to see if an external interrupt has arrived, if the command has caused an exception, and also compares its address with the ACK:

Why the address of the command may not be equal to the ACK? Take my favorite example: the processor has just been turned on, and it selects the first command from the interrupt table, which is nothing more than a command to go to the far distance (at 0x1234 address):

 Tact CCQ Sampling Decoding Execution Record_ASC results
 1 0x00 jump 0x1234 - - - 0x00
 2 0x04 Team2 jump 0x1234 - - 0x00
 3 0x08 Team3 Team2 jump 0x1234 - 0x00
 4 0x0C Team4 Team3 Team2 Jump2 team 0x1234
 *** For Komandy2 on the fourth cycle, its address (0x04) is not equal to ACK, because the transition was predicted incorrectly ***
 5 0x1234 Team666 - - - 0x1234
 6 0x1238 Team667 Team666 - - 0x1234
 7 0x1240 Team668 Team667 Team666 - 0x1234
 8 0x1244 Team669 Team668 Team667 Team666 0x1238

That's all. Of course, the shown four-stage pipeline is simple to the point of impossibility. In fact, some commands can execute more than one clock cycle, and even a simple microcontroller can complete them not in the order in which he launched them, while ensuring the accuracy of interrupts. However, the general principle of organizing interrupts, I assure you, remains the same.

Those who wish to aggravate the brain explosion, I recommend to familiarize yourself with the Implementation of precise interrupts in pipelined processors . Yes, your newest Intel Cor Core Seven works exactly as described in this article twenty-five years ago. Welcome to the eighties!

Source: https://habr.com/ru/post/188002/


All Articles