Reversing simulated time

I’m sure that many of us, when debugging applications, periodically have a desire to step back (or two, ten ...) back from the current line to see the reasons for the wrong behavior happening in it. Most often, for this you have to restart debugging from the beginning and try to stop the execution of the program a little earlier than on the previous attempt. Then you have to step by step closer to the intended place with the problem ... oops, again stepped over! We are starting on a new one, because in the debugger it is no longer possible to step back even one step. Or can it?

In one of my previous posts I planned to describe how software simulation of digital systems allows you to do what is impossible in reality, namely, to reverse the passage of time for debugging needs. This will be discussed in this article.

^{Author photo: Ivan Andreev}

')
First, a few general considerations. Why do we need a direct, i.e. "Normal" execution of programs? To get the result of some calculation that is undesirable / difficult / impossible to perform manually. The calculation of the flight of the rocket is a calculation. Watching the seals on the site are calculations. The computation process can be formalized using its representation in the form of an algorithm.

Why pay execution

What can reverse execution be necessary for? In order to make it easier for you to understand how the calculations inside the program led to a particular result, right or wrong.

Supporting the flow reversal function of a program may be included in various tools used by programmers. Next, we will mainly discuss simulators. In this context, this idea gets two options for development. The first is debugging of the simulated BIOS / Firmware code, operating system, OS, applications running inside the model, which is interesting for the end user of the simulator - the software developer. The second application is to debug the simulator directly (it is also a program). This is important already for the developer of device models.

Debugging

The execution of any algorithm (in the traditional sense of the term), written in one form or another, can be divided into separate steps, such as high-level language operators, machine instructions, movements of the Turing carriage, etc. The debugging process begins with a search for a step in the implementation of the algorithm, in which something went wrong, as expected by the programmer. As a rule, two points of execution are known: on the first, the state is still “good”, expected, and on the second, incorrect behavior or state is already observed.

It is necessary to minimize the distance between points with a known "good" and "bad" state to one step. This process can be carried out in different ways. For example, in a linear way to go through all the steps inside a segment, each time starting from the beginning of the algorithm. This is justified only if the segment itself is very short.

You can resort to the dichotomy, cutting the length of the segment in half at each approach.

At the same time, the number of iterations will be substantially less than with linear iteration. However, it remains necessary to restart the program from the beginning. Well, if this bug manifests itself almost immediately after the start. And if before its occurrence each time passes a few hours? Productivity debugging while still leaves much to be desired.

Debugging inside the simulation

How can simulation of digital systems help with debugging? I will highlight two techniques, which, as will be shown later, are closely related to each other.

Create save points. The simulator allows you to save the state of all device models: the contents of memory, CPU registers and devices - to disk as a file. Then, from such a savepoint (English checkpoint or savepoint), you can repeatedly restore the execution script, saving time on “running” from the beginning of the program to the region of interest. I am sure that computer games players who read this have used the save mechanism more than once in order not to start a difficult level from the beginning.

Postal 2

Dude: "Even my grandmother could have gone through this game if she had survived as much as you!"

Handling time. Restarting the simulation from the save point is necessary if it is found that during the search we crossed the point where the problem occurred, and it has already manifested itself. That would be great if you could step back a few steps! This would further speed up the process. In this case, the search for the erroneous step looks like this:

Simulation Requirements

Not all processes in the real world can be reversed. Anyone who tried to press the toothpaste back into the tube will confirm this. Therefore, not every simulation is possible: some properties are required from the participating software models and the efforts of their authors aimed at providing a number of guarantees.

1. The ability to inspect the state of the model

To create savepoints suitable for subsequent restoration, each model of the device included in the simulation must be able to represent its internal state in some format in sufficient detail. In addition to the architectural state (registers, memory) during restoration, it is required to know how the individual devices were connected to each other.

Some elements of the model state can be not saved if they do not reflect the architecture of the device it represents and can be uniquely restored in the future. For example, the contents of the simulator caches (but not the simulated device caches!) Are used only to speed up the operation of the model and therefore can most often be discarded when saved. In addition, when restoring a simulation, it is necessary to ensure that such caches will not contain outdated information related to the architectural state of the model prior to recovery.

2. Determinism

This strange word means the repeatability of the execution of a process for any two of its runs with the same initial state. For a single-threaded program, repeatability is considered an expected property if it is written without the use of external random sequence generators and does not have “recourse to uninitialized data” errors. For parallel applications, repeatability is also possible, but requires particularly careful programming. For software parallel simulators, I described the conditions in more detail here .

Determinism in combination with the points of conservation allows for the reverse execution.

How it works

The idea is very simple. In the process of direct execution, the simulator periodically creates save points (not necessarily on disk, they can be stored in memory, that is, to be snapshots of a state, English snapshot). If necessary, roll back a few steps back, the point nearest to the desired position is restored first, and a direct execution from it to the required place occurs. After that, thanks to the determinism of the model, its state looks as if the simulated time has really rolled back.

Suppose the simulation state is at a point with Tsim = 100 (we measure time in steps of the algorithm; these can be measures, instructions, or other discrete units). From this point it is necessary to simulate the time back 20 steps. At the same time, the nearest saving was made at the moment Ts = 50.

The save point is restored. After that Tsim = Ts = 50.
There is a direct simulation in 30 steps. After that Tsim = 80.

Features and tradeoffs

The practical value of reverse execution as a tool for debugging acceleration during simulation depends on the frequency of snapshots. The more often they are done, the reverse execution will work faster there. In this case, the direct simulation will need to be interrupted as often in order to maintain its state, which will negatively affect its speed. In addition, the dots themselves need to be stored somewhere, in memory or on disk, and they are limited resources.
Here, before programmers creating reverse execution, there are plenty of opportunities for implementing all kinds of heuristics: incremental save points, deletion of “old” points, changing interval between images, transparent data compression to save space, etc.

Struggling with unrepeatability

The two requirements indicated above are sufficient if all the data necessary for the simulation is already contained within it, i.e. she is self-sufficient. But it often turns out that in the process of work, the simulation interacts with the outside world, which is not at all inclined to repeatability.
An example of such a "unique" external source is man. That is, the operator interacting with the computer and pressing the buttons on the keyboard when he pleases. There is no need to talk about any reproducibility of the input data; it is not necessary to hope for the constancy of the lengths of time intervals between the individual presses, etc. However, repeatability is critical for many simulation scenarios — interrupts coming in a different order will cause a completely different execution of the OS code and user applications. Another source is network packets in situations where the real and simulated networks have points of contact.
The solution is to record the data flow trail from all external sources during the first direct simulation. In subsequent plays, external events are taken from this route, and not from reality. The user / network is disconnected from input systems.

Practical implementation

Simulators

I emphasize once again: reverse execution is possible if the simulator 1) can create conservation points and 2) is deterministic. If the model has to interact with the outside world, its inherent indeterminism can be “compensated” by recording the trace.

Many of the famous simulators can create save points / snapshots. Just a few examples: Oracle VirtualBox, Qemu, Bochs, FCEUX (NES / Dendy emulator). In the case of virtual machines, snapshots are often used to enable migration between host systems.
The determinism of the simulation is guaranteed by a significantly smaller number of software simulators. From open source projects, only FCEUX comes to my mind. I would appreciate if in the comments I point to other examples.

The mechanism of reverse execution, as described in this article, is implemented in the full-platform functional simulator Wind River Simics. In the next video, I demonstrate the loading of the Linux kernel inside Simics, then use the reverse execution to return the simulation state to a point in the past [In order for the video to be parsed, it is recommended to watch it with 720p settings ].

Classic debuggers

Time reversal is a fairly general idea, applicable for debugging needs, so it can no doubt be implemented in classical debuggers as well. The time reversal options for debugging are available in the RogueWave TotalView debugger ( article ), although I personally did not use it.
If readers have information about other tools that support something similar, please write about it in the comments.

PS I myself look forward to the emergence of similar functionality in GDB. And it seems there is even hope .

Source: https://habr.com/ru/post/208908/

All Articles