WIN.COM
, and it suffices to change one byte so that the AARD code is executed again each time it is started.Bypassing the execution of the code by inserting theJMP
command is pretty safe; and if you delete it, in the remaining code the function offsets will change - i.e. this will be a new , untested code. Beta testing was already over, so the developers tried not to replace the tested code with untested.
But why should such a minor code change affect something? Chris Pratley gives an example:Even re-linking the code, not just recompiling, can introduce unexpected bugs. A few years ago, when we were working on the Asian release of Word97 and we already considered all the code ready, we started working on its final optimization. We have a tool that collects statistics on the performance of individual functions, and reorders them in the file in an optimal way; the function code does not change. After optimization, we gave the code for final testing, and - both! - found a bug. When testers used a certain function of the program, on some machines the optimized code collapsed. On the same machines, the same function worked fine before optimization.
We debugged; but if we added debug information to the optimized version, it no longer collapsed. We tried to debug it without additional information; but even if we just ran it under the debugger, it did not crash anymore. Whatever way we tried to find out the cause of the falls, the program did not fall; but she fell absolutely always when we left her alone.
We were already going to use ICE (hardware debugger) when we noticed a pattern: the program crashed only on Pentium processors with a frequency of 150 MHz and below, although not at all. It was already a clue. We went to the Intel site, and looked at the "list of inaccuracies" (as they call their bugs). Bingo! In Pentium processors there was an “inaccuracy”, under certain conditions leading to a crash. In very specific conditions: if after 33 bytes afterJMP
there is a conditional jump, and theJMP
itself is located on the edge of the memory page. This "inaccuracy" has been corrected since the release of the Pentium 150 MHz.
In fact, chips are quite often bugs, although few of them become well-known. In the end, the microcode chip is written by people, not gods. Usually, as soon as chip manufacturers find a bug in it, they report to compiler manufacturers; so compilers generate code to bypass the detected bugs, and ordinary programmers with these bugs no longer become. It turned out that we had a slightly outdated compiler, which did not yet know about this particular bug.
When a representative from Intel confirmed that the mysterious program crashes could be caused by this very bug, we went through the optimized code, found the guilty command sequence, and manually rearranged three bytes so that the distance between the two hops was 34 bytes. The falls have stopped.
Now, when someone assures me that his correction is “absolutely safe,” I always tell this story. No code changes can be completely secure.
I myself came across a similar bug: when we were working on Exchange 5.5, another build crashed on the same test machine - always on the same one. We tried for several days to find out the reason, but without success: the bug disappeared from the slightest code change. But he absolutely always showed up when we stopped debugging. In the end, we, like Chris, found a “list of inaccuracies”; and indeed, our code suffered from one of them. Not having reassured by fixing a specific build, we found a set of compilation options for which the bug was impossible.
Therefore, it is not surprising that the Windows developers left the AARD code in the release: they had been testing Windows with this code for many weeks, if not months, and they knew for sure that when this code was in place, Windows was working. Whether Windows works without it, they didn’t risk finding out just before the release.
Source: https://habr.com/ru/post/103903/
All Articles