📜 ⬆️ ⬇️

Why is the BAARD not removed in the release?

In beta versions of Windows 3.1, there was a hidden and encrypted code that when launched on DR-DOS produced an incomprehensible message about a fictional error.

They decided not to deal with such tricks in the release, but the code of the checks and the message itself were not removed: they remained inside WIN.COM , and it suffices to change one byte so that the AARD code is executed again each time it is started.

Why was he left? Did Microsoft expect to unlock these questionable checks one day in the future?
Of course not. Even the message in the release remained unchanged: “Please contact Windows 3.1 beta support.” If the message was really meant to be shown, it would have been updated after the beta testing.
So why leave a meaningless code in the release that is never executed?

Larry Osterman explains:
Bypassing the execution of the code by inserting the JMP command is pretty safe; and if you delete it, in the remaining code the function offsets will change - i.e. this will be a new , untested code. Beta testing was already over, so the developers tried not to replace the tested code with untested.

But why should such a minor code change affect something? Chris Pratley gives an example:
Even re-linking the code, not just recompiling, can introduce unexpected bugs. A few years ago, when we were working on the Asian release of Word97 and we already considered all the code ready, we started working on its final optimization. We have a tool that collects statistics on the performance of individual functions, and reorders them in the file in an optimal way; the function code does not change. After optimization, we gave the code for final testing, and - both! - found a bug. When testers used a certain function of the program, on some machines the optimized code collapsed. On the same machines, the same function worked fine before optimization.

We debugged; but if we added debug information to the optimized version, it no longer collapsed. We tried to debug it without additional information; but even if we just ran it under the debugger, it did not crash anymore. Whatever way we tried to find out the cause of the falls, the program did not fall; but she fell absolutely always when we left her alone.

We were already going to use ICE (hardware debugger) when we noticed a pattern: the program crashed only on Pentium processors with a frequency of 150 MHz and below, although not at all. It was already a clue. We went to the Intel site, and looked at the "list of inaccuracies" (as they call their bugs). Bingo! In Pentium processors there was an “inaccuracy”, under certain conditions leading to a crash. In very specific conditions: if after 33 bytes after JMP there is a conditional jump, and the JMP itself is located on the edge of the memory page. This "inaccuracy" has been corrected since the release of the Pentium 150 MHz.

In fact, chips are quite often bugs, although few of them become well-known. In the end, the microcode chip is written by people, not gods. Usually, as soon as chip manufacturers find a bug in it, they report to compiler manufacturers; so compilers generate code to bypass the detected bugs, and ordinary programmers with these bugs no longer become. It turned out that we had a slightly outdated compiler, which did not yet know about this particular bug.

When a representative from Intel confirmed that the mysterious program crashes could be caused by this very bug, we went through the optimized code, found the guilty command sequence, and manually rearranged three bytes so that the distance between the two hops was 34 bytes. The falls have stopped.

Now, when someone assures me that his correction is “absolutely safe,” I always tell this story. No code changes can be completely secure.

I myself came across a similar bug: when we were working on Exchange 5.5, another build crashed on the same test machine - always on the same one. We tried for several days to find out the reason, but without success: the bug disappeared from the slightest code change. But he absolutely always showed up when we stopped debugging. In the end, we, like Chris, found a “list of inaccuracies”; and indeed, our code suffered from one of them. Not having reassured by fixing a specific build, we found a set of compilation options for which the bug was impossible.

Therefore, it is not surprising that the Windows developers left the AARD code in the release: they had been testing Windows with this code for many weeks, if not months, and they knew for sure that when this code was in place, Windows was working. Whether Windows works without it, they didn’t risk finding out just before the release.

BAARD was not the only interesting feature of the beta version of Windows 3.1.
For the first time, Windows intercepted pressing Ctrl-Alt-Del, and showed its own screen “let me help you to close a hung program.” Those using Windows 3.x remember that this screen was blue; in the beta, he was dull black.

If the user confirmed closing the hung application, but Windows could not complete it (for example, if there were no hung applications in the system), then the only thing that Windows could offer was a reboot.

In the release of Windows 3.1, this oddity was corrected: now, if there are no hung applications, Windows suggests leaving it as it is.

I read another similar story, it seems, with Raymond Chen; I could not find the source now. There Chen said that an unused variable was found in one of the ancient Windows builds. They removed it - the function stopped working in a completely different place in the code. Returned the variable - the bug was gone.

This time the chips were fine: the problem really turned out to be for the programmers. In the broken function there was a variable that was not initialized in all cases before use. The value of the uninitialized variable was garbage that happened to be in the stack cell allocated for it; and it so happened that there was always zero in this place. Zero was the appropriate value for that variable, and the program continued to work.

When the unused variable was deleted, all other variables in the program “moved out” along the stack, and now some other value appeared in the space reserved for the uninitialized variable. Now the program has collapsed.

To the surprise of Chen's colleagues, the essential function was written many months before, and even got into the previous release of Windows! Due to a successful set of circumstances, it always worked correctly, despite the bug, which is why the bug went unnoticed for so long.

Chen cited this story as an explanation of why in Windows XP there are code fragments that no one has touched since Windows 3.0. No one knows how many invisible bugs are there; but everyone knows for sure: this code works .

Source: https://habr.com/ru/post/103903/

All Articles