
Recently, Microsoft has made a gift to all programmers who want to delve into something interesting. Microsoft opened MS-DOS v 1.1, v 2.0 and Word for Windows 1.1a source code. The MS-DOS operating system is written in assembler, and the analyzer is not applicable to it. But Word is written in C language. The source codes of Word 1.1a are almost 25 years old, but we somehow managed to check them. Of course there is no practical value in this test. Just for fun.
Where to profit from the source
Perhaps many will be interested in not so much this article, but the fact that you can download the source code MS-DOS v 1.1, v 2.0 and Word for Windows 1.1a. For those who are interested in digging into the source code themselves, I send it to the original source.
Press Release:
MS-DOS and Word for Computer History Museum .
Word 1.1a Check

')
Figure 1. Word for Windows 1.1a.
Word for Windows 1.1a was released in 1990. March 25, 2014 the code of this product became available to the public. Word was and remains the flagship product of Microsoft. Me and many others are interested in looking at the insides of a software product that has contributed so much to Microsoft’s commercial success.
I decided to check the Word 1.1a code with the help of our tool
PVS-Studio . This is a static analyzer of C / C ++ code. Naturally, this is not so easy. The analyzer is designed to work with projects developed at least in Visual Studio 2005. And now I have source codes in C language that are more than 20 years old. We can say that these are prehistoric times. At least, then there was no standard for the C language. Each compiler was on its own. Fortunately, in the source code of Word 1.1a there were no unusual moments and the use of a large number of non-standard compiler extensions.
Preprocess files (* .i) are required for analysis. Having preprocessed files, you can use the
PVS-Studio Standalone tool. It can be used to analyze and examine diagnostic messages. Of course, the analyzer is not designed for the analysis of 16-bit programs. But these results of the analysis will be quite enough to satisfy curiosity. Carefully analyze the project 24 years ago there is no practical sense.
So, the main snag was how to get preprocessed files. I asked my colleague to conjure in this direction. He approached the decision very creatively. He performed preprocessing using GCC 4.8.1. It is unlikely that anyone else has mocked Word 1.1 source files. To use GCC - you had to think of it. Dreamer.
The most interesting thing that came out quite well. A small utility was written that started preprocessing with GCC 4.8.1 on each file from the directory in which it was located. As the output of errors associated with the inclusion of header files, the -I keys were added to the start parameters with the path to the required files. A couple of missing header files were created empty. All other #include disclosure problems were related to the inclusion of resources, so they were commented out. During preprocessing, the WIN macro was defined, since The code has a branch for WIN and MAC.
Further, PVS-Studio Standalone and your humble servant entered the business. I wrote out suspicious code snippets and am ready to show them to you. But first, something else about the project.
Miscellaneous Word 1.1a Code
The most complex features
The greatest cyclomatic complexity of the following functions:
- CursUpDown - 219;
- FIdle - 192;
- CmdDrCurs1 - 142.
#ifdef WIN23
Looking through the source code and meeting "#ifdef WIN23", I smiled. And even wrote out this place. I thought it was a typo and should be written #ifdef WIN32.
When I saw WIN23 the second time I began to doubt. And then I suddenly realized that I was looking at the sources from 24 years ago. WIN23 means Windows version 2.3.
Harsh times
In the code, I came across such an interesting line.
Assert((1 > 0) == 1);
It seems incredible that this condition may not be fulfilled. However, if there is such a check, then there was a reason for her to write. In those days there was no standard for the language. As I understand it, it was a good tone to check how well the work of the compiler met the expectations of programmers.
Of course, if we assume that K & R is a standard, then in theory the condition ((1> 0) == 1) is always satisfied. But K & R was just a de facto standard and nothing more. This is a test of the adequacy of the compiler.
Test results
Now let's talk about the suspicious places I found in the code. I think that is why you are reading this article. Let's get started
Endless cycle
void GetNameElk(elk, stOut) ELK elk; unsigned char *stOut; { unsigned char *stElk = &rgchElkNames[mpelkichName[elk]]; unsigned cch = stElk[0] + 1; while (--cch >= 0) *stOut++ = *stElk++; }
PVS-Studio
warning :
V547 Expression '- cch> = 0' is always true. Unsigned type value is always> = 0. mergeelx.c 1188
The “while (--cch> = 0)” loop will never stop. The variable 'cch' is of type unsigned. So, how many do not reduce this variable, it will always remain> = 0.
Overrun array due to typo
uns rgwSpare0 [5]; DumpHeader() { .... printUns ("rgwSpare0[0] = ", Fib.rgwSpare0[5], 0, 0, fTrue); printUns ("rgwSpare0[1] = ", Fib.rgwSpare0[1], 1, 1, fTrue); printUns ("rgwSpare0[2] = ", Fib.rgwSpare0[2], 0, 0, fTrue); printUns ("rgwSpare0[3] = ", Fib.rgwSpare0[3], 1, 1, fTrue); printUns ("rgwSpare0[4] = ", Fib.rgwSpare0[4], 2, 2, fTrue); .... }
PVS-Studio
warning :
V557 Array overrun is possible. The '5' index is pointing beyond array bound. dnatfile.c 444
Somehow it happened that the first line says: Fib.rgwSpare0 [5]. It is not right. There are only 5 elements in the array, which means that the maximum index should be equal to 4. The value '5' is the result of a typo. Most likely in the first line the zero index should be used:
printUns ("rgwSpare0[0] = ", Fib.rgwSpare0[0], 0, 0, fTrue);
Uninitialized variable
FPrintSummaryInfo(doc, cpFirst, cpLim) int doc; CP cpFirst, cpLim; { int fRet = fFalse; int pgnFirst = vpgnFirst; int pgnLast = vpgnLast; int sectFirst = vsectFirst; int sectLast = sectLast; .... }
PVS-Studio
warning :
V573 Uninitialized variable 'sectLast' was used. The variable was used to initialize itself. print2.c 599
The variable 'sectLast' is assigned to itself:
int sectLast = sectLast;
It seems that for initialization the variable 'vsectLast' should have been used:
int sectLast = vsectLast;
Found another identical error. Apparently the consequence of Copy-Paste:
V573 Uninitialized variable 'sectLast' was used. The variable was used to initialize itself. print2.c 719
Undefined behavior
CmdBitmap() { static int iBitmap = 0; .... iBitmap = ++iBitmap % MAXBITMAP; }
PVS-Studio
warning :
V567 Undefined behavior. IBitmap variable ddedit.c 107
I do not know how this code was treated 20 years ago. But now it is considered hooliganism, as it leads to undefined behavior.
Similarly:
- V567 Undefined behavior. The iicon variable is variable ddedit.c 132
- V567 Undefined behavior. The iCursor variable is variable. ddedit.c 150
Unsuccessful call to the printf () function
ReadAndDumpLargeSttb(cb,err) int cb; int err; { .... printf("\n - %d strings were read, " "%d were expected (decimal numbers) -\n"); .... }
PVS-Studio
warning :
V576 Incorrect format. A different number of actual arguments is expected while calling 'printf' function. Expected: 3. Present: 1. dini.c 498
The printf () function is a function with a
variable number of arguments . She can pass arguments, but you can not pass. This is where the arguments were forgotten, with the result that garbage will be printed.
Uninitialized pointers
In one of the auxiliary utilities, which is included in the source code Word, you can find something completely incomprehensible.
main(argc, argv) int argc; char * argv []; { FILE * pfl; .... for (argi = 1; argi < argc; ++argi) { if (FWild(argv[argi])) { FEnumWild(argv[argi], FEWild, 0); } else { FEWild(argv[argi], 0); } fclose(pfl); } .... }
PVS-Studio
warning :
V614 Uninitialized pointer 'pfl' used. Consider checking this function. eldes.c 87
The variable 'pfl' is not initialized before the loop and in the loop itself. But many times the function fclose (pfl) is called. However, all this could well work successfully. The function will return the error status, and the program will continue its work.
And here is another dangerous feature. Most likely, its call will lead to the emergency termination of the program.
FPathSpawn( rgsz ) char *rgsz[]; { char *rgsz0; strcpy(rgsz0, szToolsDir); strcat(rgsz0, "\\"); strcat(rgsz0, rgsz[0]); return FSpawnRgsz(rgsz0, rgsz); }
PVS-Studio warning: V614 Uninitialized pointer 'rgsz0' used. Consider checking the strcpy function. makeopus.c 961
The pointer 'rgsz0' is not initialized by anything. This does not prevent to start copying the string into it.
A typo in the condition
.... #define wkHdr 0x4000 #define wkFtn 0x2000 #define wkAtn 0x0008 .... #define wkSDoc (wkAtn+wkFtn+wkHdr) CMD CmdGoto (pcmb) CMB * pcmb; { .... int wk = PwwdWw(wwCur)->wk; if (wk | wkSDoc) NewCurWw((*hmwdCur)->wwUpper, fTrue); .... }
PVS-Studio
warning : Consider inspecting the condition. The '(0x0008 + 0x2000 + 0x4000)' argument of the '|' bitwise operation contains a non-zero value. dlgmisc.c 409
The condition (wk | wkSDoc) is always true. In fact, here, most likely, they wanted to write:
if (wk & wkSDoc)
In general, they mixed up the operator | and &.
And at the end is a long but simple example.
int TmcCharacterLooks(pcmb) CMB * pcmb; { .... if (qps < 0) { pcab->wCharQpsSpacing = -qps; pcab->iCharIS = 2; } else if (qps > 0) { pcab->iCharIS = 1; } else { pcab->iCharIS = 0; } .... if (hps < 0) { pcab->wCharHpsPos = -hps; pcab->iCharPos = 2; } else if (hps > 0) { pcab->iCharPos = 1; } else { pcab->iCharPos = 1; } .... }
PVS-Studio
warning :
V523 The 'then' statement is equivalent to the 'else' statement. dlglook1.c 873
When working with the variable 'qps', then the following values ​​are written into 'pcab-> iCharIS': 2, 1, 0.
Similarly, they work with the variable 'hps'. But at the same time suspicious numbers are placed in the variable 'pcab-> iCharPos': 2, 1, 1.
Most likely, this is a typo. At the very end, probably, it was necessary to use a zero.
Conclusion
Found quite a few strange places. There are two reasons. First, the code seemed written to me qualitatively and quite understandable. Secondly, the analysis was still incomplete. There is no practical need to teach the analyzer the features of the old C.
I hope I gave you a few minutes of interesting reading. Thanks for attention. And try the PVS-Studio analyzer on your code.
This article is in English.
If you want to share this article with an English-speaking audience, then please use the link to the translation: Andrey Karpov.
Archeology for Entertainment, or Checking Microsoft Word 1.1a with PVS-Studio .