Real-time Windows Memory Management

Recently, Raymond Chen completed a series of posts, begun a year and a half ago, and dedicated to managing virtual memory without any support from the processor: Windows up to version 3.0 supported the actual 8086 mode. In this mode, the translation of the address from the “virtual” (visible program) to the physical (issued on the system bus) is performed by simple addition of a segment and offset - no “access checks”, no “invalid addresses”. All addresses are accessible to all. At the same time, several programs could work simultaneously in Windows and not interfere with each other; Windows could move their segments in memory, unload unused ones, and load them back as necessary, possibly to other addresses.

(Interestingly, the usual holivorschiki "it was a graphical shell, and not an operating system" aware of these extraordinary abilities of it?)

And how did she manage?

Data management

There was no swapping in real Windows mode. Immutable data (for example, resources) were simply deleted from memory, and, if necessary, loaded again from the executable file. The changed data could not be unloaded, but could (like any other data) be moved: the application for working with memory blocks does not use addresses, but handles; and at the time of accessing the data, it “fixes” the block, receiving its address, and then “releases” the block so that Windows can move it if necessary. Something similar appeared a dozen years later in .NET, already called pinning.
')
The GlobalLock / GlobalUnlock and LockResource / FreeResource preserved in Win32API for compatibility with those dense times, although in Win32 memory blocks (including resources) were never moved.

The LockSegment and UnlockSegment (fix / free memory at the address, not the handle) remained for some time in the documentation marked “obsolete, do not use”, but now they don’t even have any memories left.

For those who need to fix the memory for a long period of time, there was also a GlobalWire function - “so that the unit does not GlobalWire out in the middle of the address space, move it to the lower edge of the memory and fix it there”; GlobalUnwire , completely equivalent to GlobalUnwire , corresponded to GlobalUnlock . This pair of functions, surprisingly, is still alive in kernel32.dll, although they have already been removed from the documentation. Now they just call GlobalLock / GlobalUnlock .

In the protected Windows mode, the GlobalLock function GlobalLock replaced with a “stub”: now Windows can shuffle the memory blocks without changing their “virtual address” visible to the application (selector: offset) - which means that the application no longer needs to fix non-paged objects. In other words, fixing now prevents unloading of the block, but does not prevent its (imperceptible for the application) movement. Therefore, in order to fix the data “in reality” in physical memory, for those who need exactly this (for example, to work with external devices), a pair of GlobalFix / GlobalUnfix . Just like GlobalWire / GlobalUnwire , in Win32, these functions have become useless; and they are also removed from the documentation in the same way, although they remain in kernel32.dll, and they call GlobalLock / GlobalUnlock .

Code management

The trickiest thing starts here. The blocks of code - just like the immutable data - were deleted from memory, and then loaded from the executable file. But how did Windows ensure that programs did not attempt to call functions in unloaded blocks? It would be possible to access both functions through handles, and to call a hypothetical LockFunction before each function call; but recall that many functions twist the “message loop”, for example, show a window or execute DDE commands, - and they could also be unloaded for this time, because in fact, their code is not needed at this time. However, when using "function handles," the function segment will not be released until it returns control to the calling function.

Instead, Windows begins with the assumption that you can unload any function that is not being executed right now; and if the Windows memory manager code is being executed right now, it is possible to unload any function at all . Links to it can remain either in the program code or in the stack, if this function did not have time to return until the time of unloading.

So Windows goes through the stacks of all running tasks (this is how execution contexts in Windows were called until they separated the processes and threads), finds return addresses leading inside unloaded segments, and replaces them with reload thunks - “stubs” addresses that load the desired segment from the executable file, and transfer control inside it, as if nothing had happened.

In order for Windows to go through the stack, the programs must support it in the correct format : no FPO, the stack frame must start with a BP pointer to the frame of the calling function. (Since the stack consists of 16-bit words, the BP value is always even.) In addition, Windows should distinguish between intra-segment (“close”) and inter-segment (“far”) calls in the stack, and can ignore close calls — they’re don't exactly lead to the unloaded segment. Therefore, it was decided that the odd BP value in the stack means a far call, i.e. each distant function must begin with the pr-pr- INC BP; PUSH BP; MOV BP,SP INC BP; PUSH BP; MOV BP,SP INC BP; PUSH BP; MOV BP,SP and end with the epilogue of POP BP; DEC BP; RETF POP BP; DEC BP; RETF POP BP; DEC BP; RETF (In fact, the prologue and epilogue were more difficult , but this is not about that now.)

With links from a stack have understood, and how to be with links from other code segments ? Of course, Windows cannot go through all the memory, find all the calls of the unloaded functions, and replace them all with reload thunks. Instead, intersegmental calls are compiled with the assumption that the function being called may not be in memory, and in fact cause a “stub” in the module's input table . This stub consists of an int 3fh instruction, and three more service bytes indicating where to look for the function. The int 3fh finds these service bytes at its return address; determines the desired segment; loads it into memory if it is not already loaded; and finally overwrites the stub in the input table with an absolute jmp xxxx:yyyy transition to the function body, so that the next calls to this function are slowed down only by one intersegment transition, without interruption.

Now, when Windows unloads the function, it is enough to replace the inserted transition back into the int 3fh stub in the module's input table. The system has no need to look for all the calls of the unloaded function - they were all found during compilation! The module’s “input table” lists all distant functions that the compiler knows about the existence of intersegmental calls (this includes, for example, exported functions and WinMain ), as well as all distant functions that were passed somewhere along the pointer, which means called from anywhere, even from outside the program code (this includes WndProc , EnumFontFamProc and other callback functions).

Instead of pointers to distant functions, a pointer to a stub is passed everywhere; therefore, the addresses received from GetWindowLong(GWL_WNDPROC) and similar calls also point to the stub, and not to the function body. Even GetProcAddress tricky, and instead of the address of the function, it returns the address of its stub in the DLL input table. (In Win32, only the DLL entry analogue of the “input table” remained, called the “export table”.) Static intermodule calls (calls to functions imported from a DLL) are resolved using the same GetProcAddress , and therefore exactly the same cause a stub . In any case, it turns out that when unloading a function, it is enough to fix the stub, and you do not need to touch the calling code itself.

All this wisdom with relocatable code segments came to Windows "inherited" from the overlay linker for DOS. They say, first the whole scheme - exactly in this form - appeared in the Zortech C compiler, and then in Microsoft C. When the executable file format for Windows was created, the existing overlay format for DOS was taken as the basis.

But how does Windows choose which of the segments to unload? It would be risky to choose at random - we can get into the code that has just been executed, and which will have to be immediately loaded back. Therefore, Windows uses something like “accessed bits” for code segments: knowing that all intersegmental function calls pass through its stub, they thought up to insert (before int 3fh or jmp replacing it) the sar byte ptr cs:[xxx], 1 instruction sar byte ptr cs:[xxx], 1 which resets the byte-counter from 1 to 0 with each function call. This instruction just takes five bytes: you can save the existing format of the executable file, and load the int 3fh through one, interspersed with the instruction counter.

The values of the counters for all code segments are initialized at 1, and every 250 ms Windows walks around all modules, collects updated values, and reorders the code segments in its LRU list. Appeals to data segments can be traced without any tricks: all such references are marked by an explicit call to GlobalLock or similar functions. So when it comes time to unload a segment in order to free up memory - Windows will try to unload the segment to which it has not been accessed for the longest: either the code segment whose counter has not been reset to 0 for the longest, or the data segment that has not been for the longest was fixed.

Advertisements Windows 1.0-2.1 taken on GUIdebook

Source: https://habr.com/ru/post/154713/

All Articles

Real-time Windows Memory Management

Data management

Code management

More articles: