
In the depths of the Win32 API there
is a SecureZeroMemory function with a very concise description, from which it follows that this function overwrites the memory area with zeros and is designed so that the compiler never removes the call to this function when optimizing the code. It also says that you should use this function to overwrite the memory previously used to store passwords and cryptographic keys.
One question remains - why? You can find lengthy arguments about the risk of writing a program's memory to a paging file, a hibernate file, or a crash dump where an attacker can find it. It looks like paranoia - not every attacker has the ability to lay hands on these files.
In fact, there are much more opportunities to access data that the program has forgotten to overwrite - sometimes you do not even need to have access to the machine. Next we look at an example, and everyone will decide how justified paranoia is.
')
All examples will be in pseudocode, suspiciously similar to C ++. There will be a lot of letters and not very clean code, then it will become clear that in a cleaner code the situation is not much better.
So. In the far-far function, we get the encryption key, password or credit card number (hereinafter - just a secret), use it and do not overwrite it:
{ const int secretLength = 1024; WCHAR secret[secretLength] = {}; obtainSecret( secret, secretLength ); processWithSecret( what, secret, secretLength ); }
In another, completely unrelated to the previous function, our program instance requests from another instance a file with some name. For this, RPC is used - an ancient technology like dinosaurs, which is present on many platforms and widely used by Windows for implementing interprocess and machine-to-machine interaction.
Usually, to use RPC, you need to write an IDL interface description. It will describe the method of approximately the following form:
here the second parameter has a special type, which makes it possible to transmit data streams of arbitrary length. The first parameter is an array of characters under the file name.
This description is compiled by the MIDL compiler, we get a header file (.h) with the function
error_status_t rpcRetrieveFile ( handle_t IDL_handle, const WCHAR fileName[1024], BYTE_PIPE filePipe);
here MIDL added the service parameter, and the second and third parameters are the same as in the previous description.
Call this function:
void retrieveFile( handle_t binding ) { WCHAR remoteFileName[MAX_FILE_PATH]; retrieveFileName( remoteFileName, MAX_FILE_PATH ); CBytePipeImplementation pipe; rpcRetrieveFile( binding, remoteFileName, pipe ); }
Everything is fine - retrieveFileName () receives a string no longer than MAX_FILE_PATH − 1, terminated with a null character (the zero symbol is not forgotten), the called party receives the string and works with it - gets the full path to the file, opens it and sends data from it.
Everyone is optimistic, several product releases are being made with this code, but no one has noticed the elephant yet. The elephant is here. From a C ++ perspective, the function parameter
const WCHAR fileName[1024]
it is not an array, but a pointer to the first element of the array. The rpcRetrieveFile () function is just an interlayer that is generated by the same MIDL. It packs all its parameters and always calls the same WinAPI NdrClientCall2 () function, the meaning of which is “Windows, please execute an RPC call with these parameters”, and passes the parameters with a list of the NdrClientCall2 () function. One of the first parameters is the format string generated by MIDL as described in IDL. Very similar to the good old printf ().
NdrClientCall2 () carefully looks at the formatting string received and packages the parameters for transfer to the other side (this is called marshalling). Next to each parameter its type is indicated - each parameter is packed depending on the type. In our case, the fileName parameter is the address of the first element of the array and the type is “an array of 1024 WCHAR elements”.
Now in the code we meet two calls in a row:
processWithSecret( whatever ); retrieveFile( binding );
The processWithSecret () function eats away 2 kilobytes for storing a secret on the stack, and at the end forgets about them. Next, the retrieveFile () function is called, it retrieves the file name with a length of 18 characters (18 characters + terminating null - only 19, that is, 38 bytes). The file name is again stored on the stack and most likely it will be the exact same memory area that was used under the secret in the first function.
Then a remote call occurs and the packing function faithfully packs the entire array (not 38 bytes, but 2048) into a packet, and this packet is then transmitted over the network.
EXTREMELY UNEXPECTEDLY
The secret is transmitted over the network. The program did not even plan to ever transmit the secret over the network, but it is transmitted. Such a defect can be much more convenient in “use” than even viewing the paging file. Who is paranoid now?
The example above looks rather complicated. Here's a similar code that you can try on codepad.org
const int bufferSize = 32; void first() { char buffer[bufferSize]; memset( buffer, 'A', sizeof( buffer ) ); } void second() { char buffer[bufferSize]; memset( buffer, 'B', bufferSize / 2 ); printf( "%s", buffer ); } int main() { first(); second(); }
It has an indefinite behavior. At the time of writing the post, the result of the work is a string of 16 characters 'B' and 16 characters 'A'.
Now is the time for waving pitchforks and torches and angry cries that no one in his mind uses ordinary arrays, that you need to use std :: vector, std :: string and the UniversalVse class that work "correctly" with memory, and sacred wars on no less than 9 thousand comments.
In fact, it would not have helped here - the RPC packing function in the depths would still read more data than the calling code wrote there. As a result, the data would be read at the nearest addresses or (in some cases) would fail if the memory was incorrectly accessed. At these closest addresses, the data could not be transmitted over the network again.
Who is to blame? As usual, the developer is to blame - he misunderstood how the rpcRetrieveFile () function works with the received parameters. The result is undefined behavior, which in this case leads to uncontrolled data transmission over the network. This is corrected by either changing the RPC interface and editing the code on both sides, or using a sufficiently large array and overwriting it completely before copying the parameter to it.
In this situation, SecureZeroMemory () would help - if the first function had overwritten the secret before completion, an error in the second one would at least lead to the transfer of the overwritten array. So harder to get a Darwin Award.
Dmitry Mescheryakov,
product department for developers