Development of your PE packer

Today we will talk about the development of its own packer of executable files under Windows in C ++.

A long time ago, when Windows XP was not there yet, in search of information about packers, we climbed into the wilds of the source code of the then young UPX. But either acetylcholine in our brains was synthesized less than what was needed, or UPX was already very boring at that time - well, in general, we didn’t extract anything from those samples. Matt Pitrek, and he helped more. Now with info much easier. Almost everything is there. Even a completely normal bank banking can be downloaded ( Zeus 2.0.8.9 ). Yes, what is already there, Windows for a long time in public ( Windows 2000 ).
There is information about packers, too, but mostly research, directly related to the development from the wrong side from which we would like. An excellent example of this is the article “About Packers for the Last Time” in two parts , written by well-known gurus Volodya and NEOx.
We, in turn, will try to give the most specific and consistent information about the development of the simplest but easily modified PE packer.

Algorithm

Here we have, for example, notepad.exe. In its usual 32-bit form, it weighs about 60 KB. We want to significantly reduce it, while retaining all its functionality. What should be our actions? Well, for starters, we will read our file from the first to the last baytik in the array. Now we can do anything with it. And we like to squeeze it. We take it and give it to some simple compressor, as a result of which we get an array not in 60 Kb, but, for example, in 20 Kb. This is cool, but in a compressed form, the image of our “Notepad” is just a set of bytes with high entropy, it is not an executable, and it cannot be launched by writing to a file and clicking. For an array with a compressed image, we need a medium (boot loader), a very small executable file to which we attach our array and which will release it and run it. We write the media, compile, and then append to our end our compressed Notepad. Accordingly, if the file obtained as a result of all actions (the size of which is slightly larger than that of just a compressed Notepad) is launched, it will find a packed image in itself, unpack it, parse its structure and launch it.
As you can see, we have to automate a not too complicated process. You just need to write two programs, a loader and, in fact, a packer.
The algorithm of the packer:

read PE file into an array;
compress the array with some kind of lossless compression algorithm;
in accordance with the PE format, add a compressed array to the template loader.

Loader operation algorithm:

find at the end of an array with a compressed PE file;
unclench it;
parse the headers of the PE file, arrange all rights, allocate memory and eventually start.

We will start the development from the loader, since it is they who will later be manipulated by the packer.
')

Loader

So, the first thing that our loader has to do is to find in its body the address of the array with the compressed image of the PE file. The search methods depend on how the packer implanted this array into the loader.
For example, if he simply added a new data section, the search would look like this:

Search for a compressed image in the last section

//    PE-    HMODULE hModule = GetModuleHandle(NULL); PIMAGE_DOS_HEADER pDosHeader = (PIMAGE_DOS_HEADER)hModule; PIMAGE_NT_HEADERS pNTHeaders = MakePtr(PIMAGE_NT_HEADERS,hModule,pDosHeader->e_lfanew); PIMAGE_SECTION_HEADER pSections = IMAGE_FIRST_SECTION(pNTHeaders); // ,      PIMAGE_SECTION_HEADER pLastSection = &pSections[pNTHeaders->FileHeader.NumberOfSections - 1]; // ,   LPBYTE pbPackedImage = MakePtr(LPBYTE, hModule, pLastSection->VirtualAddress); //   DWORD dwPackedImageSize = pLastSection->SizeOfRawData;

But, in our opinion, this code in the loader can be sacrificed. In general, everything that a packer can do, even if it does. The address of the image in the address space of the loader can be calculated in advance when packing, and then just enter in the right place. For this, we leave two tags in our program:

 LPBYTE pbPackedImage = (LPBYTE) 0xDEADBEEF; DWORD dwPackedImageSize = 0xBEEFCACE;

When the packer implants the array in a compressed manner into the loader, it will go through a signature search through the loader body and replace 0xDEADBEEF with the array address, and 0xBEEFCACE with its size.

Now that we have decided how to search for an address, we can choose a ready-made implementation of the compression algorithm for use in our packer.
A good option is to use aplib , a small library with a neat and very compact code that implements compression based on the Lempel-Ziv algorithm (LZ). And we would definitely choose it on any other day, but today we have the mood for an even simpler and more compact solution - the built-in Windows functions!

Since XP, our favorite ntdll.dll has started exporting two great features:

 NTSTATUS RtlCompressBuffer( __in USHORT CompressionFormatAndEngine, __in PUCHAR UncompressedBuffer, __in ULONG UncompressedBufferSize, __out PUCHAR CompressedBuffer, __in ULONG CompressedBufferSize, __in ULONG UncompressedChunkSize, __out PULONG FinalCompressedSize, __in PVOID WorkSpace ); NTSTATUS RtlDecompressBuffer( __in USHORT CompressionFormat, __out PUCHAR UncompressedBuffer, __in ULONG UncompressedBufferSize, __in PUCHAR CompressedBuffer, __in ULONG CompressedBufferSize, __out PULONG FinalUncompressedSize );

Their names speak for themselves - one function for compression, the other for decompression. Of course, if we were developing a really serious product, we would not touch these functions, as there were still computers with Windows 2000, and even with NT 4.0,;) but for our modest goals, RtlCompressBuffer \ RtlDecompressBuffer is fine.
There are no these functions in the Platform SDK headers, we cannot statically link them, so we have to use GetProcAddress:

Determining the address of the function to unpack

 //   RtlDecompressBuffer      DWORD (__stdcall *RtlDecompressBuffer)(ULONG,PVOID,ULONG,PVOID,ULONG,PULONG); //    RtlDecompressBuffer  ntdll.dll (FARPROC&)RtlDecompressBuffer = GetProcAddress(LoadLibrary("ntdll.dll"), "RtlDecompressBuffer" );

When you have something to unpack and have something to unpack, you can finally do it. To do this, we will allocate memory with a margin (since we do not know the volume of the unpacked file) and run the function defined above:

 DWORD dwImageSize = 0; DWORD dwImageTempSize = dwPackedImageSize * 15; //      LPVOID pbImage = VirtualAlloc( NULL, dwImageTempSize, MEM_COMMIT, PAGE_READWRITE ); //  RtlDecompressBuffer(COMPRESSION_FORMAT_LZNT1, pbImage, dwImageTempSize, pbPackedImage, dwPackedImageSize, &dwImageSize);

The parameter COMPRESSION_FORMAT_LZNT1 means that we want to use classic LZ compression. The function is able to compress with other algorithms , but this is enough for us.
Now we have in memory (pbImage) a raw image of a PE file. To run it, you need to carry out a series of manipulations, which are usually done by native Windows PE-loader. We will reduce the list to the most-needed:

Place the beginning of the image (heders) to the address specified in the Image Base field of the optional header (OPTIONAL_HEADER).
Place the sections of the PE file to the addresses specified in the table of sections.
Parse the import table, find all the addresses of the functions and enter them in the corresponding cells.

Naturally, the standard PE loader performs a whole bunch of other actions, and by the fact that we brush them off, we limit the compatibility of our packer with some PE files. But for the absolute majority, these actions will be enough - it is possible not to fix relocs, fixes, and other rare and nasty garbage.
If suddenly you want serious compatibility, you either write a cool PE loader yourself, or find the most complete implementation on the Web - we were too lazy to write our own, and we took advantage of the gr8 works from hellknights, throwing out everything that we didn't understand. ;) Even in a reduced form, the function of the PE-loader is a hundred lines, no less, so here we will only give its prototype (the full code is on the disk):

 HMODULE LoadExecutable (LPBYTE image, DWORD* AddressOfEntryPoint)

It takes a pointer to our unpacked image and returns the handle of the loaded module (equivalent to the address to which the PE file is loaded) and the address of the entry point (according to the pointer AddressOfEntryPoint). This function does everything to correctly place the image in memory, but not everything, so that you can finally transfer control to it.
The fact is that the system still does not know anything about the module loaded by us. If we call the entry point right now, from which the compressed program starts execution, there may be a number of problems. The program will work, but crooked.
For example, GetModuleHandle (NULL) will return the Image Base of the loader module, rather than the unpacked program. The FindResource and LoadResource functions will rummage through our bootloader, in which there are no resources at all. There may be more specific glitches. To prevent this from happening, you need to update the information in the system structures of the process whenever possible, replacing the addresses of the loader module with the addresses of the loaded module.
First of all, you need to fix the PEB (Process Enviroment Block), in which the old Image Base is specified. The PEB address is very easy to get; in the user interface, it is always at the offset 0x30 in the FS segment.

 PPEB Peb; __asm { push eax mov eax, FS:[0x30]; mov Peb, eax pop eax } // hModule —      PE- Peb->ImageBaseAddress = hModule;

It also does not hurt to fix the module lists in the LDR_DATA structure referenced by PEB. In total there are three lists:

InLoadOrderModuleList - a list of modules in boot order;
InMemoryOrderModuleList - a list of modules in order of their location in memory;
InInitializationOrderModuleList - a list of modules in order of initialization.

We need to find in each list the address of our bootloader and replace it with the address of the loaded module. Somehow:

 //    ,   //       PLDR_DATA_TABLE_ENTRY pLdrEntry = (PLDR_DATA_TABLE_ENTRY)(Peb->Ldr->ModuleListLoadOrder.Flink); pLdrEntry->DllBase = hModule; ...

Now you can safely call the entry point of the loaded module. It will function as if it were called in the most usual way.

 LPVOID entry = (LPVOID)( (DWORD)hModule + AddressOfEntryPoint ); __asm call entry;

AddressOfEntryPoint is a relative virtual address (RVA, Relative Virtual Address) of the entry point taken from the optional header in the LoadExecutable function. To get the absolute address, we simply added the address of the base (that is, the newly loaded module) to the RVA.

Reducing the size of the bootloader

If our bootloader is compiled and compiled in VS 2010 with default flags, then we will get not a two-kilobyte utility program, but a monster of more than 10 Kb in size. The studio will build there a whole bunch of superfluous, but we need to get it all out.
Therefore, in the compilation properties of the project loader (tab C / C ++) we do the following:

In the "Optimization" section, select "Minimum Size (/ O1)" so that the compiler tries to make all functions more compact.
In the same place, we indicate the priority of size over speed (flag / Os).
In the "Code Generation" section we turn off C ++ exceptions, we do not use them.
We also do not need to check the buffer overflow (/ GS-). This is a good thing, but not in our case.

In the properties of the linker (linker):

Turning off to hell "Manifest." It is big, and because of it, a .rsrc section is created in the loader, which we absolutely do not need. In general, each extra section in a PE file is at least 512 completely unnecessary bytes, thanks to the alignment.
Disable the creation of debug information.
We climb in the tab "Advanced". Turn off the "Inclusion of randomness in the base address" (/ DYNAMICBASE: NO), otherwise the linker will create a relocation section (.reloc).
Specify the base address. Choose some non-standard higher, for example 0x02000000. This value will be returned by GetModuleHandle (NULL) in the loader. You can even zakardkodit.
We specify our entry point, but not the CRT shny: / ENTRY: WinMain. In general, we are accustomed to do this with the pragma directive directly from the code, but since we’ve gotten into properties, it’s possible here.

The remaining settings for the linker are set directly from the code:

 #pragma comment(linker,"/MERGE:.rdata=.text")

Here we have combined the .rdata section, which contains read-only data (rows, import table, etc.) with the .text code section. If we used global variables, we would also need to combine the .data section with the code.

 #pragma comment(linker,"/MERGE:.data=.text") //    .data    , //        #pragma comment(linker,"/SECTION:.text,EWR")

All of the above is enough to get a loader size of 1.5 KB.

Packer

It remains for us to develop a console utility that will compress the files given to it and attach it to the loader. The first thing that it should do according to the algorithm described at the beginning of the article is to read the file into an array. The task with which the student will cope:

 HANDLE hFile = CreateFile(argv[1], GENERIC_READ,FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL); DWORD dwImageSize = GetFileSize(hFile, 0); LPBYTE lpImage = new BYTE[dwImageSize], lpCompressedImage = new BYTE[dwImageSize]; DWORD dwReaded; ReadFile(hFile, lpImage, dwImageSize, &dwReaded, 0); CloseHandle(hFile);

Next, our packer must compress the resulting file. We will not check if this is really a PE file, if its headers are correct, and so on. We leave everything on the user's conscience, immediately compress it. To do this, we use the RtlCompressBuffer and RtlGetCompressionWorkSpaceSize functions. We have already described the first one — it compresses the buffer, while the second is needed to calculate the amount of memory needed for the compressing engine to work. We assume that we have already dynamically connected both functions (as in the bootloader), all that remains is to run them:

 DWORD format = COMPRESSION_FORMAT_LZNT1|COMPRESSION_ENGINE_STANDARD; DWORD dwCompressedSize, dwBufferWsSize, dwFragmentWsSize; RtlGetCompressionWorkSpaceSize(format, &dwBufferWsSize, &dwFragmentWsSize); LPBYTE workspace = new BYTE [dwBufferWsSize]; RtlCompressBuffer(format , //     lpImage, //    dwImageSize, //   lpCompressedImage, //    dwImageSize, //   4096, //  ,   &dwCompressedSize, //       workspace); //

As a result, we have a compressed buffer and its size, you can screw them to the loader. To do this, you first need to embed the compiled code of our loader into the packer. The most convenient way to put it into a program is to use the bin2h utility. It will envelope any binary into a convenient shared header, all the data in it will look something like this:

 unsigned int loader_size=1536; unsigned char loader[] = { 0x4d,0x5a,0x00,0x00,0x01,0x00,0x00, ...

Creating a header with bin2h can be automated

We feed her a file with our loader and get everything you need for further distortions. Now, if we follow the algorithm described at the beginning of the article, we must attach a compressed image to the loader. Here we will have to remember the 90s and our vir-maker past;). The fact is that embedding data or code into a third-party PE file is a purely viral topic. The introduction is organized in a large number of different ways, but the most trivial and popular is the expansion of the last section or the addition of its own. Adding, in our opinion, is fraught with losses during alignment, therefore, in order to embed a compressed image into our bootloader, we will expand to it (the bootloader) the last section. Rather, the only section - we got rid of all the excess. ;)
The action algorithm will be as follows:

We find the only section (.text) in the loader.
Change its physical size, that is, the size of the disk (SizeOfRawData). It should be equal to the sum of the old size and the size of the compressed image and at the same time it is aligned in accordance with the file alignment (FileAlignment).
We change the virtual size of the memory (Misc.VirtualSize) by adding to it the size of the compressed image.
We change the size of the entire boot image (OptionalHeader.SizeOfImage) using the ancient formula [virtual size of the last section] + [virtual address of the last section], not forgetting to align the value using FileAlignment.
Copy the compressed image to the end of the section.

There is a little trick. The fact is that our studio makes the virtual size (Misc.VirtualSize) of the section with the code (.text) equal to the real unaligned size of the code, that is, it indicates the size is smaller than the physical one. So, there is a chance to save up to 511 bytes.
That is, so we would write the data after the heap of alignment zeros, and knowing the chip, you can write over these zeros.
Here is how all our thoughts will look like in code:

Extension code section

 //          PBYTE pbLoaderCopy = new BYTE[simple_packer_size + dwCompressedSize + 0x1000]; memcpy(pbLoaderCopy, (LPBYTE)&simple_packer, simple_packer_size); //    PIMAGE_DOS_HEADER dos = (PIMAGE_DOS_HEADER)pbLoaderCopy; PIMAGE_NT_HEADERS nt = MakePtr(PIMAGE_NT_HEADERS, pbLoaderCopy, dos->e_lfanew); //   PIMAGE_SECTION_HEADER text = IMAGE_FIRST_SECTION(nt); //         memcpy(&pbLoaderCopy[text->PointerToRawData + text->Misc.VirtualSize], lpCompressedImage, dwCompressedSize); //   ,    Misc.VirtualSize text->SizeOfRawData = ALIGN(text->Misc.VirtualSize + dwCompressedSize, nt->OptionalHeader.FileAlignment); //   (    )  text->Misc.VirtualSize += dwCompressedSize; //    nt->OptionalHeader.SizeOfImage = ALIGN(test->Misc.VirtualSize + test->VirtualAddress, nt->OptionalHeader.FileAlignment); //     DWORD dwNewFileSize = pSections->SizeOfRawData + test->PointerToRawData;

Oh, we almost forgot to replace the tags 0xDEADBEEF and 0xBEEFCACE left in the loader with real values! 0xBEEFCACE is changed to the size of the compressed image, and 0xDEADBEEF to its absolute address. The image address is calculated by the formula [image address] + [virtual section address] + [image offset relative to the section start]. It should be noted that the replacement must be made before updating the value of Misc.VirtualSize, otherwise the resulting file will not work.
Search and replace tags using a very simple loop:

 for (int i = 0; i < simple_packer_size; i++) if (*(DWORD*)(&pbLoaderCopy[i]) == 0xBEEFCACE) *(DWORD*)(&pbLoaderCopy[i]) = dwCompressedSize; else if (*(DWORD*)(&pbLoaderCopy[i]) == 0xDEADBEEF) *(DWORD*)(&pbLoaderCopy[i]) = nt->OptionalHeader.ImageBase + text->VirtualAddress + text->Misc.VirtualSize;

That's all. Now we have a packed and ready-to-use file in memory, just save it to disk using the CreateFile / WriteFile functions.

The process of debugging a huge file in OllyDbg

findings

If we compare the compression efficiency of our packer with UPX using notepad.exe as an example - we win about 1 Kb: 46,592 bytes with us against 48,128 for UPX. However, our packer is far from perfect. And it is very noticeable.
The fact is that we deliberately ignored such an important thing as the transfer of resources. The resulting file will lose the icon! You have to implement the missing function yourself. Thanks to the knowledge obtained from this material, you will not have any difficulties with this business.

Source for the article .

Our packager squeezed notepad.exe stronger than UPX!

Remake cryptor

Actually, our package differs very little from cryptor: the lack of an encryption function and anti-emulation techniques. The simplest thing you can do on the fly is to add the xor of the entire image immediately after unpacking in the bootloader. But for antivirus emulators to choke, that's not enough. It is necessary to somehow complicate the task. For example, do not prescribe the xor key in the body of the loader. That is, the loader will not know what key he needs to decrypt the code, he will go through it in the framework defined by us. This may take some time that the user has, in contrast to the antivirus.
Also, the key can be made dependent on some non-emulated function or structure. Only they still need to be found.
So that the bootloader code does not burn signature, you can attach any advanced virus engines to the packer to generate garbage and to modify the code in every way, the benefit of which is in bulk in the Web.

After executing the LoadExecutable function in the loader, it would be nice to free the memory allocated for unpacking - it will not be useful to us anymore.