Improved UPX compression ratio

From the translator:

Under the "hood" is the translation of a small but extremely useful text file "% UPX_SOURCE% \ doc \ filter.txt". In the above path, UPX_SOURCE refers to the file path to the source codes for UPX version 3.91.

The document describes a rather important aspect of UPX operation called “filtering” and it is crucial to understand how it works when analyzing UPX-packed files. Everything described about UPX also applies to other packers.
')
The main goal of the translation is an attempt to help those programmers who write static unpackers of executable files. In other words, this information will be useful for practicing reverse-engineers. By static unpacker, I mean a program which, when submitted to a compressed or packed executable file, analyzes and creates an output file, as if it was created by a compiler. A special feature of this type of unpackers is that it works solely on the knowledge of the structure of protection or the packing of a file, without the use of "dump dump", "restoration of imports" and other types of "cheating".

Understanding the filtering process helps in studying packed files, for example, using UPX, RLPack, etc. In the packed file, you can find code where some “magic actions” are done with the machine. transition instructions bytes 0xE8, 0xE9, etc. This "magic" is precisely "filtering". It aims to improve the compression ratio of the executable file.

Also, knowing how filtering works can save a specialist's time in very difficult situations. It so happens that it is completely impossible to get a piece of code with filtering for a reasonable time, for example when working with polymorphics or with files where code virtualization is applied. Then the knowledge of how filtering works will allow solving the problem of writing a defiltration code without having an exact original piece of code.

Go…

This document explains the concept of “filtering” in UPX. Basically, filtering is data pre-processing, which can improve the file compression ratio using UPX.

Currently, UPX filters use a method based on one very specific algorithm. It is well suited for i386 executables. In UPX, it is known as the “naive” implementation. There is also a “clever” method, which is only suitable for 32-bit executable files and was first implemented in UPX.

Let's take an example and consider a code fragment (this is a place from a 32-bit file):

00025970: E877410600 calln FatalError 00025975: 8B414C mov eax,[ecx+4C] 00025978: 85C0 test eax,eax 0002597A: 7419 je file:00025995 0002597C: 85F6 test esi,esi 0002597E: 7504 jne file:00025984 00025980: 89C6 mov esi,eax 00025982: EB11 jmps file:00025995 00025984: 39C6 cmp esi,eax 00025986: 740D je file:00025995 00025988: 83C4F4 add (d) esp,F4 0002598B: 68A0A91608 push 0816A9A0 00025990: E857410600 calln FatalError 00025995: FF45F4 inc [ebp-0C]

In the above code snippet, you might notice two CALL instructions calling FatalError. You may have guessed that the compression ratio would be better if the compressor “engine” finds more sequences of repeating lines. In our case, the engine has the following two bytes of sequences:

 E877 410600 8B and E857 410600 FF.

Therefore, it can find 3-byte matches.

Now apply the trick. For the i386 architecture, near calls (“near calls”) are encoded as 0xE8, followed by a 32-bit relative offset to the transition address. Now let's see what happens if the value of the call position is added to the offset value:

 0x64177 + 0x25970 = 0x89AE7 0x64157 + 0x25990 = 0x89AE7 E8 E79A0800 8B E8 E79A0800 FF

Now the “engine” of the compressor finds 5-byte matches. This allows us to save 2 bytes of compressible data. Not bad.

This is the main idea of the “naive” implementation method. All you have to do is use the “filter” method before compression and “unfilter” after decompression. Just go to the memory by finding the 0xE8 bytes and process the next 4 bytes as stated above.

Of course, there are several possibilities where this scheme can be improved. First, not only CALLs can be processed, but near jmps (0xE9 + 32-bit offset) works the same way.

The second improvement may be, if we limit this filtering only to the area occupied by the actual code, there is no point in processing the main data.

Another improvement would be if you change the order of the 32-bit offset bytes. Why? Here is another CALL that follows in the snippet above:

 000261FA: E8C9390600 calln ErrorF 0x639C9 + 0x261FA = 0x89BC3 E8 C39B 0800   E8 E79A 0800

You may notice that these two functions are close enough to each other, but the compressor is not able to use this information (2-byte matches, as a rule, are not used) if the contractor bytes of displacements are rotated. In this case:

 E8 0008 9AE7 E8 0008 9BC3

Thus, the "engine" of the compressor is looking for such 3-byte matches. This is a good improvement, now in the "engine" are also used the coincidence of nearby displacements.

That's good, but what happens when we find a 'fake' CALL? In other words, such a 0xE8, which is part of another instruction? For example:

 0002A3B1: C745 E8 00000000 mov [ebp-18],00000000

Then in this case, these wonderful 0x00 bytes will be overwritten by a few less compressible data. This is a disadvantageous “naive” implementation.

Let's get smarter and try to detect and process only “valid” CALLs. UPX uses a simple method to search for these CALLs. We simply check the addressing (destination) of these CALLs within a certain area as well as the CALLs themselves (therefore, the above code is false, but this generally helps). The best method would be to disassemble the code, we will be happy to help :)

But this is only part of the job. We cannot simply process one CALL, then boil the other, the filtering process needs some information to be able to roll back the filtering.

UPX uses the following idea, which works well. First, we assume that the size of the area to be filtered is less than 16 MB. UPX then scans this area and saves the bytes that follow 0xE8 bytes. If we are lucky, there are bytes that do not follow the next 0xE8. These bytes are our candidates for use as markers.

Remember that we assumed the size of the scan area was less than 16 MB? Well, this means that we process a real CALL, as a result there will be an offset also less than 0x00FFFFFF. Therefore, MSB is always 0x00. What a great place to store our marker. Of course, we reverse the order of bytes in the resulting offset, so this marker will appear only after 0xE8 bytes and not in 4 bytes after it.

That's all! Just work with the memory area, identifying "real" CALLs, and use this method to mark them. After that, the defiltration task is greatly simplified; they just look for the 0xE8 + marker sequence and will filter if they found it. It's smart, right? :)

In truth, this is not so easy in UPX. An additional parameter (“add_value”) can be used, which makes things a bit more complicated (for example, an unserviceable marker can be found because of some overflow during addition).

And the algorithm is generally optimized for ease of filtering (as short and fast as possible to assemble, see stub / macros.ash), which makes the filtering process less complicated (fcto_ml.ch, fcto_ml2.ch, filteri.cpp).

As it can be seen in filteri.cpp, there are many variants of these filtering implementations: - native / clever, calls / jumps / calls & jumps, with offset rotation / or without - in the amount of about 18 different filters (and 9 other options for 16 -bit programs).

You can choose one of them using the command line option "--filter =" or try most of them with "--all-filters". Or just let UPX use the default we set for executable formats.

From the translator:
I am pleased to review the bug reports about errors in my personal message list:
* Related to translation, spelling or grammar;
* Technical nature allowed in the translation, make sure that the original is all right!

Source: https://habr.com/ru/post/208446/

All Articles

Improved UPX compression ratio

More articles: