
Imagine that your application brazenly steal and spread to the network. And there’s no way to understand which of the most honest customers are leaking. The way out is clear: it is enough just to issue applications to clients with different versions and determine the leakage by version.
But what if the situation became more complicated, and your program this time is stolen by a hacker, and he will take care to clear out all the traces that identify the program. For such a case, universal methods have been developed for introducing secret data into the application, the so-called watermarks or watermarks (tracing from the English watermark).
We will only consider Watermark here and the purpose of which is not to be removed under any circumstances, so that the creator of the application can read them after any attacks by a potential intruder, and users of the application did not guess about them. There are other types of watermarks, designed, for example, to track changes in the application, such hidden cheksummy, and they must also be difficult to remove, but that’s another story.
')
Best Watermark
A great way to integrate Watermark into an application is to fantasize and come up with a place where no hacker will search for your watermark: just be afraid to drown in tons of code and drop this case. If you are developing a visual application, then nothing prevents you from changing the color of a pixel hidden in the corner of a button in a God-forgotten dialog box. The color of the pixel will be watermark. Unfortunately, such a case is not always acceptable and it is more convenient for developers to use some kind of universal solution for implementing a watermark in an already compiled application. Traditionally, this function is embedded in obfuscators.
How will a hacker fight Watermark?
So, we need a universal solution for inserting watermarks into the application. Versatility imposes serious limitations because we are not going to write artificial intelligence, which will determine which pixel can be tinted, for example. At first glance, you might think: “So there are so many places where you can write some data unnoticed. Choose any! ”But let's not hurry.
Imagine that a hacker is a smart hacker and he knows for sure that there are watermarks in the application and they need to be found and neutralized, leaving the application working. Suppose, after another attempt to remove a watermark, he can somehow find out whether the developers were able to read the watermark or not (and we don’t feel sorry). Successively, the hacker will perform the following attacks:
- Disassemble / assemble the application. Here all methods breaking the watermark into different secret places of the file will be broken off, in the hope that no one will guess. Yes, it will not guess, but will delete.
- Pack / Obfuscate application. All methods that do not require real execution of the program will be broken off here, as it is full of protectors, completely reassembling all the headers and erasing all the original data, and only large anti-virus companies have universal unpackers and we will not write this.
- An obfuscator will be involved, actively interfering with the intervention of the debager in the course of the program execution. So connect to the application and put breakpoints in it, etc. will not work.
- The hacker will find our watermark (it is sure to find it!) And hangs a bunch of his watermarks to overwrite the existing ones.
- The hacker will insert his watermark into his typical application consisting of a thousand knos and will see through our algorithm.
- The hacker will proceed to manual disassembly and, as he has been doing this for many years, he will see through the algorithm. I propose to think that there is no panacea for this, they are such hackers.
- A hacker will write a static analyzer that searches all watermarks and tears them out.
As you can see from the last point of protection is not, but I would like to make it so that few hackers get to it.
How to be?
And what to do? Is there a cure-all? There is no final answer to this question, this problem is very voluminous and has been studied for a long time. For example, look at the following review of academic studies on this topic:
citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.103.8892&rep=rep1&type=pdf .
Nevertheless, there are two places that are relatively stable to changes that the application mercilessly exposes a hacker.
- First place: application data. Of course, not strings, since any obfuscator will encrypt them and in memory you will see them only for a moment.
- Second place: application code logic. But note: it is the logic, since the hacker obfuscators will distort the code itself.
At this point, it becomes clear why watermarkers come bundled with obfuscators: there are few other types of programs that interfere with the application code so much.
Strategy
From the above, about hacker obfuscators and protectors (at least from hacker attack number two), it is clear that in order to read the watermark, you will have to start the application and somehow connect to it during operation. We assume that a hacker can hang a protector, prohibiting debug, which means you need to touch the application very gently to extract the watermark. The following options come to mind.
- Insert code that generates some globally accessible object, a pipe, for example, and send a secret message on it. Many applications have units of similar objects and the hacker will have no difficulty in figuring out which of them is a watermark (remember: the hacker must have already figured out our algorithm and knows where to look!).
- Insert code that generates a hidden file somewhere in the system, or an entry in the registry, or something else. It has the disadvantages of the previous method, plus it is easy to understand that adding such logic to the application is fraught with all sorts of bad consequences.
- Insert the code that generates an array in the memory of the application that is not deleted by the garbage collector, which we will then find by scanning the application's memory. Moreover, many objects can act as an array (MemoryWriter, for example), because when blindly reading from memory no one can figure out what it was originally. So this is our method.
In general, the algorithm is simple: we add static “arrays” to random types (recall that these are not necessarily arrays) select random methods and insert into them a code that generates a watermark message in the “array” (we make several of them so that the hacker had to sweat, finding all of them). But you need to solve several important problems.
- The code that generates the array is dead, since the array is not used after that and therefore statically, that is, without starting the program, can be detected by the hacker. The array must be revived.
- Methods with a watermark are detected statistically by the presence of a large number of constant assignments to an array. Solution: fill some pieces of watermark with the help of cycles.
- If you insert the code that generates a watermark into a performance-critical method, the speed of the application may drop significantly. Solution: insert watermark only in relatively large methods.
- It is necessary to ensure that in blind reading of memory, we will not accept any random block of data as a watermark.
- Somehow it is necessary to counteract the attempts of the hacker to rub the watermark by imposing a hacker watermark.
- If a hacker finds one watermark, he must be made so that he cannot litter the application with pseudo-watermarks with garbage text.
A lot of problems, let's get started.
Message failure
In our watermark coder / decoder, we store an array of static signatures sig [i] (one hundred pieces, for example) - random arrays of 100 bytes each. They do not change from version to version and are not a secret. We rely on the SHA256 hash algorithm (this is like MD5 and SHA1 is only better), it will ensure that there are no collisions in the memory and solve a few more problems. I recall, it is not known fast algorithms in order to HASH hash to find the string STR such that SHA256 (STR) = HASH. The second pillar on which the fairmark will hold is the Rijndael (AES) algorithm. Let me remind you, if KEY is a secret key for encryption / decryption, then unknown effective algorithms like not knowing KEY from the AES (STR, KEY) line get STR.
From the developer of the protected application you will need a password that is secret and should be carefully stored in a safe. When the user enters the password PASS and the message MSG to create a watermark, we generate sequentially:
- Individual signatures: indSig [i] = SHA256 (sig [i] + PASS + salt), where sig [i] is the i-th signature and salt is a little salt to strengthen the password;
- We generate a short message signature: MsgSig = SHA256 (PASS + salt) [0..3] (only the first four bytes of the hash).
- We calculate the individual key KEY [i] = SHA256 (indSig [i] + PASS + salt);
- For each individual signature indSig [i] prepare the message encrypted using the AES algorithm, indMsg [i] = AES (MsgSig + MSG, KEY [i]);
- Generate Watermark [i] = indSig [i] + <indMsg [i] .Length> watermarks + indMsg [i] watermarks, where <indMsg [i] .Length> are four bytes of indMsg [i].
Everything! Watermark Watermark [i] is ready to use at randomly selected program locations.
To read watermarks from memory, we also generate indSig [i] with a password and look for the memory of the watermark process indSig [i]. Then, using the key KEY [i] = SHA256 (indSig [i] + PASS + salt), we decrypt the secret message following indSig [i]. Do not forget to check that the first four bytes of the message are equal to MsgSig = SHA256 (PASS + salt) [0..3].
The SHA256 algorithm with its ability to withstand collisions ensures that a random data block will not be mistaken for a watermark. Without knowing the password, the hacker will not be able to rub the watermark, which is again guaranteed by the anti-collisionality of SHA256. Having found one watermark without having a password, the hacker cannot create a trash message or even read a watermark message, which is guaranteed by the Rijndael algorithm and a short message signature. According to the watermark found, the hacker will not be able to find the remaining watermarks without a password, for which individual signatures and keys are generated. As a result, we solved problems 4, 5 and 6, described earlier.
Array Renewal
In order to stop watermark arrays from being dead data, we involve them in calculations performed in random methods. In this case, even after finding a watermark, it cannot simply be deleted without affecting the performance of the application. For example, suppose somewhere in a certain method was:
return baseOffset + 55;
Let the watermark array MyClass.WatermarkArr contain the number 55 in the cell MyClass.WatermarkArr [42]. Then the code above will turn into:
return baseOffset + MyClass.WatermarkArr[42];
Everything looks good, but who can guarantee that at the time the code is executed, the watermark array is already created? To figure this out, we are building a control flow method graph. Such a construction involves a lot of difficulties, because the methods are invoked via Wpf, Reflection, using virtual calls, static constructors, and so on. We try to analyze as many cases as possible and reinsure wherever possible, using our own .NET code emulator for this.
Conclusion
Watermark is a very effective method to identify a user and / or a specific assembly. Of course, this method, like any other, has its limitations. In particular, you cannot pre-assemble one distribution kit for all your users (although in the case of SaaS obfuscator this problem is less relevant - you can obfuscate the program on the fly, just before downloading). On the other hand, for large individual products, the use of such protection is more than justified, because users, knowing in advance that they are being watched, will be much more reluctant to transfer your intellectual property to third parties.
Publication author: Dmitry Kosolobov, Appfuscator developer.