I’m doing PHP source recovery from a coded view.
In this article, I will discuss how PHP is currently coding and decoding.
A very brief educational program on the internal structure of the PHP interpreter.
When executing a PHP script, it is parsed and compiled into opcodes of the internal PHP virtual machine.
From each PHP file we get:
- an array of classes: in each class - information about the class, class properties and an array of class methods
- array of functions
- “script body” - code outside classes and functions
For brevity, the entire internal structure of a compiled file, ready for execution, in this article, I call "
opcodes ".
')
The opcodes themselves (operations of the internal PHP virtual machine) inside some function look like this:
[0000] ZEND_INIT_FCALL_BY_NAME -, "defined" -> -
[0001] ZEND_SEND_VAL (61) "MVMMALL", - -> -
[0002] ZEND_DO_FCALL_BY_NAME (1) -, - -> $ _z_var_120
[0003] ZEND_JMPZ $ _z_var_120, # 0008 ->
[0004] ZEND_INIT_FCALL_BY_NAME -, "defined" -> -
[0005] ZEND_SEND_VAL (61) "IN_ADMINCP", - -> -
[0006] ZEND_DO_FCALL_BY_NAME (1) -, - -> $ _z_var_120
[0007] ZEND_JMPNZ $ _z_var_120, # 0009 -> -
[0008] ZEND_EXIT "Access Denied", - -> -
Important point: compiled files are quite different even between sub-versions of the PHP interpreter. This is understandable: I compiled it for myself - and I did it myself.
How encoders work
There are two fundamentally different types of encoders.
The first - work exclusively by means of the language itself. They make the code unreadable using base64 encoding, ziping, various string manipulations, and all eventually use the eval () function. All this is very similar to obfuscators in Javascript. It looks something like this:
eval(base64_decode("DQplcnJvcl9yZXBvcnRpbmcoMCk7DQokcWF6c --- [cut] --- KfQ0KfQ=="));
Such protection is removed very simply, in the most difficult cases - in a few hours. Another major drawback - performance is seriously affected. Therefore, for serious use, such protection is not recommended.
The second type of encoder uses its plugins for the PHP interpreter, which are called loaders (
loader ). In this case, as a rule, not the source code itself is encoded, but the results of its compilation, i.e. internal structures and opcodes. This is a much more serious protection - even if you decode the opcodes themselves, they still need to restore the original PHP code. In addition, in terms of performance, the additional costs of decoding are often offset by savings in compiling the code, i.e. The speed of execution of encoded scripts is often even higher than that of the source code.
While loading the PHP interpreter, the encoder loaders hang their handlers on the functions of loading PHP files, compiling and executing, so that the work with the coded files would be transparent for the interpreter itself.
The main difficulty for encoders is to make opcodes compiled under one version of PHP during encoding work under another version of PHP when decoding. Almost all loader-s of all encoders after decoding make the necessary edits to ensure such compatibility. The main player in this market -
IonCube - at one time made great efforts to solve this problem, and its loaders can correctly execute opcodes from PHP 4.x with PHP 5.x on the fly, and, if possible, even vice versa!
Obfuscation
Also, for additional protection, most encoders make it possible to obfuscate identifiers: the names of variables, the names of functions, classes. This process is usually one-way - like hashing, and, as a result, the result is often names with non-printable characters that work fine, but which cannot be used directly in decompiled texts. For example, how to write a function with the name ... * I dictate by bytes * 0x0D, 0x07, 0x03, 0x0B, 0x02, 0x04, 0x06?
Separately, attention is paid to obfuscated names to work correctly. For example, the code calls the function checkLicense — the loader obfusts the name on the fly, gets {0x0D, 0x07, 0x03, 0x0B, 0x02, 0x04, 0x06} and already looks for this key in the hash table with the names of the functions.
Zend Guard even provides the zend_obfuscate_function_name and zend_obfuscate_class_name run-time functions, which allow you to calculate obfuscated names for functions and classes in order to make it easier to associate encoded files with unencrypted ones.
Decoders Strike Back
Two things are needed to create a decoder: get decoded opcodes and be able to decompile them into PHP source code.
To get opcodes, someone came up with a bright idea - to make your own PHP interpreter assembly, which instead of executing a decoded script, would send it to decompile. No need to bother with reading the encoder format and its defenses - the encoder loader does all the necessary work!
For a while it worked well, then the authors of some encoders thought of replacing the decoded functions with plugs, and hide the actual code and retrieve each called function only at the moment of its immediate execution.
In response, the authors of decoders began to modify the loader-s from the encoder, so that they do not use such stubs.
A rather big minus turned out to be that for each version of PHP, each encoder had its own loaders, which were also frequently updated. It was necessary to patch a lot and often, although it is easy - just turn off the function call-another.
And finally, the authors of one popular encoder took the next step: they began to additionally encode individual operands in some instructions and hang up their handlers for the corresponding commands of the PHP virtual machine. For example, the code
$a = 0;
turned into
$a = 5;
, and at the time of execution, the custom rule handler 5 is back to 0.
This slowed down those who “patch loader-s” for a long time. First, it took a long time to figure out why the seemingly properly pulled opcodes are decompiled with errors. Secondly, it was no longer possible to simply change a couple of bytes in the loader.
The few who put more effort came to the scene — reversing and understanding the format of the encoded files.
The second part of the decoder is decompiling. This is a complex, but interesting, purely algorithmic problem.
Once bright heads wrote a couple of good decompiling algorithms for PHP. Most of those who are engaged in decoding PHP now cannot write their own decompiler, so they use those that are with minimal edits.
All open source decompilers correctly restore only 90-95 %% of the code. The rest has to be corrected manually, and here the experience of programming in PHP and the experience of decompiling play a very important role, since Errors are usually typical.
Summing up: there is no fully automatic decoding for the main commercial encoders yet.
How to protect against decoding
It is clear that sooner or later any coded code will be opened, if necessary. But knowing how the decoders work, you can seriously complicate this process:
Legal aspects
Generally speaking, decoding PHP files after commercial encoders is illegal. Technically, this is due to the fact that for full decoding, the encoders themselves need to be decompiled and analyzed, and the law and user agreements expressly prohibit this.
On the territory of the European Union there is such a loophole: it is allowed to “ensure compatibility of software copies that you own, and for this, if necessary, bypass the built-in protection systems”. At the same time, a direct ban on reverse engineering for each encoder still takes precedence.
It turns out that “I downloaded a program from the Internet that got me unencrypted opcodes” or “I used a special PHP interpreter assembly that stores decrypted opcodes” - these are conditional-legal decoding methods. “Conventionally” - because if the case does reach the court, it is still not clear who will be right.
It is clear that the creators of the encoders would prefer that no one could ever decode the encoded files. But for those who stayed with the code coded after unscrupulous freelancers, or after the disappearance of the developer company (which happens very often), the opinion about decoding is diametrically opposite.
Interesting facts and tales
Most of the encoders of the last couple of years have only slightly changed the file format “under the hood”, and are being released under the guise of a new version.
Obfuscating short names often causes collisions. Apparently, in such cases, tech support for encoders simply advises not to use obfuscation.
Freelancers so often use pieces of code from the PHP documentation and from StackOverflow that a dictionary made up of identifiers taken from there from the examples usually makes it possible to de-use under 90% of all names on an average project.
For all the time I have met only five different PHP decompilers. Three of them were written by Russian-language programmers, another one was written by a Chinese, and another one was sworn by a Frenchman. Trifle, but nice - proud of "our" :)
At the same time, the majority of Russian-speaking clients ask in their own way to do work for free :)
And finally - a couple of storiesOne Arab, after a lengthy discussion of his project, said that "my budget is $ 15, but we all understand ... there is a lot of work, so you just released all your programs, and we somehow decode everything here."
Several times it turned out that only I could decode a specific file format. And the same files came to decode through several different intermediaries at the same time.
I was especially amused by this story: a Negro with an African name and Swiss citizenship, quarreled with a freelance programmer from Australia, did not pay him for his work, and stayed with a couple of coded, unfinished files on his website. I searched for a long time on the freelance stock exchanges of the one who decoded them, until at last one Indian had delivered his services to him.
For three weeks this Hindu fed the customer with breakfast, while he himself strenuously searched for a real performer. In parallel, the customer (the bug is still) under a different name and he himself continued to look for other decoders on the same freelance stock exchanges. He found me, gave me the project ... and then, literally half an hour later, a Hindu knocked me and, with a sense of obvious relief, began to persuade him to do his project too. I compared the files, and ...
Of course, for educational purposes, it would be worthwhile to take 100% of the prepayment from both of them ... but I just made them communicate and understand each other.
As a result, the Hindu still does not forget to congratulate me on my birthday.
The customer even gave me a bonus, and now he has moved to Estonia (!) Because it is cheaper to live there, and from time to time persuades me to participate in some of his dubious projects.
UPD. I had to cut out a part of the example with the eval-coded code, because Kaspersky issued a warning message to it. Thank you
nokimaro !