📜 ⬆️ ⬇️

Deanonymization of a programmer is possible not only through the source code, but also through a compiled binary file

It is no secret that many open source software developers and not only, for various reasons, wish to preserve their anonymity. Most recently, a group of researchers has published a paper that describes the deanonymization methods of a programmer in his coding style through the analysis of source codes. The authors claim that they managed to achieve an average accuracy of identification of 94%.

Using the construction of abstract syntactic trees based on the parsing of the source text, they were able to identify persistent distinguishing features when writing code that are difficult to hide even purposefully. Using machine learning and a set of heuristics, we managed to achieve an impressive accuracy of determining authorship among a sample of 1600 Google Code Jam programmers.


In their new work , the researchers demonstrated that de-anonymization is also possible through the analysis of already compiled binary files in the absence of source codes ( video of the presentation report). This time, the source codes of 600 participants of Google Code Jam were used for the study, which were compiled into executable files and then parsed. Due to the fact that the tasks at the competitions were the same for everyone, the difference in the files was largely in the programming style, not in the algorithm. Initially, when compiling binary files, compiler optimizations were turned off and source code obfuscation was not used. But, according to the authors of the work, some distinctive features are preserved even with the use of these methods of concealing authorship, reducing the accuracy of de-anonymization to 65%.


With the help of disassembling and decompiling, applying all the same abstract syntactic trees, a control flow graph is analyzed, distinctive coding features are distinguished, and the classifier is trained on the basis of feature vectors.
')




Interestingly, it has been found that more professional programmers can be deanonymized much easier compared to less experienced colleagues, since have a more pronounced and individual programming style.

The authors are confident that with the help of such methods some real authors of such developments as Bitcoin, TrueCrypt, as well as the developers of known malware will be revealed sometime.

Source: https://habr.com/ru/post/274533/


All Articles