📜 ⬆️ ⬇️

Code Obfuscation Techniques with LLVM

image
On Habré there are many great articles about the possibilities and methods of using LLVM. I would like to tell more about popular obfuscation techniques that can be implemented using LLVM, in order to complicate application analysis.

Introduction


The article carries more theory than a practical component and presupposes the presence of certain knowledge among the reader, as well as the desire to solve interesting problems for oneself without getting ready-made solutions. Most of the instructions in LLVM IR are based on the three-address code, which means that they take two arguments and return one value and that the number of instructions available to us is limited.

Used software:
- GCC 4.8.2 (mingw64)
- IDA DEMO
- Clang 3.4
- LLVM 3.4

What is possible to implement with LLVM?
1) Random CFG
This method modifies the program execution graph by supplementing it with basic blocks, the original starting block can be moved, diluted with garbage.
Examples
Original
#include <stdio.h> #include <stdlib.h> int rand_func() { return 5+rand();; } int main() { int a = rand_func(); goto test; exit(0); test: int b = a+a; } 

Original graph
image
Obfuscation 1 start, main function
image
Obfuscation 2 start, main function
image
Obfuscation 3 start, main function
image

2) Insert a huge number of basic blocks in CFG, with optional execution. (See screenshots from item 1)
A random base block is taken, its terminator is changed (the instruction of completion of the base block), a lot of base blocks are created and all of them are mixed together, they can be executed or not, according to the author's fantasy.
3) Littering the code.
Suppose that we have a certain code, it is diluted with garbage instructions that try to imitate their usefulness. They can access / change our data without affecting the execution of the program as a whole. The goal is to make the analysis of our application as difficult as possible, with minimal loss in performance.
One of many available obfuscation options.
 #include <stdio.h> #include <stdlib.h> #include <time.h> void test() { int a = 32; int b = time(0); int c = a+a+b; int d = a-b+c*2; printf("%d",d); } int main() { test(); } 

image

4) Hiding constants, data.
Suppose that we have a constant 15h, we make it so that in the native code a constant will be formed at runtime and not found in an open form. The same constant data can be hidden using any encryption algorithm.
Example
 #include <stdio.h> #include <stdlib.h> int main() { const char *habr = "habrahabr"; printf("%s",habr); } 

We find constant data, namely habrahabr, and insert our data decryptor. In the image, there is an example with xor, but you can add any encryption algorithm (AES, RC4, etc.). Data after use (printf) on the stack will be encrypted with a random key.
image
Suppose you want to add data encryption, how is this easiest to do?
LLVM can generate its own code from cpp files, which you can insert into your project.
See the hint in the answers section.

5) Cloning functions and using them in a random order.
The same function is cloned into the set (with possible changes), its handler is inserted at the place of the call code, the functions are called in random order.
6) Combining functions.
All functions and their code are transferred to one. In some cases, the use of such a method is fraught.
7) Organization from a state machine code or a switch.
A new base block is created, which becomes the entry point for the main function, branches are created from it (possibly based on a variable) to other base blocks.
Example
Original code.
 #include <stdio.h> #include <stdlib.h> int rand_func() { return 5+rand();; } int main() { const char *habr = "habrahabr"; printf("%s",habr); int a = rand_func(); goto test; exit(0); test: int b = a+a; } 

First time.
image
Second time.
image

8) Creating pseudocycles from code.
It is applicable with respect to functions, the base block of a specific function is taken, several more blocks are added to it to organize a cycle, the loop is executed only once.
9) A random virtual machine is created, all the existing code is transformed under it, for me this item is possible so far only in theory.
')

How to start learning?


View a list of available three-address commands.
// === - llvm / Instruction.def - File that Instructions - * - C ++ - * - === //
//
// The LLVM Compiler Infrastructure
//
// This file is distributed to the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
// === --------------------------------------------- ------------------------- === //
//
// This file contains descriptions of various LLVM instructions. This is
// used as a central place for enumerating the instructions
// should eventually be instructions.
//
// === --------------------------------------------- ------------------------- === //

FIRST_TERM_INST (1)
HANDLE_TERM_INST (1, Ret, ReturnInst)
HANDLE_TERM_INST (2, Br, BranchInst)
HANDLE_TERM_INST (3, Switch, SwitchInst)
HANDLE_TERM_INST (4, IndirectBr, IndirectBrInst)
HANDLE_TERM_INST (5, Invoke, InvokeInst)
HANDLE_TERM_INST (6, Resume, ResumeInst)
HANDLE_TERM_INST (7, Unreachable, UnreachableInst)
LAST_TERM_INST (7)

// Standard binary operators ...
FIRST_BINARY_INST (8)
HANDLE_BINARY_INST (8, Add, BinaryOperator)
HANDLE_BINARY_INST (9, FAdd, BinaryOperator)
HANDLE_BINARY_INST (10, Sub, BinaryOperator)
HANDLE_BINARY_INST (11, FSub, BinaryOperator)
HANDLE_BINARY_INST (12, Mul, BinaryOperator)
HANDLE_BINARY_INST (13, FMul, BinaryOperator)
HANDLE_BINARY_INST (14, UDiv, BinaryOperator)
HANDLE_BINARY_INST (15, SDiv, BinaryOperator)
HANDLE_BINARY_INST (16, FDiv, BinaryOperator)
HANDLE_BINARY_INST (17, URem, BinaryOperator)
HANDLE_BINARY_INST (18, SRem, BinaryOperator)
HANDLE_BINARY_INST (19, FRem, BinaryOperator)

// Logical operators (integer operands)
HANDLE_BINARY_INST (20, Shl, BinaryOperator) // Shift left (logical)
HANDLE_BINARY_INST (21, LShr, BinaryOperator) // Shift right (logical)
HANDLE_BINARY_INST (22, AShr, BinaryOperator) // Shift right (arithmetic)
HANDLE_BINARY_INST (23, And, BinaryOperator)
HANDLE_BINARY_INST (24, Or, BinaryOperator)
HANDLE_BINARY_INST (25, Xor, BinaryOperator)
LAST_BINARY_INST (25)

// Memory operators ...
FIRST_MEMORY_INST (26)
HANDLE_MEMORY_INST (26, Alloca, AllocaInst) // Stack management
HANDLE_MEMORY_INST (27, Load, LoadInst) // Memory manipulation instrs
HANDLE_MEMORY_INST (28, Store, StoreInst)
HANDLE_MEMORY_INST (29, GetElementPtr, GetElementPtrInst)
HANDLE_MEMORY_INST (30, Fence, FenceInst)
HANDLE_MEMORY_INST (31, AtomicCmpXchg, AtomicCmpXchgInst)
HANDLE_MEMORY_INST (32, AtomicRMW, AtomicRMWInst)
LAST_MEMORY_INST (32)

// Cast operators ...
// NOTE: The order matters here CastInst :: isEliminableCastPair
// NOTE: (see Instructions.cpp) encodes a table based on this ordering.
FIRST_CAST_INST (33)
HANDLE_CAST_INST (33, Trunc, TruncInst) // Truncate integers
HANDLE_CAST_INST (34, ZExt, ZExtInst) // Zero extend integers
HANDLE_CAST_INST (35, SExt, SExtInst) // Sign extend integers
HANDLE_CAST_INST (36, FPToUI, FPToUIInst) // floating point -> UInt
HANDLE_CAST_INST (37, FPToSI, FPToSIInst) // floating point -> SInt
HANDLE_CAST_INST (38, UIToFP, UIToFPInst) // UInt -> floating point
HANDLE_CAST_INST (39, SIToFP, SIToFPInst) // SInt -> floating point
HANDLE_CAST_INST (40, FPTrunc, FPTruncInst) // Truncate floating point
HANDLE_CAST_INST (41, FPExt, FPExtInst) // Extend floating point
HANDLE_CAST_INST (42, PtrToInt, PtrToIntInst) // Pointer -> Integer
HANDLE_CAST_INST (43, IntToPtr, IntToPtrInst) // Integer -> Pointer
HANDLE_CAST_INST (44, BitCast, BitCastInst) // Type cast
LAST_CAST_INST (44)

// Other operators ...
FIRST_OTHER_INST (45)
HANDLE_OTHER_INST (45, ICmp, ICmpInst) // Integer comparison instruction
HANDLE_OTHER_INST (46, FCmp, FCmpInst) // Floating point comparison instr.
HANDLE_OTHER_INST (47, PHI, PHINode) // PHI node instruction
HANDLE_OTHER_INST (48, Call, CallInst) // Call a function
HANDLE_OTHER_INST (49, Select, SelectInst) // select instruction
HANDLE_OTHER_INST (50, UserOp1, Instruction) // May be used internally in a pass
HANDLE_OTHER_INST (51, UserOp2, Instruction) // Internal to passes only
HANDLE_OTHER_INST (52, VAArg, VAArgInst) // vaarg instruction
HANDLE_OTHER_INST (53, ExtractElement, ExtractElementInst) // extract from vector
HANDLE_OTHER_INST (54, InsertElement, InsertElementInst) // insert into vector
HANDLE_OTHER_INST (55, ShuffleVector, ShuffleVectorInst) // shuffle two vectors.
HANDLE_OTHER_INST (56, ExtractValue, ExtractValueInst) // extract from aggregate
HANDLE_OTHER_INST (57, InsertValue, InsertValueInst) // insert into aggregate
HANDLE_OTHER_INST (58, LandingPad, LandingPadInst) // Landing pad instruction.
LAST_OTHER_INST (58)

You should read the following documentation:
LLVM-CheatSheet
LLVM Programmers Manual
LLVM-CheatSheet 2
LLVMBackendCPU
Obfuscating c ++ programs via CFF

It is worth looking at the public implementation of obfuscation code for review.
1) Obfuscator-llvm
Implemented replacement of instructions, compaction of the execution graph.
2) Kryptonite
Implemented replacement of instructions with analogs / decomposition of instructions.

Snipples


In order to insert asm instructions, you can use llvm :: InlineAsm or MachinePass , through machine passes you can change, add instructions. A good example is here.

some useful code to get you started
How to read bytecode file?
 std::string file = "1.bc"; std::string ErrorInfo; llvm::LLVMContext context; llvm::MemoryBuffer::getFile(file.c_str(), bytecode); llvm::Module *module = llvm::ParseBitcodeFile(bytecode.get(), context, &error); 

How to iterate functions in a module?
 for (auto i = module->getFunctionList().begin(); i != module->getFunctionList().end(); ++i) { printf("Function %s",i->getName().str()); } 

How to check for belonging to some instructions?
 if (llvm::isa<llvm::BranchInst>(currentInstruction)) printf("BranchInst!"); 

How to replace the terminator with another instruction?
 llvm::BasicBlock *block = () block->replaceAllUsesWith(  ); 

How to make bringing one instruction to another?
 llvm::Instruction* test = basicBlock->getTerminator(); llvm::BranchInst* branchInst = llvm::dyn_cast<llvm::BranchInst>(test) 

How to get the first non- phi instruction in the base unit?
 llvm::Instruction *inst = currentInstruction->getParent()->getFirstNonPHI() 

How to iterate instructions in a function?
  for(llvm::inst_iterator i = inst_begin(function); i != inst_end(function); i++) { llvm::Instruction* inst = &*i; } 

How to find out if the instruction is used somewhere else?
 bool IsUsedOutsideParentBlock(llvm::Instruction* inst) { for(llvm::inst::use_iterator i = inst->use_begin(); i != inst->use_end(); i++) { llvm::User* user = *i; if(llvm::cast<llvm::Instruction>(user)->getParent() != inst->getParent()) return true; } return false; } 

How to get / change the base blocks that InvokeInst and others refer to?
 invokeInst->getSuccessor(0); //    . invokeInst->setSuccessor(0,basicBlock); //. 



Answers on questions


Q: What about deobfuscation?
A: It all depends on you, there is a project based on LLVM to remove obfuscation.
Q: How to generate byte code file from source?
A: clang -emit-llvm -o 1.bc -c 1.c
Q: How to compile byte code?
A: clang -o 1 1.bc
Q: How to generate an asm file from an LLVM IR view?
A: llc foo.ll
Q: How to generate IR file from source?
A: clang -S -emit-llvm 1.c
Q: How to compile .s file (assembler)?
A: gcc -o exe 1.s
Q: How to get obj file from bytecode?
A: llc -filetype=obj 1.bc
Q: How to get the source from LLVM api for cpp file?
A: clang ++ -c -emit-llvm 1.cpp -o 1.ll then llc -march=cpp -o 1.ll.cpp 1.ll
Q: Compiled clang under windows, but he can not find the header files, how to treat?
A: You need to find InitHeaderSearch.cpp and add the necessary paths, look in the direction of AddMinGWCPlusPlusIncludePaths, AddMinGW64CXXPaths.
Q: Clang compiled under the visual studio works fine?
A: Not at the moment, it can only compile the simplest C code.
Q: Clang with optimization mode cuts the generated instructions / functions, what to do?
A: Your passes must be embedded in the clang, in this cpp file. You can also force the compiler to think that the code we added is necessary, for this it is necessary that this code be used, in the case of functions, they must be called. For tests, you can use the -O0 mode.

Source: https://habr.com/ru/post/213259/


All Articles