Control Flow Unflattening

Target

Recently I have analyzed a RASP solution called Approov. Although there are some novel detection techniques, overall it’s not that interesting. Instead, I will focus on the obfuscation part of the native library, control flow flattening. You can find a host app with a little googling :)

CFF

Control flow flattening (CFF) in a nutshell is a technique to obfuscate code flow by rearranging code blocks (+create new basic blocks) into switch cases. Visualizing it is more straightforward:

If you think about the dynamic analysis perspective, the execution flow is the same as in the first case. The dispatcher (gray block) should execute blocks in the order of normal execution. Since you need to follow the state variable after each “case” execution, static analysis is harder for a reverse engineer and sometimes state variable is also obfuscated.

After seeing the cff’ed binary I looked up earlier research about cff. I tried to work with stadeo, then found eShard d810. Stadeo uses miasm to lift binary and then analyze the cff whereas d810 uses HexRays microcode (intermediate representation of IDA). D810 can backward track variables, emulate microcode, and have very easy-to-extend config/rule sets for opaque predicates, etc. So I sticked to d810. Make sure to read their blogs about d810[1][2]

Microcode

D-810 hooks the decompilation process of IDA and makes changes. Micro instructions(minsnt_t), micro-operations (mop_t), blocks (mblocks_t) in microcode. Before giving execution back to IDA final mba needs to be verified.

Oh, boi.. Before realizing all internal errors of (INTERR) microcode was documented in hexrays_sdk/verifier/verify.cpp I tried to find the meaning of all INTERR’s by dumping microcode. I used d810’s dump_microcode_for_debug function to dump microcode which helped a lot for debugging. To debug D810 (or any IDA Pro plugin) I followed this blog post

Unflatten

To unflatten, we need to find state variables in dispatcher fathers. The Dispatcher father is a block that goes into the dispatcher directly. If we know the state variable in dispatcher father, we can calculate the next dispatched block from dispatcher and replace the dispatcher father’s next location with found block.

Let’s give a example from the sample. Each switch case body and initial state variable assignments are a dispatcher father.

Switch case starts with initial state variable 1010106. Knowing if state is 1010106 (X), dispatcher(Y) will go into code block(Z): v1 = (qword_470e8 == 0) + 1010107, We can directly go into block(Z) from assignment of 1010106(X). Meaning instead of X->Y->Z we will have X->Z

In d810 there is a rule for cff with a switch table called default_unflattening_switch_case.json. Unfortunately, it does not work for this case. But why? Because of the v1 = (qword_470e8 == 0) + 1010107. It can generate two different variables from single block.

Actually, they tried to solve this problem. If a block can have more than one state variable value (think about two if cases with different state assignments), they “duplicate any block which appears in (at least) 2 paths with (at least) 2 different predecessors.” But there is no predecessor in this case. So how I solved it ? By creating jz condition with two new blocks.

Let’s look at the microcode representation of these blocks to understand easier. To show microcode I used a plugin called lucid

  • Take comparison part of block 5
  • make it jz condition with comparison
  • Create two new blocks with calculated state variable
  • fix in/out bounds of blocks
block 5:
jz qword_470e8 == 0 @7

block 6:
mov 1010107,eax
goto @4

block 7:
mov 1010108,eax
goto @4

block 4:
jtbl eax ..

After finding dispatcher fathers, I used the mop tracker history feature of d810 to find all father histories and fixed instructions that can create two state variables before executing generic unflattening of d810.

After the changes, all dispatcher fathers have constant state variables. Then we can use d810’s cff solver part and continue.

Bad While Loops

After solving jump tables I saw bad while loops. Not that important but we can simplify these too.

Microcode representation is as following

Block 14 is a dispatcher and exit blocks (like jtbl) are 16, 19, 21 (outbounds of jump conditions). Dispatcher fathers can be calculated from inbounds of block 14: 13 and 18. I created new flattening optimizer class in d810 and used it to clear bad while loops.

Conclusion

Hexrays Microcode is a powerful beast and playing with it was fun. And the feeling of seeing 2k line of code unflattened is 👌. If you have any questions you can ping me at @0xabc0

References

CFF Example image taken from jscrambler.com
Rolf Rolles blog post
Sophos Attacking Emotet’s Control Flow Flattening
eShard d810 cff post
eShard d810 deobf post
TAU Defeating Compiler-Level Obfuscations
Quarkslab Deobfuscation: recovering an OLLVM-protected program
HexRaysDeob
IDA Plugin: lucid
IDA Plugin: d810
Debugging IDA Pro plugins