Title: Paper notes - Chosen-Instruction Attack Against Commercial Code Virtualization Obfuscators
Date: 2022-10-01 21:30

- PDF: [2afff887eae2ec3b9485b82e38df4451bfe9783f4b026be82c67eab032968530]({static}/files/papers/2afff887eae2ec3b9485b82e38df4451bfe9783f4b026be82c67eab032968530_chosen_instruction.pdf)

Commercial virtualization-based
[obfuscators](https://en.wikipedia.org/wiki/Obfuscator) are hard to… well…
devirtualize in a generic way. The idea of the paper is to use code like this:

```C
void KnowledgeLeaking() {
  VIRTUALIZER START // VM macro
  __asm(
    "cmpxchg eax , eax;" // anchor
    "mov rax , 0x1337 ;"  // knowledge leaking code
    "cmpxchg eax , eax;" // anchor
  );
  VIRTUALIZER END // VM macro
}
```

and to throw it repeatedly at virtualizers, with the `anchor` being an
instruction that aren't or can't be virtualized by the virtualizer, like atomic swaps,
`syscall`, `cpuid`, … meaning that it won't be obfuscated (format-preserving).
This construction will also force the VM to be started, suspended/terminated,
execute the anchor, resumed/restart the vm, and then finally terminate it,
effectively creating a "self-contained" obfuscation of the "knowledge leaking
code".

Afterwards, backward and forward slicing can be used on the trace of the
function, since all the input/output registers/memory values of the knowledge
leaking code are known. Moreover, by using a `nop` instruction, the
context-switch instructions can be precisely identified.

This allows to leak what the paper calls "Mapping rules": `instruction` → `obfuscated
corresponding code`, with their associated additional `transformation strategy`
(like `xor edx, ecx` → `nor(or(edx, ~ecx), or(~edx, ecx))`); the
main hypothesis being that the different transformation strategies can be
enumerated.

The authors threw their machinery at [VMProtect](https://vmpsoft.com/), [Code
Virtualizer](https://www.oreans.com/codevirtualizer.php),
[Themida](https://www.oreans.com/themida.php), and
[Obsidium](https://www.obsidium.de/product/sps/about). Their found out 760
anchor instructions, extracted 1915 customized mapping rules validated by
[Z3](https://www.microsoft.com/en-us/research/project/z3-3/),

Surprisingly, they didn't write a pattern-matching-based deobfuscator ("[…]
designed to assist analysts in extracting knowledge from commercial VM- based
obfuscators, rather than directly simplifying virtualized malware. We leave it
to future work."), but produced a benchmark, to see what instructions other
tools like [Syntia](https://github.com/rub-sysSec/syntia),
[VMhunt](https://github.com/s3team/VMHunt) and
[generic-deobfuscator](https://www.sysnet.ucsd.edu/~bjohanne/assets/papers/2015oakland.pdf)
could successfully devirtualize.

Amusingly:

> The different levels
of obfuscation (i.e., white, black, and red) provided by Code
Virtualizer and Themida only change the numbers of inserted
junk instructions but will not influence the complexity of
the mapping rules between original instructions and kernel
virtualized instructions.

The code has of course [been
published](https://github.com/chosen-instruction-attack), and the paper was
part of the [29th Network and Distributed System Security
Symposium](https://www.ndss-symposium.org/ndss2022/).
