Title: Paper notes: Breaking the x86 ISA
Date: 2017-10-1 10:10

- Complete title: Breaking the x86 ISA
- PDF: [31dd6431a3dd0d26814b4ff4d453c20e8d1292eef2f3ec1f777b8e4ee1852d08_domas_breaking_the_x86_isa_wp.pdf]({static}/files/papers/31dd6431a3dd0d26814b4ff4d453c20e8d1292eef2f3ec1f777b8e4ee1852d08_domas_breaking_the_x86_isa_wp.pdf)

The paper Christopher Domas (that wrote the
[movfuscator](https://github.com/xoreaxeaxeax/movfuscator),
[REpsych](https://github.com/xoreaxeaxeax/REpsych) and [cantor
dust](https://sites.google.com/site/xxcantorxdustxx/home)) is about the
**exhaustive** exploration of the x86 ISA ([Instruction Set
Architecture](https://en.wikipedia.org/wiki/Instruction_set_architecture))
using page-fault and guided fuzzing.

# Fuzzing

The set of every possible x86 instruction is too large to be exhausted by pure
brute-force (instructions from 1 to 15 bytes long, so
`1329227995784915872903807060280344576` possibilities), so to achieve
exhaustive coverage, the following approach is used, on a 15 bytes buffer,
initially filled with `0`:

1. Execute the instruction in the buffer, and observe its effective length
2. The byte at the end of the instruction is incremented,
3. Repeat the process with the resulting new instruction, 
4. In case of a change of the length or an exception, the incrementation will
   start from the new end.
5. If the increment reaches `256`, the incrementation moves to the *previous*
  byte.

For example, for `0…0` the observed length is 2, so the next instruction to be
executed will be `010…0`.

This approach skips less significant portions of the instruction, such
as immediate value or displacements

# Measuring the length of the instruction

Using the `trap` instruction and comparing `eip` doesn't work for instructions
that are generating exceptions, since they do not execute.

A more reliable way to measure the length of the instruction is to place the
**first** byte of the instruction on an **executable** page, and the rest of it
in a **non-executable** one. If a *page fault* occur during fetching, the processor
trigger an exception, and the address of the page boundary is reported in a register,
allowing to iteratively find the size of the instruction (either until there is
not exception, or if the reported address isn't the boundary). Valid and
invalid instructions can be told apart because the later will raise an `UD` (invalid
opcode) exception upon execution. Privileged ones will raise a `GP` one.

To prevent self-corruption, the process is running in ring 3, with every
register set to zero and every possible exception hooked. A side-effect of
vastly ignoring different immediate values (like `inc [0x0804a10c]`) is that
this reduces the odds of corruption, since there is nothing mapped at low level
addresses.

To find *interesting* instructions, the length of the instruction is compared with the length
of the one decoded by [capstone](http://www.capstone-engine.org/).

# Results

- QEMU, IDA, Objdump, capstone, VS, … are getting some instructions wrong, allowing to
hide malicious code from the disassembly listing since it's interpreted wrongly.
- The tool (called sandsifter) rediscovered the famous [`lock cmpxchg8b eax`](https://en.wikipedia.org/wiki/Pentium_F00F_bug) bug
- Disparities were found against AMD an Intel processors
- An `halt and catch fire` instruction that could be triggered from ring3 
  was found in an unnamed processor

As usual with decent research paper, the code related to the publication is [openly
available and documented](https://github.com/xoreaxeaxeax/sandsifter)
([local mirror]({static}/files/papers/sandsifter.zip), `2017-09-13`).