Title: Paper Notes: FineIBT
Date: 2022-12-07 23:15

[FineIBT](https://www.openwall.com/lists/kernel-hardening/2021/02/11/1) is a
proposal by Intel's [Joao Moreira](https://twitter.com/lvwr) for a fine-grained
forward-edge CFI scheme. It was
[presented](https://static.sched.com/hosted_files/lssna2021/8f/LSS_FINEIBT_JOAOMOREIRA.pdf)
at the Linux Security Summit 2021.

One of [Intel
CET](https://www.intel.com/content/dam/develop/external/us/en/documents/catc17-introduction-intel-cet-844137.pdf)'s
shortcomings, as hilinghted in grsecurity's [Close, but No Cigar: On the
Effectiveness of Intel's CET Against Code Reuse Attacks](
https://grsecurity.net/effectiveness_of_intel_cet_against_code_reuse_attacks)
blogpost, is that every function is a valid indirect `call`/`jmp` target.
This isn't a theoretical issue, since it was explicitly called out in Qualys'
[Baron Samedit]( https://qualys.com/2021/01/26/cve-2021-3156/baron-samedit-heap-based-overflow-sudo.txt ) exploit.

Intel CET works like this, with the
[`endbr64`](https://www.felixcloutier.com/x86/endbr64) instruction marking valid
targets:

```nasm
main:
...
mov rax, bar
call *rax;
...

bar:
endbr64
...
```

The main improvement of FineIBT,
is to cluster functions and pointers by prototypes to reduce the number of
valid targets for a given `call`/`jmp`. This isn't a new idea, is was already
described in 2003 in
[pax-future.txt](https://pax.grsecurity.net/docs/pax-future.txt),
implemented in [PaX' RAP](https://grsecurity.net/rap_faq) in 2015
and in [Microsoft's XFG](https://www.offensive-security.com/offsec/extended-flow-guard/) in 2019.
This is done by embedding a hash of target's type and checking it at runtime.
This has the nice advantage of not depending on LTO. For FineIBT, it looks like this:

```nasm
main:
...
mov rax, bar
mov r11, 0xcafecafe
call *rax
...
call bar_oep  # direct calls can skip the prologue.

bar:
endbr64
xor 0xcafecafe, r11  # this has the nice side-effect of nuking r11.
je bar_oep
hlt
bar_oep:
```

The loader checks that all DSO are supporting FineIBT, and if so enables it via
a flag stored in `fs:0x48`, making the prologue look like this:

```nasm
bar:
endbr64
xor 0xcafecafe, r11
je bar_oep
testb 0x11, fs:0x48
jne bar_oep
hlt
bar_oep:
```

Unfortunately, this means that an attacker write arbitrary r/w will be able to
disable FineIBT, which is a bit weird, since the threat-model for CFI is
usually "arbitrary r/w". Moreover, this adds two instructions per function,
hurting the performances/binary size even more.

The performance impact of this scheme is somewhere between negligible and a
dozen percents, depending on the workload, both performance-wise and
binary-size-wise.

There are a couple of prototypes floating around, in 
[llvm/ld]( https://github.com/intel/fineibt_llvm ),
[glibc]( https://github.com/intel/fineibt_glibc ),
musl, … The whole thing is still a work in progress, with questions like how to
handle C++ construct like vtables and polymorphism. Rereading [PaX' RAP](
https://pax.grsecurity.net/docs/PaXTeam-H2HC15-RAP-RIP-ROP.pdf) would likely
help a lot.

A couple of things/improvements/details aren't mentioned, but I guess they
might be during the next iterations, since they're already being discussed in
*private circles*:

- *keyed hashing* for binary diversification
- only have `endbr64` instructions for functions that could be indirectly
  called, and this, cross-DSO, likely by runtime-patching them away and via
  stub tables.
- *hash value range* to segregate functions even more, like for
  exceptions-related magic (`setjmp`/`longjmp`/…).
- *type diversification* to restrict valid targets even more, especially for
  common/dangerous function types like `int system(const char *)`
- getting rid of the `fs:0x48` hack completely.
- `hlt` vs. `ud2`: while the former only takes one byte, the later works in
  kernel-land as well, and is used by clang and gcc to implemented
  `__builtin_trap`. Moreover, there are less chances of having a handler on `SIGILL` 
  than on `SIGSEGV`.
