Paper notes: RetSpill

Full title: RetSpill: Igniting User-Controlled Data to Burn Away Linux Kernel Protections
PDF: ACM — mirror — local mirror
Authors: Kyle "kylebot" Zeng, Ruoyu Wang, Yan Shoshitaishvili, and Adam Doupé from Shellphish, along with Zhenpeng Lin, Kangjie Lu, Xinyu Xing and Tiffany Bao.

The idea of the paper is to use user-controlled data that are by design copied in kernel-land when exercising syscalls to store a ROP-chain, via 4 main venues:

Valid Data directly copied onto the kernel stack for performance reasons, like when calling poll;
Preserved Registers, restored upon returning from kernel-land to userland.
Calling Convention compliant functions will save/restore registers, and apparently, system call handlers are calling convention compliant even though the kernel is already taking care of those, and syscalls can only be called from userland. But even if the syscalls handles weren't compliant, registers still contain userland values when they're called, and sub-functions might store/restore those registers, since those do need to be compliant.
Uninitialized Memory, since the per-thread kernel stack is reused between syscalls, and not erased (unless PAX_MEMORY_STACKLEAK is used).

Then, only a KASLR leak, a CFHP (control-flow hijacking primitive) and a add rsp, X; ret-like gadget are required to ROP all the things. Nowadays, most™ CFHP are created by corrupting the heap to hijack function pointers, and since every kernel thread shares the same heap, once it is is properly shaped, the control flow hijacking primitive can likely be triggered again and again from a different threads. Moreover, changing the exploit is simply a matter of re-invoking a syscall with different data spill, instead of having to reshape the heap every single time. One doesn't have to worry about crashes (enabling lame bruteforcing), since no major Linux distributions (except CentOS, kudos) has panic_on_oops enabled, so having a ROP-chain crash is no big deal, because the CFHP is still on the heap, one syscall away.

Since the space afforded to store gadgets might be too small, one trick is to invoke do_task_dead at the end of every ROP-chain to terminate it gracefully, and trigger the CFHP again and again.

Mitigation-wise:

SMEP, SMAP and KPTI are irrelevant.
RANDKSTACK mitigates data spillage from Preserved Registers and Uninitialized Memory, but since it only provides 5 bits of randomness, a ret-sled is enough to bypass it (25.44% of the time if using gadgets from Preserved Registers or Uninitialized Memory, 100% otherwise), and in the absence of panic_on_oops it can quickly be bruteforced anyway.
STACKLEAK, STRUCTLEAK, and CONFIG_INIT_STACK_* only mitigate data spillage from Uninitialized Memory.
FG-KASLR is useless since it doesn't randomize everything, leaving a couple (42631 according to the paper) of gadgets at position-invariant positions, which are enough to perform arbitrary-reads and derandomize everything.
KCFI and IBT also (currently) don't cover everything, but don't really matter much here anyway, since we only care about backward-edges, and as for the CFHP:
There are ways to obtain one in the presence of perfect forward-edge CFI with a heap corruption.
Using __x86_indirect_thunk_rdi allows to transform a forward-edge control-flow transition to backward edge one.
Shadow stack and perfect CFI are a pipe dream that would mitigate RetSpill, but PaX' RAP is really close to it, likely making it insanely hard, with its type-based CFI, and its changing-on-every-syscall/task/… register-stored cookie paired with unreadable kernel stacks for backward edge, on top of CFI.

To showcase how cool all of this is, the paper comes with a semi-automated tool outputting the address of a stack-shifting gadget, a function to performs data spillage, invoke the triggering system call, and yield a root shell via a classic commit_creds(init_cred) + returning back to user space. It works by:

taking full snapshots of a vm to locate the syscall leading to CFHP by using a binary-search-like heuristic;
mutating userland inputs (registers, copy\_from\_user/get\_user parameters, …), continuing the execution of the vm, marking the as user-controllable data if the CFHP still happens after modifications, and doing taint analysis to find how to modify them.
generating a ROP-chain, which isn't that easy, given that:
it's done over discrete controlled regions
there are some constraints, like "eax contains the syscall number", or "edx comes from both Saved Registers and Calling Convention spillages.

Of course, given that some authors are angr developers, angrop was used to knit the ROP-chains, and the results are pretty impressive:

The abundance of data spillage allows 20 out of 22 proof-of-concept programs that manifest CFHP to be semi-automatically turned into full privilege escalation exploits.

To kill this technique, the authors suggest:

Preserved Register: RANDKSTACK helps, but storing userspace registers somewhere else than on the stack would be even better, eg. in task_struct.
Uninitialized Memory: enable STACKLEAK/STRUCTLEAK/CONFIG\_INIT\_STACK\_\*, but the performances impact is pretty steep.
Calling Convention and Valid Data: an improved version of RANDKSTACK, adding a random offset at the bottom of each stack frame, between rsp and user data. This technique also mitigates Preserved Registers and Uninitialized Memory, with an average performance overhead of 0.61%.

Like all good papers it comes with code.

Amusingly:

RetSpill completely bypasses OpenBSD's MAP_STACK mitigation, should it ever be implemented in kernel-land,
The Organizers CTF team used the ptregs structure to store their ROP chain for 0CTF/TCTF 2021 Finals's Kernote pwn challenge.

Artificial truth

archives | latest | homepage

Paper notes: RetSpill
Thu 18 January 2024 — download