- Full title: RetSpill: Igniting User-Controlled Data to Burn Away Linux Kernel Protections
- PDF: ACM — mirror — local mirror
- Authors: Kyle "kylebot" Zeng, Ruoyu Wang, Yan Shoshitaishvili, and Adam Doupé from Shellphish, along with Zhenpeng Lin, Kangjie Lu, Xinyu Xing and Tiffany Bao.
The idea of the paper is to use user-controlled data that are by design copied in kernel-land when exercising syscalls to store a ROP-chain, via 4 main venues:
- Valid Data directly copied onto the kernel stack for performance reasons, like when
calling
poll; - Preserved Registers, restored upon returning from kernel-land to userland.
- Calling Convention compliant functions will save/restore registers, and apparently, system call handlers are calling convention compliant even though the kernel is already taking care of those, and syscalls can only be called from userland. But even if the syscalls handles weren't compliant, registers still contain userland values when they're called, and sub-functions might store/restore those registers, since those do need to be compliant.
- Uninitialized Memory, since the per-thread kernel stack is reused between syscalls,
and not erased (unless
PAX_MEMORY_STACKLEAKis used).
Then, only a KASLR leak,
a CFHP (control-flow hijacking primitive)
and a add rsp, X; ret-like gadget are required to ROP all the things.
Nowadays, most™ CFHP are created by corrupting the heap to hijack function
pointers, and since every kernel thread shares the same heap,
once it is is properly shaped, the control flow hijacking primitive can likely
be triggered again and again from a different threads.
Moreover, changing the exploit is simply a matter of re-invoking a syscall with
different data spill, instead of having to reshape the heap every single time.
One doesn't have to worry about crashes (enabling lame bruteforcing), since no
major Linux distributions (except CentOS, kudos) has panic_on_oops enabled,
so having a ROP-chain crash is no big deal, because the CFHP is still on the
heap, one syscall away.
Since the space afforded to store gadgets might be too small, one trick is to
invoke do_task_dead at the end of every ROP-chain to terminate it gracefully,
and trigger the CFHP again and again.
Mitigation-wise:
- SMEP, SMAP and KPTI are irrelevant.
- RANDKSTACK mitigates data spillage from Preserved Registers and Uninitialized Memory,
but since it only provides 5 bits of randomness, a
ret-sled is enough to bypass it (25.44% of the time if using gadgets from Preserved Registers or Uninitialized Memory, 100% otherwise), and in the absence ofpanic_on_oopsit can quickly be bruteforced anyway. - STACKLEAK, STRUCTLEAK, and CONFIG_INIT_STACK_* only mitigate data spillage from Uninitialized Memory.
- FG-KASLR is useless
since it doesn't randomize everything, leaving a couple (
42631according to the paper) of gadgets at position-invariant positions, which are enough to perform arbitrary-reads and derandomize everything. - KCFI and IBT also (currently) don't cover everything, but don't really matter much here anyway, since we only care about backward-edges, and as for the CFHP:
- There are ways to obtain one in the presence of perfect forward-edge CFI with a heap corruption.
- Using
__x86_indirect_thunk_rdiallows to transform a forward-edge control-flow transition to backward edge one. - Shadow stack and perfect CFI are a pipe dream that would mitigate RetSpill, but PaX' RAP is really close to it, likely making it insanely hard, with its type-based CFI, and its changing-on-every-syscall/task/… register-stored cookie paired with unreadable kernel stacks for backward edge, on top of CFI.
To showcase how cool all of this is, the paper comes with a semi-automated tool
outputting the address of a stack-shifting gadget, a function to performs data
spillage, invoke the triggering system call, and yield a root shell via a
classic commit_creds(init_cred) + returning back to user space. It works by:
- taking full snapshots of a vm to locate the syscall leading to CFHP by using a binary-search-like heuristic;
- mutating userland inputs (registers,
copy\_from\_user/get\_userparameters, …), continuing the execution of the vm, marking the as user-controllable data if the CFHP still happens after modifications, and doing taint analysis to find how to modify them. - generating a ROP-chain, which isn't that easy, given that:
- it's done over discrete controlled regions
- there are some constraints, like "
eaxcontains the syscall number", or "edxcomes from both Saved Registers and Calling Convention spillages.
Of course, given that some authors are angr developers, angrop was used to knit the ROP-chains, and the results are pretty impressive:
The abundance of data spillage allows 20 out of 22 proof-of-concept programs that manifest CFHP to be semi-automatically turned into full privilege escalation exploits.
To kill this technique, the authors suggest:
- Preserved Register:
RANDKSTACKhelps, but storing userspace registers somewhere else than on the stack would be even better, eg. intask_struct. - Uninitialized Memory: enable
STACKLEAK/STRUCTLEAK/CONFIG\_INIT\_STACK\_\*, but the performances impact is pretty steep. - Calling Convention and Valid Data: an improved version of
RANDKSTACK, adding a random offset at the bottom of each stack frame, betweenrspand user data. This technique also mitigates Preserved Registers and Uninitialized Memory, with an average performance overhead of 0.61%.
Like all good papers it comes with code.
Amusingly:
- RetSpill completely bypasses OpenBSD's MAP_STACK mitigation, should it ever be implemented in kernel-land,
- The Organizers CTF team
used
the
ptregsstructure to store their ROP chain for 0CTF/TCTF 2021 Finals's Kernote pwn challenge.