Title: Some notes on "Randomized slab caches for kmalloc()"
Date: 2023-09-11 01:45

Ruiqi Gong and Xiu Jianfeng got their
[Randomized slab caches for kmalloc()](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c6152940584290668b35fa0800026f6a1ae05fe)
patch series merged upstream, and I've had enough discussions about it to
warrant summarising them into a small blogpost.

The main idea is to have multiple slab caches, and pick one at random based on
the address of code calling `kmalloc()` and a per-boot seed, to make heap-spraying harder.
It's a great idea, but comes with some shortcomings for now:

- Objects being allocated via wrappers around `kmalloc()`, like `sock_kmalloc`,
  `f2fs_kmalloc`, `aligned_kmalloc`, … will end up in the same slab cache. A
  possible improvement would be to mix the callsite address and the parent
  caller's address.
- The slabs needs to be pinned, otherwise an attacker could [feng-shui](https://en.wikipedia.org/wiki/Heap_feng_shui) their way
  into having the whole slab free'ed, garbage-collected, and have a slab for
  another type allocated at the same VA. [Jann Horn](https://thejh.net/) and [Matteo Rizzo](https://infosec.exchange/@nspace) have a [nice
  set of patches](https://github.com/torvalds/linux/compare/master...thejh:linux:slub-virtual-upstream),
  discussed a bit in [this Project Zero blogpost](https://googleprojectzero.blogspot.com/2021/10/how-simple-linux-kernel-memory.html),
  for a feature called [`SLAB_VIRTUAL`]( https://github.com/torvalds/linux/commit/f3afd3a2152353be355b90f5fd4367adbf6a955e),
  implementing precisely this.
- There are 16 slabs by default, so one chance out of 16 to end up in the same
  slab cache as the target.
- There are no guard pages between caches, so inter-caches overflows are
  possible.
- As pointed by [andreyknvl](https://twitter.com/andreyknvl/status/1700267669336080678)
  and [minipli](https://infosec.exchange/@minipli/111045336853055793),
  the fewer allocations hitting a given cache means less noise,
  so it might even help with some heap feng-shui.
- minipli also pointed that "randomized caches still freely
  mix kernel allocations with user controlled ones (`xattr`, `keyctl`, `msg_msg`, …).
  So even though merging is disabled for these caches, i.e. no direct overlap
  with `cred_jar` etc., other object types can still be targeted (`struct
  pipe_buffer`, BPF maps, its verifier state objects,…). It’s just a matter of
  probing which allocation index the targeted object falls into.",
  but I considered this out of scope, since it's much more involved;
  albeit something like Jann Horn's [`CONFIG_KMALLOC_SPLIT_VARSIZE`](https://github.com/thejh/linux/blob/slub-virtual/MITIGATION_README)
  wouldn't significantly increase complexity.

Also, while code addresses as a source of entropy has historically be a great
way to provide [KASLR](https://lwn.net/Articles/569635/) bypasses, `hash_64(caller ^
random_kmalloc_seed, ilog2(RANDOM_KMALLOC_CACHES_NR + 1))` shouldn't trivially
leak offsets.

The segregation technique is a bit like a weaker version of grsecurity's
[AUTOSLAB](https://grsecurity.net/how_autoslab_changes_the_memory_unsafety_game),
or a weaker kernel-land version of
[PartitionAlloc](https://chromium.googlesource.com/chromium/src/+/master/base/allocator/partition_allocator/PartitionAlloc.md),
but to be fair, making use-after-free exploitation harder, and significantly
harder once pinning lands, with only ~150 lines of code and negligible
performance impact is amazing and should be praised. Moreover, I wouldn't be
surprised if this was backported in [Google's KernelCTF](https://google.github.io/security-research/kernelctf/rules.html)
soon, so we should see if my analysis is correct.
