Using Corellium Kernel Hooks to Disable Exploit Mitigations
In this technical article, we'll take a look at a vulnerability in XNU, the kernel used by iOS and macOS, and explore how Corellium kernel hooks can help to disable exploit mitigations.
In October 2020, Ian Beer of Google Project Zero disclosed a vulnerability in XNU, the kernel used by iOS and macOS, which had been exploited in the wild. This vulnerability is used as a privilege escalation component in an exploit chain, along with a kernel memory disclosure (CVE-2020-27950) and Safari RCE (CVE-2020-27930).
We won't go into the details of the vulnerability here, as the proof-of-concept and a root-cause analysis are available directly from Google Project Zero. Instead, we'll take the proof-of-concept and attempt to run it to see what happens.
This produces an ad-hoc signed command-line binary that we can run on a Corellium virtual device. We'll use an iPhone 7 running iOS 14.1 (18A395), the version right before the vulnerability was patched in iOS 14.2.
After creating the virtual device, we can upload the turnstiles binary, for example into /tmp/ and run it:
In iOS 13.0, Apple introduced the zone_require mitigation. This is intended to defeat a common iOS kernel exploitation technique: the zone transfer, which was commonly used to turn use-after-free bugs into type confusion to create some other primitive such as arbitrary read/write. XNU uses the zone allocator to slice up a memory page into elements of a specific type, for example socket objects are allocated in the socket zone, and Mach ports are allocated in the ipc.ports zone. We can see a list of zones by running zprint:
default.kalloc.16 16 736K 736K 47104 47104 45673 16K 1024 C
default.kalloc.32 32 400K 400K 12800
12800 12433 16K 512 C
default.kalloc.48 48 400K 400K 8533
8533 6641 16K 341 C
default.kalloc.64 64 416K 416K 6656
6656 6568 16K 256 C
default.kalloc.80 80 320K 368K 4096
4710 2617 16K 204 C
default.kalloc.96 96 240K 240K 2560
2560 2469 16K 170 C
default.kalloc.128 128 320K 320K 2560
2560 2425 16K 128 C
Note the existence of the kalloc zones. kalloc builds on top of the zone allocator for objects that do not have a dedicated zone. These objects are allocated by size, and placed into the smallest bin available for the requested size. An exploit developer can use these to control the sizes of their allocations, allowing the heap to be "groomed" and filled with arbitrary attacker-controlled data. In order to control the data being type-confused, the exploit developer typically wants to transfer the page containing the target object from a type-specific zone to a kalloc zone. Generally, that requires controlling every allocation of a page, where one of the allocations is the target object.
For example, suppose there's a use-after-free of a socket object. The attacker will want to perform a zone transfer so that the dangling pointer's data is entirely attacker-controlled. To do this, a standard flow might be:
1. Allocate ("spray") a large number of socket objects. This ensures that any holes in the pages already in the socket zone are filled in, and then starts allocating one or more fresh pages containing objects whose creation was initiated by the attacker (and therefore the attacker can free them at any time). 2. Trigger the "free" part of the use-after-free bug. This stage will depend on the specifics of the bug in question, but the end result is that the target object is freed, but can still be accessed through a dangling pointer. 3. Free the sprayed objects, in the hope that the page containing the target object will no longer contain any allocations. At this point, the page is empty but still considered part of the socket zone. 4. Cause a garbage collection by creating memory pressure, such as allocating and then freeing a large amount of memory in userspace. This will mark the page that formerly contained the target object as free, allowing another zone to claim the page. 5. Attempt to reallocate the target object as a different type, such as entirely attacker-controlled data via kalloc. 6. Trigger the "use" part of the use-after-free bug. This will perform some action on the target object, which has had its data changed. For example, it may call a function pointer that is now attacker-controlled.
The purpose of zone_require is to prevent this entire technique from working. When the dangling socket pointer is referenced after having its contents replaced, a zone check will occur to validate that the page is still owned by the correct zone. Here's an example usage where an object kmsg is checked to ensure that its allocation is inside the correct zone:
In the case of CVE-2020-27932, we see the panic message: "zone_require failed: address in unexpected zone id 107 (host_notify) (addr: 0xffffffe19c7a54d0, expected: ipc ports)". Helpfully, Ian Beer's write-up mentions that "there are presumably some more tricks to get around that". The vulnerability exists as far back as iOS 12.0, so we could simply go back in time to before zone_require was introduced in order to experiment with this vulnerability, but Corellium offers a better way by using Kernel Hooks to disable the mitigation altogether.
Introduction to Kernel Hooks
Corellium Kernel Hooks allow us to introspect and modify the kernel at runtime, similar to using a Python script attached to a breakpoint in lldb. Kernel Hooks, however, have some significant advantages:
- Able to be set/modified from within the Corellium web interface without connecting the kernel debugger, including executing on every boot. - Hooks execute without locking, allowing race conditions to be investigated (which a traditional debugger might prevent from triggering by pausing all cores whenever a breakpoint triggers). - Hooks are written in a C-like language.
At the most basic, we can use a hook to print to the console when a certain instruction is reached, for example by placing a hook at the first instruction of a function at some address (for a made up example, 0xfffffff007738eb0):
This will log to the console in purple text, showing when the function is called, and printing the value of the X0 register.
To disable zone_require, we'll need to locate the function that enforces the check and causes a kernel panic. We can do this by locating the string used in the panic message, "zone_require failed" in the kernelcache opened in Binary Ninja, and then following the cross-references to the relevant function.
We can disable this mitigation entirely by simply return from this function. This is done in the hooks language by setting the PC register to the contents of the LR register (also known as X30):
cpu.pc = cpu.x;
On the Kernel Hooks tab, add a new hook and input the correct address for the beginning of the function (fffffff007768fb8, note that 0x should not be entered) and the contents of our hook, then click Create hook. Leave the patch type as csmfcc to use the C-like hooks language.
If we run the proof-of-concept again with the hook in place, we'll see the "zone_require called" message in purple, and then a different panic message:
From here we can continue to explore this vulnerability as if the mitigation didn't exist and begin implementing an exploit for it, and then deal with the mitigation later.