Using Corellium Kernel Hooks to Disable Exploit Mitigations

In this technical article, we'll take a look at a vulnerability in XNU, the kernel used by iOS and macOS, and explore how Corellium kernel hooks can help to disable exploit mitigations.
Using Corellium Kernel Hooks to Disable Exploit Mitigations

In October 2020, Ian Beer of Google Project Zero disclosed a vulnerability in XNU, the kernel used by iOS and macOS, which had been exploited in the wild. This vulnerability is used as a privilege escalation component in an exploit chain, along with a kernel memory disclosure (CVE-2020-27950) and Safari RCE (CVE-2020-27930).

We won't go into the details of the vulnerability here, as the proof-of-concept and a root-cause analysis are available directly from Google Project Zero. Instead, we'll take the proof-of-concept and attempt to run it to see what happens.

First we build the proof-of-concept:

$ xcrun -sdk iphoneos cc -arch arm64 -o turnstiles host_not.c -Wall -O3 -framework CoreFoundation

$ codesign -s - turnstiles

This produces an ad-hoc signed command-line binary that we can run on a Corellium virtual device. We'll use an iPhone 7 running iOS 14.1 (18A395), the version right before the vulnerability was patched in iOS 14.2.

After creating the virtual device, we can upload the turnstiles binary, for example into /tmp/ and run it:

zone_require_panic

zone_require

In iOS 13.0, Apple introduced the zone_require mitigation. This is intended to defeat a common iOS kernel exploitation technique: the zone transfer, which was commonly used to turn use-after-free bugs into type confusion to create some other primitive such as arbitrary read/write. XNU uses the zone allocator to slice up a memory page into elements of a specific type, for example socket objects are allocated in the socket zone, and Mach ports are allocated in the ipc.ports zone. We can see a list of zones by running zprint:

zone name                   size       size       size     #elts      

#elts       inuse   size count  

-------------------------------------------------------------------------------------------------------------

vm.permanent                   1         32K         32K     32768      

32768       28590   16K 16384  

vm.permanent.percpu           2         32K         32K     16384      

16384       7592   32K 16384  

ipc.ports                   168       2304K       2304K     14043      

14043       13072   16K     97 C

ipc.port.sets                 96         32K         32K       341        

341         240   16K   170 C

ipc.vouchers                 80         48K         64K       614        

819         86   16K   204 C

tasks                       1576       320K       320K       207        

207         194   16K     10 C

proc                       1072       208K       208K       198        

198         192   16K     15 C

VM.map.copies                 80         64K         64K       819        

819         653   16K   204 C

pmap                         232         48K         48K       211        

211         195   16K     70 C

vm.objects                   256       3456K       3456K     13824      

13824       13572   16K     64 C

maps                         280         64K         64K       234        

234         207   16K     58  

VM.map.entries               80       1872K       1872K     23961      

23961       23593   16K   204 C

Reserved.VM.map.entries       80       160K       160K       2048      

2048         260   16K   204  

VM.map.holes                 32       160K       160K       5120      

5120       1381   16K   512 C

vm.pages                     64       160K       160K       2560      

2560       2141   16K   256 XC

default.kalloc.16             16       736K       736K     47104       47104       45673   16K   1024 C

default.kalloc.32             32       400K       400K     12800      

12800      12433   16K   512 C

default.kalloc.48             48       400K       400K       8533      

8533       6641   16K   341 C

default.kalloc.64             64       416K       416K       6656      

6656       6568   16K   256 C

default.kalloc.80             80       320K       368K       4096      

4710       2617   16K   204 C

default.kalloc.96             96       240K       240K       2560      

2560       2469   16K   170 C

default.kalloc.128           128       320K       320K       2560      

2560       2425   16K   128 C

[snip]

Note the existence of the kalloc zones. kalloc builds on top of the zone allocator for objects that do not have a dedicated zone. These objects are allocated by size, and placed into the smallest bin available for the requested size. An exploit developer can use these to control the sizes of their allocations, allowing the heap to be "groomed" and filled with arbitrary attacker-controlled data. In order to control the data being type-confused, the exploit developer typically wants to transfer the page containing the target object from a type-specific zone to a kalloc zone. Generally, that requires controlling every allocation of a page, where one of the allocations is the target object.

For example, suppose there's a use-after-free of a socket object. The attacker will want to perform a zone transfer so that the dangling pointer's data is entirely attacker-controlled. To do this, a standard flow might be:

1. Allocate ("spray") a large number of socket objects. This ensures that any holes in the pages already in the socket zone are filled in, and then starts allocating one or more fresh pages containing objects whose creation was initiated by the attacker (and therefore the attacker can free them at any time).
2. Trigger the "free" part of the use-after-free bug. This stage will depend on the specifics of the bug in question, but the end result is that the target object is freed, but can still be accessed through a dangling pointer.
3. Free the sprayed objects, in the hope that the page containing the target object will no longer contain any allocations. At this point, the page is empty but still considered part of the socket zone.
4. Cause a garbage collection by creating memory pressure, such as allocating and then freeing a large amount of memory in userspace. This will mark the page that formerly contained the target object as free, allowing another zone to claim the page.
5. Attempt to reallocate the target object as a different type, such as entirely attacker-controlled data via kalloc.
6. Trigger the "use" part of the use-after-free bug. This will perform some action on the target object, which has had its data changed. For example, it may call a function pointer that is now attacker-controlled.

The purpose of zone_require is to prevent this entire technique from working. When the dangling socket pointer is referenced after having its contents replaced, a zone check will occur to validate that the page is still owned by the correct zone. Here's an example usage where an object kmsg is checked to ensure that its allocation is inside the correct zone:

zone_require_example

In the case of CVE-2020-27932, we see the panic message: "zone_require failed: address in unexpected zone id 107 (host_notify) (addr: 0xffffffe19c7a54d0, expected: ipc ports)". Helpfully, Ian Beer's write-up mentions that "there are presumably some more tricks to get around that". The vulnerability exists as far back as iOS 12.0, so we could simply go back in time to before zone_require was introduced in order to experiment with this vulnerability, but Corellium offers a better way by using Kernel Hooks to disable the mitigation altogether.

Introduction to Kernel Hooks

Corellium Kernel Hooks allow us to introspect and modify the kernel at runtime, similar to using a Python script attached to a breakpoint in lldb. Kernel Hooks, however, have some significant advantages:

- Able to be set/modified from within the Corellium web interface without connecting the kernel debugger, including executing on every boot.
- Hooks execute without locking, allowing race conditions to be investigated (which a traditional debugger might prevent from triggering by pausing all cores whenever a breakpoint triggers).
- Hooks are written in a C-like language.

At the most basic, we can use a hook to print to the console when a certain instruction is reached, for example by placing a hook at the first instruction of a function at some address (for a made up example, 0xfffffff007738eb0):

print_int("Reached hooked function, x0=", cpu.x[0]);

This will log to the console in purple text, showing when the function is called, and printing the value of the X0 register.

To disable zone_require, we'll need to locate the function that enforces the check and causes a kernel panic. We can do this by locating the string used in the panic message, "zone_require failed" in the kernelcache opened in Binary Ninja, and then following the cross-references to the relevant function.

disassembly

We can disable this mitigation entirely by simply return from this function. This is done in the hooks language by setting the PC register to the contents of the LR register (also known as X30):

print("zone_require called\n");

cpu.pc = cpu.x[30];

On the Kernel Hooks tab, add a new hook and input the correct address for the beginning of the function (fffffff007768fb8, note that 0x should not be entered) and the contents of our hook, then click Create hook. Leave the patch type as csmfcc to use the C-like hooks language.

kernel_hook

If we run the proof-of-concept again with the hook in place, we'll see the "zone_require called" message in purple, and then a different panic message:

mitigation_bypassed

From here we can continue to explore this vulnerability as if the mitigation didn't exist and begin implementing an exploit for it, and then deal with the mitigation later.