GuidesHow to Debug the Kernel

How to Debug the Kernel

Let's debug a kernel!

One of the major benefits of using a virtual environment is the ability to inspect and modify the state of the whole system under user control. The primary interface to these features is through the TCP-based GDB remote protocol compatible stub.

Important Notes on the Kernel Debug Stub

  • The kernel debug stub presents the multiple CPUs in the system as 'threads' of the process being debugged. For instance, on an iPhone 6, there are two threads: 1 and 2, corresponding to CPU 0 and CPU 1. (GDB protocol does not allow for a thread called 0.)
  • Only all-stop mode is supported: if one CPU stops, the others stop as well.
  • Single-stepping is supported by using AArch64 single-stepping features. In case of vCont packets, if a CPU single-steps and other CPUs do not have actions specified, they do not perform a step.
  • Hardware breakpoints and watchpoints are supported. (Up to 4 of each at a time.) Software breakpoint packets issued to the stub are converted into hardware breakpoints.
  • Memory addresses from debugger are passed through virtual-to-physical address translation - this is necessary to make it work. Also, only actual RAM is visible to the debugger; access to MMIO regions is ignored / returns 0.
  • Only one concurrent debug stub connection is supported per VM.
  • High latency can cause the VM to visually “freeze” or “stutter” as the underlying kernel is breakpointing and communicating with your gdb. If you use features like watch / awatch / rwatch with conditions, every breakpoint is sent to your local machine, the condition is calculated, and then resumed if it was meant to ignore. This is how gdb functions, but it is typically not noticeable for local debugging. On the local gdb end, nothing will visually indicate that this is happening unless the condition is met, in which case you’ll receive a prompt.

Initial Setup

Connecting to kernel debug stub

Using GDB

You will need an AArch64 compatible GDB. The versions that ship with Ubuntu 16.04 on AArch64, as well as the Linaro public releases since 7.1.1 (x86_64 Linux) have been tested to work. The AArch64 debug protocol was somewhat in flux until 2015, so very old GDB versions may not interoperate correctly.

If you are using the cloud product, you will need to be connected to VPN.

To connect to the stub

Note: The address and port provided here are for example purposes only. You will need to use the address and port for your particular virtual device. You can find the address and port for your device at the end of the "kernel gdb" link, located at the bottom of the virtual device page.

Example

1
2
3
4
5
(gdb) target remote 10.11.1.1:44219
Remote debugging using 10.11.1.1:44219
warning: No executable has been specified and target does not support
determining executable automatically. Try using the "file" command.
0x00000008030e40c8 in ?? ()

To switch CPUs (in this case, to CPU 1)

1
2
3
(gdb) thread 2
[Switching to thread 2 (Thread 2)]
#0 0x00000008030e40c8 in ?? ()

To access monitor commands (see above)

1
2
(gdb) monitor sr ttbr1_el1=0x0000000034d4593d
CPU 1, ttbr1_el1 := 0x0000000034d4593d (before: 0x0000000000000000)

Otherwise, use regular GDB commands to control the debug stub.

Using IDA

The following instructions are for IDA 7.0 versions.

  • Select Debugger | Switch debugger... from main menu, then pick Remote GDB debugger in the dialog box.
  • Then, again from main menu, select Debugger | Debugger options.... Click the Set specific options button and make sure the Use stepping support checkbox is checked.
  • Finally, select Debugger | Process options... from main menu, enter the stub's address in the Hostname and Port fields. After this setup, which is saved in the IDA database, select Debugger | Attach to process... to attach to the running VM.

To access monitor commands from IDA, locate the GDB command line bar at the bottom of the window (just above the status bar, next to a GDB button). Enter the monitor commands there, without the word monitor itself. For instance, instead of monitor sr, simply write sr and press Enter. The output will appear in IDA's text output window above.

Using LLDB

The platform supports the version of LLDB that ships with Xcode on macOS. Debugging with lldb is beneficial for virtual devices running iOS because lldb already has built-in knowledge of the Mach-O format and the fact that it is dealing with an XNU kernel. It is possible to use the debugger with or without the Mach-O kernel binary. Using the binary slows down load time, but gives you access to the symbols that are in the binary.

To connect to the stub without the binary

Note: The address and port provided here are for example purposes only. You will need to use the address and port for your particular virtual device. You can find the address and port for your device at the end of the "kernel gdb" link, located at the bottom of the virtual device page.

Example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ lldb
(lldb) gdb-remote 10.11.1.1:44219
Kernel UUID: B99BA98C-3AAA-30E9-B733-07845A9EC56B
Load Address: 0xfffffff007004000
WARNING: Unable to locate kernel binary on the debugger system.
Process 1 stopped
* thread #1, stop reason = signal SIGINT
    frame #0: 0xfffffff0071a513c
->  0xfffffff0071a513c: bl     0xfffffff00709fc00
    0xfffffff0071a5140: mrs    x8, TPIDR_EL1
    0xfffffff0071a5144: ldr    x8, [x8, #0x428]
    0xfffffff0071a5148: str    xzr, [x8, #0xf0]
Target 0: (No executable module.) stopped.
(lldb)

To connect to the stub with the binary

Example

1
2
3
4
5
6
7
8
9
$ lldb ~/Documents/Work/kerneli61125
(lldb) gdb-remote 10.11.1.31:39535
Kernel UUID: B99BA98C-3AAA-30E9-B733-07845A9EC56B
Load Address: 0xfffffff007004000
Kernel slid 0x0 in memory.
Loaded kernel file /Users/planetbeing/Documents/Work/kerneli61125
Loading 140 kext modules warning: Can't find binary/dSYM for com.apple.kec.corecrypto (A6668145-C49A-3D84-96E6-80AE4AA9E4B4)
.warning: Can't find binary/dSYM for com.apple.kec.Libm (51AFA03E-8041-3D11-BD40-A6D1AED1C667)
.warning: Can't find binary/dSYM for com.apple.kec.pthread (75DF2E44-845A-3C15-987F-D53AC36CFD72)

...

1
2
3
4
5
6
7
8
9
10
11
12
13
.warning: Can't find binary/dSYM for com.apple.driver.usb.cdc (0B64EFC9-CF03-37AF-8C86-4A0C419BC87B)
.warning: Can't find binary/dSYM for com.apple.driver.AppleUSBDeviceMux (84B68AED-C037-3166-BA42-906FE26FA76F)
. done.
Process 1 stopped
* thread #1, stop reason = signal SIGINT
    frame #0: 0xfffffff0071a513c kerneli61125`___lldb_unnamed_symbol1665$$kerneli61125 + 296
kerneli61125`___lldb_unnamed_symbol1665$$kerneli61125:
->  0xfffffff0071a513c <+296>: bl     0xfffffff00709fc00        ; ___lldb_unnamed_symbol102$$kerneli61125
    0xfffffff0071a5140 <+300>: mrs    x8, TPIDR_EL1
    0xfffffff0071a5144 <+304>: ldr    x8, [x8, #0x428]
    0xfffffff0071a5148 <+308>: str    xzr, [x8, #0xf0]
Target 0: (kerneli61125) stopped.
(lldb)

The gdb stub represents CPU cores as threads.

1
2
3
4
5
6
7
8
9
10
11
12
(lldb) thread list
Process 1 stopped
* thread #1: tid = 0x0001, 0xfffffff0071a513c, stop reason = signal SIGINT
  thread #2: tid = 0x0002, 0xfffffff0071a513c
(lldb) thread select 2
* thread #2
    frame #0: 0xfffffff0071a513c
->  0xfffffff0071a513c: bl     0xfffffff00709fc00
    0xfffffff0071a5140: mrs    x8, TPIDR_EL1
    0xfffffff0071a5144: ldr    x8, [x8, #0x428]
    0xfffffff0071a5148: str    xzr, [x8, #0xf0]
(lldb)

You can use the regular lldb commands to control the debug stub.

Your debugger will work as if it was attached to a hardware debugger (think OpenOCD).

Hook into Kernel Functions

The hypervisor can trap certain instructions in the guest kernel and execute short pieces of user-supplied code before either executing or skipping the trapped instruction.

The hooks are written in a simple programming language and run in the hypervisor environment without pausing the VM or engaging the debugger.

The monitor command form shown here is appropriate for GDB. In IDA, you can enter monitor commands without the leading mon. In LLDB they are generally not available because of LLDB deficiencies.

Basic knowledge

How to set a hook:

1
mon patch 0xfffffff0061b8ca4 print(“hello world\n”);

A hook can be deactivated with:

1
mon patch 0xfffffff0061b8ca4 -

The virtual address is translated to physical based on current pagetable, so before the guest kernel is booted you have to use a physical address (translation is disabled).

Not all instructions can be hooked. The set of legal instructions includes all branches, most typical function prologue instructions and simple register MOVs.

Hooks run without locking on multiple cores. This is a major difference from using GDB breakpoint scripts, and (apart from the obviously better performance) allows investigating race issues in the guest.

Example

Let's say the function at 0xfffffff001234570 is a memory allocator, with the interface of malloc, and we want to tag every allocation with the calling address. This can be done with kernel hooks with fairly low effort.

First, we need to enlarge the allocation by 8 bytes to leave space for the tag, and save the size for later:

1
mon patch 0xfffffff001234570 global[0] = cpu.x[0]; cpu.x[0] += 8;

Then, we find the RET opcode the allocator function uses - let's say it's at 0xfffffff0012346e4. We can patch this opcode to write LR (return address) before actually returning:

1
mon patch 0xfffffff0012346e4 if(cpu.x[0]) { *(u64 *)(cpu.x[0] + global[0]) = cpu.x[30]; }

This is however not fully reentrant. The issue could be solved by adding a stack frame (remember that AArch64 SP must be 16-byte aligned) and not using global[0]:

1
2
mon patch 0xfffffff001234570 cpu.sp -= 16; *(u64 *)cpu.sp = cpu.x[0]; cpu.x[0] += 8;
mon patch 0xfffffff0012346e4 size = *(u64 *)cpu.sp; if(cpu.x[0]) { *(u64 *)(cpu.x[0] + size) = cpu.x[30]; } cpu.sp += 16;

Language features

The language used to write hooks is superficially similar to C, however it has the following differences from it:

  • the only types supported are integer types and pointers to them,
  • there is no way to define functions; each hook is written like a function body,
  • variables may be automatically declared at assignment if the type can be fully derived from the right-hand side,
  • variables can be declared anywhere before use, including inside for(;;) loop initial statements,
  • variables are globally scoped,
  • character string literals are not of pointer type, instead they are a special strlit type (because pointers access VM memory instead, literal strings can't be pointers).

The supported control structures are: if, if-else, while, do-while, break and for.

Built-in integer types can be referred to with standard C names (unsigned long etc.), stdint-style names (uint64_t) or short names (u64).

The unary & (reference) operator is not supported.

Variables declared with static keep their values between invocations of the hook, just like static variables inside a C function. There is a limit of 8 such variables in a given hook, imposed by the execution environment.

Accessing VM state

Accessing processor state is done by using a pseudo-struct cpu. The supported fields in that structure are:

  • cpu.x[0] to cpu.x[30] for 64-bit GPRs (note that register index must be a constant),
  • cpu.w[0] to cpu.w[30] for 32-bit GPRs,
  • cpu.pc, cpu.sp and cpu.cpsr for the processor state,
  • cpu.midr_el1, cpu.mpidr_el1, cpu.sctlr_el1, cpu.ttbr0_el1, cpu.ttbr1_el1, cpu.tcr_el1, cpu.spsr_el1, cpu.elr_el1, cpu.esr_el1, cpu.far_el1, cpu.par_el1, cpu.vbar_el1, cpu.isr_el1, cpu.contextidr_el1, cpu.tpidr_el1, cpu.tpidr_el0 and cpu.tpidrro_el0 for EL1-visible system registers.

Writing these fields will modify the corresponding processor state upon return to VM.

Accessing VM memory (kernel virtual address view) happens by dereferencing pointers. For instance,

1
print_int(“foo”, *(u64 *)0xfffffff001234568);

will print the value of a 64-bit word at 0xFFFFFFF001234568 in the VM.

Built-in functions

The following functions are available:

1
void print(strlit s);

prints a literal string.

1
void print_int(strlit s, u64 a);

print an integer with optional string prefix s (pass 0 to not print a string).

1
void print_str(strlit s, void *p, u64 z);

print a zero-terminated string from VM memory at p, maximum z bytes, optional prefix s.

1
void print_buf(strlit s, void *p, u64 z);

print a buffer from VM memory at p, maximum size z bytes, with optional prefix s.

1
void print_thread(strlit s, u64 t);

print info on thread t (pass 0 for current) with optional string prefix s.

1
void print_backtrace(strlit s, u64 d);

print backtrace, maximum depth d, with optional string prefix s.

1
void usleep(u32 usec);

delay for usec microseconds.

1
void debug(void);

cause the VM to take a debug trap immediately after the hook returns.

1
u64 mapped(type *p);

check if VM memory location is valid; the type determines the size of location.

1
u64 mapped(void *p, u64 z);

check if VM memory range is valid; the size is defined by z.

1
2
u64 min(u64 a, u64 b); u64 max(u64 a, u64 b);
s64 min(s64 a, s64 b); s64 max(s64 a, s64 b);

minimum/maximum for unsigned and signed values.

Additionally, the constant NULL is defined as 0.

The global[0] to global[63] are 64-bit numbers shared between all hooks installed on the VM.

OS-independent breakpoints

Sometimes it's desirable to break into the kernel debugger from user software running in EL0. Inserting a BRK opcode usually results in it being intercepted by the operating system kernel. But if you want to end up directly in the kernel debugger instead, you can use the following opcode sequence:

EL1 (64-bit)

1
hvc #0x7242

iOS EL0 (64-bit)

1
2
mrs xzr, cntpct_el0
hvc #0x7242

Linux EL0 (64-bit), Android EL0 (64-bit)

1
2
mrs xzr, pmcr_el0
hvc #0x7242

Linux EL0 (32-bit), Android EL0 (32-bit)

1
2
mrc p15,#0,r0,c9,c12,#0
hvc #0x7242

After running this code, the VM will enter a paused state. If a debugger is attached, this will result in the debugger registering a debug event (similar to a breakpoint).

Console output from anywhere in the VM

Sometimes, when writing patches to kernel or user applications, it's nice to be able to print output without having to ask the operating system to do so. In fact, it can be fairly hard to get to debug output from some places, such as EL0 software that has no standard output and disabled debugging calls.

This feature helps you get debug output from patches, shellcodes or maybe just helper programs without having to deal with operating system constraints.

Both EL0 and EL1 code can print directly to VM console via a special HVC (hypervisor call). The form of the HVC itself depends on environment:

EL1 (64-bit)

1
hvc #0x6C43

iOS EL0 (64-bit)

1
2
mrs xzr, cntpct_el0
hvc #0x6C43

Linux EL0 (64-bit), Android EL0 (64-bit)

1
2
mrs xzr, pmcr_el0
hvc #0x6C43

Linux EL0 (32-bit), Android EL0 (32-bit)

1
2
mrc p15,#0,r0,c9,c12,#0
hvc #0x6C43

To use the HVC, set registers x0 .. x2 (or r0 .. r2 for 32-bit code) to the following values:

CONSLOG_REQ_STR: prints a zero-terminated (C) string

1
2
x0 = 0xFFFF0000
x1 = pointer to string

CONSLOG_REQ_U64: prints a number as hex

1
2
x0 = 0xFFFF0001
x1 = number

CONSLOG_REQ_S64: prints a number as signed decimal

1
2
x0 = 0xFFFF0002
x1 = number

CONSLOG_REQ_HEX: prints a buffer as hex dump

1
2
3
x0 = 0xFFFF0003
x1 = pointer to buffer
x2 = size in bytes

The memory printing calls (CONSLOG_REQ_STR and CONSLOG_REQ_HEX) return status in x0; if negative, the call failed. If non-negative, returns number of bytes retrieved from the buffer (or string). The other calls simply return zero in x0.

Text printed through this output path will show up in the VM console, in cyan color using ANSI control characters.

An example for the string printing function, including handling EL0 page faults, is in the Corellium GitHub repository at guest-tools - the conslog program will echo its standard input to VM console (in the highly visible cyan color of hypervisor messages).