Bug hunting in the Linux kernel

oops

Continuing on from our last post on dynamic program analysis, today our focus is on what to do after we’ve discovered a bug. This post is a write up of my notes on the following articles:

Debugging Analysis of Kernel panics and Kernel oopses using System Map

Understanding a Kernel Oops!

Bug hunting — The Linux Kernel documentation

Tainted kernels — The Linux Kernel documentation

First let’s go over some terminology. Kernel bugs refer to an internal error in the kernel, kernel oops refers to a recoverable error which invokes the kernel exception handler, and kernel panic refers to a non-recoverable error that causes the operating system to do a controlled hang or reboot.

OOPS messages are useful for debugging because they give context to the kernel error such as the values of the CPU registers (which you could presumably restore while debugging), the address of the errant function, and the name of the executing process.

Triggering an oops

Taking reference from Prabhakar’s post, we write a simple kernel module that purposely dereferences a NULL pointer:

// SPDX-License-Identifier: GPL-2.0
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>

void create_oops(void)
{
	*(int *)0 = 0;
}

int init_module(void)
{
	pr_debug("oops from the module\n");
	create_oops();
	return 0;
}

void cleanup_module(void)
{
}

MODULE_LICENSE("GPL");

This code can be compiled outside the kernel with the following Makefile (note that while this Makefile strives to be as general as possible, you may have to edit the obj-m line depending on the name of your module, as well as the KDIR line depending on your Linux distribution):

ccflags-y := -DDEBUG=1

ifneq ($(KERNELRELEASE),)
obj-m += oops.o

else
KDIR ?= /lib/modules/$(shell uname -r)/build
default:
	make -C $(KDIR) M=$(PWD) modules

clean:
	make -C $(KDIR) M=$(PWD) clean
endif

We can now build and load the module by running make followed by sudo insmod oops.ko. Immediately the console tells us that the module is killed. Calling dmesg, we see the following oops dump:

[107618.113697] oops: loading out-of-tree module taints kernel.
[107618.113752] oops: module verification failed: signature and/or required key missing - tainting kernel
[107618.114352] oops from the module
[107618.114361] BUG: kernel NULL pointer dereference, address: 0000000000000000
[107618.114366] #PF: supervisor write access in kernel mode
[107618.114367] #PF: error_code(0x0002) - not-present page
[107618.114369] PGD 0 P4D 0 
[107618.114372] Oops: 0002 [#1] PREEMPT SMP PTI
[107618.114375] CPU: 3 PID: 165904 Comm: insmod Tainted: G           OE     5.9.14-arch1-1 #1
[107618.114376] Hardware name: LENOVO 253724U/253724U, BIOS 6IET72WW (1.32 ) 08/27/2010
[107618.114381] RIP: 0010:init_module+0x1d/0x30 [oops]
[107618.114384] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 66 66 66 66 90 66 66 66 66 90 48 c7 c6 54 f0 2b c1 48 c7 c7 00 10 2c c1 e8 93 1d 03 d8 <c7> 04 25 00 00 00 00 00 00 00 00 31 c0 c3 0f 1f 44 00 00 66 66 66
[107618.114386] RSP: 0018:ffff9e7c874cfc90 EFLAGS: 00010246
[107618.114388] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[107618.114390] RDX: 0000000000000000 RSI: ffffffff9a159b0f RDI: 00000000ffffffff
[107618.114391] RBP: ffffffffc12be020 R08: 0000000000000ef6 R09: 0000000000000004
[107618.114392] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9e7c874cfc98
[107618.114394] R13: ffffffffc12c1200 R14: ffffffffc12c1250 R15: 0000000000000000
[107618.114396] FS:  00007f9b60ed4740(0000) GS:ffff900c73d80000(0000) knlGS:0000000000000000
[107618.114398] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[107618.114399] CR2: 0000000000000000 CR3: 0000000027a10001 CR4: 00000000000206e0
[107618.114401] Call Trace:
[107618.114409]  do_one_initcall+0x59/0x240
[107618.114418]  do_init_module+0x5c/0x260
[107618.114422]  load_module+0x21a7/0x2450
[107618.114429]  __do_sys_finit_module+0xbd/0x120
[107618.114438]  do_syscall_64+0x33/0x40
[107618.114442]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Note that the “oops from the module” line comes from the module that was loaded in. Let’s break down the rest of the oops message.

BUG: kernel NULL pointer dereference, address: 0000000000000000: this tells us that the error was caused by a NULL pointer dereference. This is bad because we’re trying to get the value of whatever’s stored at 0x0, but there’s nothing there.

Oops: 0002 [#1] PREEMPT SMP PTI: here we get a little more context about the error. Bit 0 tells us if no page was found (0), or there was a protection fault (1). Bit 1 tells us if this was a read (0) or a write (1). Bit 2 tells us if the error occurred in kernel-mode (0), or in user-mode (1). [#n] tells us that the error occurred n times.

CPU: 3 PID: 165904 Comm: insmod Tainted: G OE 5.9.14-arch1-1 #1: since there was a kernel error, our kernel is now tainted. Tainting just means that the kernel is marked for future investigation. It’s fine to continue running a tainted kernel, and even if the cause of the error is undone, the kernel remains tainted. From this log, we can see that the error occurred on CPU 3, with process ID 165904, on the insmod (insert module) command. “G OE” tells us that this was an unsigned proprietary module built out-of-tree (i.e. outside the Linux kernel source tree). More definitions can be found here.

RIP: 0010:init_module+0x1d/0x30 [oops]: the RIP register stores the 64-bit instruction pointer that tells us the address of the instruction being executed. init_module is a symbol and needs to be translated into an address. +0x1d/0x30 refers to the offset from this translated address, and the length of the function respectively.

Code:...: contains the hex dump of the machine code being executed.

RSP, RAX, RBX, RCX, RDX, RSI, RDI, RBP, etc.: list the contents of various CPU registers.

Call Trace: gives us the sequence of functions that were called leading up to the oops.

Debugging the oops

To find out what went wrong, we open up a debugger on the offending module by calling gdb oops.ko. GDB is able to disassemble the function that caused our kernel oops:

(gdb) disassemble init_module
Dump of assembler code for function init_module:
   0x0000000000000060 <+0>:     call   0x65 <init_module+5>
   0x0000000000000065 <+5>:     nopl   0x0(%rax,%rax,1)
   0x000000000000006a <+10>:    mov    $0x0,%rsi
   0x0000000000000071 <+17>:    mov    $0x0,%rdi
   0x0000000000000078 <+24>:    call   0x7d <init_module+29>
   0x000000000000007d <+29>:    movl   $0x0,0x0
   0x0000000000000088 <+40>:    xor    %eax,%eax
   0x000000000000008a <+42>:    ret    
End of assembler dump.

+0x1d = +29, and so we see the error comes from the instruction movl $0x0,0x0. $0x0 is a NULL pointer dereference. Voilà.

We can also find the specific offending line in the C code. This can be done either with some quick math (the function’s address is 0x60, the offset is at 0x1d, so the line at 0x7d is the culprit), or we can get GDB to do the translation and computation for us:

(gdb) list *(init_module+0x1d)
0x7d is in init_module (/home/desmond/linux_work/linux_misc/learning_oopses/oops.c:8).
3       #include <linux/module.h>
4       #include <linux/init.h>
5
6       void create_oops(void)
7       {
8               *(int *)0 = 0;
9       }
10
11      int init_module(void)
12      {

Line 8 was indeed the line that dereferenced a NULL pointer, so we’ve found the bug again!

Other tidbits

Loading additional symbol files into GDB

If you need to find the address of the module’s text section, that can be found in sysfs:

$ sudo cat /sys/module/oops/sections/.text
0xffffffffc12be000

If GDB needed additional symbol table information to perform our translations, we could have done it like so:

(gdb) add-symbol-file oops.o 0xffffffffc12be000
add symbol table from file "oops.o" at
        .text_addr = 0xffffffffc12be000
(y or n) y
Reading symbols from oops.o...

objdump

We can also disassemble our object files to view their assembly code directly.

$ objdump -dr oops.o

oops.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <create_oops>:
   0:   e8 00 00 00 00          callq  5 <create_oops+0x5>
                        1: R_X86_64_PLT32       __fentry__-0x4
   5:   c7 04 25 00 00 00 00    movl   $0x0,0x0
   c:   00 00 00 00 
  10:   c3                      retq   
  11:   66 66 2e 0f 1f 84 00    data16 nopw %cs:0x0(%rax,%rax,1)
  18:   00 00 00 00 
  1c:   0f 1f 40 00             nopl   0x0(%rax)

0000000000000020 <init_module>:
  20:   e8 00 00 00 00          callq  25 <init_module+0x5>
                        21: R_X86_64_PLT32      __fentry__-0x4
  25:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  2a:   48 c7 c6 00 00 00 00    mov    $0x0,%rsi
                        2d: R_X86_64_32S        .rodata.str1.1
  31:   48 c7 c7 00 00 00 00    mov    $0x0,%rdi
                        34: R_X86_64_32S        __dyndbg
  38:   e8 00 00 00 00          callq  3d <init_module+0x1d>
                        39: R_X86_64_PLT32      __dynamic_pr_debug-0x4
  3d:   c7 04 25 00 00 00 00    movl   $0x0,0x0
  44:   00 00 00 00 
  48:   31 c0                   xor    %eax,%eax
  4a:   c3                      retq   
  4b:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

0000000000000050 <cleanup_module>:
  50:   e8 00 00 00 00          callq  55 <cleanup_module+0x5>
                        51: R_X86_64_PLT32      __fentry__-0x4
  55:   c3                      retq

If you’re trying to find the offending symbol in the Linux kernel itself, you can run objdump on vmlinux, which is a statically linked executable (ELF) file containing the object code of the kernel itself.

DesmondCheong