# Kernel Debugging References - Kernel OOPS [TOC] ## Notes Kernel throws oops in the kernel logs when something faulty happens. This [`dmesg` output](https://launchpadlibrarian.net/730774504/CurrentDmesg.txt) in the [LP: #2066126](https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-535/+bug/2066126). ## Kernel OOPs checklist The kernel oops may seem overwhelming, but here are a few things worth checking. ### Summary of the bug itself First line after the `----[ cut here ]----` line in the kernel oops is usually a summary of what caused the bug. For example, here it is the UBSAN, or the Undefined Behavior Sanitizer, or runtime undefined behavior checking code that inserted during the compile time, that detects undefined behavior and decides to emits this kernel oops: ``` [ 28.514734] UBSAN: array-index-out-of-bounds in build/nvidia/535.171.04/build/nvidia-uvm/uvm_pmm_gpu.c:2364:28 ``` ### The instruction pointer register On the bottom of a oops, there's the register dumps. It lists register with their contents when the error happens. Because registers are architecture-specific, this part of kernel oops varies between architectures. For example, in the first kernel oops from the [`dmesg` output](https://launchpadlibrarian.net/730774504/CurrentDmesg.txt) of LP: #2066126, the register dump is: ``` [ 28.515091] RIP: 0033:0x74fa66324ded [ 28.515105] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00 [ 28.515106] RSP: 002b:00007ffd17f5ee10 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 28.515108] RAX: ffffffffffffffda RBX: 000074fa56300860 RCX: 000074fa66324ded [ 28.515109] RDX: 00007ffd17f5eeb0 RSI: 0000000000000025 RDI: 0000000000000008 [ 28.515109] RBP: 00007ffd17f5ee60 R08: 000074fa563008f0 R09: 0000000000000000 [ 28.515110] R10: 000074fa6620d630 R11: 0000000000000246 R12: 00005fd35677ba16 [ 28.515110] R13: 000074fa563008f0 R14: 00007ffd17f5eeb0 R15: 0000000000000008 ``` Note that the RIP is the instruction pointer in x86_64 architecture. See [*x86-64 General Purpose Registers - Architecture 1001: x86-64 Assembly*](https://youtu.be/XvJzR3eb0b4). The instruction pointer is useful when ### Comm: the offending process "Comm" here means "Command". For example, here it is the `gst-plugin-scan`: ``` [ 28.514739] CPU: 6 PID: 2315 Comm: gst-plugin-scan Tainted: P O 6.8.0-31-generic #31-Ubuntu ``` ### The tainted flags The same lin also shows a `Tainted: P`. Here 'P' means that there's proprietary module loaded. This is because this bug is a log was collected from the bug in [`nvidia-graphics-drivers-535`](https://launchpad.net/ubuntu/+source/nvidia-graphics-drivers-535) package, which is the proprietary Nvidia driver package for Ubunu. There are also other flags than the `P` flag. All those flags are defined in the [`kernel/panic.c`](https://elixir.bootlin.com/linux/latest/source/kernel/panic.c#L478): ```c const struct taint_flag taint_flags[TAINT_FLAGS_COUNT] = { [ TAINT_PROPRIETARY_MODULE ] = { 'P', 'G', true }, [ TAINT_FORCED_MODULE ] = { 'F', ' ', true }, [ TAINT_CPU_OUT_OF_SPEC ] = { 'S', ' ', false }, [ TAINT_FORCED_RMMOD ] = { 'R', ' ', false }, [ TAINT_MACHINE_CHECK ] = { 'M', ' ', false }, [ TAINT_BAD_PAGE ] = { 'B', ' ', false }, [ TAINT_USER ] = { 'U', ' ', false }, [ TAINT_DIE ] = { 'D', ' ', false }, [ TAINT_OVERRIDDEN_ACPI_TABLE ] = { 'A', ' ', false }, [ TAINT_WARN ] = { 'W', ' ', false }, [ TAINT_CRAP ] = { 'C', ' ', true }, [ TAINT_FIRMWARE_WORKAROUND ] = { 'I', ' ', false }, [ TAINT_OOT_MODULE ] = { 'O', ' ', true }, [ TAINT_UNSIGNED_MODULE ] = { 'E', ' ', true }, [ TAINT_SOFTLOCKUP ] = { 'L', ' ', false }, [ TAINT_LIVEPATCH ] = { 'K', ' ', true }, [ TAINT_AUX ] = { 'X', ' ', true }, [ TAINT_RANDSTRUCT ] = { 'T', ' ', true }, [ TAINT_TEST ] = { 'N', ' ', true }, }; ``` Alternatively, the [*Tainted kernel*](https://docs.kernel.org/admin-guide/tainted-kernels.html) in the kernel documentation also introduces those flags. ## References Also there's a tool called `decode_stacktrace.sh` in Linux kernel that, as its name suggests, decodes stack trace in the kernel oops into source files and lines. Under the hood it uses `addr2line`. See the following: 1. [*decode_stacktrace: make stack dump output useful again*](https://lwn.net/Articles/592724/) 2. [*Linux Kernel Bug Fixing Mentorship*](https://himadripandya.me/post/634481719919165440/linux-kernel-bug-fixing-mentorship) 3. [*Decoding stack traces in the Linux kernel*](https://www.desmondcheong.com/blog/2021/06/02/decoding-stack-traces-in-the-linux-kernel/) 4. [*Debugging Analysis of Kernel panics and Kernel oopses using System Map*](https://sanjeev1sharma.wordpress.com/tag/debug-kernel-panics/) 5. [*Understanding a Kernel Oops!*](https://www.opensourceforu.com/2011/01/understanding-a-kernel-oops/) ### [Tools and Techniques to Debug an Embedded Linux System](https://youtu.be/4rBrS65P-yM) {%youtube 4rBrS65P-yM %}