# Linux Kernel Debugging ###### tags: `GNU` `Linux` `Kernel` `Debugging` `Tracing` `C` `C++` `x86` `ASM` `GCC` `GDB` `Operating System` A comprehensive assorted collection of Linux user-level/kernel-level debugging tools such as *GDB*, *KGDB*, *Valgrind*, *runtime sanitizers*, *program analysis tools*, etc. ![image](https://hackmd.io/_uploads/BJcOrrNfA.png) > [!Warning] > The memory layout in ++Little Endian (LSB -> MSB)++ such as: > ```arm! > (long long)(0x0123456789abcdef) > ``` > becomes: > ```! > memory low to high: [ef][cd][ab][89][67][45][23][01] > ``` > in actual memory. > [!Note] **Related articles** > - [Assembly & GNU C - shibarashinu](https://hackmd.io/@shibarashinu/rkIVYJZmA) > - [Compiler: The Program, the Language, & the Computer Work - shibarashinu](https://hackmd.io/@shibarashinu/SyEHz-JHC) :::info :arrow_right: For runtime tracing/profiling tools for debugging, please visit: [System Performance Analysis - shibarashinu](https://hackmd.io/@shibarashinu/r1Ww_zvmR). ::: ## Overview - **testing** - **automated testing:** regression test, TDD, CICD intergration, real world existing unit test frameworks: Qt QTestLib, Google test, ... - **code coverage tools:** critical code testing coverage analysis: Linux gcov with gcc/clang `--coverage` flag, Windows VS tools, ... - **debugging** - **static code analysis:** compiler warnings, clang-tidy, PVS-Studio CICD regular improvement tools, [CodeQL](https://codeql.github.com/), ... - **logging systems:** easy-to-use that automatically turns on/off for developing & debugging on the target units. - **assertions:** - **runtime assertion:** development process error-detection helpers. - **static assertion:** compile-time error-detection helpers. - **runtime program analysis** - **general non-intrusive analysis tools:** valgrind, helgrind, callgrind, ... - **specialized intrusive runtime sanitizers:** compiler sanitizer for memory usage, memory leak, uninitialized memory, thread race condition, undefined behavior, ... - **runtime tracing** - **application tracing tools:** span from low-level sharedlibrary/binary checking tools to high-level tracing frontends: strace, perf, eBPF, ... (available without source code & debugging symbols). - **debuggers:** runtime interactive debugging tools: gdb, lldb, rr, ... [An Overview of Debugging Tools for C and C++ Applications - Qt Developer](https://www.kdab.com/c-cpp-debugging-tools/) ## Logging Systems ### User-Level Logs - **Systemd/Journal** [systemd/Journal - ArchWiki](https://wiki.archlinux.org/title/Systemd/Journal) ```clike= #include <systemd/sd-journal.h> int main() { sd_journal_send("MESSAGE=hello from user program", "PRIORITY=%i", LOG_INFO, NULL); return 0; } ``` View the log messages: ```sh apt install libsystemd-dev # link the library from libsystemd-dev package gcc <source-code> \ $(pkg-config --cflags --libs libsystemd) \ -o <program> ./<program> journalctl -xe ``` Helper: ```sh man sd-journal ``` :::info - Python ```python= import systemd.journal systemd.journal.send("MESSAGE=hello from user program", PRIORITY=systemd.journal.Priority.INFO) ``` - Node.js ```javascript= const journal = require("systemd-journal"); journal.log("hello from user program", journal.INFO); ``` ::: ### Kernel-Level Logs - **Printk** *Kernel Logging Deamon* > **Note:** *Kernel messages are stored in a circular buffer, so large amounts of output will be overwritten.* ```c= #include <linux/kernel.h> void func() { printk(KERN_INFO "my_driver: initialized\n"); } ``` - [/include/linux/kern_levels.h](https://elixir.bootlin.com/linux/v6.11.1/source/include/linux/kern_levels.h) ```c= #define KERN_EMERG "0" /* system is unusable */ #define KERN_ALERT "1" /* action must be taken immediately */ #define KERN_CRIT "2" /* critical conditions */ #define KERN_ERR "3" /* error conditions */ #define KERN_WARNING "4" /* warning conditions */ #define KERN_NOTICE "5" /* normal but significant condition */ #define KERN_INFO "6" /* informational */ #define KERN_DEBUG "7" /* debug-level messages */ #define KERN_DEFAULT "" /* the default kernel loglevel */ ``` View the log messages: ```sh # Display kernel messages via journalctl. journalctl -k | grep "keywords" # Display kernel messages via dmesg. sudo dmesg -w watch -n 1 "dmesg | grep 'keywords'" ``` ## User-Level Debugging ### Info of Programs / Shared Libraries #### Basic Information ```sh file <program> ``` ```sh size <program> ``` ```sh # list shared libraries ldd <program> ``` #### Inspect ELF Files ```sh # ElF's program headers info readelf -hW <program> # ELF's section headers info readelf -SW <program> # Address: runtime virtual memory address (VMA) # Off: section offset in the binary file # Size: section size objdump -hw <program> # All symbols (of all sections) readelf -sW <program> | c++filt # c++filt: demangle C++ symbol names nm -n <program> # <program.debug> # R/r: readonly symbols # U: undefined (dynamic linked) symbols # u: unqiue global symbols (GNU extension) # T/t: text code symbols # B/b: bss unintialized symbols # D/d: data initialized symbols # W/w: weak symbols # All shared library relocation symbols (of all sections) readelf -rW <program> objdump -R <program> ``` ```sh # dump all functions in all text sections objdump -dM intel <program> # dump all binaries in all data sections objdump -s <program> ``` > Alternative, if we have source code, we can just compile it into assembly & inspect directly: e.g., `g++ <source-code> -S`, with name demangling `c++filt < <input> > <output>`. > [!Note] **Name Demangling** > Transforming C++ ABI identifiers (like RTTI symbols) into the original C++ source identifiers. > - [Demangling - The GNU C++ Library - GNU GCC](https://gcc.gnu.org/onlinedocs/libstdc++/manual/ext_demangling.html) > - [C/C++ Mangle, Demangle, ABI, RTTI, & Debugging - 巴黎河畔](https://www.cnblogs.com/robinex/p/7892795.html) > - [In C++ what is name mangling? - Nathan Baggs](https://www.youtube.com/watch?v=Wvj5w-b6h1Q) :::info **ELF Headers** [Acronyms relevant to Executable and Linkable Format (ELF) - buffalo.edu](https://web.archive.org/web/20190428202733/https://www.cs.stevens.edu/~jschauma/631/elf.html) ```C= typedef struct { uchar_t ident[16]; // elf info HWORD type; // Executable (no PIE/ASLR), Shared Object (so, exe with PIE), core_dump ... HWORD ehsize; // struct size of this elf_header Addr entry; // entry of virtual address for starting the process, or shared libary's constructor Offset phof // entry of program (segment) header table Offset shoff; // entry of section header table HWORD phentsize; // struct size of program (segment) header HWORD phnum; // number of program (segment) headers HWORD shentsize; // struct size of section headers HWORD shnum; // number of section headers HWORD shstrndx; // table for resolving section symbols } Elf32_Ehdr; // at file's offset: 0 ``` Intro: - [In-depth: ELF - The Extensible & Linkable Format - stacksmashing](https://www.youtube.com/watch?v=nC1U1LJQL8o) - [Differentiate an ELF executable from a shared library - serializethoughts](https://serializethoughts.com/2019/06/29/elf-pic-pie) ::: :::warning **ELF ABI Debugging Tools** *Library to analyze and compare ELF ABIs.* `libabigail` includes tools to analyze the debugging data of each binary, to infer fine-grained ABI data on both sides of an interface: what a caller uses, and what the callee provides. Another tool compares two alternative versions of the same binary or library, to answer whether they are substitutable—whether they provide the same interface. > **Note:** `libabigail` compatibility checks currently focus on ++function presence++, and ++type compatibility++ related to ==the layout of complex variables in memory==, which are the most common sources of incompatibility that occur with evolving software. (Other aspects of ABI compatibility are also being investigated.) [Application binary interface compatibility testing with libabigail](https://developers.redhat.com/articles/2024/05/20/application-binary-interface-compatibility-testing-libabigail#) [libabigail - Ubuntu Manuals](https://manpages.ubuntu.com/manpages/bionic/man7/libabigail.7.html) ::: ### Process Info *Pseudo-File-System Process Monitor: [proc(5)](https://man7.org/linux/man-pages/man5/proc.5.html)* ![image](https://hackmd.io/_uploads/HJCxnU_-C.png =500x) ### GDB [[Book] Debugging with GDB: the GNU Source-Level Debugger - sourceware.org](https://sourceware.org/gdb/current/onlinedocs/gdb.html/) :::warning **Debugging Symbols x Source Code Mapping** To enable debugging & inspecting variables & functions, both executables & sharedlibraries should have compiled debugging symbol in either the ELF related sections or the dettached `.debug` files. 1. **Debugging Symbols** - embedded within the same executable/sharedlibrary: ```sh! # for the main executable gcc <src-code> -Og -g \ -L. -l<linked-so-lib> ... \ -o <program> ``` ```sh! # for shared libraries gcc -c <src-code> -Og -g \ -fPIC \ -shared \ -o <so-lib>.so ``` > Compiling with the `-g` flag will store all debugging info (additional description of how the source code text relates to the binary code) & the source code path in ELF's sections that follow the *DWARF* format. - stored in dettached (stripped) `.debug` file: ```sh! gcc <src-code> -Og -g -o <program> # separate the debug info & store it in different file objcopy --only-keep-debug <program> <program>.debug strip --strip-debug --strip-unneeded <program> objcopy --add-gnu-debuglink=<program>.debug <program> # (optional) add .gdb_index section to the debug file to speedup GDB's debug symbol indexing gdb-add-index <program>.debug ``` then in GDB, GDB will auto load the dettached `.debug` file listed in executable's `.gnu_debuglink` section, or set a `.debug` file root directory for GDB searching: ```sh! (gdb) set debug-file-directory <debuginfo-root-path> (gdb) show debug-file-directory # default: /usr/lib/debug ``` or manually load the `.debug` file: ```sh! # for the main executable (gdb) symbol-file <program>.debug ``` ```sh! # for shared libraries (gdb) add-symbol-file <so-lib>.debug ``` :::info **Glibc (dettached) debugging info?** Download Glibc's `.debug` files at system default debug symbol root directory `/usr/lib/debug`: ```sh! apt install libc6-dbg ``` ::: 2. **Source Code Mapping** - source code path is stored in executable's *DWARF* debug info section. if the source code can't be found on the given path, manually add the source code path via: ```sh! (gdb) dir /root/path # can do multiple times to add different directories ``` or change the source root path entirely: ```sh! (gdb) set substitute-path /old/root/path /replaced/root/path ``` > [!Warning] > This tip is useful yet powerful when cross compiling and debugging on a different target system. :::info **Glibc source code?** Download under the current directory: ```sh! cd <target-path> apt source glibc ``` > [!Warning] > Glibc debugging info uses ==relative path== to locate the source file (e.g., `./libio/ioputs.c`). > > So the root relative path should be pointed to the root of the glibc source: > ```sh! > (gdb) set substitute-path . < # glibc debug info uses relative path > ``` ::: finally, use `(gdb) info line`, `(gdb) info sharedlibrary`, `(gdb) info source` to inspect. ::: > [!Note] **GDB Config & Plugins** > - `~/.gdbinit` > ```sh= > set disassembly-flavor intel > set print pretty on > ... > ``` > - [gdb-dashboard](https://github.com/cyrus-and/gdb-dashboard) > - source highlight plugin: > :::warning > **[Pygments](https://pygments.org/docs/): Python syntax highligher** > > install on ubuntu/debian systems: > ```sh! > apt install python3-pygments > ``` > or > ```sh! > pip install Pygments > ``` > or > ```sh! > python3 -m pip install --user Pygments > ``` > ::: > - [gdb-peda](https://github.com/longld/peda) > the extended `.gdbinit` config for program exploiting. #### Start GDB ```sh! # load the designated executable & its debugging symbols gdb <program> (gdb) file <program> # same # load the designated executable only (gdb) target exec <program> ``` ```sh! # provide the core dump (RAM image) (gdb) target core <core> ``` ```sh! # debug on the designated process gdb -p <pid> (gdb) attach <pid> # same ``` ```sh! # mix gdb <program> <core> gdb <program> <pid> ``` ```sh! # redirect stdout to other /dev/tty gdb -tty <pts> (gdb) tty <pts> # same ``` ```sh! # remote debugging sudo apt install gdbserver # server: gdbserver runs the program gdbserver :1234 <program> # <ip>:<port> (e.g., server listens on localhost:1234) # client: gdb frontend attachs to the server gdb <program> \ -ex "target remote :1234" (gdb) target remote :1234 # same ``` > [!Note] **GDB Remote Debugging** > ![image](https://hackmd.io/_uploads/rkju_Stnxg.png) > > (Source: [Debugging ARM programs inside QEMU - Balau](https://balau82.wordpress.com/2010/08/17/debugging-arm-programs-inside-qemu/)) #### Basic Usage [Controlling GDB - Debugging with GDB](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Controlling-GDB.html) ```sh! # Turn on native TUI (gdb) layout next # Or using dashboard UI plugin (gdb) help dashboard # Run/Restart the entire program (gdb) [r]un (grb) starti # run & break at the 1st insn # List out the source code (gdb) [l]ist <func> # Switch to the different function frame on the stack (gdb) bt [f]rame <id> (gdb) up # go to caller frame (gdb) down # go to callee frame # Control (gdb) [n]ext (gdb) [n]ext[i] (gdb) [s]tep (gdb) [s]tep[i] (gdb) [c]ontinue [c]ontinue -a [c]ontinue & # allow running gdb commands (non-stop mode only) (gdb) interrupt # stop process/thread ("continue &" only) interrupt -a (gdb) until # end loops (gdb) finish # end frame (gdb) return <val> # return frame (prematurely) # Reversed control (single-thread works only) (gdb) record # init instruction-recording target (gdb) rn # reverse-next (gdb) rsi # reverse-stepi # Shell functions (gdb) shell <commands> (gdb) !<commands> ``` > [!Tip] **GDB Record & Reverse Debugging** > - [GDB and Reverse Debugging - sourceware.org](https://www.sourceware.org/gdb/news/reversible.html) > - [rr: lightweight recording & deterministic debugging](https://rr-project.org/): available recording & replay to debug the entire program from start to crash. #### Examining & Manipulating Program ```sh! # Breakpoints (code) (gdb) [b]reak main Web::HTML::HTMLParser::run *0x1234 if i > 10 main.cpp:12 (gdb) [tb]reak ... # one-time (temporary) breakpoint (gdb) [hb]reak ... # hardware-support breakpoint (gdb) [thb]reak ... # one-time hardware breakpoint # Watchpoints (data) (gdb) watch <var> # write watcher *<addr> (gdb) rwatch <var> # read watcher *<addr> (gdb) awatch <var> # read/write watcher *<addr> i # monitor if i's value changed *(int *) 0x1000 # monitor if [0x1000]~[0x1004] changed # Catchpoints (interrupt) (gdb) catch <signal> # e.g., SIGINT, SIGKILL, SIGILL, ... signal # all standard signals # break before/after the syscallsyscall <syscall> # e.g., mmap, ioctl, ... syscall # all syscalls # break at the process' events <event> # e.g., (v)fork, C++ throw/catch, ... # Fine-tune how the signals shall perform (gdb) handle <signal> nostop noprint pass ignore # ignore: nopass to program (gdb) info breakpoints [d]elete <id> disable <id> enable <id> ``` ```sh! # Show variable type (gdb) whatis <var> # Print variable's type (gdb) [pt]ype <var> # like decltype(<var>) # Print variable's value (gdb) p/t <var> # in bin p/x <var> # in hex (gdb) p <var> # in dec 'a.c'::<static-var> <expression> *(char *)($esp + $eax + arr[13]) (int) $xmm0.v4_float $fs_base # thread-local storage base (e.g., fs:[0x0]) $cr3 # kernel used only (page table base) # Set variable's values (gdb) p *arr = b $pc = main+5 # $pc == $rip $pc = $2 # the 2nd debug print result $var = (struct A) {} # GDB self-defined vars $ptr = (Elf64_Dyn*) <addr> # reinterpret a pointer (e.g., p ptr[5]) (int)(add(22, 55)) # call function # Disassemble the function (gdb) [disas]semble func $pc # Examine the memory (gdb) x/3cb 0x2000 # 3 [c]har [B]YTE (1B) (gdb) x/3th 0x2000 # 3 [t]wo [H]WORD (2B) (gdb) x/3dw 0x2000 # 3 [d]ec [W]ORD (4B) (gdb) x/3xg 0x2000 # 3 he[x] [G]IANT (8B) (gdb) x/5s ((char **) argv)[0] # 5 [S]tring (NULL$: ...'\0') (gdb) x/i $pc # 1 [I]nstruction (gdb) x/4i func # Show info of current execution context (gdb) info reg # CPU's basic registers all-reg # CPU's registers (including FPU, SIMD) (gdb) info auxv # program's startup auxiliary vector (gdb) info file # all sections info (program view) (gdb) info proc map # all segments info (process view) stat # current process info (user view) status # current process info (os view) (gdb) info local # current func frame's vars arg # current func frame's args frame # current func frame's info (gdb) info func /regex/ # query func symbols var /regex/ # query var symbols sym <addr> # query func/var symbol at <addr> ``` #### Advanced Usage ```sh! # Load sharedlibrary's debug symbols (gdb) set auto-solib-add off # disable auto loading *.so debug symbols (gdb) info sharedlibrary # all sharedlibrary info sharedlibrary /regex/ # for this run only (gdb) set stop-on-solib-events 1 # break on loading sharedlibrary ``` ```sh! # Run with ASLR (RAM gets shuffled every run) (gdb) set disable-randomization off # on gdbserver gdbserver --no-disable-randomization ... ``` #### Multi-Process/Thread Debugging [Stopping and Starting Multi-thread Programs - Debugging with GDB](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Thread-Stops.html) ```sh! # Non-stop control set non-stop on # truly non-blocking processes/threads (only break on explicit breakpoints or commands) ``` ```sh! # Multi-process debugging (gdb) info inferiors # list processes (gdb) inferior <id> # switch between processes # or (gdb) catch fork # stop on every (v)fork # set on fork() behavior (gdb) set follow-fork-mode child parent (gdb) set detach-on-fork on # shall let go the other process off (gdb) set schedule-multiple on # shall resume the other process off ``` > [!Warning] **GDB Can't Hold Parent Process while Halting the Child** > I.e., if `set follow-fork-mode parent` && `set detach-on-fork off`, must `set schedule-multiple on`. ```sh! # Multi-thread debugging (gdb) info threads # list threads thread <id> # switch to the thread (gdb) [b]reak <func> thread <id> (gdb) [b]reak main.c:33 thread 5 if <bool-expression> # set multithread tracing behavior (gdb) set scheduler-locking off # free others every time on this next/step on # stop others until continue step # stop others, continue them manually ``` > [!Note] **Multi-Thread Debugging in GDB** > - **Logging Events** > When isolating a bug in a multithreaded application, having a log of the event sequences leading up to failure is crucial. > :::warning > A trace buffer is a simple mechanism for storing this event information. > ::: > - **Event Logging Techniques** > Bracket events in the trace buffer with "before" and "after" messages to clearly determine the order of events. > :::warning > It is a good practice to utilize tracepoints to log or record the sequence of events as they occur. > ::: > - **Multi-Thread Executions against Debugging** > Be aware that running the application in a debugger may alter runtime timing conditions, potentially masking race conditions. > :::warning > **Multi-Thread Debugging Issues** > > *GDB may interfere with system calls.* > > [Interrupted System Calls - Debugging with GDB](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Interrupted-System-Calls.html) > > 1. **Debugging influences certain syscalls** > In multi-thread debugging, syscalls may return prematurely due to GDB using signals for handling breakpoints. > > To mitigate this, developers should check the return values of all system calls and handle those cases. > > For example, if a thread is on `sleep` and is signalled by other thread breakpoint, we should record the remaining `sleep` time & "catch up on sleep" in a loop: > ```C= > int remaining = 30; > do { > remaining = sleep(remaining); > } while (remaining > 0); > ``` > 2. **Threads might not be *lockstep* while Debugging** > Threads might not be coherent as running the normal program, use `set scheduler-locking` to fine-tune the other threads' behavior. > ::: > For more advanced debugging technique, recommand using proprietary debugging tools (e.g., `Intel system debugger`, `Intel thread checker`, `Intel thread profiler`, ...). > > [Multithreaded Debugging Techniques - Dr. Dobb's](https://drdobbs.com/cpp/multithreaded-debugging-techniques/199200938?pgno=6) #### Other Usage ```sh! # Run arbitrary instructions on heap (gdb) p !!($cbase = (void*) calloc(100, sizeof(void*))) & !!($cptr = $cbase) (gdb) p !!($page_size = sysconf(_SC_PAGESIZE)) & !!($cbase_page = ((uintptr_t) $cbase & ~($page_size - 1))) & !!((int)(mprotect($cbase_page, $page_size, 0x7))) # alternatives: # - sysconf(_SC_PAGESIZE) vs. getpagesize() # - mprotect(,,0x7) vs. mprotect(,,PROT_READ|PROT_WRITE|PROT_EXEC) # - (int)mprotect(...) vs. (void*)mmap(...) (gdb) info proc map (gdb) p *((<insn_size_type>*)$cptr)++ = <insn> # e.g., x86 insn: mov eax,0x0 # p *((char[5]*)$cptr)++ = {0xb8} ... (gdb) x/20i $cbase (gdb) p $pc = $cbase (gdb) ni ... ``` #### Frame Filter Frame filters are Python based utilities to manage and decorate the output of frames. It would process the input backtrace presentation and (might) return a new one. - [Management of Frame Filters - GDB Docs - sourceware.org](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Frame-Filter-Management.html) - [Writing a GDB Frame Filter - Min-Yih Hsu - Medium](https://medium.com/@mshockwave/writing-a-gdb-frame-filter-43bef88c9a53) #### Core Dump *Memory Snapshot at the Crash* ```sh ulimit -c unlimited # change the default name of the core dump file sysctl -w kernel.core_pattern=./my_core_dump # original: core ``` ```sh gdb <src-code> -g -Og -o <program> # -g: with debugging info gdb <program> <core> (gdb) bt (gdb) frame <frame-id> (gdb) ... ``` [Debugging TMUX Crashes - TMUX - GitHub](https://github.com/tmux/tmux/wiki/FAQ#tmux-exited-with-server-exited-unexpectedly-or-lost-server-what-does-this-mean) > **Why Called Core?** > In the older references, *core* was meant to be *memory* (magnetic-core memory, 1950~1970 before the birth of semiconductor memory), so when the core is dumped means actually writing the current state of memory into a file. > > [How to get a core dump for a segfault on Linux - Julia Evans](https://jvns.ca/blog/2018/04/28/debugging-a-segfault-on-linux/) ### Valgrind *General Non-Intrusive Program Analysis Tools* ```sh valgrind <program> # -q: quiet, -v: verbose ``` > [!Warning] **Limitation** > Valgrind is unable to detect all cases of bounds errors in the use of static or stack-allocated data. :::warning It is better to use *language-/system-specific sanitizers*. See the following topic: [Compiler Sanitizers](#Compiler-Sanitizers). ::: :::info :arrow_right: For more detailed info about Valgrind debugging tech, please visit: [Valgrind - Compiler: The Program, the Language, & the Computer Work - shibarashinu](https://hackmd.io/@shibarashinu/SyEHz-JHC#Valgrind). ::: ### Callgrind *Call-Graph Generating Cache & Branch Prediction Profiler (integrated inside Valgrind).* - [Callgrind - Valgrind User Manual](https://valgrind.org/docs/manual/cl-manual.html) - [性能优化之 vallgrind 之 callgrind 分析瓶颈 - 懒人李冰](https://lazybing.github.io/blog/2019/04/15/profiler/) Start tracing: ```sh valgrind --tool=callgrind <program> ``` Open the interactive interface while tracing: ```sh (watch) callgrind_control -e -b <target> # target: <pid> or <program-name> ``` View the profiling report: ```sh callgrind_annotate <output-file> # output-file: callgrind.out.<pid> ``` ### Compiler Sanitizers *Intrusive Native Specialized Runtime Program Analysis Tools* - **AddressSanitizer (ASan):** memory errors. - heap/stack/global buffer overflows. > E.g., `auto c = new int[10]; int x = c[11];` (overflow), `auto c = new int[10]; int x = c[-1];` (underflow). - use-after-free. > E.g., `delete ptr; ptr;` (dangling pointer). - mismatched new-delete, ... > E.g., `~VirtualBase() = default;` vs. `virtual ~VirtualBase() = default;` (runtime polymorphism's base not calling derived destructors). ```sh! clang++ <source-code> \ -fsanitize=address \ -o <program> export ASAN_OPTIONS=new_delete_type_mismatch=0 # disable new-delete ./<program> ``` :::info **Address Sanitizer Example in TCMalloc** Implementation of GWP-ASan in TCMalloc -- turns each `malloc` into page-based allocation with guard page around & make each `free` set the page non-modifiable. It is able to catch such the following bug: ```Cpp= int main() { std::string s = "Hello "; std::string_view sv = s + "World\n"; // temparary std::string std::cout << sv; } ``` [2019 LLVM Developers’ Meeting: M. Morehouse “GWP-ASan: Zero-Cost Detection of MEmory Safety...” - LLVM](https://www.youtube.com/watch?v=RQGWMLkwrKc) ::: - **LeakSanitizer (LSan):** memory leaks. > E.g., `auto p = malloc(10); p = nullptr;` (lost heap reference). :::info Can be used with/without ASan. ::: ```sh! clang++ <source-code> \ -fsanitize=leak \ -o <program> ./<program> ``` Or with ASan: ```sh! clang++ <source-code> \ -fsanitize=address \ -o <program> export ASAN_OPTIONS=detect_leaks=1 # enable leak sanitizer ./<program> ``` - **MemorySanitizer (MSan):** use-before-init. > E.g., `new int[10]` (not `new int[10]()`) then read `arr[i]` value. [MemorySanitizer - Clang Docs](https://clang.llvm.org/docs/MemorySanitizer.html) ```sh! clang++ <source-code> \ -fsanitize=memory \ -o <program> ./<program> ``` :::warning **Debug with Sanitizer Library** ```C++= #include <sanitizer/msan_interface.h> // Dump shadow for a memory range. Shadow bit of 0 corresponds to initialized memory, 1 - to uninitialized memory. __msan_print_shadow(ptr, size); // Make memory range fully initialized. Does not change actual memory contents, but only MemorySanitizer perception of them. __msan_unpoison(ptr, size); // Source code ... ``` Embedded these functions in the source code, or call them in gdb: ```sh! (gdb) b __msan_warning (gdb) b __msan_warning_noreturn ``` ```sh! (gdb) p $p = &obj (gdb) p $s = sizeof(obj) (gdb) call __msan_print_shadow($p, $s) ``` [MemorySanitizer - Google - GitHub](https://github.com/google/sanitizers/wiki/MemorySanitizer) [MemorySanitizer (MSan) - Chromium Docs](https://www.chromium.org/developers/testing/memorysanitizer/) ::: - **UndefinedBehaviorSanitizer (UBSan):** undefined behavior. > E.g., `int *p = nullptr; int i = *p;`. ```sh! clang++ <source-code> \ -fsanitize=undefined \ -o <program> ./<program> ``` - **ThreadSanitizer (TSan):** race conditions. > E.g., `void thread_func() { static int i = 0; ++i; }` (need exclusive data access). ```sh! clang++ <source-code> \ -fsanitize=thread \ -o <program> ./<program> ``` ### Radare2 *A Reversed engineering investigation tool* > Can inspect ROP attacks. [radare2 - radareorg - GitHub](https://github.com/radareorg/radare2) ```sh radare2 -d <program> # -d: debugging ``` #### Basic Usage ```sh help # hint of general commands ? # hint of advanced commands aa, aaa, ... # analyze the whole program af # analyze the function afl # list all accessible functions pdf # disassemble the functions s <addr> # seek to a different address s # step into the disassembly function S # step over the disassembly function dsf # step out (debug step until frame exit) db <addr> # set a break point dc # start the program (debug continue) / # search for string patterns /x # search for hexadecimal patterns wa # write assembly ``` ![image](https://hackmd.io/_uploads/B1PNMvsekg.png) #### In UI View's Commands ```sh v # change UI view ENTER # show/hide the side panel ARROW_KEYS # move disassembly function's window CLICK <regs> # head to the address storing in that register : <cmd> # enter commands ``` ![image](https://hackmd.io/_uploads/rJwpEuixJg.png) ### Ghidra *A software reverse engineering (SRE) framework.* [ghidra - NationalSecurityAgency - GitHub](https://github.com/NationalSecurityAgency/ghidra/) ### Binwalk *Dig into & auto analyze useful information, text, info entropy, ...* [binwalk - ReFirmLabs - GitHub](https://github.com/ReFirmLabs/binwalk) ## Kernel-Level Debugging ### Syscalls & Unix Philosophy An interface to get into the protected in-kernel subsystems & access the system resources. - **Modern Fast `syscall` Specialized Instruction:** Trap into the kernel like calling a normal function, getting rid of `int 0x80` which has enomous overhead to lookup & use the interrupt handler to change mode & execution context from the user space to the kernel space. - :+1: [The Definitive Guide to Linux System Calls - Joe Damato](https://blog.packagecloud.io/the-definitive-guide-to-linux-system-calls/) - **Unix Philosophy:** Dedicated to produce simple, compact, clear, modular, composable, & extensible code. - [Unix system calls (1/2) - Brian Will](https://www.youtube.com/watch?v=xHu7qI1gDPA) - [Unix system calls (2/2) - Brian Will](https://www.youtube.com/watch?v=2DrjQBL5FMU) - [[Online Book] Basics of the Unix Philosophy](https://cscie2x.dce.harvard.edu/hw/ch01s06.html) - [Book] The Design of the UNIX Operating System - Maurice J. Bach: 1980s UNIX System V - [Book] Unix Programming Environment - Brian W. Kernighan, Rob Pike - **Operating System / Supervisor History:** From the very start of the operating system concept implementations: [Multics](https://www.multicians.org/history.html) ([MIT](https://web.mit.edu/) / [Bell Tele Labs (BTL)](https://www.bell-labs.com/) / [GE](https://www.ge.com/), 1964~, time-sharing, :+1: [influence on UNIX](https://en.wikipedia.org/wiki/Multics#Novel_ideas)), [OS/360](https://en.wikipedia.org/wiki/OS/360_and_successors) ([IBM](https://www.ibm.com/history/system-360), 1966~, batch processing), to the modern operating systems (UNIX, Linux, [Plan 9](https://www.bell-labs.com/institute/blog/plan-9-bell-labs-cyberspace/)), & PC's operating systems ([CP/M](https://en.wikipedia.org/wiki/CP/M), DOS, Windows). - [Compatible Time-Sharing System (CTSS) - Wiki](https://en.wikipedia.org/wiki/Compatible_Time-Sharing_System) - [The UNIX Operating System: Making Computers More Productive - AT&T Tech Channel](https://www.youtube.com/watch?v=tc4ROCJYbm0) > **UNIX: A User-Friendly Out-of-the-Box Operating System** > E.g., hierarchical file systems & advanced management, shell / pipeline / unified shell interpreter interface / shell scripts, every resources / devices are files. - [Linux Protection Rings - DJ Ware](https://www.youtube.com/watch?v=BGA019kHhwU) > **Back in the 60s did OS/360 have the protection ring?** > Yes, but actually no. You simply set a bit that made your application look like it was part of the operating system, and you could take over the entire machine. 🤣 :::info **Daemons vs. Demons:** - **Daemons:** System services of user-space processes. - **Demons:** [Kernel drivers & kernel modules](https://www.baeldung.com/linux/kernel-drivers-modules-difference) of in-kernel kthreads. ::: :::warning **Nowadays Ring from 0 to 3 Seems not Enough to Use:** > ![image](https://hackmd.io/_uploads/SyuI7iBLkx.png =300x) > > (Source: [Authorization - xkcd](https://xkcd.com/1200/)) For example: - [CVE-2022-0185 - NVD](https://nvd.nist.gov/vuln/detail/cve-2022-0185): A heap-based buffer overflow flaw was found in the way the `legacy_parse_param` function in the Filesystem Context functionality of the Linux kernel verified the supplied parameters length. ++An unprivileged (in case of unprivileged user namespaces enabled, otherwise needs namespaced `CAP_SYS_ADMIN` privilege) local user able to open a filesystem that does not support the Filesystem Context API (and thus fallbacks to legacy handling) could use this flaw to escalate their privileges on the system++. ::: - SysVinit / UpStart / Systemd - [init演化歷程 – [轉貼] 淺析 Linux 初始化 init 系統,第 1 部分: sysvinit](http://felix-lin.com/linux/init演化歷程-轉貼-淺析-linux-初始化-init-系統,第-1-部分-sysvinit/) - `/etc/systemd/system/ssh-agent.service` ```sh [Unit] Description=SSH Key Agent After=network.target [Service] Type=simple Environment=SSH_AUTH_SOCK=%t/ssh-agent.socket ExecStart=/usr/bin/ssh-agent -D -a $SSH_AUTH_SOCK [Install] WantedBy=default.target ``` - [Unix vs Linux - Gary Explains](https://www.youtube.com/watch?v=jowCUo_UGts) - [UNIX、BSD 與 Linux 的愛恨情仇 - 陳毅](https://hackmd.io/@czPKboGUQZi6-txq9HcDqw/rJcbA-gWu) #### Syscall Practices - **Tracing a Write Syscall:** [Syscalls, Kernel vs. User Mode and Linux Kernel Source Code - bin 0x09 - LiveOverflow](https://www.youtube.com/watch?v=fLS99zJDHOc) - **Custom Syscall with Custom *Kernel Image* + *initramfs* Setup**: [Adding Simple System Call in Linux Kernel - Nir Lichtman](https://www.youtube.com/watch?v=Kn6D7sH7Fts) - **Custom Environment Setup for Child Processes:** ```c= int main(int argc, const char* argv[]) { pid_t parent_pid = getpid(); pid_t pid = fork(); if (pid > 0) { // parent process int status; // wait for child process end pid_t waited_pid = wait(&status); if (waited_pid >= 0 && WIFEXITED(status)) { perror("Wait failed"); exit(1); } } else if (pid == 0) { // child process pid_t child_pid = getpid(); // replace STDOUT with the custom output file close(STDOUT_FILENO); // fd: 1 // this will take the smallest available fd (i.e. 1) open("./output.txt", // If pathname does not exist, create one O_CREAT // O_RDONLY, O_WRONLY, O_RDWR | O_WRONLY, // gnu.org/software/libc/manual/html_node/Permission-Bits.html S_IRWXU); const char child_argv[] = { argv[1], NULL }; int ret = execvp(child_argv[0], child_argv); if (ret == -1) { // same fprintf(stderr, "Execution failed: %s", strerror(errno)); perror("Execution failed"); exit(1); } } else { perror("Fork failed"); exit(1); } return 0; } ``` - **My Shell Process:** ```c= int main(int argc, const char* argv[]) { for (;;) { printf("[shell]: "); char *input_str = NULL; size_t size; size_t input_len = getline(&input_str, &size, stdin); if (input_len > 1) { // replace '\n' with '\0' input_str[line_len - 1] = '\0'; const char child_argv[] = { input_str, NULL }; int pid = fork(); if (pid > 0) { // parent process int status; // wait for child process end pid_t waited_pid = wait(&status); if (waited_pid >= 0 && WIFEXITED(status)) { perror("Wait failed"); exit(1); } free(input_str); } else if (pid == 0) { // child process int ret = execvp(child_argv[0], child_argv); if (ret == -1) { perror("Execution failed"); exit(1); } } else { perror("Fork failed"); exit(1); } } } } ``` ### System.map Enable to access symbols by name in kernel built-ins (`CONFIG_DEBUG_INFO` build option required) & in kernel modules (`CONFIG_KALLSYMS` build option required). [System.map - Wiki](https://en.wikipedia.org/wiki/System.map) :::info **Use Cases:** When debugging into a function pointer whether it is in `crashdump` files or runtime print info, where it isn’t clear what function is behind the reference, it is better to dump out the content of that pointer, then match it with the *System.map* file. ::: ### Debugging with Kernel Core Dump ```sh! # make sure that vmlinux does align to the current kernel image gdb vmlinux /proc/kcore ``` ### Enable Kernel Debugging To enable GDB debugging on the kernel, make sure the following: - **The kernel should build (compile & install) with additional debugging flags on configurations:** Resources: - [Using kgdb, kdb and the kernel debugger internals - Linux Kernel Docs](https://www.kernel.org/doc/html/v6.3/dev-tools/kgdb.html) - [How Linux Kernel Runs Executables - Nir Lichtman](https://www.youtube.com/watch?v=ZlZDWeVL2LI) (Optional) Build kernel on a specific architecture & toolchain. ```sh export ARCH=arm64 export CROSS_COMPILE=aarch64-none-elf- # generate default .config make defconfig ``` ```sh # make with customized config make menuconfig ``` Through the UI setting page, here we can enable `CONFIG_KGDB`, `CONFIG_KGDB_KDB`, `CONFIG_KGDB_SERIAL_CONSOLE`, `CONFIG_MAGIC_SYSRQ`, debugging information, ... & disable `KASLR` (KGDB assumes kernel code in fixed address), virtulization, loadable kernel modules, networking, ... Or simply write them in `.config` file & build with it. ```config= # Enable in-kernel GDB stubs (for cross-machine debugging) CONFIG_KGDB = y # Kernel code (vmlinux file) with debugging symbols (source for emulators) CONFIG_DEBUG_INFO = y ... ``` (Optional) Install kernel modules for the target kernel image. ```sh make INSTALL_MOD_PATH=/path/to/install modules_install ``` Finally, Install the custom Linux kernel. ```sh make -j <CPU-core-number> # to enable parallel processes ``` Generate GDB-related scripts & config files that facilitate debugging the kernel using GDB. ```sh make scripts_gdb ``` :::info **vmlinux vs. vmlinuz** - `vmlinux`: uncompressed kernel image used for development and debugging. - `vmlinuz`: compressed version used for booting. ::: - **[Cross-Machine Debugging] Configuring proper settings on runtime:** Resources: - [How to Debug your Linux Kernel - ByteSnap](https://www.bytesnap.com/news-blog/how-to-debug-your-linux-kernel/) Init the *KGDB* at the boot time. For example, in the target debugging kernel's *GRUB* bootloader configuration, specify: ```sh # Connect to serial port ttyS0 at 115200 Baud rate, # then "kgdbwait" will pause the execution, # & the booting procedure will be passed to the KDB debugger. kgdboc=ttyS0,115200 kgdbwait ``` :::info **How Does the Kernel Perform Kernel Debugging like a Normal GDB?** The target kernel for debugging contains a ++GDB stub++ that talks ==GDB remote protocol across a serial port==. That is, GDB is running on the host machine, & communicates with the target hardware's kernel via KGDB service. This permits GDB to single step through the kernel, set breakpoints and trap exceptions that happen in kernel space and interrupt execution. It also permits the NMI (Non-Maskable Interrupt) button or serial port events to jump the kernel into the debugger. More: - [DEBUGGING FR-V LINUX - Linux Kernel Docs](https://www.kernel.org/doc/Documentation/frv/gdbstub.txt) - [Debugging Remote Programs - Debugging with GDB](https://sourceware.org/gdb/current/onlinedocs/gdb.html/Remote-Debugging.html) ::: :::info **Applications Using GDB Stubs** Other applications like *Valgrind* also includes a GDB stub to allow debugging of the target program as it runs in *Valgrind*, with "monitor commands" that allow querying the Valgrind tool for various information. Also, Valgrind recompiles binary code to run on host and target (or simulated) CPUs of the same architecture in machine-independent SSA form IR done by JIT dynamic recompilation. [Valgrind - Wiki](https://en.wikipedia.org/wiki/Valgrind) ::: - **[Debugging with Emulator] Using QEMU or KVM-based VMs (with built-in GDB stubs), or JTAG-based hardware interfaces as target:** Resources: - :+1: [Debugging kernel and modules via gdb - Linux Kernel Docs](https://www.kernel.org/doc/html/v6.3/dev-tools/gdb-kernel-debugging.html) - [Debugging the Kernel with QEMU - Linux Kernel Exploitation 0x0 - Keith Makan](https://blog.k3170makan.com/2020/11/linux-kernel-exploitation-0x0-debugging.html) - [Linux 核心設計: 開發與測試環境 - Yiwei Lin](https://hackmd.io/@RinHizakura/SJ8GXUPJ6#%E7%B7%A8%E8%AD%AF%E4%B8%A6%E5%BB%BA%E7%BD%AE-kernel-%E6%98%A0%E5%83%8F%E6%AA%94(Linux)) - [用 gdb debug 在 QEMU 上跑的 Linux Kernel - AusTinTin](https://blog.austint.in/2022/01/16/run-and-debug-linux-kernel-in-qemu-vm.html) - [Booting a Custom Linux Kernel in QEMU and Debugging It With GDB - Nick Desaulniers](https://nickdesaulniers.github.io/blog/2018/10/24/booting-a-custom-linux-kernel-in-qemu-and-debugging-it-with-gdb/) - [Embedded Linux System Development - Bootlin](https://bootlin.com/doc/training/embedded-linux-beagleplay/embedded-linux-beagleplay-labs.pdf) ```sh # QEMU with GDB supports qemu-system-x86_64 -kernel vmlinux -s -S ``` ```sh # Start GDB with the vmlinux file gdb vmlinux # In GDB, Attach to the booted guest (gdb) target remote :1234 # On the process, it is allowed to use SysRq (system request)commands echo 'g' > /proc/sysrq-trigger ``` :::warning Some distros may restrict auto-loading of gdb scripts to known safe directories. In case gdb reports to refuse loading `vmlinux-gdb.py`, add the following command to `~/.gdbinit`: ``` add-auto-load-safe-path /path/to/linux-build ``` ::: The kernel comes with 2 different debugger frontends (*KGDB*, *KDB*) which interface to the debug core. #### KGDB KGDB is intended to be used as a *source level debugger* for the Linux kernel. It is used along with gdb to debug a Linux kernel. The expectation is that gdb can be used to “break in” to the kernel to inspect memory, variables and look through call stack information similar to the way an application developer would use gdb to debug an application. It is possible to place breakpoints in kernel code and perform some limited execution stepping. #### KDB A simplistic shell-style interactive interface for kernel debugging like a system console with a keyboard or serial console. It can be used to inspect memory, registers, process lists, dmesg, or even set breakpoints & do further operations or inspections. :::info KDB is mainly aimed at doing some analysis to aid in development or diagnosing kernel problems. ::: The main config option for KDB is `CONFIG_KGDB_KDB` which is called `KGDB_KDB`: include KDB frontend for KGDB in the config menu. In theory you would have already also selected an I/O driver such as the `CONFIG_KGDB_SERIAL_CONSOLE` interface if you plan on using KDB on a serial port, when you were configuring KGDB. ## Hardware-Aided Debugging Utilizing hardware ports & signals to perform low-level debugging, like *halting CPU*, *manipulating controllers*, *stepping into functions*, *doing I/O operations*, *establishing serial communications*, etc. ### x86 Debug Registers > breakpoint (intrusively embedded `INTRPT`): > ![image](https://hackmd.io/_uploads/rk9PpHw4gg.png =500x) > > step: > ![image](https://hackmd.io/_uploads/BkIl6SPVlg.png =500x) > > mem watch: > ![image](https://hackmd.io/_uploads/HyYvoHPEle.png =500x) > (Source: [一个动画搞懂调试器工作原理 - 轩辕的编程宇宙](https://www.youtube.com/watch?v=PFC9Qqcvi0M)) > mem watch debug registers: > ![image](https://hackmd.io/_uploads/BJAZhBvEle.png =500x) > (Source: [Debug Registers - Intel 80386 Reference Programmer's Manual - MIT](https://pdos.csail.mit.edu/6.828/2004/readings/i386/s12_02.htm)) ### JTAG (Joint Test Action Group) A unified interface for *testing*, *debugging*, *changing pin values*, or *loading firmwares* for internal logics at the boundary of the chips. :::info **Why JTAG?** GDB remote debugging can use the serial ports to communicate or use JTAG as a medium to transcribe the GDB commands into target-machine debugging instructions. > If the communication methods are not complete (e.g., I/O operations `getDebugChar`, `putDebugChar`, `flush_i_cache`, `exceptionHandler`, ... in the target-machine gdb stubs are not implemented) or want a more general debugging interface, JTAG debugging will be a suitable alternative. [進階 gdb - 用 Open Source 工具開發軟體: 新軟體開發關念](http://www.study-area.org/cyril/opentools/opentools/x1265.html) ::: [JTAG TAP Controller Tutorial - TechSharpen](https://www.youtube.com/watch?v=PhaqHKyAvR4) ![image](https://hackmd.io/_uploads/BJfVLbaeyl.png =500x) - **TAP Controller:** A state machine for test access port ![image](https://hackmd.io/_uploads/SJmr8W6eyl.png =500x) - **TDI/O:** Test data in/out - **TCLK:** Test clock - **TRST:** Test reset - **TMS:** Test mode state #### Debugging Mode - **Software Breakpoint:** still under OS surveillance and handle. - **Hardware Breakpoint:** using ISA and circuit design to enter debugging state. #### Applications - [From getting JTAG on the iPhone 15 to hacking Apple's USB-C Controller - Stacksmashing - DEFCONConference](https://www.youtube.com/watch?v=cFW0sYSo7ZM) ### Platform Debug Trigger Table (PDTT) System debugging spec under ACPI (Advanced Configuration & Power Interface)'s platform for capturing system info beyond the normal OS crash dump through hardware-level debug facilities. :::info **Example: OS Invoking Multiple Debug Triggers** When the OS encounters a fatal crash, ==prior== to collecting a crash dump and rebooting the system, the OS may choose to invoke the debug triggers in the order listed in the PDTT. The addresses of the doorbell register and the PCC general communication space (if needed) are retrieved from the PCCT, depending on the PCC subspace type of spec. ![image](https://hackmd.io/_uploads/rJ1Yaix-1e.png =500x) ::: Refs: - [Platform Debug Trigger Table (PDTT) - ACPI Spec](https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/05_ACPI_Software_Programming_Model/ACPI_Software_Programming_Model.html#platform-debug-trigger-table-pdtt) - [Platform Communications Channel (PCC) - ACPI Spec](https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/14_Platform_Communications_Channel/Platform_Comm_Channel.html) - [ACPI 體系中的重要名詞 - 初心者之家](https://boy-asmc.blogspot.com/2008/11/acpi.html) #### Platform Communications Channel (PCC) The platform communication channel (PCC) is a generic mechanism for OSPM to communicate with an entity in the platform (e.g. a platform controller, or a Baseboard Management Controller (BMC)). #### System Control Interrupt (SCI) An interrupt mechanism used in ACPI compliant systems. It enables the handling of power management and configuration events by notifying the OSPM layer when hardware events occur, such as device insertions. SCI interrupts allow multiple ACPI events to share the same interrupt vector, and can streamline event processing in the system.