[TOC] ## Introduction For Ubuntu, [Kernel crash dump](https://ubuntu.com/server/docs/kernel-crash-dump) describes how to have machine generate dump file upon crash. The instrutions doesn't always work, and usually need extra tweaks. Plus, it only shows how to generate the dump file itself, not including how to prepare environment for reading the dump file. There are another few steps to do before you can read the dump file using `crash(8)` properly. This articitle describes both aspects. It describes how to set up `kdump` on the crashing machine in the first half. The other half decribes how to set up environments to read that dump file. ## Setup `kdump` on crashed machine ### Step 1: Install `kdump-tools` This should be really forward: ``` $ sudo apt install linux-crashdump ``` ### (Issue 1) kdump is not enabled by default Note that in the documentation, is is stated that: *...Starting with 16.04, the kernel crash dump mechanism is enabled by default.* When fact, it is not. At least not on my `lima` virtual machine, nor is it enabled on Ubuntu 22.04 for amd64 desktop. It has to be enabled manually. See the next step. ### Step 2: `dpkg-reconfigure` If this is the first time you install these package, you'll see prompts asking you to hand over reboot to `kexec-tools`: ``` |------------------------| Configuring kexec-tools |------------------------| | | | | | If you choose this option, a system reboot will trigger a restart into a | | kernel loaded by kexec instead of going through the full system boot | | loader process. | | | | Should kexec-tools handle reboots (sysvinit only)? | | | | <Yes> <No> | | | |---------------------------------------------------------------------------| ``` and whether to enable `kdump`: ``` |------------------------| Configuring kdump-tools |------------------------| | | | | | If you choose this option, the kdump-tools mechanism will be enabled. A | | reboot is still required in order to enable the crashkernel kernel | | parameter. | | | | Should kdump-tools be enabled be default? | | | | <Yes> <No> | | | |---------------------------------------------------------------------------| ``` To enable `kdump`, both options should be `yes`. You can enable those options later after installation, by `dpkg-reconfigure`: ``` $ dpkg-reconfigure kexec-tools ``` And: ``` $ dpkg-reconfigure kdump-tools ``` ### (Issue 2): `grub` failed to update due to `memtest86+` Now, here's one issue you may face. If you are running Ubuntu 22.04 on a x86-64 machine, chances that you may fail to `update-grub` due to `memtest86+` not supporting EFI boot: ``` ... Memtest86+ needs a 16-bit boot, that is not available on EFI, exiting ... ``` And if that happens, the `dpkg-reconfigure` will not take effect, and `kdump` will not enable. To deal with it, see discussion in [Update-grub broken, Memtest86+](https://askubuntu.com/questions/1421573/update-grub-broken-memtest86). You can work around by: ``` $ sudo chmod -x /etc/grub.d/20_memtest86+ ``` And then run the `dpkg-reconfigure` commands again. This should make it work on x86 machine. ### Step 3: reboot After reboot, if you run: ``` $ kdump-config show ``` You should see: ``` DUMP_MODE: kdump USE_KDUMP: 1 KDUMP_COREDIR: /var/crash crashkernel addr: 0x /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-5.19.0-45-generic kdump initrd: /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-5.19.0-45-generic current state: ready to kdump kexec command: /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinuz-5.19.0-45-generic root=UUID=94ffbc88-f087-4bc9-8d8f-dc365280fc1c ro console=tty1 console=ttyS0 reset_devices systemd.unit=kdump-tools-dump.service nr_cpus=1 irqpoll nousb" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz ``` Note the `Ready to dump` message. If `kdump` is not set up properly, it will show `current state: Not ready to kdump` instead. ### (Optional) Set dump file format in `/etc/` If `kdump` is set up using these steps, the default output format will be a compressed format called `dumpfile`. `gdb` is not able to interpret this format, however. Only `crash(8)` can understand it. If you'd like to do post-moterm using `gdb` (or `drgn`), you'll need to dump in ELF format. This can be done by editing `MAKEDUMP_ARGS` parameter in [`/etc/default/kdump-tools`](https://manpages.debian.org/unstable/kdump-tools/kdump-tools.5.en.html): ``` $ vim /etc/default/kdump-tools ``` The `MAKEDUMP_ARGS` parameter specifies what options are passed to `makedumpfile` when a dump file is generated. By default, it is commented out: ``` # --------------------------------------------------------------------------- # Makedumpfile options: # MAKEDUMP_ARGS - extra arguments passed to makedumpfile (8). The default, # if unset, is to pass '-c -d 31' telling makedumpfile to use compression # and reduce the corefile to in-use kernel pages only. #MAKEDUMP_ARGS="-c -d 31" ``` Uncomment that option, and replace `-c` option with `-E`, meaning that instead of using the compressed (`-c`) `dumpfile` format, store the dump file into an ELF (`-E`): ``` # --------------------------------------------------------------------------- # Makedumpfile options: # MAKEDUMP_ARGS - extra arguments passed to makedumpfile (8). The default, # if unset, is to pass '-c -d 31' telling makedumpfile to use compression # and reduce the corefile to in-use kernel pages only. #MAKEDUMP_ARGS="-E -d 31" ``` ### Step 4: increase reserved memory size `kdump` reserved a portion of memory for the kdump kernel. Determining how much it should reserved is another lore. Instead of setting a lower value from the beginning (like what the documentation does), I suggest make it large enough in the very beginning --- even it may seem to large --- and then gradually decrease it. It's better to make sure it works before optimizing the size. So edit `etc/default/grub.d/kdump-tools.cfg`, and make it (as the documentation suggests): ``` GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=384M-:512M" ``` And then ``` $ sudo update-grub ``` ### Step 5: test `kdump` Crash the kernel by the SysRq: ``` $ sudo -s [sudo] password for ubuntu: # echo c > /proc/sysrq-trigger ``` After reboot, if kdump is properly configured, there should be a directory named after the datetime crash happens: ``` $ tree /var/crash/ /var/crash/ ├── 202306211421 │   ├── dmesg.202306211421 │   └── dump.202306211421 ├── kdump_lock ├── kexec_cmd └── linux-image-5.19.0-45-generic-202306211421.crash ``` The `dmesg.<datetime>` is the preserved `dmesg` log when crash happens; the `dump.<datetime>` file is the dump file, which can be analyze with `crash(8)` later on. ## Setup `crash` utilities For generating the dump files, the above steps will be sufficient. However, if you are going to *analyze* the dump file, you'd need: 1. An uncompressed kernel image i.e. the `vmlinux`, with kernel debug symbols. This is provided as `NAMELIST` in `crash(8)` terminologies. 2. A working `crash(8)` binary. Usually 1. can only be generated if certain configs is enabled during kernel compilation. Fortunately, in Ubuntu this can be installed as separate package. For 2., althought it can be installed by `apt install`, taht one doesn't always work (it doesn't work on my Ubuntu 22.04), so it is likely that you have to manually compile it from [source](https://github.com/crash-utility/crash/releases). ### Step 1: Install kernel debug symbol Follow the instructions in [CrashdumpRecipe](https://wiki.ubuntu.com/Kernel/CrashdumpRecipe): ``` $ echo "deb http://ddebs.ubuntu.com $(lsb_release -cs) main restricted universe multiverse deb http://ddebs.ubuntu.com $(lsb_release -cs)-updates main restricted universe multiverse deb http://ddebs.ubuntu.com $(lsb_release -cs)-proposed main restricted universe multiverse" | \ sudo tee -a /etc/apt/sources.list.d/ddebs.list ``` ``` $ sudo apt install ubuntu-dbgsym-keyring ``` ``` $ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys F2EDC64DC5AEE1F6B9C621F0C8CAB6595FDFF622 ``` Finally, install the debug symbol package. Note that dumpfile is peculiar about kernel version. It has to have `vmlinux` with debug symbol whose version is the same as the kernel that generates the dump file. If they happen to be the same (say, you are debugging your own machine), then it can be installed by: ``` $ sudo apt-get install linux-image-`uname -r`-dbgsym ``` otherwise, replace the `uname -r` part with proper kernel version: ``` $ sudo apt-get install linux-image-5.19.0-43-dbgsym ``` If succeeded, there will be a `vmlinux-<uname -r>-generic` appears under `/lib/debug/boot/`, say: ``` $ ls /lib/debug/boot/ vmlinux-5.19.0-43-generic ``` As a side note, if `vmlinux` version doesn't match that of the dump file, later `crash(8)` will complains: ``` crash: invalid kernel virtual address: 1fbc0 type: "current_task (per_cpu)" crash: invalid kernel virtual address: 1fbc0 type: "current_task (per_cpu)" crash: invalid kernel virtual address: 1fbc0 type: "current_task (per_cpu)" crash: invalid kernel virtual address: 1fbc0 type: "current_task (per_cpu)" crash: seek error: kernel virtual address: ffff892d3bc1fbc0 type: "current_task (per_cpu)" crash: seek error: kernel virtual address: ffff892d3bc9fbc0 type: "current_task (per_cpu)" crash: seek error: kernel virtual address: ffff892d3bd1fbc0 type: "current_task (per_cpu)" crash: seek error: kernel virtual address: ffff892d3bd9fbc0 type: "current_task (per_cpu)" crash: page excluded: kernel virtual address: ffffffff9463abc0 type: "current_task (per_cpu)" crash: page excluded: kernel virtual address: ffffffff9463abc0 type: "current_task (per_cpu)" crash: page excluded: kernel virtual address: ffffffff9463abc0 type: "current_task (per_cpu)" ``` ### Step 2: get a working `crash(8)` When you install `linux-crashdump`, there is a version `crash(8)` included in that package. The basic usage of `crash(8)` would be: ``` $ crash <debug kernel> <crash dump> ``` So in this case: ``` $ crash /lib/debug/boot/vmlinux-5.19.0-45-generic /var/crash/202306211421/dump.202306211421 ``` However this `crash` doesn't necessarily work. For example, on Ubuntu 22.04, if it is `crash 8.0.0` you are using, the `crash` will crash: ``` crash 8.0.0 Copyright (C) 2002-2021  Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation Copyright (C) 1999-2006  Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited Copyright (C) 2006, 2007  VA Linux Systems Japan K.K. Copyright (C) 2005, 2011, 2020-2021  NEC Corporation Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc. Copyright (C) 2015, 2021  VMware, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions.  Enter "help copying" to see the conditions. This program has absolutely no warranty.  Enter "help warranty" for details. GNU gdb (GDB) 10.2                              Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-pc-linux-gnu". Type "show configuration" for configuration details. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>.   For help, type "help". Type "apropos word" to search for commands related to "word"...   please wait... (gathering kmem slab cache data) crash: invalid structure member offset: kmem_cache_s_num        FILE: memory.c  LINE: 9619  FUNCTION: kmem_cache_init()   [/usr/bin/crash] error trace: 556099d8969e => 556099d5d2f4 => 556099e2b11b => 556099e2b09c ``` To deal with it, you have to compile newer version of `crash(8)` from scratch. Per the date of writing of this article, latest available version is [8.0.3](https://github.com/crash-utility/crash/releases). To compile it, first install the dependencies: ``` sudo apt install bison texinfo libz-dev ``` And then get the source code and compile it: ``` $ wget "https://github.com/crash-utility/crash/archive/refs/tags/8.0.3.zip" $ unzip 8.0.3.zip $ cd crash-8.0.3/ $ make ``` Finally, use the compiled `crash`: ``` $ ./crash /lib/debug/boot/vmlinux-5.19.0-45-generic /var/crash/202306211421/dump.202306211421 ``` If everything works well, the `crash> ` prompt will appear: ``` crash 8.0.3 Copyright (C) 2002-2022 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011, 2020-2022 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. Copyright (C) 2015, 2021 VMware, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 10.2 Copyright (C) 2021 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-pc-linux-gnu". Type "show configuration" for configuration details. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... /home/lima.linux/.gdbinit:1: Error in sourced command file: ~/.gef-2b72f5d0d9f0f218a91cd1ca5148e45923b950d5.py:52: Error in sourced command file: Undefined command: "import". Try "help". KERNEL: /lib/debug/boot/vmlinux-5.19.0-45-generic DUMPFILE: /var/crash/202306211421/dump.202306211421 [PARTIAL DUMP] CPUS: 4 DATE: Wed Jun 21 14:20:48 UTC 2023 UPTIME: 00:25:24 LOAD AVERAGE: 0.06, 0.04, 0.05 TASKS: 220 NODENAME: lima-default RELEASE: 5.19.0-45-generic VERSION: #46-Ubuntu SMP PREEMPT_DYNAMIC Wed Jun 7 09:08:58 UTC 2023 MACHINE: x86_64 (1000 Mhz) MEMORY: 4 GB PANIC: "Kernel panic - not syncing: sysrq triggered crash" PID: 3753 COMMAND: "bash" TASK: ffff8e51874bc8c0 [THREAD_INFO: ffff8e51874bc8c0] CPU: 3 STATE: TASK_RUNNING (PANIC) crash> ``` And you are good to go!