# Enabling `kdump` on CoreOS #### Overview of `kdumpctl` (executed by `kdump.service` in the real root) 1. check dump feasibility - checks if kdump is supported on this kernel and whether memory is reserved 2. check config - check if `/etc/kdump.conf` has valid configuration 3. check rebuild - setup initrd - sets appropriate variables for building the initrd, notably, checks `/etc/sysconfig/kdump` to see if a kernel is specified to be used as the crash kernel, otherwise, set currently running kernel as the kernel for the initrd. - sets the value for `TARGET_INITRD`, which is always `${KDUMP_BOOTDIR}/initramfs-${kdump_kver}kdump.img` - check system modified - check whether `TARGET_INITRD` exists, if not, return MODIFIED - check whether config files (`/etc/kdump.conf`) were modified - rebuild initrd if forced to or system is modified via `mkdumprd` 4. load the crash kernel - call `kexec` to load the kernel specified in `/etc/sysconfig/kdump`. If none is specified, use the currently running kernel as the crash kernel. - additional `kexec` args and kernel cmdline arguments for the crash kernel can be supplied by configuring `/etc/sysconfig/kdump`. - without configuration, `/proc/cmdline` (plus some additional arguments added by `kdump`) will be used as cmdline arguments for the crash kernel #### Integrating into *COS 1. My understanding is that we want all the configuration of `kdump` to be done in a declarative manifest, and just have it "work" without further manual configuration after booting up. 2. Kernel arguments need to be supplied to the main kernel to reserve memory for the crash kernel and `kdump` initrd 3. We would like to avoid letting`kdump` generate initrds behind `rpm-ostree`'s back so `ostree` can own `/boot` modifications. 4. Regarding https://github.com/coreos/fedora-coreos-tracker/issues/622#issuecomment-692362905, we need to make sure the `KDUMP_COMMANDLINE` variable is configured in `/etc/sysconfig/kdump` #### Proposed Flow 1. Include `kexec-tools` in the FCOS image. 2. Ignition edits necessary configuration for kdump. 3. Ignition adds the karg `crashkernel=256M` for the real root. This would require changing the dracut module to reboot at the switch-root point if kargs have been modified. 4. Ignition enables `kdump.service`. 5. Reboot at switch-root. 6. `kdump.service` generates its kdump initrd (based off of current running kernel) and places it alongside the main initramfs and kernel (`/boot/ostree/fedora-coreos-$bootchecksum`). 7. `kdump.service` loads the kdump initrd and kernel into memory via `kexec`. 8. kdump is now armed. 9. Update involving a new kernel happens. "BOOT_IMAGE" in `/proc/cmdline` changes to e.g. `/boot/ostree/fedora-coreos-$newbootchecksum`. 10. `kdump.service` detects no kdump initrd exists in `/boot/ostree/fedora-coreos-$newbootchecksum` and repeats Step 7 and Step 8. 11. kdump is now armed. #### Notes In the above flow, it seems like OSTree/rpm-ostree is not too involved, and most of the work to enable kdump, particularly Step 3, actually involves the First Boot team. However, Luca also mentioned that something similar to Step 3 can be done through OSTree, as well, thanks to a not-yet-merged PR by Robert. Another approach would be to just do everything after the switch-root through a systemd unit. I have verified that this works. However, Benjamin mentioned that it is unsafe to reboot from a systemd unit. #### To Discuss - Should we add kexec-tools to the base image or keep it as an extension? - Support in Ignition to add karg support and reboot as needed before the switch root? - Should we add sugar to fcct for kdump? How much? - How much does OSTree want to own the `/boot` directory? (kdump generates a separate independent initrd in the `/boot/ostree/$bootcsum` directory)