# The kernel move plan
## Overview
* /boot is running out of space.
* For a safe upgrade path, the only feasible solution is to move the kernel image to /usr.
* Loading the kernel from elsewhere requires a new GRUB config:
* UEFI: GRUB image will be atomically upgraded.
* BIOS: GRUB cannot be safely upgraded, so rely on existing config loading grub.cfg from the OEM partition.
* The verity root hash can no longer be stored in the kernel image when it lives on the verity device.
* systemd can use the UUID from the verity + verity data partitions, but Flatcar only has a single partition.
* The root hash must therefore remain in /boot but must also be protected.
* Our GRUB patch to store it in the kernel image can therefore be dropped.
## The boot process for new deployments
* The shim and GRUB start as before.
* The partition selection is the same, either manual or via `gptprio`.
* The GRUB image cannot have the verity root hash baked in because it needs to support different releases.
* Use a separate executable script to apply the verity root hash.
* Also use this to support different kernel command lines in future without relying on a specific GRUB build.
* Initial script is /boot/flatcar/load-a, to be managed on upgrade with load-b.
* Due to GRUB's limited scripting capabilities, load-a actually defines a function for the main configuration to call.
## How the load script and verity root hash are secured
* `veritysetup` natively supports verifying the verity root hash using an S/MIME signature.
* The certificate to verify this against can be stored within the kernel.
* To avoid having to manage another certificate:
* The Secure Boot signing certificate is being used to sign the verity root hash.
* Its issuer, the shim vendor certificate, is being used to verify it.
* It turns out the same certificate is also needed in the kernel for verifying signed sysexts!
* This alone isn't strong enough if the verity root hash and its signature are simply loaded from files in /boot by GRUB.
* GRUB can check the signatures of arbitrary files but only using GPG. Appended signatures only work for PE binaries.
* A GPG public key can be embedded within the GRUB image, which in turn is signed for Secure Boot.
* Again, to avoid having to manage another key, the official Secure Boot signing certificate has been converted to a GPG key.
* gnupg-pkcs11-scd (already in Gentoo) can be used to make GPG sign with a private key in Azure Key Vault.
* Converting local X.509 certificates to GPG is a pain, so a totally new GPG key has been created for unofficial builds.
* load-a is signed (as load-a.sig) and verified by GRUB when choosing a menu entry before being executed.
* load-a includes (among other things):
```
systemd.verity_usr_options=root-hash-signature=/boot/flatcar/verity-\$slot.sig usrhash=XXXX
```
* I initially included `dm_verity.require_signatures=1` above, but that would prevent the use of unsigned sysexts.
* Although the root hash signature can be given inline as base64, this is too long, especially when the leaf certificate is included. riscv's limit is 1024 characters.
* That signature therefore has to go in another file and that S/MIME signature itself needs to be GPG-signed.
* verity-a.sig is signed (as verity-a.sig.sig) and verified by GRUB in the load-a script.
* The root hash verification is performed by veritysetup during early boot.
* The boot partition is currently mounted at /sysroot/boot for Ignition's first_boot handling, but this is too late.
* There is no point in mounting and unmounting the boot partition twice during early boot, so this is simplified to mounting at /boot just once and earlier than before.
## Further notes about GRUB security
When you include any GPG keys in the GRUB image, it automatically sets `check_signatures=enforce`, which forces GPG verification of _any_ file loaded by GRUB. The kernel image already has an appended signature, so there is no need to GPG sign it as well. While this wouldn't hurt on UEFI systems where GRUB itself is packaged into a single file, it would require all the GRUB modules to be signed on BIOS systems. Perhaps we don't want to go that far. For now, `check_signatures` is set back to `no` in the memdisk.
GPG verification is all well and good, but it doesn't add much when you can interactively tell GRUB to bypass it or do something else entirely. Having said that, we are arguably no worse off than we were before either since you could always simply disable verity that way. Preventing all interactive GRUB usage is controversial, so perhaps this could be done only when Secure Boot is enabled. While this doesn't help users without Secure Boot, those users are already at risk of the files under /boot being silently modified.
## How the build process has changed
* coreos-sb-keys has been reworked significantly.
* It now includes official (public) keys and certificates.
* It now handles multiple dated shim vendor certificates for future rotation.
* It now combines those shim vendor certificates for you into a DER-encoded EFI Signature List.
* It now includes current and historical GPG keys for use with GRUB.
* The `files` directory now includes two scripts for fetching/generating what is needed.
* The [README](https://github.com/flatcar/scripts/blob/chewi/kernel-mv/sdk_container/src/third_party/coreos-overlay/coreos-base/coreos-sb-keys/README.md) has been rewritten to describe all of this in more detail.
* sys-kernel/coreos-kernel now has an `official` USE flag:
* This is so it knows which certificate to point `CONFIG_SYSTEM_TRUSTED_KEYS` at.
* This flag cannot be set via the profiles, so it is set globally using an environment variable in `common.sh`.
* Now that the flag is global, we should probably rename it to `flatcar-official`.
* `finish_image` and `sbsign_image` have some minor adjustments:
* The kernel must be signed and have its hashes generated _before_ applying verity and unmounting /usr.
* Applying verity now has to be delayed for official builds until after the kernel is signed in `sbsign_image`.
* Applying verity has therefore moved to a new common `setup_verity_and_load_scripts` function along with the writing and signing of load-a.
* Despite the configuration changes, GRUB is built much like before except:
* grub_install.sh doesn't need to know whether to apply verity or not.
* grub-mkimage includes the current and historical GPG keys using `--pubkey`.
* The gcry_rsa and gcry_sha256 modules are required for GPG verification.
## The upgrade process
This part is admittedly still sketchy, but enough is currently known to suggest that upgrading to this new model should at least be possible.
### In general
* The new GRUB configuration needs to allow booting into an older release without these changes.
* update_engine has a flatcar-postinst shell script that already performs various fix-ups.
* flatcar-postinst is run from the newly upgraded /usr partition, so we can rely on new code here.
* Apparently, the kernel image within the update payload is written to /boot, but this could easily be something else like a tarball.
* The load-a/b and verity-a/b files can simply be overwritten since you should have the existing slot to fall back on.
### UEFI systems
* Care is needed when handling the GRUB image. It should be atomically replaced with `mv`.
* Note that FAT cannot do truly atomic renames. 😢 See [this discussion](https://lore.kernel.org/linux-fsdevel/CAJCQCtQ38W2r7Cuu5ieKRQizeKF0tf--3Z8yOJeeR+ZZ4S6CVQ@mail.gmail.com/) about the roughly the same use case. Or maybe it's actually a lot better since [this kernel change](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=da87e1725ae2)? 🤔
* While that avoids partial writes, it alone doesn't provide a backup should GRUB just fail to load.
* The previous GRUB image should therefore be kept under another name:
* The shim supports falling back to other EFI binaries if the default one doesn't load.
* This is done using the shim's fb\*.efi, which we don't currently install, but we could.
* The backup GRUB is referenced in a file called boot.csv.
* fb\*.efi tries to create a boot entry pointing to the backup (via the shim).
* Without rebooting, it then tries to load the backup, so it is not crucial that the boot entry works.
* Alternatively, it might be simpler and more reliable to just name the backup GRUB as fb\*.efi. This has been briefly tested.
* This only guards against GRUB failing to load. Perhaps the new GRUB could have a menu entry for the old GRUB.
* An unresolved problem: The system won't know if the shim has fallen back. On the next upgrade, both copies might be broken.
* We might want to consider how the shim itself can be safely upgraded, but that is not necessary for the kernel move.
### BIOS systems
* Since GRUB cannot be upgraded safely, the new configuration needs to live outside the GRUB image.
* It could live in grub.cfg on the OEM partition, but it seems prudent to keep changes here minimal.
* OEM's grub.cfg should therefore just invoke `configfile ($root)/flatcar/grub/grub.cfg` without going into a loop.
* Unfortunately, the existing GRUB will lack the modules and keys needed for GPG verification:
* That will therefore need to be made conditional or…
* To avoid complexity and security risks in load-a, an alternative might be:
```
rmmod pgp
function verify_detached { true; }
```
## Other ideas
* Thilo suggested that USR-A/USR-B could be entire disk images, which might tie in well with systemd. Probably too big a change at this point, maybe later.
* A prior discussion about supporting kexec talked about how to get the kernel parameters. load-a or another similar file, possibly appended to /proc/cmdline, seems ideal.
* It is possible to effectively bake kernel command line parameters consumed by systemd generators into the initrd, but there doesn't seem to be a need.
## Trust Model Diagram

## Discussion
We discussed the proposal in
* our dev sync call in June 2025 ([recording](https://www.youtube.com/live/f2h-VxoS76I?si=cavEfIwd9-M_3KOe&t=1578), [notes](https://github.com/flatcar/Flatcar/discussions/1762#discussioncomment-13575263))
### TODOs from discussions
- add a migration plan (timeline) to the proposal above
- ensure thorough testing, particularly on bare metal