20231212_bootable-containers-in-fedora

- Scope/Goals? - Who are "we"? - "We" are the Fedora CoreOS team - We need to be ready to embrace "bootable containers" when Fedora makes this technology available to us. - What is bootable containers? - we have a "core" thing in Fedora that is built and tested with each package addition - new packages aren't added to the "core" without passing tests - if a package gets in that causes instability it gets ejected? - current editions of Fedora layer on top of this core thing - editions ship updates using these containers - users are able to layer top on of editions' output all while maintaining the core flows - What does the FCOS team need to do to embrace bootable containers? - switch away from OSTree remotes - making sure our container images get retained in the cluster - https://github.com/coreos/fedora-coreos-tracker/issues/1367 - migration script to move existing hosts over to the new refspec - DWM: I think this is the most actionable item - but may be intertwined with some of the other pieces so it would make it hard to implement on its own - what is the future of smart clients/update server? - the biggest open question for me (DWM) - https://github.com/coreos/fedora-coreos-tracker/issues/1263 - top level issue ^^ - working session notes/design: https://hackmd.io/jB7ILlitRnq13boj4pLSuQ - what we decide here could have big implications for Red Hat's day 2 strategy for bootable containers - rework build tooling to consume bootable container "core" layer - output of higher level group in Fedora - requires collaboration with that group when failures occur - does not block the two earlier points in this section - work through the workflows users will go through when deriving from the Fedora CoreOS image and ensure they're well oiled - related: https://github.com/coreos/fedora-coreos-tracker/issues/1219 - can do this via documented user stories? - how are users expected to do their rebuilds? - how are clients expected to use a rebuild and also keep things up to date when new builds exist? - how can this be tightly or loosely coupled with future Fedrora CoreOS releases - tracker issue re. adding versioned tags in container registry repo - https://github.com/coreos/fedora-coreos-tracker/issues/1367 - for this we have something documented already, but this could change based on the design discussion about the future of updates --- ### User flows - https://github.com/coreos/fedora-coreos-tracker/issues/1219 - Today: - users who don't need any layering, just consume what we output as is - users who apply client side package layer - we plan for client side package layering to continue to work in the short term - long term we may have a way to register with the node a Containerfile with changes to apply to future auto-updates - Future: - users who don't need any layering, just consume what we output as is (coming from container registry) - users who set up e.g. GHA to rebuild our image on push and push to their Quay.io repo - users who have more elaborate pipelines and want more control over the rollout ### Update properties - Today: - streams - users actually get to preview content and have ownership over not catching problems - phased rollouts & rollbacks - barriers - deadends - ### Proposals - https://hackmd.io/jB7ILlitRnq13boj4pLSuQ - https://hackmd.io/9TU9nmLPQfGvtSi9226y0w - https://github.com/coreos/fedora-coreos-tracker/issues/1263 - 1. Follow a tag in a container registry. That's it. - DWM: What happens when a node has updated and we rollback a tag in a registry? Will it downgrade itself? - That's logic we'll have to add in bootc/rpm-ostree - Pros - Less maintenance overhead for us - Simple to understand. Matches current container ecosystem. - Where this holds up: - Ephemeral Cloud IaaS where the nodes run and then terminate - Where this breaks down: - Long running systems with an OS aren't app containers (app containers never upgrade) - Cons - Users that need more stability need to: - manage which release to rebase to on their own - setup their own registry and maintain tags - Users will not go through barriers - loss of rollouts and deadends means that users may now have resort to manual actions more often - because we as OS distributers have less tools at our disposal for keeping things stable - stable too right? - yes, but hopefully less frequently? - 2. Follow a tag in a container registry, but add more info in additional symbolic tags - Encode the barrier, streams, deadend properties as tags in the registry - https://docs.google.com/document/d/1dsx-o2qe68GYHmhVpleOUZ7cvLSUzxouvy6NqnJXpio/edit?usp=sharing - Pros: - Everything is in a registry, everything is a tag - ^^ means if any auth is used anywhere it can be re-used for this - May be able to set tags from a Web UI - Cons: - complex tagging structure, prone to user error, would require lots of docs - requires tag listing, which is an expensive operation for registries and may even be gated/limited - keeps barriers and deadends, but not rollouts -> stable -> sha:xyz -> 39.20240112.3.0 -> sha:xyz -> 39.20240112.3.0-stable -> sha:xyz -> 39.20230112.3.0-stable-barrier -> sha:xyz -> 39.20220112.3.0-stable-deadend -> sha:xyz - 3. Control updates using config file (JSON) in a container registry (or http/git/s3 it's just JSON) - https://hackmd.io/9TU9nmLPQfGvtSi9226y0w - https://hackmd.io/jB7ILlitRnq13boj4pLSuQ - Pros: - Keeps existing Zincait rollout logic, but moves them to the client - You "only" need to rebuild derived containers images for customizations - Cons: - Requires tag listing, which is an expensive operation for registries and may even be gated/limited - Getting full control of the rollouts requires managing the update guidance data - Important: your own custom JSON is purely opt-in; layering users should not be required to maintain a copy - 4. Keep Cincinnati server - "Conflicts" with one of the goals for bootable containers which is to only rely on a container registry for infrastructure - Pros: - could use this smart server for metrics in the future - customers could register with cloud.redhat.com to control their own updates (a service we provide to them) - Cons: - Harder for users to own their updates (standing up and configuring a server is harder) Zincati state machine: https://github.com/coreos/zincati/blob/main/docs/images/zincati-fsm.png https://coreos.github.io/zincati/development/agent-actor-system/ # Compared to Satellite/Insights - Satellite/Insights assumes SSH access to the node. This may not be true in large organizations with firewalled departments. Or in cases where the nodes may not be directly accessible due to e.g. intermittent connectivity, firewalling on the cloud side, etc... - But the IT department still wants to be in charge of controlling updates. This is closer in architecture to e.g. OTA updates to Android/iPhones. - Separate the decision between what the latest version is and when nodes actually reboot. Similar in k8s world to desired vs current state, imagine the node as its own operator working towards convergence. ## GitOps flow - Containerfile + rollout (templated) data in repo - PR with a new base image version - Builds the new test image - Pushes to a test registry? - Runs tests in an environment - PR merged: - Builds the updated base image or copy the test image - Updates the rollout info to start the rollout once the image is built - Update GitHub Pages with the rollout info (or push rollout info as OCI artifact) Could also have multiple branches for different pools (e.g. `testing` and `stable` branches) - Merges to `testing` deploy to a test fleet - Merges to `stable` (e.g. promoted from `testing`) deploy to the prod fleet - Could have different rollout policies for each (e.g. automatic for `testing` but manual for `stable`) ## Working with Satellite - Satellite could have an interface for visualizing and editing the update metadata (e.g. barriers, deadends, and rollouts), and in that same interface is where you'd start a new rollout - Underneath, Satellite is just maintaining the same JSON file format described in the proposal # New implementation notes - Write a wariness score (0-100) in /var/lib/zincati/wariness on first boot - Currently, Zincati gets a full graph from Cincinnati: - curl -L 'https://updates.coreos.fedoraproject.org/v1/graph?basearch=x86_64&stream=testing&rollout_wariness=0' - The nodes of the graph come from https://builds.coreos.fedoraproject.org/prod/streams/stable/releases.json - But now it would receive the JSON file describing rollouts and barriers, etc... - Would need to do version comparisons itself? - Zincati calls out to bootc instead of rpm-ostree (or make it configurable?) # Thoughts from Jonathan Some thoughts: 1. I think local derivation support makes sense; it takes off the pressure on having to resort to maintaining a build pipeline even for simpler/single node cases and will seamlessly work with our update guidance, including rollouts. 1. long-term, this would go in dnf 2. short-term, we could carry it in zincati maybe? 2. In the case where you do want to build in e.g. Quay and maybe have a bit of CI, I don't think we should try to retain rollout capabilities, and I also don't think we should make users edit and upload their own update guidance file (but they can if they want). Instead, the main requirement can just be that you must respect the tagging scheme. That allows us to use the canonical update guidance for barriers. We can emit an motd warning (that can be disabled) if we detect that you're not following this. 1. Have zincati / bootc use sha256sum to reference images 2. "Disables" bootc upgrade 3. The bootcd/zincati daemon does the update checks and tells bootc which image to rebase ^ all this is a workaround for https://github.com/containers/bootc/issues/337#issuecomment-2067029512 --- # Middle-ground approach ## Local derivation - User configures Zincati to call out to $tool using Ignition - User configures tool using Ignition - $tool can be whatever we want - can be /etc/$tool.d/Containerfile or /var/lib/$tool.d/Containerfile - can be /etc/$tool.d/config.toml - git.repo = ... - Zincati looks at the update guidance - knows which stream it's on by looking at LABEL on booted image - analogous to commit metadata check it does today - Zincati finds the next version to update too, taking into account the update guidance ### Proposal 1 - Zincati calls out to $tool --update-to VERSION - $tool builds a local derived container image using the Containerfile containers-storage:localhost/osimg:$VERSION - $tool calls out to bootc to switch to that new container image - Zincati reboots when ready ### Proposal 2 > [name=Jean-Baptiste Trystram] i don't see the difference between proposal 1 and 2 - Zincati calls out to $tool --build-for VERSION --no-deploy - $tool builds derived container image, puts it at containers-storage:localhost/osimg:$VERSION - Zincati calls bootc to switch to localhost/osimg:$VERSION - Zincati reboots when ready ## Build system derivation - if users maintain their own build system, they do their own rollouts - you mostly don't care about barriers if you're keeping up to date - we don't have to require users to worry about barriers by default - i.e. they can just derive from e.g. :stable and push to their registry :newstable - built at least once a week minimum - but we *can* detect on the client-side if the user would blow past a barrier and provide information about them ($reason field) - blocks the update - write motd - Rollout info is ignored :stable / :newstable / :foo system down for 2 months - A B C B is a barrier release hey you need a :B in your registry LABEL stream=newstable - bootc status will report :newstable - they build :B - zincati will do bootc switch to :B and remembers that it uses to be on :newstable - zincati will do bootc switch back to :newstable - zincati tells user they need to go through :B - user builds for B and bootc switches to :B - user bootc switches back to :newstable User experience for local layers: - At a minimum a user can provide either of two things to get local layers on their system - 1. a containerfile delivered via Ignition to a specific location - 2. a URL to a git repo + possible subdir for context - One other minimum requirement I would say is that zincati, if a user doesn't want local layers, does not have to call out to $tool on every update to check if local layers are configured or not Jonathan's idea: - don't bake any tool as configured in the Zincati config - bake a tool that auto-detects if a containerfile/URL is written to some location and registers itself as the "local builder" tool with Zincati - podman-autobuilder - looks in /etc/podman-autobuilder/ - if /etc/podman-autobuilder/context/ is non-empty, then at build time, we pass `podman build <contextdir>` - if /etc/podman-autobuilder/buildargs.d/ is non-empty, then at build time, we pass the args in there to `podman build` - they're not mutually exclusive. only one of either can be provided, or both - zincati passes the from to use to podman-autobuilder - podman-autobuilder calls `podman build` with `$args --from=$from --tag=localhost/podman-autobuilder:latest` - GC: - read first current sha of tag - do the build - podman rmi $old_image_sha - this ensures that we keep container building as decoupled from Zincati, all while providing a nice UX on the FCOS side - but also, i'd say this doesn't necessarily need to be a day-1 feature. let's focus on getting the zincati semantics for all the other stuff too first. ## What do we need in order for bootc to be "feature complete" for us? - dnf support for client side layering - local rebuild of initramfs - local kernel arguments modifications - (on the FCOS team): zincati integration - any changes on the bootc side that zincati would need would need to be implemented -