# 20230306_OSTree-Native-Container-Updates-Proposal # Proposal: The new updates strategy will not include an update server but it will include a client that can parse update guidance from a configured location. The update guidance can be disabled in order to instruct the client to not seek any update guidance and just use the latest from the current container image that is being followed. The update guidance will consist of a single yaml file hosted locally or on the internet somewhere (file://, https://, docker://). The client knows how to pull the update guidance and parse it. In the update guidance we define rollouts, barriers, and deadends for each supported stream. For Fedora CoreOS we'll store the update guidance as a single yaml file in s3 and store it as a single file in a scratch container for mirroring conveniences. These will be updated simultaneously and should always be in sync. The file format: ```yaml! streams: next: rollouts: - version: 37.20230205.3.0 start: <time> duration: <duration> - version: 37.20230122.3.0 start: <time> duration: <duration> barriers: "31.20200517.3.0": reason: "https://github.com/coreos/fedora-coreos-tracker/issues/480#issuecomment-631724629" "32.20200615.3.0": reason: "https://github.com/coreos/fedora-coreos-tracker/issues/484" "36.20190718.3.0": reason: "https://github.com/coreos/fedora-coreos-tracker/issues/215" deadends: "30.20190715.3.0": reason: "https://github.com/coreos/fedora-coreos-tracker/issues/215" testing: ... stable: ... ``` alternative using digests: ```yaml schema-version: 1.0 rollouts: - version: 37.20230205.3.0 refspec: quay.io/fedora/fedora-coreos@sha.... start: <time> duration: <duration> barriers: "37.20230205.3.0": reason: ... refspec: ... deadends: "30.20190715.3.0": reason: "https://github.com/coreos/fedora-coreos-tracker/issues/215" refspec (optional): ... ``` --- Build graph by sorting by version string? - need to think through scenarios when e.g. you want to rollback - also ^ but with the layering case where users want to keep updating their software - A.1 -> B.1 -> C.1 -> D.1 - user is on B.1, roll out C.1 - there's a bug in C.1, want to roll back cluster to B.1 - set rollout to B.1 How does Zincati react? - the node is on C.1 which sorts higher than B.1 - Zincati would see that B.1 sorts older than C.1 and the node would stay on C.1 - manually rollback to B.1 - User really wants their own software update - create a B.2 - - have to disregard ordering and still deploy B.1 - release a 2:B.1 - roll a c.2 that has a custom selinux policy --- ## Derivation + update graph flow User rebuild FCOS oscontainers, push them to `quay.io/user/fcos:41.20250218.0.1`. Modify graph as follow: ```yaml schema-version: 1.0 rollouts: - refspec: quay.io/user/fcos@sha256:... start: <time> duration: <duration> metadata: version: 41.20250218.0.1 barriers: "37.20230105.3.0.1": reason: ... refspec: quay.io/user/fcos@sha256:... deadends: "37.20230102.3.0.1": reason: "https://github.com/coreos/fedora-coreos-tracker/issues/215" refspec (optional): quay.io/user/fcos@sha256:... ``` ``` $ update-graph --inherit --rollout --patch 1 [--barrier] $ update-graph --inherit --rollout --tag 41.20250218.0=41.20250218.0.1 [--barrier] update-graph --inherit-from 41.20250218.0 --rollout 41.20250218.0.1 ``` ### User scenarios 1. User wants to layer on top of FCOS, and push to a different repo; doesn't care about rebuilds "overriding things", follows just the same tagging scheme as FCOS. 2. User wants to rebuild FCOS, and also want to be able to change version scheme/rebuild on the same FCOS base with different layered content. i.e. Update their apps more often than FCOS release. --- - minimal MVP would be to have a tool that just swaps refspec by inspecting all the tags found in update guidance ``` $ grapher --fcos-inherit --substitute quay.io/user/fcos ``` --- The client will fetch and read this update guidance (if configured to do so) and also inspect the currently configured registry location. It will determine if there is an update available and proceed to stage the update. For derived container images there are two pieces of information that are used to determine if an update is available: 1. The update guidance (i.e. update to X.Y.Z) 2. The registry We will find all containers in the registry with a VERSION label string that starts with X.Y.Z and then version sort the matches and find the highest version to apply. For example, if FCOS releases a new X.Y.Z version and a user does derived container image builds they could/would add a VERSION=X.Y.Z.1 label to the image. The update client would find this container in the registry and apply the update. If the user later decided that they wanted to do a new build (updated configuration, added software) etc, then they would do a new build and apply the VERSION=X.Y.Z.2 label. For example, if FCOS releases a new X.Y.Z version and a user does a derived container image build they will inherit the VERSION=X.Y.Z label in their derived build. The client update software would follow the same logic as before and pick up the update. - further if the user did another derived container build and pushed to the same tag it would still do another update as desired. ```dockerfile FROM quay.io/fedora/fedora-coreos:stable RUN rpm-ostree install foobar foobaz ``` current update guidance is `X.Y.Z` - push to -> quay.io/someuser/my-cool-os:X.Y.Y - push to -> quay.io/someuser/my-cool-os:X.Y.Y.1 - push to -> quay.io/someuser/my-cool-os:X.Y.Z - push to -> quay.io/someuser/my-cool-os:X.Y.Z.1 - push to -> quay.io/someuser/my-cool-os:X.Y.Z.2 - push to -> quay.io/someuser/my-cool-os:A.B.C.1 Today: - quay.io/fedora/fedora-coreos:stable Future: - quay.io/fedora/fedora-coreos:X.Y.Y - quay.io/fedora/fedora-coreos:X.Y.Y.1 - quay.io/fedora/fedora-coreos:X.Y.Z - quay.io/fedora/fedora-coreos:X.Y.Z.1 quay.io/fedora/fedora-coreos:stable quay.io/podman/fedora-coreos:stable client would find this container in the registry and apply the update. If the user later decided that they wanted to do a new build (updated configuration, added software) etc, then they would do a new build and apply the VERSION=X.Y.Z.2 label. ## Option 3 - throw some JSON/YAML/TOML/HCL/XML?/ASN.1?? in S3 - Do we want one of these files for each stream or a single file that defines all streams? - it's a tradeoff here - single stream per file looks more simple, but needs templated location - +1 larger file is slightly more complicated, no templated location - don't necessarily want to dictate user's tagging scheme - why not? It is FCOS after all. - Could we just force them to use ours and give them an optional component - 37.20230205.3.0.x - where `.x` is completely optional - I guess we can look for labels too, but things start to get more complicated - We add VERSION=37.20230205.3.0 label to our image. When they derive they inherit it. - can use skopeo inspect like functionality for this - when a new user derived image comes out zincati needs to apply the update again - 37.20230205.3.0 - 37.20230205.3.0.1 - 37.20230205.3.0.2 - Do we require the updates.{yaml,toml} file? - DM: was thinking we require it, BG was thinking we shouldn't require it - requiring it poses a problem for disconnected installs - ship it in a container and allow it to be mirrored? - Maybe we shove a version of it into our image when we build - Doesn't help when you want to stop a rollout - Don't want to be in the business of pushing container updates for a rollout change - What happens in rollouts in N-1 derived image gets garbage collected in container registry - we could just wait til the local wariness value passes the test - potentially security vulnerable systems coming online keep security vulnerabilities longer - if the user doesn't want this they need to figure out how to not GC their latest images - use multiple tags for the purpose of defeating GC possibly Support for loose barriers Support for deadends Support for rollout windows - As a user, not deriving container images - things should work as today (no user facing changes) - As a user, deriving container images, but no need for more control over rollout/etc - user does not need to create their own json (accessible via https)? - the client IDs the base FCOS version that is the target of current upgrade - find related container image in user's defined repo via tags/labels? - As FCOS, scheduling a rollout - As FCOS, stopping a rollout - json file on the side to stop rollouts - also possibly control deadends/barriers - denylist upgrading from 35.* - deadend 36.20230111.1.0 - by default update to latest - no update server - put in place some policy for timing out systems to update - generation based "{fedoraver}-{fcos-ver} > 37.1.2.3" # Related topics https://major.io/p/watchtower/ for container images ## Current controls that exist - server side controls - deadends - barriers - rollout window start/duration - fleet controls - reboot coordination - client side controls - periodic window (control when reboots happen and updates are applied) - systemd reboot inhibitors also block rebooting for updates (and other reasons) - checks - Zincati doesn't rerun an upgrade that's been rolled back ## Proposed Future Controls - server side controls - deadends - barriers - rollout window start/duration - client side controls - periodic window (control when reboots happen and updates are applied) - systemd reboot inhibitors also block rebooting for updates (and other reasons) - rollout window start/duration? - overriding rollout to upgrade immediately - require that if we stop a rollout the client nodes don't downgrade if they had already upgraded ## Lightweight layering (containers, git repos, tarballs?) - https://github.com/containers/bootc/issues/22 - layers are lightweight files and get applied client side - OS updates still follow canonical locations for the OS updates