Suggestions for ways to avoid https://github.com/coreos/fedora-coreos-tracker/issues/1608 in the future:
- restart zincati periodically
- allows the process to get out of any stuck state it may be in
- I think there have been at least two issues where this would have helped
- Should have almost no risk / no cost
- Switch Zincati to a periodic systemd timer
- Instead of having a permanently running background daemon, use a systemd timer to trigger zincati checks at a regular interval
- DWM: one problem with this approach may be the periodic timer stuff for finalizing and rebooting the update.
- TR: The timer would still be triggered every 5 minutes by default which should cover this case
- JL: i think this would require a rework of zincati, and... we don't have much zincati expertise currently. also, need to sanity-check it meshes well with other update strategies like fleet_lock. so overall, definite risks of regressions in trying to do this
- Prepare Zincati for the container-first workflow
- Something we need to do anyway
- JL: that doesn't necessarily address that specific issue. the leaking happened in sd_notify, which presumably zincati would still do.
- bake zincati in next first
- add monitoring to our persistent systems our team uses
- build nodes
- archive-repo-manager
-