Suggestions for ways to avoid https://github.com/coreos/fedora-coreos-tracker/issues/1608 in the future: - restart zincati periodically - allows the process to get out of any stuck state it may be in - I think there have been at least two issues where this would have helped - Should have almost no risk / no cost - Switch Zincati to a periodic systemd timer - Instead of having a permanently running background daemon, use a systemd timer to trigger zincati checks at a regular interval - DWM: one problem with this approach may be the periodic timer stuff for finalizing and rebooting the update. - TR: The timer would still be triggered every 5 minutes by default which should cover this case - JL: i think this would require a rework of zincati, and... we don't have much zincati expertise currently. also, need to sanity-check it meshes well with other update strategies like fleet_lock. so overall, definite risks of regressions in trying to do this - Prepare Zincati for the container-first workflow - Something we need to do anyway - JL: that doesn't necessarily address that specific issue. the leaking happened in sd_notify, which presumably zincati would still do. - bake zincati in next first - add monitoring to our persistent systems our team uses - build nodes - archive-repo-manager -