#20210511 Networking + MCO
This turns out to be a bigger problem than we originally anticipated. The scenario
here is that you have a cluster that was installed some time ago. After initial
install, you realize you need some slight networking tweak (like added static route).
You add this static route to existing nodes via the MCO (i.e. writing out
`/etc/sysconfig/network-scripts/route-bond0`).
Later you try to deploy a new worker node. Since the `route-bond0` file gets
written out by Ignition during the initramfs, the initramfs network propagation
code sees "networking configuration" that was written by Ignition and decides to
not propagate any initramfs networking configuration to the real root (kernel args).
So you can't deploy new nodes.
We'd like to not have this problem in the future so we are discussing it more widely
amongst our teams. For now there are a few possible workarounds you have that shouldn't
require manually running commands on invidual nodes. Of course, manually running commands
on individual nodes is still an option is you prefer. Here is what we came up with:
A. Write a systemd unit that checks for the existance of the `route-bond0` file
and creates it if it doesn't exist.
- The systemd unit is delivered as a machine config.
- The systemd unit runs in the real root before NetworkManager is started.
- You'll also remove the existing machine config `route-bond0` file entry.
- This works around the check for networking configuration that happens
at the end of the initramfs (allows karg networking to persist).
- To minimize disruption (less reboots), the best way to do so is:
- pause the corresponding machineconfigpool
- delete the MC with the file entry
- add the new MC with the systemd unit
- unpause the machineconfigpool
B. Machine Config Pool Musical Chairs: the idea here is to move all existing nodes
into a new custom machineconfigpool, which will have the `route-bond0` MC entry.
New nodes joining the cluster can then boot into a worker pool without that MC,
such that the karg-provided networking configuration gets propagated. The new
nodes can then also be moved into the custom pool to add `route-bond0`.
- This doesn't require any new scripts, but has more steps.
- This requires you to move new nodes to the custom pool when you boot them
- To minimize disruption, the best way to do so is:
- Create a custom machineconfigpool (henceforth named custom1).
- Add the same MC with `route-bond0` to the custom pool.
- Move all current worker nodes into the custom pool, by adding the
custom1 role label.
- Remove the MC with `route-bond0` from the worker pool.
- When you join a new node to the cluster, join it as worker, and
then move to custom pool.
- For details see https://github.com/openshift/machine-config-operator/blob/master/docs/custom-pools.md