owned this note
owned this note
Published
Linked with GitHub
###### tags: `cloud-init` `networking`
# Network Hotplug support
There are a number of scripts and programs[1][2] which currently implement hooks
via udev to invoke programs which will query the instance metadata server for
updated network configuration; do some processing on that data and ultimately
update the instance network configuration.
I think cloud-init can help to provide some abstractions so each distro doesn't
need to include cloud specific hooks for fetching metadata. In particular,
cloud-init already knows which cloud it's running on as well as how to read the
metadata; and in some cases cloud-init is already parsing network metadata and
rendering network configs on a per-distro basis.
As a first step, I think we can cosolidate the existing knowledge into a few
cloud-init commands. We can bike-shed the subcommand names and parameters
as needed.
1. https://github.com/lorengordon/ec2-net-utils
2. https://github.com/SUSE/Enceladus/tree/master/cloud-netconfig
cloud-init refresh-metadata
---------------------------
> [name=Scott Moser]I would not expose this command. obj.pkl is internal state, there is no reason that someone would want to "refresh internal state" other than to subsequently "apply network", and no reason to "apply network" from invalid state. So... just make 'update-config' hit the md. Note, there is now /run/cloud-init/instance-data.json, but it currently doesn't include network config anyway.
> [name=Ryan Harper]I agree that the network-config update would implicitly go refresh before applying. I would certainly like our instance data to include all metadata that's available. w.r.t the obj.pkl; we can replace that with "determine the current datasource and refresh metadata"
> [name=Chad Smith] I could see us wanting the publish a subcommand that is different from apply. I can see consumers wanting to refresh data, or whatever we call it, to get updated /run/cloud-init/instance.data for their scripts if they know that metadata is stale. I guess I'd vote for some granularity between refresh and apply somehow. Maybe update-config without a specific --mode param could just re-crawl data.
> [name=Scott Moser] At best this is unrelated to network hotplug config.
> [name=Ryan Harper] While it may be unrelated, I don't see why we wouldn't expose it to enable other uses, even outside of cloud-init.
refresh-metadata will examine the current instance datasource and re-read the
vendor/user metadata and update the on-disk cache of this information. This is
roughly approximated by the following:
```
cloudobj = cloudinit.stages._pkl_load(/var/lib/cloud/instance/obj.pkl)
cloudobj.datasource._get_data()
cloudinit.stages._pkl_store(cloudobj,
cloudobj.paths.get_ipath_cur("obj_pkl"))
```
cloud-init update-config --mode=network
----------------------------------------
> [name=Scott Moser] I think i prefer 'apply network'. And i think blue should blue the color for this shed.
> [name=Chad Smith] I agree that cloud-init update-config X(or apply network) should do all the right thing by itself so two commands wouldn't need to be run to refresh MD and then apply MD. I'm just wondering about the use case for just refreshing MD and not making changes to the system.
update-config --mode=network will load the current instance datasource, and
use the distro interface to apply the network configuration stored in the
cached metadata. This looks like:
```
cloudobj = cloudinit.stages._pkl_load(/var/lib/cloud/instance/obj.pkl)
cloudobj.distro.apply_network_config(cloudobj.datasource.network_config)
```
The update-config mode parameter would initially support 'network' but can be
extended in the future for other configuration items, such as vendor and user
data with specific config modules. Some of the current config modules are
not yet safe to be called a second time with updated configurations. We would
introduce a way to tag which modules have been fixed and tested to be
idempotent and allow policies for controlling behavior.
Putting these commands together to implement updating network configuration on
hotplug would be something like:
> [name=Scott Moser] It should just update. If you wanted to '--stale' or '--no-refresh', i'm fine with that, but I think default case is "do the right thing now".
> [name=Ryan Harper] For the implementation, I suspect you're right. I do want to build separate subcommands so they can be combined/controlled independently.
```
cloud-init refresh-metadata
cloud-init update-config --mode=network
```
This can be called from a udev hook, or some other script injection method.
Clouds may have various ways to trigger the instance that it needs to refresh
its metadata and apply network changes whether or not they implement hotplug.
It also can enable clouds to modify network config independent of add/remove
of devices
Pushing the metadata refreshing and parsing into the datasource class allows
per-datasource control over things like retries and timeouts. Calling into
distro objects for applying network configurations allows per-distro
customizations as needed.
Beyond udev/hotplug
-------------------
Hooking cloud-init from udev does have some limitations. The biggest is the
requirement of adding or removing devices from the system as a mechanism to
trigger cloud-init to refresh-metadata and apply changes. Hotplug only
allows changes to occur alongside with a device change. Things like a network
configuration change on existing devices is not something that hotplug can
handle. Long-running daemons or scripts breaks the design of udev and can
block processing of other events. Due to the nature of hotplug it is not
always possible to ensure that the metadata service is updated before the
event occurs on the system necessitating some form of retry and timeout which
can further run afoul of udev design.
Long-term I do think we'd like some form of daemon which can (in no order of
importance at this time):
- accept cli commands asynchronously (decouple execution from request)
- listen to various sockets (tcp, unix, pipe, vsocket)
- support polling of resources for events
- udev, url, socket, netlink
- cloud-init driven configuration
- clouds and users can control configuration of this daemon
indicating what to do for various events, including custom behavior
This is not a complete requirement list and already a lot of work to acheive.
Hotplug Hooking Details
-----------------------
In the meantime, I'd like to focus on the near-term changes we can make to
address some of the existing user-stories that we can satisfy with hotplug
hooks.
- common udev hook
- ordered very late, allow other handlers to run first
- check if cloud-init is finished booting (don't attempt to reconfigure during first boot)
- check if cloud-init hotplug-hook is enabled (it may be disabled by vendor/user)
> [name=Chad Smith] **cloud-init status --wait** blocks until complete and exits 0 on success
> [name=Ryan Harper] That's a nice touch; in general though checking for result.json in /run/cloud-init is going to be faster than invoking cloud-init at this time. But that's an implementation detail.
- invoke cloud-init subcommands
- datasources will need
- methods for parsing metadata service network config and storing them in
in the 'network_config' attribute
- method for refreshing some or all metadata
- distro class will need
- method for restarting interfaces (ifup/down, etc)
> [name=Scott Moser] This is 'bringup=True' in distro.apply_network_config.
> [name=Ryan Harper] I saw that; that looks promising. It needs some work to help determine what the current OS uses for network management; on systemd-networkd systems there are no 'ifup' equivalent instructions for bringing up a single interface.
- method for restarting networking daemons (service restart)
- cloudinit net module
- support for primary interface designation
- handle routing table rules (ip rule, priority)