owned this note
owned this note
Published
Linked with GitHub
---
tags: cloud-init
---
# cloud-init refresh-metadata
Cloud-init will grow a CLI subcommand to trigger a refresh of the current instance metadata. Cloud-init will examine the current instance datasource and re-read the vendor/user metadata and update the on-disk cache of this information. This is crudely approximated by the following sequence
```
def load_cloud_object(object_path=OBJ_PKL):
print('loading object %s' % object_path)
return _pkl_load(object_path)
def read_metadata_service(cloud):
print('Reading metadata service')
return cloud._get_data()
def metadata_service_results(cloud):
print('Fetching network-data metadata')
return ds_openstack.read_metadata_service(cloud.metadata_address)
def write_cloud_object(obj):
_pkl_store(obj, obj.paths.get_ipath_cur("obj_pkl"))
```
# cloud-init modules --mode=update
Many cloud-init config modules are not yet ready to be invoked multiple times. Specifically many of the 'per-instance' modules will not check to see if they need to perform actions. For example, adding users will fail on a second invocation due to users already existing on the system. To help the transition of updating existing configuration modules to support being called multiple times (idempotency) we will introduce a new module mode: 'update' which will only run modules which have been modified to support this mode. The 'update' mode will will rerun any config module that has been marked idempotent.
# cloud-init datasource notification polling metadata
Some clouds which provide metadata via URL support long polling mode which allows a program to block/sleep while waiting for an update to the contents of the URL. Not all clouds support this sort of mode in there metadata service. As clouds indicate they have support for long polling the datasource will accept configuration to indicate the URI where cloud-init can longpoll to receieve notification that some or all of the instance metadata has changed. The long poll URI needs to support several modes.
The first mode is 'simple'. The simple mode is effectively a flag that indicates to cloud-init that it can invoke the 'refresh-metadata' and trigger any of the configured subsystem command ('network', 'storage', 'modules'). The second mode allows datasoures to map specific URIs to handlers. For example a datasource might specify a URI for network configuration changes (http://169.254.169.254/openstack/2017-02-22/network_data.json) and the will only trigger cloud-init's network update mode.
Datasources will have a new method/attribute to provide the URL to longpool for the built-in update types ('network', 'storage', 'modules')
```
Datasource.get_configchange_uri(configtype)
"""
Return a URI on which caller can read to obtain an updated configuration for 'configtype' configurations
Supported 'configtype' values in ['network', 'storage', 'modules']
"""
```
In the absence of a long pollable URL, a vendor may opt to configure cloud-init to allow the hotplug events to trigger not just a network or storage config update, but to call cloud-init modules --mode=update which would apply config changes to modules which are updatable.
# cloud-init hotplug --nic/--storage
clouds can provide vendor configuration to enable cloud-init to be called upon the instance receiving a udev/hotplug event when a nic or a disk have been added or removed. Cloud-init would add a udev rule to invoke:
* cloud-init refresh-metadata (network metadata only if separate)
* cloud-init network --mode=update
The network --mode=update would extract the latest network configuration data from the metadata provided and render network configuration to disk and optionally invoke distro tools to apply the updated network configuration.
# cloud-init update handlers
Configure cloud-init to update network settings on nic hotplug, update storage settings on disk hotplug and reconfigure modules if the datasource user or metadata changes. The 'hotplug' key will configure cloud-init to render a udev script for either 'net' or 'block' linux kernel subsystems. For each udev hook, invoke a list of handlers. The values 'network', 'storage' and 'modules' are reserved for cloud-init to invoke it's built-in handlers for updating network, storage and cloud-init config modules. Alternatively users may supply a path to a program which will be exec'ed inside the handler instead.
events are of type: ['boot', 'boot-new-instance', 'udev', 'metadata-change', 'user-request']
policy:
* never renders never
* per-instance renders on boot-new-instance
* boot renders on boot-new-instance, boot, boot-change
* boot-change renders on boot if network metadata or devices changed
* always renders on any event type (ideally on re-rendering if a change)
```
## updated/suggested 2018-06-13 smoser
updates:
policy-version: [1] # default to 1
network:
when: [never, per-instance, boot, boot-change, always]
watch-url: http://..../
storage:
when:
update-handlers:
hotplug:
net: [network]
storage: [storage]
watchurl:
datasource: [modules]
```
# enable nic hotplug to update network config and modules, don't watch datasource for changes
```
update-handlers:
hotplug:
net: [network, modules]
```
> [smoser] changed 'nic:' to 'net:'"
# only watch datasource for changes, including network and storage
'datasource' is a special keyword which indicates that cloud-init will watch the datasource specific url for config changes. This may be the normal metadata url (e.g. http://169.254.169.254/latest) or a cloud may have configured a specific url inside the datasource for long polling.
The value is a list of handlers which are called with the updated configuration. The values 'network', 'modules' and 'storage' are markers for invoking cloud-init subcommands.
```
update-handlers:
watchurl:
datasource: [network, modules, storage]
```
# watch a custom url and use a custom handler on change
Configure a url watch to a custom URL and call a custom program. When the URL returns data it is passed to the configured binary for handling.
```
update-handlers:
watchurl:
http://myhost.internal/v1/config-users: [/usr/local/sbin/updateusers]
```
# configure nic hotplug to call a custom program
Upon nic hotplug, invoke a custom program in the udev hook. This tool will run under udev environment and will need to read values from the environment.
```
update-handlers:
hotplug:
nic: [/usr/local/bin/my-network-config-changed]
```
# Triggering updates outside of hotplug
Without a metadata server with polling support, clouds will need to provide some mechanism to trigger cloud-init to refresh-metadata or update-config. If configured, on hotplug (or unplug) clouds can configure cloud-init to refresh all metadata and apply changes to network, storage and modules; however in the case that change are not accompanied by a device change (update an ip address, modify ntp settings) etc, then a general notification mechanism is needed.
# Existing cloud-aware hotplug/netconfig tools
https://github.com/SUSE/Enceladus/tree/master/cloud-netconfig (SuSE support in EC2/Azure)
https://github.com/lorengordon/ec2-net-utils/tree/master/ec2-net-utils (Amazon Linux)
----------------------
OLD
# Cloud-init hotplug/event network config
Initial configuration for configuring event-channel
---------------------------------------------------
# ip-based (for clouds without an in-band channel)
```
reporting:
'event-channel': {
'level': 'DEBUG',
'type': 'event-channel',
'endpoint': 'http://169.254.169.254/events',
'channels': ['/custom/foobar'],
}
```
# ip-based (for clouds without an in-band channel)
```
reporting:
'event-hook': {
'level': 'DEBUG',
'type': 'event-hook',
> [name=Chad Smith] Bikeshed alert: reporting makes me think of logging...
> Though I can't think of a better name. It seems reporting key 'event-hook'
> and type are duplicated in all reporting types. Is that necessary?
>
> [name=Ryan Harper] it's related in that we can publish the existing
> cloud-init events over the channel; The top-level key is arbitrary,
> the type field determines the handler in cloud-init; Channel based
> communication is bi-directional; event-hook is more of a callback;
> Clouds which don't or won't listen to a result (say like pci hotplug
> is today; the hypervisor injects the event but has no way of confirming
> that it happened); The event-hook can be configured during first boot
> telling cloud-init where to find updated metadata for particular events.
>
'endpoint': 'http://169.254.169.254/events/hotplug/disk/serial/config',
'channels': ['/instance-xz/hotplug/disk/serial/config'],
}
```
# in-band (for hypervisors with VMCI/vsock capability)
```
reporting:
'event-channel': {
'level': 'DEBUG',
'type': 'event-channel',
'endpoint': 'vsock://events'
'channels': ['/custom/foobar'],
}
```
reporting cloud-config will generate configuration for
socket-activated event-channel handler.
```
class EventChannelHandler():
""" New Reporting Handler which accepts event messages
and dispatches to the appropriate handler, while
also publishing events as they occur within
cloud-init via the existing Reporting/Events methods.
"""
```
StandAloneHandler
- socket activated
- supports AF_UNIX, AF_INET, AF_INET6, AF_VSOCK
-
Cloud-init default channel subscriptions:
# event-notification/handling
/cloudinit/<instance_id>/hotplug/{network, block}
/cloudinit/<instance_id>/config/{get,set}/{<modulename>}
# boot-time event messages
/cloudinit/<instance_id>/init-local/{modulename}
/cloudinit/<instance_id>/init-network/{modulename}
/cloudinit/<instance_id>/modules-config/{modulename}
/cloudinit/<instance_id>/modules-final/
Management Server hotplugs a NIC into an instance
-------------------------------------------------
After injecting the hardware hotplug event (ACPI),
Linux guests with an udev event handler will capture
details from the udev-event and forward them to the
local cloud-init socket as message broadcast to the
instance. cloud-init will format and publish a
message over the hotplug channel.
```
[
{
"channel": "/cloudinit/instance-abc/hotplug/network",
"clientId": "d92f88b5-487d-473b-880a-96ce4454e012",
"data": { "subsystem": "net", "action": "add", "attrs": { "address": "00:11:22:33:44:55" } },
}
]
```
When the mgmt server listening to the instance hotplug channel recieves the
message, this is an acknowledgement that the hardware event has occurred.
It may optionally validate the information matches what was injected. The
mgmt server will now push a config change to configure the newly attached
network device by publishing a message to the 'config/set/network' channel
(prepending cloudinit/<instance_id>), specifing a new network configuration*
```
[
{
"channel": "/cloudinit/instance-abc/config/set/network",
"clientId": "d92f88b5-487d-473b-880a-96ce4454e012",
"data": {"network": {"version": 2, "ethernets": {"ens0": {"set-name":
"ens0", "match": {"macaddress": "00:11:22:33:44:55"}, "dhcp4":
true}}}},
}
]
```
* Management server is usually in the best position to determine
what configuration has (or will change); therefore instead of
sending the entire configuration with the update embeded, leaving
the client to do config diff management, instead the client only
needs to merge the config with existing settings and apply.
When the client recieves a config/set/network message, the
config/set/network subscription handler will consume and
apply the change to the system.
Management Side driven Password change
--------------------------------------
A user has requested the Management server to reset
a user (possibly root, or otherwise) password. After
accepting the required input (username, etc.) The
server will publish a message to the config/set/cc_set_password
channel as follows:
```
[
{
"channel": "/cloudinit/instance-abc/config/set/cc_set_password",
"clientId": "d92f88b5-487d-473b-880a-96ce4454e012",
"data": {"chpasswd": {"list": ["root:$6$rL..$ej..."]}},
}
]
```
The instance receives the message and invokes the subscription
handler for the cc_set_password with the data payload as input.
This results in re-running the cc_set_password module and results
in an updated password and the following events published from
the client:
```
[
{
"channel": "/cloudinit/instance-abc/modules-config/cc_set_password",
"clientId": "d92f88b5-487d-473b-880a-96ce4454e012",
"data": [{"description": "running config-set-passwords with frequency once-per-instance",
"event_type": "start",
"name": "modules-config/config-set-passwords",
"origin": "cloudinit",
"timestamp": 1477683507.146
}]
},
{
"channel": "/cloudinit/instance-abc/modules-config/cc_set_password",
"clientId": "d92f88b5-487d-473b-880a-96ce4454e012",
"data": [{"description": "config-set-passwords ran successfully",
"event_type": "finish",
"name": "modules-config/config-set-passwords",
"origin": "cloudinit",
"result": "SUCCESS",
"timestamp": 1477683507.148
}]
}
]
```
# /cloudinit/<instance-id>/<stage>/<event>
```
[
{
"channel": "/cloudinit/instance-abc/modules-config
"clientId": "d92f88b5-487d-473b-880a-96ce4454e012",
"data": [{
"description": "attempting to read from cache [check]",
"event_type": "start",
"name": "init-local/check-cache",
"origin": "cloudinit",
"timestamp": 1495494121.093
}]
},
{
"channel": "/cloudinit/instance-abc/modules-config
"clientId": "d92f88b5-487d-473b-880a-96ce4454e012",
"data": [{
"description": "no cache found",
"event_type": "finish",
"name": "init-local/check-cache",
"origin": "cloudinit",
"result": "SUCCESS",
"timestamp": 1495494121.094
}]
}
]
```
# Hotplug POC
I've put together an inital network hotplug POC which works
with:
- Ubuntu Zesty image
- Cloud-init deb built from chad.smith-cloudinit/unify-datasource-get-data
but doesn't strictly require it, now that I know that 'networkdata' is
available in OpenStack metadata service
- A cloudinit-hotplug python program[A] which we'll likely want to integrate
as a subcommand, which does the following:
- load /var/lib/cloud/instance/obj.pkl
- checks datasource for 'network_config' attr (and if it's not None)
- calls datasource._get_data() method to pull and crawl metadata
- extracts 'network_config' from the datasource
- calls cloud.distro.apply_network_config()
- exits
- This script is called from /lib/udev/ifupdown-hotplug and runs
- before calling "ifup"
- after calling "ifdown"
On an openstack bastion instance you can inject a second nic like so:
- neutron port-create <name of one of your neutron subnets>
- capture PORT_UUID from above command
- launch an instance in the same subnet where the port is available
- nova interface-attach --port-id $PORT_UUID $INSTANCE_UUID
In the image, after hotplug there should be an updated:
/etc/network/interfaces.d/50-cloud-init.cfg
And one can query how the interface came up once you know the name (say ens9):
systemctl status ifup@ens9
To unplug the nic:
nova interface-detach $INSTANCE_UUID $PORT_ID
TODOs
-----
1. Datasource classes should include a method for crawling their
metadata; currently these are decoupled from the class itself
(DataSourceOpenStack as a separate read_metadata_service method
which is used inside _get_data() but it's not available on
the class itself.)
Ideally the hotplug code can generically call: ds.crawl_metadata()
and have that return the results dictionary
2. DataSource class should allow for easier access to fetching
network-metadata; on OpenStack and EC2, for example, the network
metadata is found at a specific URL, so instead of crawling *all* of
the metadata, just the network metadata.
3. We likely need a mirror of this w.r.t storage
4. Datasources (OpenStack for example) does not store the
raw results of the metadata crawl; for example, 'networkdata' is
present in the metadata service, but since it's keys are not utilized
when examining DataSourceOpenStack.metadata dictionary, it's not present
at all.
5. Figure out reporting/event callbacks to post success
6. Attempt this hotplug hook on alternate clouds (like AWS for ipv6 addr)
A. http://paste.ubuntu.com/25516596/
General Issues to deal with
---------------------------
- Async Events
Hotplug events may occur in parallel (or very nearly) such that the
"processing" hooks from a first event may not yet be complete before a
second invocation of processing hooks.
We have an option of locking(and blocking) or somehow queueing subsequent
events for processing. It's possible via queue/dispatch to handle hotplugs
in parallel; but that may require some logical analysis of the updated
metadata. For example, while not currently feasible in OpenStack to date;
the updated network configuration may be a stacked/multi-nic device, like a
bridge over 3 nics (which are hotplugged serially).
- Locking/Blocking
In general, there shouldn't be a restriction on running in parallel since
we're ingesting read-only metadata; however, it does imply that the
processor of the metadata may need to "validate" the config found in the
metadata to determine if subsequent action can be taken, or if it must pass.
For example, if an updated configuration depends on a resource that's not
yet available, cloud-init may defer any action.
If cloud-init chooses to react; any actions that make modifications to the
system will need to be protected and provide atomic updates.
Storage POC
-----------
Instead of modifying openstack layers to hack in metadata service updates
when attaching a cinder block; run an instance-local HTTP service which
will host the a RESTFUL response to:
http://<local ip>/openstack/2017-09-12/storagedata
Which would appear under the openstack metadata dict, at the same level that
'networkdata' does.
storagedata will be in MAAS's Storage v1 format, defined here[1]
(needs schema).
In general for each openstack "block" device, we'll emit:
- type: disk
id: <block_uuid>
path: /dev/vdc
serial: <serial>
model: <device model>
size: 10737418240B
Optionally, storage config may include additional structures to define
what to *do* with the devices:
The canonical example[2] which injects, formats, and mounts would look
like:
storage:
version: 1
config:
- id: <openstack block uuid>
type: disk
serial: <first 21 chars of block uuid>
name: mysql_disk
- id: <openstack block uuid>-fmt
type: format
fstype: ext4
volume: <openstack block uuid>
- id: <openstack block uuid>-mnt
type: mount
path: /mysql
device: <openstack block uuid>-fmt
The storage config will be pulled and (for now) passed to
an invocation of:
curtin --config <storage cfg> block-meta custom
This will be missing a few things like:
- package deps install (do you have raid tools, the right mkfs)
- curtin block-meta needs some help for dir paths that don't exist when
called directly (instead of via commands/install.py)
- only apply storage-config to items present (we already do this) but
allow elements in the storage config to fail and continue to process.
This allows a storage-config to apply say one disk config, and ignore
config of a second whose disk is not yet present.
- idempotency in the case of re-runs;
- we could look at the use of preserve flag, which
runs through the config, but does not modify the devices
- we need to see if the fstab line we want to append is already present
(and skip)
## Discussion Notes:
* config options in cloud-init to allow cloud-init to subscribing and react to hardware changes for storage and networking.
* config modules declare whether they are idemopotent
* all idempotent modules will be re-run on metadata/vendordata recrawl
* require clouds to implement a long-poll endpoint (like GCE) to allow cloud-init to determine whether metadata/vendordata needs recrawling
* add cloud-init sub-command alternative to inject **cloud-init event (network|storage|metadata)-(change|add|delete)**
*
## smoser ##
* network can get by fairly well with a "update-when" of "never", "per-instance", "boot-change", "boot", "always"
*
+ Hotplug-related maintenance events describing the source generating the event and datasources can setup masks to determine whether or not to react to a specific event class
```
+# the maintenance request.
class MaintenanceEvent(object):
NONE = 0x0 # React to no maintenance events
BOOT = 0x1 # Any system boot or reboot event
DEVICE_ADD = 0x2 # Any new device added
DEVICE_REMOVE = 0x4 # Any device removed
DEVICE_CHANGE = 0x8 # Any device metadata change
ANY = 0xF # Match any defined MaintenanceEvents
```