owned this note
owned this note
Published
Linked with GitHub
# CNI meeting notes
_note_: the notes are checked in after every meeting to https://github.com/containernetworking/meeting-notes
An editable copy is hosted at https://hackmd.io/jU7dQ49dQ86ugrXBx1De9w. Feel free
to add agenda items there
Time: 10AM EST, bi-weekly Monday
## 2025-11-24
- Casey, tomo, marcelo on vacation. No meeting
## 2025-11-10
- Konstantinos joining the team from RedHat
- Review:
- https://github.com/containernetworking/plugins/pull/1210 -- nftables fix. Will need to apply for a CVE once released.
- https://docs.github.com/en/code-security/security-advisories/working-with-repository-security-advisories/about-repository-security-advisories
- https://github.com/containernetworking/cni/pull/1173
- Whoops, we need to bump to golangci-lint v2.
## 2025-10-27
- [EC] bug with nftables host-port (https://github.com/containernetworking/plugins/issues/1209)
- fix: https://github.com/containernetworking/plugins/pull/1210/files
- problem with abandoned nft chains
- solution: new chain (cni_hostports)?, delete old entries if present. next release, we can remove that code.
- [cdc] Still thinking about safer DEL.
## 2025-10-13
- [cdc] Owing to illness, may be absent
- [Lionel] Demo CNI-DRA-Driver
- [cdc] working on safer DEL. First pass: https://github.com/containernetworking/cni/compare/main...squeed:shortcut-error-codes?expand=1
- json key " "effectState": "none | container | global""
- if ADD fails with effectState: none, then DEL is optional
## 2025-09-29
- PR triaging
- Casey trying to join, internet is acting up
- Containerd leaks https://github.com/containerd/containerd/issues/12130
-
## 2025-09-016
- Lionel, Marcelo and Tomo join 11AM EST...
- Did Casay start the call 10AM EST?
- Currently as far as we saw google calendar (of CNI), starting time is same as before, 11AM EST
- Casay, if you change the meeting time, please change google calendar time as well.
- Sorry, was stuck on a train!
-
## 2025-09-01
- sent new meeting invite to CNI-DEV: https://groups.google.com/g/cni-dev
- Every other tuesday
- would like to move 1 hr earlier - does that work for Mike Z?
- PR review
- https://github.com/containernetworking/cni/pull/1168
- https://github.com/containernetworking/cni/pull/1169
- bump plugins to v1.25: https://github.com/containernetworking/plugins/pull/1201
- https://github.com/containernetworking/cni.dev/pull/134
- https://github.com/containernetworking/plugins/pull/1201
- Casey wonders: are the Windows plugins being used?
- they are referenced in the k8s documentation
- We will ask Mike to reach out
- We cut v1.8.0 :partying_face:
## 2025-08-25
- Should we cut a release? https://github.com/containernetworking/plugins/compare/v1.7.1...main
- merged unreleased changes: bridge VLAN changes.
- Marcelo to file docs PR for bridge changes
- Casey to bump Go version
- DRA update:
- DRA is GA tomorrow
- Meeting scheduling:
- Switching to 1st and 3rd Monday of the month
- Would like to move 1 hr earlier, if Mike is OK
- Keep at 11:00 Eastern for now.
## 2025-08-18
- Consider moving to bi-weekly? ()
- Pushing on ContainerD's GC implementation
- Casey to write up from Cilium's perspective
- Discuss PR https://github.com/containernetworking/plugins/pull/1195, adding a bridge uplink
- This adds a "physical" interface to the bridge
- Q: is this in the plugin's wheelhouse? or should this be NMState (et al.)?
A: This seems in-scope, so it's not out-of-the-question
- Q: This is vlan-specific. What happens if there are no vlans?
- Q: What is the intended behavior?
- Marcelo to write up these questions to the submitter.
## 2025-07-28
- cdc on vacation Aug 04, Aug 11
## 2025-07-21
- PR reviews
- GC progress in containerd?
- cri-o does GC at startup: https://github.com/cri-o/cri-o/pull/8245/files
- we do lots of PR hygiene
## 2025-07-14
Regrets: Casey (Childcare snafu)
## 2025-07-07
Deadline for kubecon November maintainer track is approaching.
We discuss the ownership of Multus, now that Tomo and Doug have wandered off to more ecxAIting work.
We would like to focus on gRPC. Whether or not this is determined to be CNI 2.0 is somewhat unimportant.
Open question: did containerd add GC support?
## 2025-06-30
We discuss the CNI-DRA driver. Lionel mentions that determining the initial set of resources is difficult.
Question: should we add a new informational verb? Or should we expose them statically in the CNI configuration file?
Challenge: some resources, most notably the node's uplink, may be shared between, say, a macvlan and ipvlan device.
## 2025-06-23
Sebiastian and Casey discuss DRA.
## 2025-06-16
- regrets: tomo
- [danwinship] returning errors from DEL
- [danwinship] did people actually talk about https://github.com/containernetworking/cni/issues/1162 (exec from container image) last week? I also added some thoughts to https://github.com/containernetworking/cni/issues/821#issuecomment-2944552357 (CNI 2.0 daemonization). Maybe if we try to solve a simpler problem we can make some progress on this...
## 2025-06-09
- regrets: casey
- [Doug] quick news (<5 minutes)
- Doug starting a new role with new upstreams! (especially: vllm)
- https://github.com/containernetworking/cni/issues/1162 (just look this one)
## 2025-06-02
- CI fixed (yay)
- let's do some reviews
- https://github.com/containernetworking/plugins/pull/1180, https://github.com/containernetworking/plugins/pull/1181
- https://github.com/containernetworking/cni/pull/1152/
- https://github.com/containernetworking/cni.dev/pull/147
- https://github.com/containernetworking/cni.dev/pull/146
- https://github.com/containernetworking/cni.dev/pull/138
- Why do we disable DAD (duplicate address detection)?
- we assume IPAM will do the job for us, it's slow
- https://github.com/containernetworking/plugins/pull/695
## 2025-05-26
- regrets: tomo
## 2025-05-19
- Plan to fix the "doomed delete" problem:
1. List certain ADD error codes as "nothing was created"
2. Cache when an ADD fails with this error
3. On DEL, if the DEL fails for that specific plugin, swallow the error.
- additionally, flag certain DEL error codes as "deletion failed, but all resources are in the namespace".
- Discuss the confusing vlan situation: https://docs.google.com/document/d/1fHrZ6f2Syaq-jiS1pdTXRYxY29mu4dYakVCqoMZtS4U/edit?addon_store&tab=t.0
- proposal: `vlans` sub-field that disables all other vlan behavior when set
- `{vlans: {"untagged": 1234, "tagged": [5, 6, 7] }}`
## 2025-05-12
- regrets: cdc, tomo
- [zappa] containerd: only passing in a subset of labels
- everyone's querying for it
- looking into a single line change to instead pass all of the labels
- any issues?
- [Doug] sgtm, I need to check what's up on the crio side, and how we're leveraging it.
- There are limitations, a pod spec can only be a meg (which is configurable) in etcd. Then there's a kernel limitation, which is also set to a meg.
- https://github.com/containernetworking/plugins/pull/1175
- Still pending some other options
-
## 2025-05-05
- [zappa] We have an issue where the metaplugin fails and then the runtime keeps allocating an IP. What can we do on the CNI side to stop the bleeding here.
- We discuss this for some time. The conclusion:
- It is not safe to ignore an error on delete, as stale resources may be re-used (i.e. IP allocations and firewall rules). Need to delete *both* or neither
- If there is a failure on ADD, we don't know if any resources were created, so a DEL is required
- STATUS is the immediate fix -- if a chained plugin is unavailable, it should fail STATUS
- TODO
- Write best-practices document
- File issue adding "no non-namespaced resources were added" error code set
- This would be used by runtimes to know that certain error corner cases could be skipped, e.g. failed ADD
- "Don't sweep the floor, I'm about to tear down the building!" h/t: [raymond chen](https://devblogs.microsoft.com/oldnewthing/20120105-00/?p=8683)
## 2025-04-28
- [cdc] Tagged plugins v1.7.0 (and v1.7.1, oops!)
- lesson learned: don't do `git push --tags`, create tag via releases page.
- [cdc] Thanks, Lionel for dealing with maintainers DBs
- anything else outstanding?
- [mlguerrero12] PR for review: https://github.com/containernetworking/plugins/pull/1175
## 2025-04-21
- [Doug] Show and tell
- Krang, K8s enabled CNI runtime.
- https://asciinema.org/a/DSNTIQIg5VM2mGh5oFK7YlGUW
## 2025-04-14
- https://github.com/containernetworking/plugins/pull/1168
- merged!
- https://github.com/opencontainers/runtime-spec/pull/1271
- CNI DRA Driver:
- Validation: https://github.com/containernetworking/cni/issues/1132
- Scheduling?
- update MAINTAINERS: https://github.com/containernetworking/cni/pull/1157
- push some people to Emeritus status
- DEL wording: https://github.com/containernetworking/cni/pull/1156
## 2025-04-07
- [Doug] v1.2.4 release for CNI?
- [Commits since v1.2.3](https://github.com/containernetworking/cni/compare/v1.2.3...main)
- Fairly minor, but, I'd like the safe subdirectory loading changes, please and thanks.
## 2025-03-24:
- time for plugins release
- fixing netlink
- https://github.com/containernetworking/plugins/pull/1154 , 1156
- go 1.24
## 2025-03-17:
- continue {DRA + NRI} CNI substitution
- casey's idea for "merging" CNI and DRA:
0. We give up chaining?
- We don't necessarily have to give up anything; can we make the simple case simpler, and the complicated case possible?
1. NRI returns IP addresses
2. "infinite" / "virtual" / "dynamic" device creation (i.e. how do you represent a fully virtual, "free" device to the scheduler) ([kep 5075](https://github.com/kubernetes/enhancements/issues/5075))
3. Some kind of formalized Primary Network
We discuss replacing CNI with NRI. We also discuss whether or not DRA needs to be involved at all.
## 2025-03-10:
- regrets:
- Tomo
- review https://github.com/google/dranet
- [Doug] Concerns about user experience, especially user classes like cluster admins and network plugin developers
- Replace CNI with DRA, or wrap CNI with DRA?
- [KEP 5075: DRA: Consumable Capacity](https://github.com/kubernetes/enhancements/issues/5075)
- Lionel brings up example where you have macvlan + vlan driver, and they're essentially maintaining two copies of the same resources.
- Other resources
- [sig-net ML post](https://groups.google.com/g/kubernetes-sig-network/c/f0GsD60zEYE/m/EDWQZJV6AAAJ)
- [sig-node 4817 resource claim device status](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/4817-resource-claim-device-status)
- Big question: How do you define a primary network in a DRA scenario?
- We also have node readiness in crio and contained.
## 2025-03-03:
- DST: US DST is applied to this call
- Discussion: should we work on a first draft of gRPC?
- Tomo: meh
-
## 2025-02-24:
- regrets:
- Casey (on vacation)
- Tomo (national holiday)
## 2025-02-17:
- regrets:
- Tomo (will be back when DST comes. BTW, is CNI call aligned to US DST time?)
- Casey and Lionel chat about making DRA and CNI have similar APIs
- Plan for now is to lift existing API in to gRPC without big changes
-
## 2025-02-10
- Lionel: what's the status of CNI 2.0?
- Nobody's really taken it on
- Lionel: would like ability to report allocatable capacity
- casey: What if we returned JSON from STATUS?
- problem: when STATUS is error, then we lose that
- See example from the kep:
```
kind: ResourceSlice
spec:
driver: cni.dra.networking.x-k8s.io
deviceSources:
- name: eth1
provisionLimit: 1000
basic:
attributes:
name:
string: "eth1"
capacity:
bandwidth:
quantity: 10Gi
```
How can we get this information from the CNI plugin. CAPACITY verb?
Casey: is this the "straw that breaks the camels back", should we move over to gRPC?
## 2025-02-03:
- regrets: casey (on a plane)
- zappa oof today (containerd 2.0 status PR fix should be merged today)
- Tomo oof
## 2025-01-27:
- [mike] update on containerd 2.0 update
- [lionel] update on cni-dra-driver
- [cdc] should we have better embeddable types?
- [tomo] FYI: multus-cni support CNI 1.1
- [Doug] Addressed comments on: https://github.com/containernetworking/plugins/pull/1143 (further review appreciated, and thanks for review!)
- Main note that the [changes for ErrDumpInterrupted](https://github.com/vishvananda/netlink/commit/084abd93d350e97ee5410b5b6311bcc211f7ea05) isn't in a tagged version of the `netlink` package yet.
Could the DRA config type be something like
```
type CNIConfig struct {
metav1.TypeMeta `json:",inline"`
// IfName represents the name of the network interface requested.
IfName string `json:"ifName"`
// Config represents the CNI Config.
Config CNINetworkConfig
Plugins []runtime.RawExtension `json:"plugins"`
}
type CNINetworkConfig struct {
CNIVersion string
Name string
DisableCheck bool
DisableGC bool
}
```
## 2025-01-20:
- [ormergi] port-isolation support in bridge CNI proposal
https://github.com/containernetworking/plugins/pull/1141
https://github.com/containernetworking/plugins/issues/1135
- Can someone from RH "weird networking" team(s) take on some Bridge PR reviews?
- maybe need CODEOWNERS files?
- Example [CODEOWNERS file from whereabouts](https://github.com/k8snetworkplumbingwg/whereabouts/blob/master/.github/CODEOWNERS)
- I thought there'd be more in [the commit that added it](https://github.com/k8snetworkplumbingwg/whereabouts/commit/79ded4c7ae7a5dde484710dc4fd0e80772a5f14f), there isn't. -Doug
-
## 2025-01-13:
- CI [failing](https://github.com/containernetworking/plugins/actions/runs/12636474981/job/35208789424?pr=1133) with `unshare: write failed /proc/self/uid_map: Operation not permitted`. Anyone have any clues?
- Should we just remove this for now to unblock CI?
- Tomo will skip this call due to holiday. (But please see PR1137 below)
- PR
- https://github.com/containernetworking/plugins/pull/1137 (just remove `scripts/release.sh` because it is no longer used, replaced with github action)
- github CI failed to tests (even though it is passed in my lab). Guess that github CI needs to be fixed...
- Review CNI v1.2 [ideas](https://github.com/containernetworking/cni/issues?q=is%3Aopen+is%3Aissue+milestone%3A%22CNI+v1.2%22)
- discuss [VALIDATE](https://github.com/containernetworking/cni/issues/1132)
- Relevant to conversation about [cni-dra](https://github.com/kubernetes-sigs/cni-dra-driver/pull/1)
- wow, dra scheduling is *complicated*
- jitsi dies, https://meet.google.com/gjm-mmmf-cra
## 2025-01-06:
- PR
- https://github.com/containernetworking/plugins/pull/1123
- and then file another PR to remove scripts/release.sh because it is no longer used.
- Split go.mod and CI go version.