owned this note
owned this note
Published
Linked with GitHub
# CFP-23380: Ordered network policy
**SIG: sig-policy**
**Begin Design Discussion:** 2025-11-01
**Cilium Release:** 1.19
**Authors:** Casey Callendrello <cdc@isovalent.com>, Jarno Rajahlme <jarno@isovalent.com>
## Summary
Add explicit, user-supplied rule ordering to the policy engine.
## Motivation
Most firewall engines present an _iterative_ interface to their users. Users provide an _ordered_ list of rules. In comparison, Cilium policy is _commutative_, which is to say, the ordering of rules does not matter.
Iterative policy languages are more expressive. For example, consider the rules:
> 1. `env=qa, role=grafana`: allow access to `env=prod, role=artifacts`
> 2. `env=qa`: deny access to `env=prod`
These sorts of rules are awkward to express via the Cilium policy language, where Deny-based rules always take priority.
## Goals
* Provide a foundation for the ClusterNetworkPolicy effort
* Allow expressing L3/L4 rules with order
* Add an explicit priority field to policy Rules
* Keep the datapath O(1)
## Non-Goals
* Add an ordered engine to the datapath
## Existing mechanism
Before outlining the proposed changes, it is important to understand the current policy resolution mechanism in detail. The basic shape:
1. From API objects, policy is lightly translated to an `[]PolicyEntry`
2. These rules are converted to a `L4Policy`, which is a `map[(proto,ports)]L4Filter`. The `L4Filter` is a `map[selector]PerSelectorPolicy`. `L4Filter` contains some bookkeeping information. The "policy judgement" is the `PerSelectorPolicy`. Informally, this can be considered a `map[(proto, port(s), selector)]PerSelectorPolicy`.
3. The `L4Policy` is then converted to a `MapState`, which is (logically) a `map[port+proto]map[identity]MapStateEntry`. The `MapStateEntry` is the actual configuration for the datapath.
### Existing logic: PerSelectorPolicy generation
When there are no existing entries, the translation from a `PolicyEntry` to its `L4Policy` and `PerSelectorPolicy` is straightforward. Most of the logic is in how to merge two `PerSelectorPolicies`. The `PerSelectorPolicy` merge logic primarily evaluates per-field precedence. For example, deny overrides all, auth overrides no-auth, L7 overrides no L7. In the event of a conflict (e.g. different L7 parsers for the same port), policy resolution fails. L7 rules are the exception: they are vectors that support merging.
Envoy directly consumes the `L4Policy`, the distilled per-selector policy. However, the BPF policy engine expects a different structure: the PolicyMap.
### Existing logic: Generating the PolicyMap
The policy map as stored in BPF is a map from `(identity, proto, port) -> MapStateEntry`. Thus, the `L4Policy` must be partially inverted to a per-identity policy result. This translation is not purely mechanical; it must also be precedence aware.
Why must PolicyMap generation be precedence aware? After all, `PerSelectorPolicy` merging *should* have included all the precedence logic. However, disjoint selectors may select the same identity. Thus, when resolving the `MapStateEntry` for a `(id, proto, port)` key, it may be that a tiebreaker is needed between two `MapStateEntries`.
## Proposal
This proposal outlines a way to add explicit order-based precedence to L3/L4 policies.
There are two key sections for this proposal. The first is the technical implementation of policy ordering. The second is an outline of the algorithm for generating an endpoint-specific BPF PolicyMap from an ordered set of policies.
### Ordered PerSelectorPolicy resolution
Quick summary:
- Generate a partial order of PerSelectorPolicies, based on specified order and tie-breakers.
- Merge PerSelectorPolicies with the same order number, using existing precedence logic.
The first step is to generate an ordered list of PerSecectorPolicies. This should be sorted according to explicit order, then tie-broken according to the existing precedence logic.
Then, PerSelectorPolicies with the same order should be *merged*. This means there will be exactly one PerSelectorPolicy for the key `(proto, port, selector, priority)`. This prevents conflicts while determining L7 policies, and preserves existing behavior, where each policy can be considered to have priority 0.
Finally, the full set of PerSelectorPolicies is flattened, retaining the first PerSelectorPolicy per `(proto, port(s), selector)` key.
### Generating the MapState
Quick summary: When inserting entries in to the MapState, any longer-match, lower-priority entries must be deleted.
Once the flattened set of PerSelectorPolicies has been created, it must be translated to the MapState. Recall that the MapState is almost equivalent to `map[proto + port]map[identity]MapStateEntry`, where `MapStateEntry` is
```go
ProxyPort uint16
Listener string
AuthType AuthType
IsDeny bool
Priority uint32
```
The MapState *isn't* a map, however. It is a longest-prefix-match trie, whose key is chosen to match the datapath. In other words, MapState's precedence matches the datapath's precedence. A longer-prefix MapState key will be preferred over a shorter prefix match
The sketch of the algorithim is:
- For every PerSelectorPolicy:
- For every selected identity:
- Insert in to the MapState if valid
- Delete any now-invalid entries
- Propagate authentication mode to descendants
The existing MapState already has a form of this algorithm, in the implicit priorities brought by deny and proxy policies. This CFP proposes eliminating the split logic for deny and proxy priorities, unifiying them in a single 32-bit Priority field that merges user order, proxy priority, and deny.
### MapState conflict resolution
The key observation is this: **we cannot insert a lower-priority, higher-precedence entry in to the MapState**. Since MapState precedence mirrors datapath precedence, determining this is relatively straightforward: to insert a given key, we scan all ancestor keys, checking if the same identity is selected. If a lower-precedence, higher-priority entry is found, we cannot insert the higher-precedence, lower-priority entry. Conversely, if any descendants of the newly proposed entry are higher-precedence and lower-priority, they must be deleted.
Note: higher numerical priority overrides lower numerical priority.
#### Example
A MapState contains the entry `(prio 2, tcp, port 0-1023, identity *)`.
**Example 1a:** Can we insert `(prio 1, tcp, port 0-1023, identity 5)`? No, because this is a longer-precedence match with lower priority that would cover key with higher priority.
```mermaid
---
title: "Example 1a: invalid"
---
graph TD
rule1[prio 2, id *, tcp, 0-1023]
rule2[prio 1, id 5, tcp, 0-1023]
root[[root]]
root-->rule1
rule1-. invalid .-> rule2
```
**Example 1b:** Can we insert `(priority 1, tcp, port 80, identity *)`?
No, for the same reason
```mermaid
---
title: "Example 1a: invalid"
---
graph TD
rule1[prio 2, id *, tcp, 0-1023]
rule2[prio 1, id *, tcp, 80]
root[[root]]
root-->rule1
rule1-. invalid .->rule2
```
**Example 1c:** Can we insert `(priority 1, tcp, port *, identity 5)`?
Yes, because this lower-priority entry has a strictly shorter precedence match, and thus the existing priority-2 entry remains.
```mermaid
---
title: "Example 1c: valid"
---
graph TD
rule1[prio 2, id *, tcp, 0-1023]
rule2[prio 1, id *, tcp, *]
root[[root]]
root-->rule2
rule2-->rule1
```
**Example 1d:** Can we insert `(priority 1, udp, port 53, identity 5)`?
Yes, because this does not cover a higher-priority entry.
```mermaid
---
title: "Example 1d: valid"
---
graph TD
rule1[prio 1, id *, tcp, 0-1023]
id2[prio 2, id 5, udp, 53]
root[[root]]
root-->rule1
root-->id2
```
**Example 1e:** Can we insert `(priority 3, tcp, port *, identity *)`?
Yes, and we must delete the lower-priority entry.
```mermaid
---
title: "Example 1a: invalid"
---
graph TD
rule1[prio 2, id *, tcp, 0-1023]
rule2[prio 3, id *, tcp, *]
root[[root]]
root-->rule2
rule2-. deletes .-> rule1
```
#### Wildcard and non-wildcard interaction
Special care must be taken for identity wildcards, both when higher and lower precedence.
If a lower-priority non-wildcard PerSelectorPolicy would override an existing wildcard, higher-priority, lower-precedence selector, adding it is blocked.
If a lower-priority wildcard PerSelectorPolicy would override an existing non-wildcard, higher-priority, lower-precedence selector, then it must be inserted, but this will have to be separately resolved via the datapath. Specifically, the datapath must look up both wildcarded and non-wildcarded entries, and select the one with the highest priority, using precedence length as a tiebreaker.
#### Example 2a: Ancestor wildcard
Three rules:
1. `(prio 2, tcp, port *, id {*})`
2. `(prio 1, tcp, port 80, id {*})`
3. `(prio 1,tcp, port 80, id {5})`
Rules 2 and 3 would cover higher-priority rule 1, so they may not be inserted.
```mermaid
graph TB
rule1["prio 2, tcp, 0-65535, id {*}"]
rule2["prio 1, tcp, 80, id {*}"]
rule3["prio 1, tcp, 80, id {5}"]
root[[root]]
root --> rule1
rule1 -. invalid .-> rule2 & rule3
```
#### Example 2b: Descendant wildcard
Two rules:
1. `(priority 2, id 5, tcp, port *)`
2. `(priority 1, id * tcp, port 80)`
Rule 2 has a longer match. For ID 5, rule 2 overrides rule 1, which is an error. However, rule 2's wildcard selector selects more identities, so it cannot be elided. Rather, at policy resolution time, the numeric priorities will be compared.
```mermaid
graph TB
rule1["prio 2, id 5, tcp, 0-65535"]
rule2["prio 1, id *, tcp, 80"]
root[[root]]
root --> rule1
rule1 --> rule2
```
#### Merging mutual authentication
Whether or not a connection requires mututal authentication is *also* managed by the network policy engine. However, authentication is not a "traffic verdict". (Traffic verdicts are `allow`, `deny`, and `proxy`). Rather, it is applied to all traffic, regardless of selectors.
When inserting a mapstate entry, the authentication option needs to be resolved:
- if no authentication is specified, inherit it from the closest ancestor of equal or higher priority
- if authentication is specified, "push" it to all descendants of equal or lower priority
### Processing incremental updates
In incremental update is when a complete policy newly selects, or no longer selects, one or more identities. These incremental updates must then be propagated down to the MapState and the BPF map updated accordingly.
Incremental updates are aggregated by selectors and then inserted in to the MapState. The MapState is responsible for deconflicting overlapping selectors. This remains the same in the ordered version.
It is safe to insert incremental updates in this manner because identities and selectors are immutable. As a direct consequence, two selectors S1 and S2 might overlap, a new identity I1 can never change state from non-overlapping to overlapping.
We will need to ensure that SelectorCache updates are processed in a transactional manner -- that is to say, an incremental update must be delivered to all affected selectors *before* distributing this to the MapState. Otherwise, we may distribute an intermediate state where an entry has not yet been removed by a higher-priority entry.
## Impacts / Key Questions
### Impact: effect on L7 policies
TODO