owned this note
owned this note
Published
Linked with GitHub
# RateLimitPolicy Policy Attachement Design Notes
[DEPRECATED]
* Follow up next iteration on https://hackmd.io/_1k6eLCNR2eb9RoSzOZetg
## Introduction
On designing kuadrant rate limiting and considering Istio/Envoy's rate limiting offering, we hit two limitations ([described here](https://docs.google.com/document/d/1ve_8ZBq8TK_wnAZHg69M6-f_q1w-mX4vuP1BC1EuEO8/edit#bookmark=id.5wyq2fj56u94)). Therefore, not giving up entirely in existing [Envoy's RateLimit Filter](https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/filters/network/ratelimit/v3/rate_limit.proto#extension-envoy-filters-network-ratelimit), we decided to move on and leverage the Envoy's [Wasm Network Filter](https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/network_filters/wasm_filter) and implement rate limiting [wasm-shim](https://github.com/Kuadrant/wasm-shim) module compliant with the Envoy's [Rate Limit Service (RLS)](https://www.envoyproxy.io/docs/envoy/latest/api-v3/service/ratelimit/v3/rls.proto). This wasm-shim module accepts a [PluginConfig](https://github.com/Kuadrant/kuadrant-controller/blob/fa2b52967409b7c4ea2c2e3412ecf80a8ad2b802/pkg/istio/wasm.go#L24) struct object as input configuration object.
The [current RateLimitPolicy CRD](https://github.com/Kuadrant/kuadrant-controller/blob/fa2b52967409b7c4ea2c2e3412ecf80a8ad2b802/apis/apim/v1alpha1/ratelimitpolicy_types.go#L132) already implements a `targetRef` with a reference to [Gateway API's HTTPRoute](https://gateway-api.sigs.k8s.io/v1alpha2/references/spec/#gateway.networking.k8s.io/v1alpha2.HTTPRoute). We are considering allowing the `targetRef` to reference a [Gateway API's Gateway](https://gateway-api.sigs.k8s.io/v1alpha2/references/spec/#gateway.networking.k8s.io/v1alpha2.Gateway). Having in place this HTTPRoute - Gateway hierarchy, we are also considering to apply [Policy Attachment's](https://gateway-api.sigs.k8s.io/v1alpha2/references/policy-attachment/) defaults/overrides approach to the RateLimitPolicy CRD.
![](https://i.imgur.com/UkivAqA.png)
## Use Cases (targeting a gateway and setting overrides and defaults)
A key use case is being able to provide governance over what service providers can and cannot do when exposing a service via a shared ingress gateway. As well as providing certainty that no service is exposed without my ability as a cluster administrator to protect my infrastructure from unplanned load from badly behaving clients etc.
As a cluster administrator providing a shared ingress gateway, I want to be able to define a sane default rate limit policy that for any service exposed via this gateway so that I can protect the infrastructure behind this gateway from malicious / accidental users and ensure that no one service is consuming all of the resources. As a cluster administrator I also undertand that services may need to override these defaults to meet their own sepcific service needs and I want to enable these service providers to do this while being safe in the knowledge that if they do not define their own rate limting my infrastrucure is still protected.
As a cluster administrator providing a shared ingress gateway, I want to be able to define a rate limit policy that cannot be overridden. I want to define that any service exposed via this gateway, cannot accept more than x requests per second. I want the service providers to be able to define their own rate limit policies but ensure that my policy is always in place so that I know they cannot go beyond these limits.
## Goals
The goal of this document is to define:
* The schema of this `PluginConfig` struct.
* The schema (CRD) of the RateLimitPolicy
* The kuadrant-controller behavior filling the `PluginConfig` struct having as input the RateLimitPolicy k8s objects
* The behavior of the wasm-shim having the `PluginConfig` struct as input.
## Envoy's Rate Limit Service Potocol
Kuadrant's rate limit relies on the [Rate Limit Service (RLS)](https://www.envoyproxy.io/docs/envoy/latest/api-v3/service/ratelimit/v3/rls.proto) protocol, hence the gateway generates, based on a set of [actions](https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto#envoy-v3-api-msg-config-route-v3-ratelimit-action), a set of [descriptors](https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/common/ratelimit/v3/ratelimit.proto#envoy-v3-api-msg-extensions-common-ratelimit-v3-ratelimitdescriptor) (one descriptor is a set of descriptor entries). Those descriptors are send to the external rate limit service provider. When multiple descriptors are provided, the external service provider will limit on ALL of them and return an OVER_LIMIT response if any of them are over limit.
## Schema (CRD) of the RateLimitPolicy
High level overview of the proposed rate limiting policy
```go=
// RateLimitPolicy provides a way to apply rate limiting policy configuration
type RateLimitPolicy struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
// Spec defines the desired state of RateLimitPolicy.
Spec RateLimitPolicySpec `json:"spec"`
// Status defines the current state of RateLimitPolicy.
Status RateLimitPolicyStatus `json:"status,omitempty"`
}
// RateLimitPolicySpec defines the desired state of RateLimitPolicy.
type RateLimitPolicySpec struct {
// TargetRef identifies an API object to apply policy to.
TargetRef gatewayv1a2.PolicyTargetReference `json:"targetRef"`
// Override defines policy configuration that should override policy
// configuration attached below the targeted resource in the hierarchy.
// +optional
Override *RateLimitPolicyConfig `json:"override,omitempty"`
// Default defines default policy configuration for the targeted resource.
// +optional
Default *RateLimitPolicyConfig `json:"default,omitempty"`
}
// RateLimitPolicyConfig contains rate limiting policy configuration.
type RateLimitPolicyConfig struct {
// GatewayConfigurations is a list of rate limit configuration options to be applied at the envoy gateway
// +optional
GatewayConfigurations []*GatewayConfigurationSpec `json:"gatewayActions,omitempty"`
// RateLimits is a list of configuration objects oriented to
// the external rate limiting service
// +optional
RateLimits []*RateLimitSpec `json:"rateLimits,omitempty"`
}
// RateLimitSpec contains rate limiting limits
type RateLimitSpec struct {
// Conditions is the list of conditions to match to apply the specified limit
Conditions []string `json:"conditions"`
// Variables is the list of independent counters defined for the specified limit
Variable []string `json:"variables"`
// MaxValue defines the maximum number of requests for the defined period of time
MaxValue int `json:"max_value"`
// Seconds defines the period of time for the rate limit
Seconds int `json:"seconds"`
}
// GatewayConfigurationSpec contains rate limiting configuration at the envoy gateway
type GatewayConfigurationSpec struct {
// Rules is a list of rules to match the request. A match occurs
// when at least one rule matches the request.
// If not set, it is equivalent to matching all the requests.
// +optional
Rules []*RateLimitRule `json:"rules,omitempty"`
// Configurations is alist of Envoy rate limit configurations
// +optional
Configurations []*RateLimitConfiguration `json:"configurations,omitempty"`
}
// RateLimitConfiguration defines Envoy rate limit configuration
type RateLimitConfiguration struct {
// +optional
Actions []*RateLimitAction `json:"actions,omitempty"`
}
// RateLimitRule defines rules to match requests
type RateLimitRule struct {
// +optional
Paths []string `json:"paths,omitempty"`
// +optional
Methods []string `json:"methods,omitempty"`
// +optional
Hosts []string `json:"hosts,omitempty"`
}
// RateLimitAction one Envoy rate limit action
// https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto#envoy-v3-api-msg-config-route-v3-ratelimit-action
// Precisely one of the available action types must be defined
type RateLimitAction struct {
// +optional
GenericKey *GenericKeySpec `json:"generic_key,omitempty"`
// +optional
Metadata *MetadataSpec `json:"metadata,omitempty"`
// +optional
RemoteAddress *RemoteAddressSpec `json:"remote_address,omitempty"`
// +optional
RequestHeaders *RequestHeadersSpec `json:"request_headers,omitempty"`
}
// RateLimitPolicyStatus defines the observed state of RateLimitPolicy.
type RateLimitPolicyStatus struct {
// Conditions describe the current conditions of the RateLimitPolicy.
//
// +optional
// +listType=map
// +listMapKey=type
// +kubebuilder:validation:MaxItems=8
Conditions []metav1.Condition `json:"conditions,omitempty"`
}
```
RateLimitPolicy instance targeting an *HTTPRoute* resource adding rate limiting on few paths
```yaml=
---
apiVersion: apim.kuadrant.io/v1alpha1
kind: RateLimitPolicy
metadata:
name: myroute
spec:
targetRef:
group: gateway.networking.k8s.io
kind: HTTPRoute
name: myroute
default:
gatewayActions:
- rules:
- paths: ["/admin"]
methods: ["GET"]
- paths: ["/newuser"]
methods: ["POST"]
configurations:
- actions:
- generic_key:
descriptor_key: my-key
descriptor_value: my-value
rateLimits:
- conditions: ["my-key == my-value"]
max_value: 100
seconds: 60
variables: []
```
RateLimitPolicy instance targeting a *Gateway* resource
```yaml=
---
apiVersion: apim.kuadrant.io/v1alpha1
kind: RateLimitPolicy
metadata:
name: mygateway
spec:
targetRef:
group: gateway.networking.k8s.io
kind: Gateway
name: mygateway
default:
gatewayActions:
- configurations:
- actions:
- generic_key:
descriptor_key: my-gw-key
descriptor_value: my-gw-value
rateLimits:
- conditions: ["my-gw-key == my-gw-value"]
max_value: 100
seconds: 60
variables: []
```
Note: there is no `PREAUTH`, `POSTAUTH` stage defined. Ratelimiting filter should be placed after authorization filter to enable authenticated rate limiting.
## Kuadrant-controller's behavior
### RLP targeting a Gateway
Initially, only one RLP having reference to a given gateway is considered.
The rate limit configuration precendece is defined as follows:
```
Gateway override > HTTPRoute override > HTTPRoute default > Gateway default
```
We define one rate limit configuration A has precedence over another rate limit configuration B as follows:
* gatewayActions
Regarding rate limit actions evaluated at the gateway, rather than overriding, aggregation seems to be the right approach. Mainly because adding more entries seems to not affect existing rate limit configuration. Adding more descriptor entries never shadows or disables rate limit counters. Yet, more rate limit counters can be activated and effectively make rate limiting more restrictive (never more open). One example to illustrate:
Let's say we have 2 rate limit configurations (one counter per config):
admin counter
```yaml=
conditions: ["admin-request" == yes]
variables: []
max_value: 1
seconds: 60
```
devel counter
```yaml=
conditions: ["developer-request" == yes]
variables: []
max_value: 2
seconds: 60
```
The external provider receives one descriptor with one entry:
```yaml=
descriptors:
- entries:
- developer-request: "yes"
```
Then, only devel counter would be activated and increased. If gateway level actions are added, one descriptor entry would be added:
```yaml=
descriptors:
- entries:
- developer-request: "yes"
- admin-request: "yes"
```
Then, both devel and admin counters are activated and increased. If any of the activated counter's limit is exceeded, the request would be rate limited with `429 Too Many Requests`.
* rateLimits
Regarding the list of rate limit objects, the controller will apply a patching strategy having as key the following keys:
* `variables`
* `conditions`
* `seconds`
In short, when both `variables` and `conditions` and `seconds` fields are identical, the limit with higher precedence will apply. One example to illustrate. Let's say rate limit configuration A has precedence over configuration B.
A defines
```yaml=
rateLimits:
- conditions: ["keyA == valueA"]
max_value: 1
seconds: 60
variables: []
```
B defines
```yaml=
rateLimits:
- conditions: ["keyA == valueA"]
max_value: 100
seconds: 60
variables: []
- conditions: ["otherkey == othervalue"]
max_value: 100
seconds: 60
variables: []
```
The resulting merge will be
```yaml=
rateLimits:
# taken from A
- conditions: ["keyA == valueA"]
max_value: 1
seconds: 60
variables: []
# taken from B
- conditions: ["otherkey == othervalue"]
max_value: 100
seconds: 60
variables: []
```
### "VirtualHosting" RateLimitPolicies
When a request hits the gateway, only one HTTPRoute is used to route the request. Then, only the rate limiting policy targeting that HTTPRoute will be activated. Optionally, if it exists, the rate limiting policy targeting that gateway will also be activated.
When a RLP targets an *accepted* HTTPRoute, the intersection of the HTTPRoute [hostnames](https://gateway-api.sigs.k8s.io/v1alpha2/references/spec/#gateway.networking.k8s.io/v1alpha2.HTTPRouteSpec) and Gateway's Listener's [Hostname](https://gateway-api.sigs.k8s.io/v1alpha2/references/spec/#gateway.networking.k8s.io%2fv1alpha2.Listener) provides the domain(s) for which the rate limiting policy should be applied. Note that the domain could be `*` and the policy should apply to all hostnames hitting the gateway.
The controller will build a index of policies indexed by the virtualhost domain. The policies will be computed merging HTTPRoute and Gateway level policies according to the precedence rules.
## Schema of the `PluginConfig` struct
Currently the PluginConfig looks like this:
```yaml=
# The filter’s behaviour in case the rate limiting service does not respond back. When it is set to true, Envoy will not allow traffic in case of communication failure between rate limiting service and the proxy.
failure_mode_deny: true
ratelimitpolicies:
default/toystore: # rate limit policy {NAMESPACE/NAME}
hosts: # HTTPRoute hostnames
- '*.toystore.com'
rules: # route level actions
- operations:
- paths:
- /admin/toy
methods:
- POST
- DELETE
actions:
- generic_key:
descriptor_value: yes
descriptor_key: admin
global_actions: # virtualHost level actions
- generic_key:
descriptor_value: yes
descriptor_key: vhaction
upstream_cluster: rate-limit-cluster # Limitador address reference
domain: toystore-app # RLS protocol domain value
```
Proposed new design:
```go=
// PluginConfig defines the object to be passed as input into the WASM plugin
type PluginConfig struct {
// +optional
RateLimitPolicies []*RateLimitPolicySpec `json:"rateLimitPolicies,omitempty"`
// FailureModeDeny defined the filter’s behaviour in case the rate limiting service does not respond back. When it is set to true, Envoy will not allow traffic in case of communication failure between rate limiting service and the proxy. By default it will be `false`
// +optional
FailureModeDeny *bool `json:"failureModeDeny,omitempty"`
}
type RateLimitPolicySpec struct {
// RateLimitDomain is the Envoy's RLS protocol domain value
RateLimitDomain string `json:"rateLimitDomain,omitempty"`
// Hostnames list of domain names that will be matched to this rate limit policy
Hostnames []string `json:"hostnames"`
// UpstreamCluster is the Envoy's cluster name to be used as external rate limit service
UpstreamCluster string `json:"upstreamCluster"`
// +optional
Name *string `json:"name,omitempty"`
// +optional
GatewayConfigurations []*GatewayConfigurationSpec `json:"gatewayActions,omitempty"`
}
// GatewayConfigurationSpec contains rate limiting configuration at the envoy gateway
type GatewayConfigurationSpec struct {
// Rules is a list of rules to match the request. A match occurs
// when at least one rule matches the request.
// If not set, it is equivalent to matching all the requests.
// +optional
Rules []*RateLimitRule `json:"rules,omitempty"`
// Configurations is alist of Envoy rate limit configurations
// +optional
Configurations []*RateLimitConfiguration `json:"configurations,omitempty"`
}
// RateLimitConfiguration defines Envoy rate limit configuration
type RateLimitConfiguration struct {
// +optional
Actions []*RateLimitAction `json:"actions,omitempty"`
}
// RateLimitRule defines rules to match requests
type RateLimitRule struct {
// +optional
Paths []string `json:"paths,omitempty"`
// +optional
Methods []string `json:"methods,omitempty"`
// +optional
Hosts []string `json:"hosts,omitempty"`
}
// RateLimitAction one Envoy rate limit action
// https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/route/v3/route_components.proto#envoy-v3-api-msg-config-route-v3-ratelimit-action
// Precisely one of the available action types must be defined
type RateLimitAction struct {
// +optional
GenericKey *GenericKeySpec `json:"generic_key,omitempty"`
// +optional
Metadata *MetadataSpec `json:"metadata,omitempty"`
// +optional
RemoteAddress *RemoteAddressSpec `json:"remote_address,omitempty"`
// +optional
RequestHeaders *RequestHeadersSpec `json:"request_headers,omitempty"`
}
```
An example
```yaml=
# The filter’s behaviour in case the rate limiting service does not respond back. When it is set to true, Envoy will not allow traffic in case of communication failure between rate limiting service and the proxy.
failureModeDeny: true
rateLimitPolicies:
- name: some_name # rate limit policy {NAMESPACE/NAME}
rateLimitDomain: some_value # RLS protocol domain value
upstreamCluster: rate-limit-cluster # Limitador address reference
hostnames: # hostnames
- '*.toystore.com'
gatewayActions: # rate limit configuration list
- rules:
- paths: ["/admin/toy"]
methods: ["POST", "DELETE"]
configurations:
- actions:
- generic_key:
descriptor_value: yes
descriptor_key: admin
```
## WASM-SHIM
Each rate limit policy host has a logical name as well as a set of hostnames to be applied it based on the incoming request’s host header.
The WASM-SHIM builds a tree based data structure holding the rate limit policies. The longest hostname match is used to select the policy to be applied. Only one policy is being applied per invocation.
The policy contains rules and configuration to build a list of Envoy's RLS descriptors.
A given configuration will be applied, and therefore the actions activated, when rule match occurs. A rule match occurs when at least one rule matches the request. If not set, it is equivalent to matching all the requests.
If an action cannot append a descriptor entry, no descriptor is generated for the configuration.
The external rate limit service will be called when there is at least one descriptor being generated by the list of configurations.