# KEP-XXXX: Kubelet Sizing Providers
<!-- toc -->
- [Release Signoff Checklist](#release-signoff-checklist)
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Proposal](#proposal)
- [Risks and Mitigations](#risks-and-mitigations)
- [Design Details](#design-details)
- [Sizing Provider Configuration](#sizing-provider-configuration)
- [Sizing Provider Request API](#sizing-provider-request-api)
- [Sizing Provider Response API](#sizing-provider-response-api)
- [Test Plan](#test-plan)
- [Graduation Criteria](#graduation-criteria)
- [Alpha](#alpha)
- [Alpha -> Beta Graduation](#alpha---beta-graduation)
- [Beta -> GA Graduation](#beta---ga-graduation)
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
- [Version Skew Strategy](#version-skew-strategy)
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
- [Monitoring Requirements](#monitoring-requirements)
- [Dependencies](#dependencies)
- [Scalability](#scalability)
- [Troubleshooting](#troubleshooting)
- [Implementation History](#implementation-history)
- [Drawbacks](#drawbacks)
- [Alternatives](#alternatives)
- [Infrastructure Needed (Optional)](#infrastructure-needed-optional)
<!-- /toc -->
## Release Signoff Checklist
Items marked with (R) are required *prior to targeting to a milestone / release*.
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
- [ ] (R) Design details are appropriately documented
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
- [ ] (R) Graduation criteria is in place
- [ ] (R) Production readiness review completed
- [ ] Production readiness review approved
- [ ] "Implementation History" section is up-to-date for milestone
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
<!--
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
-->
[kubernetes.io]: https://kubernetes.io/
[kubernetes/enhancements]: https://git.k8s.io/enhancements
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
[kubernetes/website]: https://git.k8s.io/website
## Summary
Kubelet should have a sizing provider mechanism, which could give kubelet an ability to dynamically fetch sizing values for memory and cpu reservations.
Today the sizing values are passed manually to kubelet using `--kube-reserved` and `--system-reserved` flags. Many cloud providers provide reference values for their customers to help them select optimal values based on the node sizes. e.g. [GKE](https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture#memory_cpu), [AKS](https://docs.microsoft.com/en-us/azure/aks/concepts-clusters-workloads#resource-reservations)
This KEP proposes an extensible plugin mechanism so that kubelet can dynamically fetch sizing values for any node size from user provided guidance irrespective of the cloud provider.
## Motivation
Kubelet’s `system reserved` and `kube reserved` play a crucial role in the OOMKilling the resource intensive pods. Without an adequate enough `system reserved` and `kube reserved` we risk freezing the node making it completely unavailable for other pods.
We have observed that varying the value of `system reserved` and `kube reserved` with respect to the installed capacity of the node helps to deduce optimal values.
Currently, the only way to customize the `system reserved` and `kube reserved` limits is to pre-calculate the values prior to Kubelet start. If the Kubelet is deployed to various instance types, then the limits need to be tuned for every instance type manually.
### Goals
* Enable Kubelet to determine the value of the `system reserved` and `kube reserved` automatically during start up.
* Add a plugin mechanism so that kubelet can dynamically fetch sizing values for a given node.
### Non-Goals
* For now the plugin mechanism is proposed here is only for fetching values of `system reserved` and `kube reserved`. Similar approach can be taken to dynamically fetch the values of other parameters of the kubelet (e.g. `evictionHard`) but they are out of scope of this KEP.
## Proposal
The extension mechanism introduced in the kubelet will be done by exec-ing a plugin binary. The kubelet and the plugin communicate through stdio (stdin, stdout, and stderr) by exchanging json-serialized api-versioned types. The kubelet and the plugin must always talk the same api version to ensure compatibility as the API evolves.
### Risks and Mitigations
* exec-ing plugins for sizing values can add a slight delay in the kubelet startup.
## Design Details
The sizing provider plugin is enabled by passing two flags to the kubelet `--sizing-provider-config` and `sizing-provider-bin-dir`. The former is the path to a file containing the `SizingProviderConfig` API (more on this below) and the latter is a directory the kubelet will check for plugin binaries.
### Sizing Provider Configuration
The v1alpha1 configuration API read by the kubelet to enable exec plugins is as follows:
```go
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
// SizingProviderConfig is the configuration containing information about
// each exec sizing provider. Kubelet reads this configuration from disk and enables
// each provider as specified by the SizingProvider type.
type SizingProviderConfig struct {
metav1.TypeMeta `json:",inline"`
// Providers is a list of sizing provider plugins that will be enabled by the kubelet.
// Multiple providers may provide different sizing parameters (e.g. cpu for system-reserved,
// memory for system-reserved etc.), in which case sizing values from all providers will
// be returned to the kubelet. If multiple providers return overlapping values for a single
// kubelet parameter (e.g. cpu for system-reserved), then the value from the provider
// earlier in this list is used.
Providers []SizingProvider `json:"providers"`
}
// SizingProvider represents an exec plugin to be invoked by the kubelet. The plugin is only
// invoked when `--sizing-provider-config` parameter is passed during kubelet startup.
type SizingProvider struct {
// name is the required name of the sizing provider. It must match the name of the
// provider executable as seen by the kubelet. The executable must be in the kubelet's
// bin directory (set by the --image-sizing-provider-bin-dir flag).
Name string `json:"name"`
// Required input version of the exec SizingProviderRequest. The returned SizingProviderResponse
// MUST use the same encoding version as the input. Supported values are:
// - sizingprovider.kubelet.k8s.io/v1alpha1
APIVersion string `json:"apiVersion"`
// Arguments to pass to the command when executing it.
// +optional
Args []string `json:"args,omitempty"`
// Env defines additional environment variables to expose to the process. These
// are unioned with the host's environment, as well as variables client-go uses
// to pass argument to the plugin.
// +optional
Env []ExecEnvVar `json:"env,omitempty"`
}
// ExecEnvVar is used for setting environment variables when executing an exec-based
// sizing plugin.
type ExecEnvVar struct {
Name string `json:"name"`
Value string `json:"value"`
}
```
### Sizing Provider Request API
If an exec plugin is enabled the kubelet will exec the plugin during the startup, passing the `SizingProviderRequest` API via stdin.
The kubelet will encode the request based on the apiVersion provided in SizingProviderConfig. It wil also exec the plugin based on the `args` and `env` fields in `SizingProviderConfig`.
```go
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
// SizingProviderRequest will be passed to the plugin via stdin. In general,
// plugins should prefer responding with the same apiVersion they were sent.
type SizingProviderRequest struct {
metav1.TypeMeta
}
```
### Sizing Provider Response API
An exec plugin is expected to return an encoded response of the `SizingProviderResponse` API to the kubelet via stdout.
```go
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
// SizingProviderResponse holds sizing values that the kubelet should use for the specified
// image provided in the original request. Kubelet will read the response from the plugin via stdout.
type SizingProviderResponse struct {
metav1.TypeMeta
// A set of ResourceName=ResourceQuantity (e.g. cpu=200m,memory=150G,pid=100) pairs
// that describe resources reserved for non-kubernetes components.
// Currently only cpu and memory are supported.
// See http://kubernetes.io/docs/user-guide/compute-resources for more detail.
SystemReserved map[string]string `json:"systemReserve"`
// A set of ResourceName=ResourceQuantity (e.g. cpu=200m,memory=150G,pid=100) pairs
// that describe resources reserved for kubernetes system components.
// Currently cpu, memory and local ephemeral storage for root file system are supported.
// See http://kubernetes.io/docs/user-guide/compute-resources for more detail.
KubeReserved map[string]string `json:"kubeReserve"`
}
```
### Test Plan
Alpha:
* unit tests for the exec plugin provider
* unit tests for API validation
### Graduation Criteria
### Alpha
* adequate unit testing for the plugin provider
* a working reference implementation, proving that the existing functionality of the built-in providers
can be achieved using the exec plugin.
#### Alpha -> Beta Graduation
* integration or e2e tests.
* at least one working plugin implementation.
#### Beta -> GA Graduation
TBD
### Upgrade / Downgrade Strategy
This feature is feature gated so explicit opt-in is required on upgrade and explicit opt-out is required on downgrade.
### Version Skew Strategy
Not applicable because this feature is contained to only the kubelet and does not require communication to other components.
## Production Readiness Review Questionnaire
### Feature Enablement and Rollback
* **How can this feature be enabled / disabled in a live cluster?**
- [ ] Feature gate (also fill in values in `kep.yaml`)
- Feature gate name: KubeletSizingProvider
- Components depending on the feature gate: kubelet
* **Does enabling the feature change any default behavior?**
No, use of this feature still requires extra flags enabled on the kubelet.
* **Can the feature be disabled once it has been enabled (i.e. can we roll back
the enablement)?**
Yes, as long as kubelet does not specify the flags `--sizing-provider-config` and `--sizing-provider-bin-dir`.
* **What happens if we reenable the feature if it was previously rolled back?**
Kubelet will continue to invoke exec plugins. No state is stored for this feature to function.
* **Are there any tests for feature enablement/disablement?**
Yes.
### Rollout, Upgrade and Rollback Planning
_This section must be completed when targeting beta graduation to a release._
* **How can a rollout fail? Can it impact already running workloads?**
TBD for beta.
* **What specific metrics should inform a rollback?**
TBD for beta.
* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
TBD for beta.
* **Is the rollout accompanied by any deprecations and/or removals of features, APIs,
fields of API types, flags, etc.?**
TBD for beta.
### Scalability
_For alpha, this section is encouraged: reviewers should consider these questions
and attempt to answer them._
_For beta, this section is required: reviewers must answer these questions._
_For GA, this section is required: approvers should be able to confirm the
previous answers based on experience in the field._
* **Will enabling / using this feature result in any new API calls?**
No
* **Will enabling / using this feature result in introducing new API types?**
It will add a new kubelet-level API. This API only contains a TypeMeta though and is
not an object.
* **Will enabling / using this feature result in any new calls to the cloud
provider?**
No, but a plugin implementation may choose to make API calls to a cloud provider.
* **Will enabling / using this feature result in increasing size or count of
the existing API objects?**
No.
* **Will enabling / using this feature result in increasing time taken by any
operations covered by [existing SLIs/SLOs]?**
Use of the exec plugin may increase the startup time if the exec plugin invoked
by kubelet takes a long time.
* **Will enabling / using this feature result in non-negligible increase of
resource usage (CPU, RAM, disk, IO, ...) in any components?**
Possibly, it depends on how long it takes for the exec plugin to return during kubelet startup.
## Implementation History
## Drawbacks
* exec plugins may be expensive to invoke by kubelet during the startup.
* a poorly implemented exec plugin may halt the startup for the kubelet.
## Alternatives
1. add a built-in sizing provider in-tree.
## Infrastructure Needed (Optional)
N/A