KEP-NNNN: Kubelet limit of Parallel Image Pulls

# KEP-NNNN: Kubelet limit of Parallel Image Pulls    - [Release Signoff Checklist](#release-signoff-checklist) - [Summary](#summary) - [Motivation](#motivation) - [Goals](#goals) - [Non-Goals](#non-goals) - [Proposal](#proposal) - [User Stories (Optional)](#user-stories-optional) - [Story 1](#story-1) - [Story 2](#story-2) - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) - [Risks and Mitigations](#risks-and-mitigations) - [Design Details](#design-details) - [Test Plan](#test-plan) - [Prerequisite testing updates](#prerequisite-testing-updates) - [Unit tests](#unit-tests) - [Integration tests](#integration-tests) - [e2e tests](#e2e-tests) - [Graduation Criteria](#graduation-criteria) - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) - [Version Skew Strategy](#version-skew-strategy) - [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) - [Feature Enablement and Rollback](#feature-enablement-and-rollback) - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) - [Monitoring Requirements](#monitoring-requirements) - [Dependencies](#dependencies) - [Scalability](#scalability) - [Troubleshooting](#troubleshooting) - [Implementation History](#implementation-history) - [Drawbacks](#drawbacks) - [Alternatives](#alternatives) - [Infrastructure Needed (Optional)](#infrastructure-needed-optional)  ## Release Signoff Checklist  Items marked with (R) are required *prior to targeting to a milestone / release*. - [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) - [ ] (R) KEP approvers have approved the KEP status as `implementable` - [ ] (R) Design details are appropriately documented - [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) - [ ] e2e Tests for all Beta API Operations (endpoints) - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free - [ ] (R) Graduation criteria is in place - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) - [ ] (R) Production readiness review completed - [ ] (R) Production readiness review approved - [ ] "Implementation History" section is up-to-date for milestone - [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] - [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes  [kubernetes.io]: https://kubernetes.io/ [kubernetes/enhancements]: https://git.k8s.io/enhancements [kubernetes/kubernetes]: https://git.k8s.io/kubernetes [kubernetes/website]: https://git.k8s.io/website ## Summary  This KEP proposes adding to kubelet a node-level limit on the number of parallel image pulls. ## Motivation  ### QPS/burst limits on kubelet are confusing Currently kubelet limits image pulls with QPS and burst, but they are confusing as they only limit the number of requests sent to container runtime, and does not consider the number of inflight image pulls. In other words, even a small QPS is set, there still could be many image pulls in progress in parallel, if each pull takes a long time. See [issue #112044](https://github.com/kubernetes/kubernetes/issues/112044) as an example. ### No way to limit the number of inflight image pulls Currently neither kubelet or containerd limits the number of inflight image pulls. On kubelet, as mentioned above, the QPS/burst limit does not take the in-progress pulls into account. On containerd, there is only [a per-image limit](https://github.com/containerd/containerd/blob/8e787543deede10f372b0e16ad5c07790fc680b6/pkg/cri/config/config.go#L299-L300), which only limits the number of parallel layer downloading for each image, but potentially allows unlimited number downloads overall. ### Goals  Adding a node-level limit of parallel image pulls to kubelet. This limit will limits the maximum number of images being pulled in parallel. Any image pull request beyond the limit will be blocked until one image pull finishes. ### Non-Goals  * Prioritizing image pulls in any way. * Using the number of inflight image pulls as a signal to direct pod scheduling. ## Proposal  * Add `maxParallelImagePulling` to `KubeletConfiguration`. * If `serialize-image-pulls` is set to false, maxParallelImagePulling default to 0 for no limitation. * Firstly, Kubelet pass `maxParallelImagePulling` setting to CRI as option of image pulling. * If image already exists, succeed. (If lazy pulling is enable, the layers are treated as AlreadyExists.) * Else * If maxParallelImagePulling=0, keep it as before * If maxParallelImagePulling=n, check current image pulling counts * If current>=n, retry 3 times and wait 10s interval for 30s. (Not sure if this is needed as kubelet will retry. However, I think we can do something in container runtime.) * If still current>=n, fail with a new Error like `TooManyPullingInParallelErr`. * If current<n, keep it as before * Fallback, if container runtime is not supporting `maxParallelImagePulling` option. * Kubelet will keep a list of image pulling triggered in a cached list * If there is new pulling, add it to the cached list * When the image pull failed or completed, remove it from the cached list * Keep the list size less than or equals `maxParallelImagePulling`. * If list is full, hang the image pull process to wait for other image pulling completed. * If the image is already in pulling list, return. ### User Stories (Optional)  #### Story 1 #### Story 2 ### Notes/Constraints/Caveats (Optional)  ### Risks and Mitigations  ## Design Details  ### Test Plan  [ ] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement. ##### Prerequisite testing updates  ##### Unit tests   - `<package>`: `<date>` - `<test coverage>` ##### Integration tests  - <test>: <link to test coverage> ##### e2e tests  - <test>: <link to test coverage> ### Graduation Criteria  ### Upgrade / Downgrade Strategy  ### Version Skew Strategy  ## Production Readiness Review Questionnaire  ### Feature Enablement and Rollback  ###### How can this feature be enabled / disabled in a live cluster?  - [ ] Feature gate (also fill in values in `kep.yaml`) - Feature gate name: - Components depending on the feature gate: - [ ] Other - Describe the mechanism: - Will enabling / disabling the feature require downtime of the control plane? - Will enabling / disabling the feature require downtime or reprovisioning of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled). ###### Does enabling the feature change any default behavior?  ###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?  ###### What happens if we reenable the feature if it was previously rolled back? ###### Are there any tests for feature enablement/disablement?  ### Rollout, Upgrade and Rollback Planning  ###### How can a rollout or rollback fail? Can it impact already running workloads?  ###### What specific metrics should inform a rollback?  ###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?  ###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?  ### Monitoring Requirements  ###### How can an operator determine if the feature is in use by workloads?  ###### How can someone using this feature know that it is working for their instance?  - [ ] Events - Event Reason: - [ ] API .status - Condition name: - Other field: - [ ] Other (treat as last resort) - Details: ###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?  ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?  - [ ] Metrics - Metric name: - [Optional] Aggregation method: - Components exposing the metric: - [ ] Other (treat as last resort) - Details: ###### Are there any missing metrics that would be useful to have to improve observability of this feature?  ### Dependencies  ###### Does this feature depend on any specific services running in the cluster?  ### Scalability  ###### Will enabling / using this feature result in any new API calls?  ###### Will enabling / using this feature result in introducing new API types?  ###### Will enabling / using this feature result in any new calls to the cloud provider?  ###### Will enabling / using this feature result in increasing size or count of the existing API objects?  ###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?  ###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?  ### Troubleshooting  ###### How does this feature react if the API server and/or etcd is unavailable? ###### What are other known failure modes?  ###### What steps should be taken if SLOs are not being met to determine the problem? ## Implementation History  ## Drawbacks  ## Alternatives  ## Infrastructure Needed (Optional)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.