TEP-XXXX: Run concurrency *keys*, mutexes, …

# TEP-XXXX: Run concurrency *keys*, mutexes, … ###### tags: `tektoncd` `tep` `pipeline` *Feel free to add comments or question, even inlined (in the file)* Collaborators: - Vincent Demeester (Red Hat) - William Fish (dave-inc) - *<if you add a comment or text, please add yourself here!>* *To be better written, this is at the stage of a braindump* ## Motivation Enable users to limit the number of tasks that can run simultaneously in a pipeline, which could help with: - Resource consumption : limit the number of `PipelineRun`/`TaskRun` and thus `Pod` resulting from it - Making sure two executions (of `Pipeline` or `Task`) or not conflicting with each others This has been discussed several times (see [Thoughts](#thoughts)), and is *implemented* by tools integrating with tekton pipeline. > We've heard a few use cases for limiting execution concurrency, but so far it's been hard to generalize the various needs into one single unified "concurrency" concept that we can apply across all of Tekton Pipelines. Some users might only want to have "deployment" pipeline running at a time, across the whole cluster. Others might want one "deployment" pipeline per namespace, or per deployment target (only one pipeline can deploy to Prod at a time, but you can deploy to Prod and Staging at the same time), or per input source (only deploy my Git repo to one place at a time), or per authorizing user (Alice can only deploy to one place at a time). > > Users might also want to limit TaskRun concurrency, either when run as part of a PipelineRun or when executed directly. > From [Jason](https://github.com/tektoncd/experimental/issues/699#issue-794324070) Limiting execution concurrency is relatively opiniated (as highlighted in the above) and by definition, Tekton Pipelines cannot be very opiniated. This proposal aims to explore how we could provide primitives that would allow for users to define their *concurrency limits* rules. ## Goals - A cluster-admin can define rules on concurrency limit (cluster-wide) - A namespace admin can define rules on concurrency limit (per project / namespace) - A user can define rules on concurrency limit (per pipeline) - Define rules when new Run are created for a given limit (Queue, Cancel, Deny) ### Non-goals - Define opiniated,advanced concurrency limit flow, like "only one run per pull-request, …" - Define a default concurrency limit behavior ## Use cases - Limit multiple run on PRs - Do not deploy the same service at the same time ## Design detail This proposal is a *reboot* and mix of several previous proposals / issues, but mainly based on [tektoncd/pipeline#2828](https://github.com/tektoncd/pipeline/issues/2828) and [tektoncd/experimental#699](https://github.com/tektoncd/experimental/issues/699). Because a `PipelineRun` (or `TaskRun`) can not refer to any `Pipeline` (or `Task`) — using embedded definition — we cannot design a solution that would be based on annotating (in any way possible) a `Pipeline` or a `Task`. In addition, as the primitive(s) need to be set on `PipelineRun` (or `TaskRun`), there is a possibility that they won't be set at all, making a "default" configurable behavior worth it (per-namespace or cluster-wide). The general idea behind this proposal is the following: - Concurrency limit are ensured by "concurrency keys", meaning the controller have a *bucket* of (dynamic) concurrency keys, and looks up to it to be able to schedule an execution. - When the limit is reach, the controller will act differently depending on how the given "concurrency key" is configured. Possibilities for those *strategy* are: - Queue: start the `PipelineRun` (or `TaskRun`) as `Pending` and wait the queue to empty to start it - Cancel: cancel the *oldest* `PipelineRun` (or `TaskRun`) in the queue, and start the new one directly - ~Deny: deny the request, do not create tshe `PipelineRun` (or `TaskRun`) at all~ (*I don't see a use case for this yet*) - The controller will use annotation on `PipelineRun` (or `TaskRun`) with `configmap` to know the "concurrency key" state - `configmap`s stores the "definition" of the concurrency key rule, like the *strategy*, as well as the limit number This behavior is *optional*, and by default, an instance of Tekton Pipeline with the default configuration wouldn't enforce any rules. ### Configuration It should be possible to define multiple "concurrency key" depending on metadatas. ```yaml kind: ConfigMap metadata: name: concurrency-controller data: rules: # Limit to 3 PipelineRun per namespace, per pipelinerun # This will apply the key "namespace" to any PipelineRun… # … that are not caught by other rules - name: default-pipelineruns type: PipelineRun selection: * key: $(metadata.namespace) limit: 3 strategy: queue # default strategy is queue # Limit to 3 PipelineRun per namespace, per pipeline # This will apply the key "namespace-pipelinename" to any PipelineRun that have a spec.pipelineRef field… # … that are not caught by other rules - name: default-pipelineruns-with-refs type: PipelineRun selection: has(spec.pipelineRef) # Select all PipelineRun that reference a Pipeline key: $(metadata.namespace)-$(spec.pipelineRef.name) limit: 4 # Limit to 10 TaskRun per namespace # This will apply the key "namespace" to any TaskRun… # … that are not caught by other rules - name: default-taskruns type: TaskRun selection: * key: $(metadata.namespace) limit: 10 # Limit to 1 TaskRun per Namespace, per label "deployment" *if* it is present - name: deployment-taskruns type: TaskRun selection: hasKey(metadata.labels, "deployment") key: $(metadata.namespace)-$(metadata.labels["deployment"]) limit: 1 # Limit to 1 PipelineRun per Namespace, per annotation "pull-request" *if* it is present… # … and cancel previous execution if any - name: pipelinerun-pull-requests type: PipelineRun selection: hasKey(metadata.annotations, "pull-request") key: $(metadata.namespace)-$(metadata.annotations["pull-request"]) # foo-2901 for a namespace foo, and an annotation pull-request with foo-2901 limit: 1 strategy: cancel ``` The above configuration effectively does the following: - No more than 10 `TaskRun` can execute in a namespace - No more than 3 `PipelineRun` without a `PipelineRef` can execute in a namespace - No more than 4 `PipelineRun` with a `PipelineRef` can execute in a namespace - Only one `TaskRun` with a "deployment" label can run, per namespace and per label value — two different deployment can happen for example - Only one `PipelineRun` with a "pull-request" annotation can run, per namespace and par annotation value. New `PipelineRun` will cancel the currently running one, and start once it is completed. If a `PipelineRun` (or a `TaskRun`) is not catched by any rules, it doesn't have any concurrency limit. This allow, for example, to ensure that only one run per pull-request is running *but* not limiting any other `PipelineRun` otherwise. It should be possible to define configuration cluster-wide, via a configuration map in `tekton-pipelines` (or the namespace the controller is deployed to), as well as per-namespace. The per-namespace configuration is the same, except it will automatically append the namespace to the concurrency key. If there is a `concurrency-controller` configmap in a namespace, it overrides the cluster-wide configuration (if it exists). If a `PipelineRun` or a referenced `Pipeline` has rules set on it, it overrides the cluster-wide and namespace-wide configuration. ```yaml apiVersion: tekton.dev/v1beta1 kind: Pipeline metadata: name: foo labels: pipeline.tekton.dev/concurrency-key: foo-concurrency-key pipeline.tekton.dev/concurrency-limit: 3 pipeline.tekton.dev/concurrency-strategy: queue # […] --- apiVersion: tekton.dev/v1beta1 kind: Pipeline metadata: name: foo labels: pipeline.tekton.dev/concurrency-key: $(namespace)-$(name) pipeline.tekton.dev/concurrency-limit: 1 pipeline.tekton.dev/concurrency-strategy: cancel # […] --- apiVersion: tekton.dev/v1beta1 kind: PipelineRun metadata: generateName: foo labels: pipeline.tekton.dev/concurrency-key: $(namespace)-$(generateName) pipeline.tekton.dev/concurrency-limit: 5 pipeline.tekton.dev/concurrency-strategy: queue # […] ``` *Note: We should allow cluster-admin to configured what is allowed or not, aka control if it is allowed to define per-pipeline rules, …*. ### How the controller works At startup or on events on `concurrency-controller` configmaps, the controller compute the list of rules so that it can apply them to the `PipelineRun` (or `TaskRun`). - On `PipelineRun` create (or on any event on it), the controller will try to get a "concurrency key" based on rules and itself (metadata, spec, …) if there is not a key already - on startup, (or resync) the controller update the internal representation of concurrency keys (and values) by reading annotations on `PipelineRun` (and `TaskRun`) - If a "conccurency key" is returned, the controller annotates the `PipelineRun` (or `TaskRun`) with that key… - … and increment in memory the value of that key by `1` - … and see if the number on that key is above the limit or not. - if it is, it runs the `PipelineRun` (or `TaskRun`) as usual - if not, it creates the `PipelineRun` (or `TaskRun`) with `Pending` status. - When a `PipelineRun` (or `TaskRun`) completes (success or failure, it doesn't matter), the controller decrement in memory the value of that key by `1`… - … and get the oldest `PipelineRun` (or `TaskRun`) that have the same key, and force an event on it (to make it schedulable) ## Alternatives ### Use a field instead of annotation Using a field in the `status` instead of an annotation might be a better approach. ### Do nothing Rely on completely external system to handle this. ## Thoughts Referenced issues and pull-requests - https://github.com/tektoncd/community/pull/228 - https://github.com/tektoncd/pipeline/issues/1305 - https://github.com/tektoncd/experimental/issues/699 - https://github.com/imjasonh/tektoncd-concurrency - https://github.com/tektoncd/pipeline/issues/2591 - https://github.com/tektoncd/pipeline/issues/2828 - https://github.com/jenkins-x/jx/issues/5471 - https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#concurrency ## Braindump > Some thoughts on how this could possibly work, feel free to propose alternatives: > > introduce a "concurrency bucket" CRD with a cap on task runs and/or resources, and/or something else > > have PipelineRuns and TaskRuns state what bucket they're counting against; possibly using an annotation? > > triggers could populate with some key based on repo+branch (or just repo, or org, etc.) > > PipelineRun controller holds runs in a concurrency-limited state until there's room in the bucket's cap > > Open Questions: > > should items be unblocked FIFO? At random? Scheduled based on requests? > how should this interact with existing K8s features for limiting resource usage in a namespace? Can operators use these features effectively today as a stopgap? > - From [Jason](https://github.com/tektoncd/pipeline/issues/1305#issuecomment-679288026)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.