owned this note
owned this note
Published
Linked with GitHub
---
tags: deppy, design, wip
---
# :link: Deppy
## Introduction
In this document we introduce Deppy: a Kubernetes API for resolving constraints over [RukPak](https://github.com/operator-framework/rukpak) bundle catalog(s). Deppy is part of the next iteration of OLM and was first introduced [here](https://hackmd.io/upiNuoeJTwqNKQJCVMZADw). The central goal of the project is to remove the dependency manager from the [Operator Lifecycle Manager](https://github.com/operator-framework/operator-lifecycle-manager) (OLM) and make it its own generic component.
Dependency resolution is an integral part of OLM. Given an intent to install a bundle, the current cluster state, and one or more catalogs of bundles, the dependency resolution layer will ensure that all bundle dependencies can be, or are already, met. The dependency resolver ensures that all bundle, admin, and runtime constraints can be met, and that as new content is made available by the catalogs OLM will know about it, enabling continuous and automatic updates. Its central aim is to promote cluster stability by providing sets of bundles that ought to behave well together on cluster. For an in-depth look at how the resolution process currently works, please see [Appendix I](#Appendix-I-OLM-Package-Resolution).
Bundle constraints are typically either dependencies (on other packages or `gvk`s) or runtime constraints (e.g. minimum and/or maximum supported kube version, or resource constraints, node arch support, etc.). Admins may want to further narrow to set of possible packages, e.g. by filtering out certain providers, channels, or packages with high resource requirements, etc. For more information on bundle constraints see [Appendix II](#Appendix-II-Bundle-Properties-and-Constraints). Cluster runtime constraints is still in its [infancy](https://github.com/operator-framework/enhancements/pull/96) and our decisions here will undoubtedly affect their final shape.
Here, we try to take a fresh look at the resolver/resolution process and attempt to nail down what it will look like in OLMs next iteration.
## Resolver Requirements
* The resolver *must* resolve dependencies for a given set of packages (bundles belong to packages)
* The resolver *must* understand [bundle constraints](#Appendix-II-Bundle-Properties-and-Constraints)
* The resolver *must* understand global constraints defined by admins
* The resolver *must* understand cluster runtime constraints
* The resolver *must* react to changes in catalog content (e.g. new bundle updates being available)
* Continuous updates of specific packages *must* be optional (i.e. bundle versions can be pinned)
* The resolver *should* surface resolution failure reasons but it *must* surface failures (even if the reason is elusive)
* The resolver *could* account for the current cluster state (i.e. installed bundles) when providing a resolution
* The resolver *could* be used in different form factors, e.g. CLI for debugging and exploration
### Who uses OLM and what for?
**Cluster Admin**: administrator or one or more clusters, wants to easily manage (add, remove, and update) packages installed on the cluster in a declarative way. Also cares about security, and who gets to see and use which operator(s).
**Namespace Admin**: administrator of one or more namespaces, wants to consume cluster services and, possibly install new operators (though may not have the rights to approve the installation). Maybe not be well versed in kubernetes.
**Operator Author**: develops and distributes an operator. Wants to be able to easily declare and manage operator metadata such as properties and dependencies.
## Design Questions/Options
### High-level or Low-level API? Who will use it?
This questions goes to the core of how Deppy should be positioned. Should it be a high-level/user-facing API, which most personas will use? Or, will it be an API that operates "under the covers", helping support APIs achieve their ends? Given out explicit goal to create a separate API for resolution of constraints over catalogs of content, the latter would be a better fit. A constraint resolver on its own doesn't add much without other components that can interpret the results and mutate the cluster (e.g. RukPak). Therefore:
* **Cluster Admin** (create/read/update/delete): Yes. In order to add additional constraints and for debugging purposes.
* **Namespace Admin**: No. These APIs would be too low level. Depending on the operator scope, may not have access to the watched namespaces(s). Messing around with `Inputs` can also detrimentally affect the cluster state for more than the administrator of the originating namespace.
* **Operator Author**: Yes, _but_ as a cluster admin of the dev/test cluster. Additional tools will also be provided, e.g. cli for off-cluster resolution, that might obviate the need for accessing Deppy on a cluster.
### Resolution Philosophy?
Here we attempt to answer what the user's mental model of the resolution process and its output should look like. The resolver should always output the minimum set of bundles that can meet the given constraints. The open question is whether it should consider replaceable dependencies as irreplaceable. Let's illustrate this with an example: we input a constraint that says package `A` must be in the cluster. `A` has a `gvk` requirement that can be met by packages `B` and `C`. It breaks the tie with `B`. So, the resolution output is ``['A', 'B']``. Then, another constraint is added stating that package `D` must be installed on the cluster. _But_, `D` has a direct dependency on `C`. Now, the resolver _could_ error out and say there's no resolution, _or_, it could return ``['A', 'C', 'D']``. It seems to me that resolution with replacement better fits the declarative model as it tells the user the state the cluster should move to in order to fulfill the stated constraints.
The second consideration is whether the resolver should have knowledge of content that is on the cluster that isn't represented in the constraints. That is, if a user "side loads" a package. Meaning the resolver would need to be aware of side-loaded [RukPak `Instance`s](https://hackmd.io/upiNuoeJTwqNKQJCVMZADw?view#Instance-API) and potentially independently installed, e.g. manually or Helm'ed, CRDs. The upside is that the resolver has more information to work with and can possibly provide better resolution sets. The downside is, we have increased security concerns (more things to watch -> more rbac), we increase the coupling between Deppy and RukPak by requiring that it understand `Instance`s, and therefore incur a loss of generality.
My suggestion would be to go with a partial knowledge resolution, where the resolver only takes into account the constraints that are given to it and the content in the catalog sources. This would means that the resolver wouldn't be responsible for the cluster/package reconciliation process, but rather just provide an input to it, i.e. what the idealized state of the cluster should be. This doesn't mean that the resolver needs to be completely oblivious to cluster state. Additional constraints, e.g. the availability of CRDs and (side-loaded) bundles could be modeled as constraints. But that would be a choice for the reconciliation component.
Therefore, the mental model for the resolver should be something like "it gives the admin the idealized set of catalog bundles that would meet its stated inputs". This also means that, at least for the resolution, all of the inputs are known and easy to navigate. You just look at instances of Deppy's constraint `Input` CRs. While this would just move the complexity to a different component, it does open the door for different implementations of this component (e.g. the `Operator` API).
With this approach, the high-level architecture for the next iteration of OLM might look like:

### Note (19.01.2022)
Deppy is lower priority for now, so I won't be able to finish the design.
We still need to:
- Agree on what Deppy is (its scope) and its interfaces
- We don't need to design all of OLM here, but as our major "customer" we need it to drive the requirements. Therefore, it might be worthwhile agreeing on a high-level architecture (e.g. the image above)
- While package deletion is not a requirement for Deppy. We should understand what it would look like.
Once these things are clear and agreed on. We can then figure out the rest of the design.
---
(Read until here)
### Namespace vs Cluster Scoped
#### Option 1. Cluster Scoped
**Pros**:
* No namespaces to think about
* Clear list of `Input`s that are being resolved
**Cons**:
* Any user with access will see all created api objects (this may leak information)
#### Option 2. Namespace Scoped
**Pros**:
* More flexibility around where the `Input`s get created
* Users would only see `Input`s in the namespaces they have access (no information leak)
**Cons**:
* Could complicate the controller in the case of many individual namespaces
### Singleton vs Multi-Object
## Appendix I: OLM Package Resolution
### Introduction
This is intented to be a basic overview of how package resolution works in OLM. It won't cover the whole topic in detail. It hopes to give enough that an uninitiated reader enough of a mental model to help them reason about the process.
At the core of OLM's dependency resolution is a *boolean satisfiability problem*, or *SAT*, [solver](https://github.com/go-air/gini) that determines whether, given a system of boolean clauses, there exists a model - a set of values (true or false) for the variables (also called *literals*) - for which the system will evaluate to *true*. The SAT solver works on problems defined in conjunctive normal form (CNF), that is, a conjunction (AND) of disjunctions (OR). For instance, here is a SAT problem of four varibles (A-D) in CNF:
```
(A OR !B) AND (B OR !C) AND (C OR !D) AND (D OR !A)
```
*Note: the '!' represents a logical NOT operator*
There are two possible solutions to this problem:
* A=true, B=true, C=true, D=true
* A=false, B=false, C=false, D=false
For information on SAT problems, see: [[1](https://github.com/go-air/gini/blob/master/docs/satprob.md)], [[2](https://en.wikipedia.org/wiki/Boolean_satisfiability_problem)].
By reducing package dependencies and any other constraints to a CNF boolean expression, the solver can tell us whether there exists a set of packages that can meet the constraints, and if so, which.
### Solver
In OLM, the [solver](https://github.com/operator-framework/operator-lifecycle-manager/tree/master/pkg/controller/registry/resolver/solver) is the lowest level of abstraction over the underlying SAT solver. It's most basic unit is the `Installable`:
```go=
// Installable values are the basic unit of problems and solutions
// understood by this package.
type Installable interface {
// Identifier returns the Identifier that uniquely identifies
// this Installable among all other Installables in a given
// problem.
Identifier() Identifier
// Constraints returns the set of constraints that apply to
// this Installable.
Constraints() []Constraint
}
```
#### Constraints
The `Installable` has a unique ID and a set of constraints. The solver package avails constraints that translate from higher level intents (e.g. package A dependes on package B) to the boolean expression of the underlying SAT problem. The following constraints are:
##### Mandatory
The `Mandatory` constraint will only permit solutions that contain the installable.
This constraint will map to the literal (boolean variable) that represents the installable.
##### Prohibited
The `Prohibited` constraint will block any solution that constains the installable.
This constraint will map to a negated literal that represents the installable.
##### Dependency
The `Dependency` constraint will only permit solutions containing the installable if at least one of a list of given installables appears in the solution. Higher precendence is given to installables at the start of the list.
This constraint will map to a clause in the form:
```
!installable OR dep1 OR dep2 OR dep3 OR ...
```
This says that the installable has a dependency, and the solver needs to pick one of a given set of options to to resolve that dependency.
##### Conflict
The `Conflict` constraint will block solutions that contain both the installable and a given conflicting installable. That is, any solution can have the installable, the conflicting installable, neither, but never both.
This constraint get translated to the clause:
```
~installable OR ~conflicting_installable
```
##### AtMost
The `AtMost` constraint will block solutions that contain more than *n* given installables. For example, if there are many possible packages that can fulfill a dependency (only one is needed), or if you have a channel of updates at most one should be picked for installation.
Note: It's not yet clear to me how this constraints gets translated to clauses. It seems to use some lower-level apis in the underlying sat solver library that I don't fully grok yet.
#### Solver
The solver is, in gross terms, is a function of the type:
```
solve([]Installable) ([]Installable, error)
```
It takes a input the all of the installables (unique ids and constituent constraints) and returns the list of installables that are part of the solution, or returns an error if no solution can be found.
### Operator Resolution
Operator resolution is executed as part of the OLM operator reconciliation loop as is depicted in the sequence diagram below. Resolution is done at a namespace level, although it will also take into account the global catalog namespace. Triggered by changes to `Subscription` and `CatalogSource`s visible to the namespace, the operator will use the resolver to see if any `InstallPlan`s need to be created.
The resolution process is composed of three high-level components: `StepResolver`, `Resolver`, and [`Solver`](#Solver). The `StepResolver` is responsible for collecting the resolution context: `ClusterServiceVersion`s and `Subscription`s and feeding that context to the `Resolver`. The `Resolver` is responsible transforming the resolution context and the content available to users in `CatalogSources` into `Installable`s and `Constraints`s that are ultimatelly fed to the `Solver`. The process of building the constraints relies heavily on predicates used to filter out content from catalogs that could be used to meet constraints defined by the operator (e.g. dependencies, property constraints, etc.). A few examples of predicate include: *CSVName*, *Channel*, *Package*, *VersionInRange*, *Label*, *Catalog*, *ProvidingAPI*, *SkipRangeIncludes*, *Replaces*, *And*, *Or*, *Counting*, etc. The full list of predicates can be found in the [cache](https://github.com/operator-framework/operator-lifecycle-manager/blob/master/pkg/controller/registry/resolver/cache/predicates.go) package.
The `Solver` output is transformed into an `operator set`, the set of new operators that need to be installed on the cluster by the `Resolver`. The operator set is in turn transformed into `Step`s that are ultimatelly executed by an `InstallPlan`.
```mermaid
sequenceDiagram
autonumber
participant Operator
participant StepResolver
participant Resolver
participant Solver
participant Kube
loop Subscription or CatalogSource changes
Operator->>StepResolver: ResolveSteps(namespace)
StepResolver->>Kube: GetCSVs(namespace)
Kube->>StepResolver: CSVs
StepResolver->>Kube: GetSubscriptions(namespace)
Kube->>StepResolver: Subscriptions
StepResolver->>Resolver: SolveOperators(namespaces, csvs, subs)
Resolver->>Resolver: buildSnapshotForNamespace(namespace, csvs, subs)
Resolver->>Resolver: convertBundlesToBundleInstallablesWithConstraints(snapshot)
Resolver->>Resolver: buildConstraintsForEachSubscription(subscription)
Resolver->>Resolver: addInvariants(bundleInstallables)
Resolver->>Solver: Solve(bundleInstallables)
Solver->>Resolver: solvedInstallables
Resolver->>Resolver: filterOutExistingBundles(solvedInstallables)
Resolver->>StepResolver: operatorSet
StepResolver->>StepResolver: calculateChangesToPersistToCluster()
StepResolver->>Operator: steps
Operator->>Kube: CreateInstallPlans
end
```
#### Update Path
* If you know this, please add it!
#### Error Surfacing
* If you know this, please add it!
## Appendix II: Bundle Properties and Constraints
RukPak bundles expose `properties`, which drive the bundle constraints. Properties can be arbitraily defined on a bundle, though most will be ignored by the resolver some are understood and applied as constraints. For instance, a bundle with an `olm.deprecated` property will never be picked the resolver. For more information on properties can be found [[here](https://github.com/operator-framework/enhancements/blob/master/enhancements/properties.md)]. A special class of properties are constraints. Below is a list of the current and planned constraint properties:
#### Package Dependency Constraint
The `olm.package.required` property defines a dependency relationship between one bundle and another package for a certain versions of it. Meaning for this bundle to be installed, the dependency must be installed at a version in the specified range.
##### Example
```yaml=
properties:
- type: olm.package.required
value:
packageName: baz
versionRange: '>=1.0.0'
```
#### Group-Versio-Kind (GVK) Dependency Constraint
The `olm.gvk.required` property defines a dependency relationship between the bundle and the given *gvk*. That is, for this bundle to be installed the specified *gvk* must be present on the cluster.
##### Example
```yaml=
properties:
- type: olm.gvk.required
value:
group: etcd.database.coreos.com
kind: EtcdBackup
version: v1beta2
```
#### Compound Constraints
[Compound bundle constraints](https://github.com/operator-framework/enhancements/pull/97) allow for more complex dependency relationships to be defined by operator authors by supporting nested *and*, *or*, and *not* constraints.
##### Example
```yaml=
properties:
- type: olm.any.required # OR
value:
description: Required for Baz because...
constraints:
- type: olm.all.required # AND
value:
description: All are required for Baz because...
constraints:
- type: olm.package.required
value:
packageName: foo
versionRange: '>=1.0.0'
- type: olm.gvk.required
value:
group: foos.example.com
version: v1
kind: Foo
- type: olm.none.required # NOT
value:
constraints:
- type: olm.gvk.required
value:
group: bars.example.com
version: v1alpha1
kind: Bar
- type: olm.all.required # AND
value:
description: All are required for Baz because...
constraints:
- type: olm.package.required
value:
packageName: foo
versionRange: '<1.0.0'
- type: olm.gvk.required
value:
group: foos.example.com
version: v1beta1
kind: Foo
```
#### Generic Constraints
[Generic constraints](https://github.com/operator-framework/enhancements/pull/91) allow users to define arbitrary constraints using the [Common Expression Language](https://github.com/google/cel-go).
##### Example
```yaml=
properties:
- type: olm.constraint
value:
rule: properties.exists(p, p.type == "certified") && properties.exists(p, p.type == "stable")
message: require to have "certified" and "stable" properties
action:
id: require
evaluator:
id: cel
```