owned this note
owned this note
Published
Linked with GitHub
# Deppy: standalone bundle resolver
The idea for deppy comes from the [new API strawman](https://hackmd.io/upiNuoeJTwqNKQJCVMZADw?view) written recently. This doc dives into the details of requirements and implementation.
Since other docs/presentations cover other components of the new API set, this doc will not dive into those features except in the overview section and when interfacing between them is discussed. Ultimately, deppy is a standalone project and its implementation should be discussed in kind.
## Overview of current resolution mechanism
OLM has an in-tree, semi-internal resolver library containing a resolver construct that
1. Reads the (operator) bundle universe from a cluster
2. Runs a [SAT solver](https://en.wikipedia.org/wiki/Boolean_satisfiability_problem) over an expression compiled from all bundle constraints
3. Returns the set of installable, non-conflicting bundles to OLM to manage, or an error.
Each time a new Subscription is created, OLM will eventually run the resolver in-cluster as part of its catalog operator's reconciliation loop. If there is an error, that error is persisted to the Subscription's status. Otherwise the catalog operator will update all Subscriptions with the latest bundle for the specified channel; this includes new Subscriptions.
## Problem statement
The above scheme works well enough currently. Now it is time to evolve the resolver into a more user-friendly system. This can be done by observing issues with the resolver surfaced by user feedback and knowledge gained by the OLM team via operation:
- Despite error reporting in a Subscription's status, there is no way to run the resolver without installing OLM onto a cluster then setting up a CatalogSource and Subscription.
- Visibility of errors to users on-cluster is poor.
- With OLM being monolithic, it is difficult to comprehend how constraints on bundles in a catalog are translated into a language the resolver can understand.
- There is no compartmentalized interface for users to read and therefore help with debugging resolution failures.
- The resolver is not easily extended to handle new constraint types.
- Constraints cannot be specified after building a catalog image, as they are all content-addressed within the image.
- Declarative configs do not solve this issue, just hoist constraints to a single location.
These long-winded problems with the current resolver can be reduced down to a few concise points:
- Little result insight, across all user types.
- No well-defined interfaces.
- No runtime configuration options.
- No extensibility.
Deppy and its APIs intend to solve these issues, either now or later via inherent extensibility.
### Notes:
- Who has access to the resolved dependency, ordering of dependency vs depender installation
- Prior art (in part): https://issues.redhat.com/browse/OLM-2136
- OLM resolves in a single ns. It should resolve across entire cluster
- Exposing "new" graphs created by each resolution run to user; what graph is connected, what operators are upgrading (disconnected)
- Deletion of a dependency is not captured currently
- Independent Subscriptions are currently created per dependency
- Operator API or rukpak may need to "understand" this state and communicate that to the user
## New APIs
![API diagram](https://i.imgur.com/as7prt8.png)
- A `ProvisionerClass` associates a controller and configuration with a name referenced by `Instance`s; this named controller is called a _provisioner_.
- RukPak defines the `Bundle` and `Instance` APIs:
- An `Instance` object specifies a `Bundle` spec, provisioner, and runtime configuration for operator installs.
- A `Bundle` object references remote or local content like a bundle image.
- Creation of an `Instance` object triggers the provisioner to store referenced content in a `Bundle`, ex. from an object of deppy's `Input`.
- Deppy defines the `InputClass` and `Input` APIs, and implements the `olm.resolveset` provisioner's controller.
- An `InputClass` defines a schema for an `Input` by name.
- Creation of an `Input` object triggers deppy to continuously resolve the object's spec, ex. a catalog (image).
## Deppy design pt. 1: APIs, provisioner, and resolver controller
Deppy provides an API for resolving constraints over catalog content (see: [declarative index format](https://github.com/operator-framework/enhancements/blob/master/enhancements/declarative-index-config.md)).
The on-cluster installation of Deppy also includes:
- Continual (optional) resolution against catalog content, to keep installed packages up-to-date.
- An API for defining constraints at runtime (`Input`)
It may also include:
- A RukPak `provisioner` and default `ProvisionerClass` for `deppy.resolveset` bundles.
- A RukPak Approval plugin that provides Android-style update policy
### Use-cases
* Keep package, and its dependencies, up-to-date (OLM subscription)
* Keep package, but not its dependencies, up-to-date
* Dry-run
* Install pinned version of a package/operator
* Install, and keep a package updated, up to a version
* Install package within a range of versions
* Force the installation of a package
* Surface resolution errors
### Input API
The `Input` is the entry point into Deppy and feeds constraints to it. It describes the intent by a user to have a package installed on the cluster. The package could come from a channel, or fit other constraints, e.g. version, or property criteria. An `Input` will also have an `inputClass` that defines behviour (e.g. dry-run, subscription, etc.) and, optionally, additional constraints.
```yaml=
apiVersion: deppy.io/v1
kind: Input
metadata:
name: plumbus
spec:
inputClass: subscription
constraints:
# most common - equivalent to a Subscription
- type: olm.catalog
value:
name: community
namespace: my-ns
package: plumbus
channel: stable
# other examples of constraints - less common, but useful
# lock to a specific version by package/version
- type: olm.packageVersion
value:
package: "plumbus"
version: v2.0.0
# lock to a specific version by name
- type: olm.name
value: "plumbus.v2.0.0"
# restrict to a range of versions
- type: olm.packageVersion
value:
package: "plumbus"
version: ">2.0.0 <3.0.0"
# arbitrary rego queries
- type: olm.rego
value: "semver.Compare(minOCPVersion, 4.8.0) == 1"
status:
# instances with properties that match the constraints
instances:
- name: plumbus.v2.0.0-alpha
# other instances satisfied by this instance
satisfies:
- kind: Instance
name: other-operator
apiVersion: rukpak.io/v1
meets:
- {"olm.constraint": {"olm.name": "etcd-operator.v0.9.3"}}
# dependencies introduced by this instance
dependencies:
- kind: Instance
name: prometheus-operator-abc
apiVersion: rukpak.io/v1
meets:
- {"olm.constraint": {"olm.gvk": {"group": "manufacturing.how.theydoit.com", "version": "v1alpha1", "kind": "Grumbo"}}
- {"olm.constraint": {"olm.label": "LTS"}}
# isntance-specific conditions
conditions:
- type: DependenciesMissing
status: True
reason: Only 2/3 dependency constraints are met.
message: "etcd-operator has the following constraints: X,Y,Z. X and Z are satisfied by prometheus-operator-abc, but no instance satisfies Y"
lastTransitionTime: "2019-09-16T22:26:29Z"
# conditions about the input itself
conditions:
- type: ResolutionFailed
message: "unable to find a solution that matched constraints: X requires Y, but Z requires !Y, X and Z are mandatory"
status: True
```
Note, as an alternative, name the `Input` after the package it seeks to install and use the constraints to further refine which bundle version(s) to consider during resolution.
```yaml=
apiVersion: deppy.io/v1
kind: Input
metadata:
name: plumbus # <- package name
spec:
inputClass: subscription
# packageName: plumbus # <- if we aren't happy with naming the Input after the package
constraints:
# most common - equivalent to a Subscription
- type: olm.catalog
value:
name: community
namespace: my-ns
channel: stable
# other examples of constraints - less common, but useful
# lock to a specific version by package/version
- type: olm.packageVersion
value: "v2.0.0"
# lock to a specific version by name
- type: olm.name
value: "plumbus.v2.0.0"
# restrict to a range of versions
- type: olm.packageVersion
value: ">2.0.0 <3.0.0"
# arbitrary rego queries
- type: olm.rego
value: "semver.Compare(minOCPVersion, 4.8.0) == 1"
```
Then a minimal working `Input` might be:
```yaml=
apiVersion: deppy.io/v1
kind: Input
metadata:
name: plumbus
```
Which would take the default input class and get the latest version of the package. Or, even more succinct, make the name of the `Input` the name of the package.
### InputClass API
Input class defines a configuration for an `Input`, for instance, installation behavior or additional constraints.
```yaml=
kind: InputClass
apiVersion: deppy.io/v1
metadata:
name: <some unique name>
parameters:
...
```
Example:
```yaml=
kind: InputClass
apiVersion: deppy.io/v1
metadata:
name: force
parameters:
force: true
generateInputsForDependencies: false
dryRun: false
trace: false
constraints:
- type: minKubeVersion
value: >= 1.20
- type: minOCPVersion
value: >= 4.7
```
### Deppy Controllers
Depending on the direction of the dependency (non pun intended) between RukPak and Deppy, we have identified a couple of options:
#### Option 1. Deppy depends on RukPak
This is more or less what the original design described. The deppy operator ("deppy-operator") will have two controllers:
- The "provisioner" controller lifecycles a resolved set of operators.
- This is basically a slimmer version of OLM's "olm-operator".
- The "resolver" controller consume the [deppy resolver library](#Deppy-design-pt-2-resolver-library-and-CLI) to continuously resolve a CatalogSource's referenced contents.
- This is basically a slimmer version of OLM's "catalog-operator".
#### Resolver controller
The resolver controller is in charge of reconciling `Input` objects. The resolution output will be written to the `status` and, if the resolution is successfull, a `olm.resolveset` Bundle will be created containing the resolved bundles from the catalogs [[example](#Appendix-I-ResolveSet-Bundles)].
| Watches | Creates | Reconciles |
| ------------- | ------------------------- | ----------- |
| Input | Bundle (`olm.resolveset`) | Input |
| InputClass | | |
| CatalogSource | | |
#### Input Reconciliation path
1. Build bundle cache
1. Collect index content (Bundles)
2. Collect cluster state (Instances)
2. Build constraints: collect Input + InputClass dfns
3. Call resolver
4. Unfurl results
5. Create `olm.resolveset` Bundle with resolved dependencies
6. Update Input status
#### Provisioner controller
The provisioner controller will create `Instance`s referencing the `olm.resolveset` bundle created by the resolver, installing the content on the cluster.
| Watches | Creates | Reconciles |
| -------------------------- | ---------| ------------------------- |
| Bundle (`olm.resolveset`) | Instance | Bundle (`olm.resolveset`) |
| Instance | Bundle | Instance (?) |
#### Bundle Reconciliation path
1. Take new bundle
2. Create `Instance`s for each bundle
3. Update Bundle and Input status
#### Option 2. RukPak depends on Deppy
#### Resolver controller
| Watches | Creates | Reconciles |
| ------------- | ------- | ----------- |
| Input | | Input |
| InputClass | | |
| CatalogSource | | |
#### Reconciliation path
1. Build bundle cache
1. Collect index content (Bundles)
2. Collect cluster state (Instances)
2. Build constraints - collect Input + InputClass dfns
3. Call resolver
4. Unfurl results
6. Update Input status
A RukPak controller would then be in charge of monitoring the `Input` status and installing the resolved bundles.
## Discussion Questions
- Is reconciliation "one-shot", i.e. take into account all available inputs and their constraints, or step-wise/staggered: only look at one input at a time.
- What does deletion look like? Will deleting an Input (or Operator) lead to the underlying operators being deleted? Will the deletion cascade?
- What if I installed an Operator with subscription mechanics and don't want updates anymore?
- You don't delete the Operator.
- You update the Operator's spec with your new intent.
- What happens if I delete an Input?
- Nothing. Updates just stop? Also for deps?
- Everything gets remove, inc. unused deps?
- What happens if I delete an/all InputClass?
- Nothing except relevant Inputs move to an unreconcilable state?
- What happens if I delete a/all CatalogSource?
- Nothing except relevant Inputs move to an unreconcilable state?
- What happens if I delete an Instance?
- The provisioner recreates it based on the resolveset bundles?
- What happend if I delete a (resolveset) Bundle?
- The resolver re-creates it?
- Does the provisioner understand all of the bundle types in the resolveset? Or does it just create instances and hope there's a provisioner there to field the Instance?
- Is reconciliation "one-shot", i.e. take into account all available inputs and their constraints, or step-wise/staggered: only look at one input at a time.
- Can catalogs contain bundles of different types?
## Deppy design pt. 2: resolver library and CLI
The current [`resolver` package](https://pkg.go.dev./github.com/operator-framework/operator-lifecycle-manager@v0.19.1/pkg/controller/registry/resolver) in OLM defines a resolver type with a method to resolve a set of namespaces, CSVs, and subscriptions.
```go=
type SatResolver struct {
cache cache.OperatorCacheProvider
log logrus.FieldLogger
}
func (r *SatResolver) SolveOperators(
namespaces []string,
csvs []*v1alpha1.ClusterServiceVersion,
subs []*v1alpha1.Subscription) (cache.OperatorSet, error)
```
`SolveOperators` does a ton of internal stuff using its k8s client:
1. Snapshots the catalog namespace (assumes it is the first element in `namespaces`).
2. Gets all existing "installables" across a set of namespaces.
- Installables are just bundles with their constraints exposed as a first-class field.
3. Sets constraints on installables such that their GVKs/package versions are invariants.
4. Runs the [SAT solver](https://pkg.go.dev./github.com/operator-framework/operator-lifecycle-manager@v0.19.1/pkg/controller/registry/resolver/solver) over the set of installables.
5. Returns the set of not-installed installables as [operator `Entry`s](https://pkg.go.dev./github.com/operator-framework/operator-lifecycle-manager@v0.19.1/pkg/controller/registry/resolver/cache#Entry).
This procedure can be distilled into:
1. Gather the universe of bundle properties from a set of namespaces, both installed and not installed, via Subscriptions.
2. Associate bundles with constraints from their properties to create installables.
3. Solve this set of installables for a set of new bundles to install.
### New library
Here I propose splitting the on-cluster and off-cluster steps into two functions:
```go=
func ListClusterInstallables(
client kubernetes.Interface,
namespaces []string) ([]cache.Installable, error)
func ResolveInstallables(
installables []cache.Installable) (cache.OperatorSet, error)
```
Separating out these two steps allows for off-cluster evaluation of a set of constraints on some input.
I also propose an object-to-installable transformer that takes an object that has constraints and extracts them into an installable:
```go=
type Constrainer interface {
GetConstraint(v json.RawMessage) (cache.Constraint, error)
}
var converters map[string]Constrainer
func RegisterConverter(typ string, c Constrainer) {
converters[typ] = c
}
func ToInstallable(id string, obj interface{}) (cache.Installable, error)
```
`ToInstallable` would accept bundles or a list of constraint properties and convert them using the registered `Constrainer`s for those property type(s). This pattern is similar to Kubernete's scheme-d object conversion functions.
### CLI
The `deppy` CLI tool will use the off-cluster APIs to resolve catalog data plus optionally a set of constraints into a set of bundles to install. The output would be in declarative config format:
```shell=
$ deppy solve --catalog registry.com/my/catalog:latest [--constraints extra-constraints.yaml] -o json
{
"schema": "olm.bundle",
"name": "foo.v1.0.0",
...
}
```
The catalog data can either be in image or file format.
You can optionally provide a set of installed bundles as a declarative config to add invariants:
```shell=
$ deppy solve --catalog registry.com/my/catalog:latest --installed ./installed-dc-dir -o json
{
"schema": "olm.bundle",
"name": "foo.v1.0.0",
...
}
```
#### Discussion points:
- Right now it is hard/impossible to surface what resolution failed, or what steps were taken to get to a particular result. Is this problem surmountable by some `deppy debug` mode that only resolves one "layer" of dependencies at a time? Ex. if a dependency has a dependency, run the resolver twice: one on the first set of constraints, then the next on those constraints + the second level dependency's constraints.
### General/Random/Crazy Questions
- What would an Operator API monolith look like?
- Would it make sense to express this architecture as part of a single controller and expose only a single API? (have fewer repos, and apis to manage, iterate faster, etc.). Then, once we have more confidence split it up into its components and APIs.
- Why do we mix installation and automation?
- As a user I'd like to just specify a list of packages to install on my cluster
- I'd like to control settings such as catalog filters (restrict the universe of bundles in some dimension(s)) and the automation in, perhaps, a hierarchical way: catch all defaults + package level overrides iff necessary.
- Migration strategies?
1. Remove current in-tree resolver library, run deppy-operator, OLM relies on creating/reading `Input`s.
2. Use deppy library instead of current in-tree resolver library.
- \[jlanford] I posit that Deppy will need to know how to translate constraint properties to SAT expressions. Therefore, versions of Deppy will have specific knowledge that certain property `type`s are actually constraints, whereas other property `type`s are opaque non-constraint properties. How should this version skew be handled when new constraint types are introduced?
- Naively, unknown constraints will be treated as opaque properties and will therefore be irrelevant to resolution. Is that a good UX?
- I would propose that we teach deppy what the shape of a constraint property is, so that if it sees a constraint it doesn't recognize, it has a chance to do something other than ignore it altogether.
- \[jlanford] The SAT solver can produce multiple results. (e.g. install a single package `A` that depends on `B` in range `>=2.0.0 <3.0.0`. If B has versions `2.0.0` and `v2.1.0`, there are two solutions)
- Today OLM uses heuristics to choose if there is more than one result. Are we bringing those same heuristics into Deppy?
- We have to solve this somehow in Deppy to actually produce a single resolveset bundle.
- I heard: Use more constraints to filter until there's a single solution.
- We don't have a concept of a "sorting" constraint do we? How would a user even say "sort the solutions such that the chosen result maximizes semver" or "sort the solutions such that the chosen result minimizes channel depth" or "ignore solutions that include pre-release versions" or some combination of those.
- What if a cluster admin needs a complex heuristic and none of the constraint types are capable of executing my heuristic? (e.g. sort the solutions to maximize the semver of every selected bundle, but for B choose the second lowest semver bundle that also has a property `foo==bar` and who's CSV sha256 hash % 8 == 1.)
- Proposal: Deppy defines an API, something like `SolutionChoice` that is used to select from a set of solutions when multiple are found. Deppy ships with a default implementation that uses the current OLM heuristics.
### Appendix I. ResolveSet Bundles
Taken from the [e2e strawman](https://hackmd.io/upiNuoeJTwqNKQJCVMZADw)
```yaml=
kind: Bundle
metadata:
name: etcd-operator.v0.9.3
spec:
class: deppy.resolveset
refs:
- file://content
volumeMounts:
- mountPath: /content
configMap:
name: resolved-654adh-content
namespace: olm
status:
unpacked: NotStarted | InProgress | Done
objects:
- /objects/resolved.bundles.json
---
apiVersion: v1
kind: ConfigMap
metadata:
name: resolved-654adh-content
namespace: default
data:
resolved.bundles.json: |-
{
"schema": "bundle.v1",
"packageName": "quay.io/operatorhubio/etcd",
"path": "quay.io/operatorhubio/etcd:v0.6.1",
"version": "0.6.1",
"properties": [
{
"name": "pivotFrom",
"value": "etcd-v0.6.0"
},
{
"name": "olm.gvk",
"value": {
"group": "etcd.database.coreos.com",
"version": "v1beta2",
"kind": "EtcdCluster"
}
},
],
"channels": [
"alpha"
],
},
{
"schema": "bundle.v1",
"packageName": "quay.io/operatorhubio/prometheus",
"version": "1.0.0",
"properties": [
{
"name": "olm.label",
"value": "LTS",
}
]
}
```
```yaml=
kind: ProvisionerClass
apiVersion: rukpak.io/v1
metadata:
name: olm.resolveset
provisioner: operators.coreos.com/olm
parameters:
approval: AllowAll | DenyAll | Android
# for a given bundle entry, determine what ProvisionerClass to use when stamping out a Bundle
matchBundle:
- selector: ".schema == olm.bundle"
class: olm.bundle
- selector: ".schema == helm.chart"
class: helm.bundle
# equivalent to above when picking ProvisionerClass based on schema
matchSchema:
- schema: olm.bundle
class: olm.bundle
```