---
tags: app platform
---
# App platform review
start date: 2023.03.15
author: piontec
## Intro
This document is meant to summarize our ideas around problems with and future of
app platform in Giant Swarm. In this part, we focus on the controllers part of
the app platform (app-operator, chart-operator, app-exporter, app-admission-controller).
Currently, we're aiming at two new strategic goals:
- extending our managed apps offering into a developer platform
- making our managed components run on EKS/AKS clusters
we have to make a conscious decision about the app platform future. This especially
means we have to compare it with FluxCD, which we use more and more and rely on it anyway.
The most important question that we try to answer here is whether it is worth to keep
`App` CR as our user-facing interface, or should we entirely switch to `HelmRelease` CR
handled by `helm-operator` from the FluxCD project.
## Follow-ups and references
- considerations about [implementing missing functionality of app platform with flux](https://hackmd.io/gqF6PT53RcG_i4OhrmtWIw)
## app-platform operators vs. helm-controller comparison
To make the decision easier, let's compare features of `app-operator` and `helm-controller`.
### Features comparison
| Feature | app-platform | helm-controller |
|---------|:------------:|:---------------:|
| delivery of apps to remote clusters | yes | yes |
| support for apps as helm charts | yes | yes |
| dependencies between apps | yes | yes |
| automatic app upgrades | no | yes, configurable for Helm Charts and image versions |
| report when app upgrade is available | possible with `AppCatalogEntry` CRs | yes/no (flux can do discoveries for automated upgrades; might be doable using notification events, but there's no such thing as a report) |
| extended metadata | yes | no |
| OCI registries support for helm charts | yes (but still requires at least 1 HTTP one) | yes |
| signature (cosign) verification of helm charts | no | yes |
| private catalogs support | no | yes |
| automated upgrade of controllers | yes | yes |
| merge multiple layers of configuration | yes | yes |
| monitoring | yes, with app-exporter | yes, exposes HelmRelease status metrics |
| app configuration admission controller | yes, with `app-admission-controller` | no |
| pausing release for disaster recovery | yes | yes |
| helm catalog discovery | yes (ACE CRD) | no |
| works behind proxy and in air-gapped environments | yes| yes|
| delivered with software supply security | no | [yes](https://fluxcd.io/flux/components/helm/api/#helm.toolkit.fluxcd.io/v2beta1.HelmChartTemplateVerification) |
| can upgrade CRDs on installation | no, needs tricks like Job or 2 phased install | [yes](https://fluxcd.io/flux/components/helm/helmreleases/#crds) |
| helm repo fail-over | yes | no |
## App platform known issues
Using app platform controllers has these extra issues that we're aware of:
- obvious: we have to maintain it, as we're the only developer
- complex architecture: multiple app-operators running on each MC, with different configuration modes and different RBAC permissions; this is hard to manage and debug
- dedicated `chart-operator` running on WCs, that is not really needed
- security: as architecture is complex and we use different security RBAC permissions for different modes, security is hard to manage
- scalability/performance - we use a very old operator framework, which is mostly based on reconciliation, not watchers, and thus tends to be slow
- app-bundles are a disaster and we would like to get rid of them
- it's impossible to tell what cluster is the bundle targeting: for bundle to work, the bundle App CR has to target unique `app-operator` on the MC; then, internal App CRs are extracted by it from the Chart and they have to be configured to target the destination cluster and handled by another `app-operator`, that is dedicated to that specific cluster,
- configuring app-bundle forces bundle developer to create an additional `values.yaml` file, that has to pass all the necessary values to internal App CRs
- controller ownership is complex, as a single app-bundle is handled and partially owned by multiple app-operators
- debugging is a nightmare, as it requires going the following path [app-bundle App CR] -> [unique app-operator logs] -> [app-bundle Chart CR] -> [chart-operator] -> [collection of internal App CRs] -> [WC app operator] - and that's only on an MC
## App platform pros
Even though Flux can do most of the things app platform can, there is some functionality missing. In particular, it includes:
- multi-repo fail-over: our `Catalog` CRs can point to copies of helm registries and `app-operator` can cycle through them in case any of them fails; to catch up with this on Flux side, we would have to implement it
- admission control: we have `app-admission-controller`, that can evaluate newly created apps and bloc/mutate requests; if we drop it, we either need to port the admission controller to work with `HelmCharts` instead, we try to write them in `kyverno` or we do a mix of both
- Helm repository discovery: with app platform, a user has access to information about what apps, in what versions are available in each `Catalog`. `helm-controller` does also a discovery of versions in repositories, but uses it only internally, to upgrade an image or helm chart version to a specific new version; to catch up on the discovery side, we would have to add a controller to Flux that can work with its `HelmRepo` controller and present stuff in the repositories.
- our extended metadata: with each Helm Chart release, we include extra info, like project's readme, values schema and extra metadata (like compatible platforms or installation restrictions); Flux doesn't have any feature like this; this is a more complex problem, with a possible solution described in the [implementation considerations doc](https://hackmd.io/@piontec/H1Wgt2zWh)
- app bundles - we use App CR to create Helm Charts that have sets of `App` CRs inside; in case we drop `App` CRs, this will need a replacement. Possible solution is also described in the [implementation considerations doc](https://hackmd.io/@piontec/H1Wgt2zWh).
## Possible solutions
It is pretty clear to us, that integrating, or replacing, our app-platform with Flux will be beneficial for us, as it will free us from maintaining at least part of our code base and will make our solution closer to what is a popular solution upstream, lowering support and development effort of the team.
Additionally, Flux main dev, Stefan Prodan, is working now on a CUE based [timoni](https://timoni.sh/) project, that will allow for building k8s manifests from CUE templates, with stuff like full CR validation using CRDs, native support for bundles and more. We can expect this to be fully integrated in Flux soon and that would improve our experience a lot as well.
Below, I'm presenting possible solutions to the overall problem of using Flux and app platform
### Do nothing
Pros:
- we avoid migration/switch effort
Cons:
- we're still on our code base that we don't have time to maintain
- we're missing on upstream development
This is not really an acceptable solution.
### Replace chart-operator with helm-controller, while keeping App CRs
Pros:
- nothing changes for users, our interfaces `App`, `ACE` and `Catalog` stay the same
- we (most probably) don't have to figure out a solution for the [missing flux features](#app-plaform-pros)
- we can easily extend `App` CRs with new features offered on top of what `HelmRelease` can do; this can include stuff like:
- advanced dependency management: checking dynamic dependencies coming from apps and APIs really deployed on a cluster
- we can extend the idea of `Catalogs` and `AppCatalogEntries` with new features
- attested test results (compatibility between platform release and app release)
- we can still use app-bundles, app-exporter and app-admission-controller
- if the world changes a lot, and drops Helm Charts (which suck), we can still keep `App` CR as the user facing contract and "just" translate it into something else
Cons:
- we still have the problem of using both `App` and `HelmRelease` CRs
- we still have to develop `app-operator`
- we're left with both sides: `App` and `HelmRelease` CRs
- `app-operator` will have to play the catch-up game with `helm-controller`
- debugging is still hard, as we have to check both `app-operator` and `helm-controller`
### Drop app platform controllers and migrate to helm-controller
Obviously, we can't just drop `app-operator` one day and replace everything with `HelmReleases` the same moment. This will require a migration path that will give Chart authors proper time to adjust to this change. This can be done by modifying `app-operator` to translate `App` CRs into `HelmRelease` and
then ask all the users to migrate to direct usage of `HelmRelease`. Once it's done, we can
remove `app-operator` completely.
Pros:
- we switch to upstream project and don't have to maintain our own set of controllers
- `App` and `HelmRelease` CRs are very similar, and it should be easy to migrate
- Flux offers incomparable security, with code security reviews and delivery pipeline security already done
- forces us to solve some unfortunate ideas like app-bundles or configs associated with `Catalog` object
Cons:
- we bet 100% on Helm Charts
- we will have to develop some extra components to cover missing functionality
- ability to automatically include cluster/installation-wide config in each `HelmRelease`; this is to replace Catalog config and set stuff like image registry URL
- admission and validation: we can try to rewrite `app-admission-controller` in Kyverno, but probably not everything will be possible to convert (compatibility extensions of `ACEs`)
- Flux has no discovery for Helm repo contents (our `ACEs`); we will probably have to extend and add it (use cases: user experience, notifications about available upgrades)
- Flux is very cautious about extending its code base. It might be a problem when trying to push new features to upstream.
## Recommendation
In my opinion, it's a bit hard to tell between replacing app platform entirely or only the `chart-operator` and leaving `App` CRs as the interface. Still, the important part is that both solutions start with exactly the same step: implementing `App` CR to `HelmRelease` translation layer, either as a final solution or a temporary one for the time of migration. As such, we should start planning and working on it.