--- tags: upstream-meetings title: OLM V1 Channels Brainstorming Session status: provisional --- # Design Discussion: OLM V1 Channels Support ## Purpose for the meeting - The purpose of this meeting is to discuss the concept of channels from OLM V0. - Channels today allow users to basically describe any possible upgrade channel. - Are channels as a concept something we want to support in OLM V1. ## Background Channels in v0 were 'wide open' provided that participants fulfilled the schema, and had to be explicitly stated. With FBC, even fewer restrictions exist. The default channel provided the source for requirements/dependencies for resolution, with no inherent promises of maturity. Without conventions on a version format, it is left to each operator author to undertake channel definition to describe their upgrade graph. ## Thesis Is there a better way? Can we embrace certain fundamental properties which make channels more intuitive / less prone to confusing failure modes? ## Meeting Discussion - [Joe L] Operators usually define a "production" and an "early access" channels, but it is very difficult for users to consume. Is it worth reinventing how OLM V1 defines upgrades in an effort to make it easier to consume operator upgrades. - Maturity based channels vs semver based channels - [Chris J] From IBM's perspective, they've bought into major/minor semver channels. - Backing away from replaces except where required for unskippable versions (very rare!) - Moving towards SkipRange - Believes that the existing workflow is working well. - Is working on a merge tool to combine multiple FBCs. - Overall is comfortable with channels the way they are - [piotr Godowski] IBM supports Third party operators in their catalog. - IBM cannot force these third parties to adhere to some standard. - There are cases where customers have valid reasons to break from these norms, for instance there are instances where a channel contains operator versions that are related, but they are not tied to the channel name. - [Joe L] Unless we formalize some set of norms, the cluster admin using OLM is going to have a disjointed user experience where: - Upgrades within a channel may or may not require manual steps, (in general only automatic upgrades should exist in a channel) - [Todd Short] Do you still find it difficult - [Chris S] IBM was involved early, and documentation has gotten significantly better, but the docs are very technical and still yield questions today. - [Daniel M] Skips and SkipRange offer very fine tuned control but are conceptually very challenging to consume.This would be significantly easier to consume if there was some UI that allowed users to see that they are building a "line" of version upgrades. - Ultimately, our move to FBC is going to provide an opportunity to provide a better user experience. - Channels are basically "enterprise promises" about what you're agreeing to install onto your cluster. - Even Joe's comment on "upgrades within a channel should be automatic" isn't necessarily true. There may be a need to notify users about stuff in an upgrade. - [Chris and Piotr] In general, using channels as a mode for grouping versions of operators based on stability is supported. - There is a miss in visibility in OLM V0, where installable content is unviewable by operator authors. - [Daniel] The goal of channels is to basically communicate how much "surprise" a user may experience after subscribing to a channel: - Fast: New releases, could have bugs. - Stable: Slower releases, less likely to have bugs. - [Piotr] maturity is not the only grouping model, for e.g. when we have stable-v`X` and stable-v`X+1` channels we don't want automatic upgrades to cross the major version threshold, but catalog-as-maturity could have both major versions in it, and how would we prevent admins from accidentally crossing the threshold? [Joe L] There is a concern that by introducing an "Ack system" that allows operator authors to communicate the need for a manual step may promote "worst practices". There was a large discussion around two types of users: - We acknowledged that there is occasionally a need for cluster admins to perfo - Involved cluster admins that would appreciate a mecha [Joe L] discussed concerns about having manual steps (ACK's from here on out) to perform an upgrade. - [Daniel M] There are two groups: 1. Users that would want to view the ack, perform manual steps, kickoff upgrade 2. Users that manage many clusters and now need to perform a manual step on all clusters, where an ack would be distruptive. [Joe L] If we don't have some standard, there is a chance that it could be abused. We basi [Alex] Should we pursue a moddle where one can restrict the acceptable channel names within a catalog? - We could have fast/candidate/stable channels as default. - We could create documentation that explains why we believe that these channel names are "best practice". - Allow overrides at the catalog level. [Anik] We centered the entire disucssion around the assumption that there are only three channels (fast, stable, preview), but reality is that operator authors have gone wild wild west when left to defining own channel names..stable.1.x, stable.2.x, channel.4.11, channel.4.12 etc being eg of some of the names. When support request is received, have to spend significant amount of time understanding the channels and how the upgrade graphs are laid out in those channels. So sounds like what we're talking about is channels are importnant concepts, but we want to define a set of channels ourselves -- [Daniel] We've tried to do this before by envisinoning something and asking the world to educate themselves about those concepts, only to realize that that's now how the world wants to work, so we had to evolve to meet the world where they already are, and that's the reason for some of the complexities around channels today, due to the evolution process -- [Anik] The thing that jumps in my head from that is how we're defining extension mechanisms for other components of OLM v1. i.e we define a set of channels, but also enable new set of channels to be introduced via extension mechanisms when operator authors want them, so that we're not trying to encapsulate ALL of the edge cases in the world while defining our set of channels. ## Takeaways - Channels are an important concept that should be included in OLMv1. ## Open Questions/Talking Points - Ability for operator authors to deprecate a channel. - Removal of default channel concept? - Which channels should we consuming updates of transitive dependencies from? - Are channels defined as properties on bundles or as a separate channel-specific API? - Should intra-channel upgrades always be fully automatic? Or should we allow operator authors to ask for acks to consume upgrades within a channel? Larger question: where is the right place to allow operator authors to require acks? - Will we have an opinion on channel naming and channel schemes? - semver-based channels "1.1", "1.2", "1" - maturity-based channels "candidate", "fast", "stable" - some combination of semver and maturity - something else? - Will our opinions be enshrined in FBC templates or directly in the FBC API consumed on cluster? - if the former (FBC templates), what _is_ the FBC API consumed on cluster for channels? - How are upgrade edges defined within a channel? Are they implicit based on semver? Or do catalog maintainers have the ability to explicitly define edges? - Do channels have to have a single "head"? - Do channels have to have an explicit sortable preference ordering (a replaces chain)? Or is it implicit (semver)? - Do we need to steal the rpm concept of rpm releases? i.e. there's the source code version, and the release of the bundle itself? (this would support bundle rebuilds of the underlying software for things like security patches of base images) - Should we have a stated recommendation about when to put content into a different channel vs when to put it in a different catalog? - [joelanford] I think this is important in the context of consuming dependency versions. Somewhere somehow, cluster admins likely want to be able to have a policy that applies to resolution along the lines of "only resolve stuff that is supported"? So is "supported" implicit in particular catalogs or explicit in channels and/or bundles? ## Chat Messages 06:16:01 From CNCF Operator Framework Lifecycle Manager : What did you say was your main point @piotr? 06:17:02 From Piotr Godowski : My main point was that operator versions are related, but shall not be tights to the channel names 06:17:09 From Piotr Godowski : *tied 06:17:10 From CNCF Operator Framework Lifecycle Manager : Ack 06:18:19 From Chris Johnson : I think composing custom Catalogs even for 3rd party operators helps here. In the EDB case, we could pre-curate a catalog with the channels we want. 06:19:23 From Piotr Godowski : Replying to "I think composing cu..." Could be, perhaps šŸ™‚ 06:19:24 From Jordan : If ā€œI don’t know what any of your graph vertices mean so give me an explicit ordering/relationshipā€ is the problem, then does the solution include opinionated version identification (possibly with an escape hatch for legacy behavior)? 06:23:56 From Piotr Godowski : Fully agree that the ā€˜channel’ concept is not something that enterprises really acknowledge. Enterprises things ā€œversionsā€ not ā€œchannels" 06:24:08 From Chris Johnson : Reacted to "Fully agree that the..." with šŸ‘ 06:26:54 From huizenga : We have found that we need to add orchestration around what we call channel hopping and how we use subscriptions 06:27:12 From Chris Johnson : We can simulate channels by building multiple catalogs. However, I think that's a bit coarse. I would like to keep channels. 06:27:38 From Piotr Godowski : I’d like to keep channels, BUT expose operator versions and the upgrade graphs for admins to understand and action on 06:27:56 From Chris Johnson : Reacted to "I’d like to keep cha..." with šŸ‘ 06:28:06 From Jordan : If ā€˜maturity’ is an attribute of the catalogSource rather than the package+channel, then an enterprise accesses an aggregate promise for all included operators in that catalogSource. Then the dominant version from the ā€˜stable’ catalogSource will always be the best choice for an install/upgrade. Intuitive for cluster admins. 06:31:40 From Jordan : catalogs { stable, updates, rawhide } 06:32:24 From Piotr Godowski : But channels also might not only model ā€œmaturityā€ but different major versions, both in support. So think about 2 stable channels 06:34:11 From Jordan : Yeah, so in terms of avoiding freely promoting an operator across a major version bump, this is not something we’d typically want to do, and multiple channels solve that problem because you stay in your major-version stream. Hmm.... 06:35:06 From Piotr Godowski : 100% agree with Daniel 06:41:12 From Jordan : Do we want to tie the ack mechanism to the channel definition, rather than the bundle (version)? 06:41:42 From Joe Lanford : That’s what I’m getting at. 06:41:44 From Joe Lanford : @jordan 06:41:49 From Jordan : It feels more intuitive on the bundle, not the channel. 06:43:21 From Piotr Godowski : Regardless of the UX of the ā€˜approval’, there will be always a need for admins being able to strictly control what upgrades are ok which are not 06:43:32 From Joe Lanford : Yeah agreed. 06:43:47 From Piotr Godowski : Plus, some vendors might need that ack, to indicate e.g. time-intense or compute intense of IO intense operator at upgrade (like db schema optimisation) 06:44:08 From Joe Lanford : In my mind this is an incentive for operator authors to maybe do extra work in their operator so that they can push that new version to an existing channel and the operator does the migration 06:45:03 From Piotr Godowski : From my perspective, as an operator vendor, we wouldn’t be so much incented, honesly 06:45:31 From Piotr Godowski : We focus on making the upgrades as smooth as possible, yet sometimes we need to have some controls, on exception basis 06:46:50 From Daniel Messer : Anik had his hand up for a while 06:47:22 From Piotr Godowski : My apology for not following the hand raised protocol. I’m sorry. 06:47:36 From Anik : No that’s okay this is good discussion 06:47:54 From Jordan : Engineering to encompass every exception is what’s led us to this (really complex) construction today. 06:48:31 From Jordan : +1 Joe! 06:51:16 From Piotr Godowski : I think we need to have a definition of channel as a starting point. 06:51:36 From huizenga : +1 on opinionated channel names, semver has starting working well for us 06:52:29 From Joe Lanford : Semver aligns somewhat closely to maturity, but its not quite 1 to 1. 06:53:25 From Jordan : Agree with Piotr.. .ā€what does a channel provide?ā€ is something we have to start with, to see if we have opportunity to handle differently, or just be more opinionated. 06:54:33 From Joe Lanford : I think definition I heard is: ā€œchannel provides operator authors a way to deliver content to cluster admins at various cadencesā€ 06:55:26 From Piotr Godowski : To me, the channel is a logical group of operator versions. And operators can declared the allowed / accepted upgrades which are easy to understand to cluster admins. Upgrades between the channels shall be also easy. 06:55:43 From Jordan : Also what we identified is that it also provides a mechanism to constrain those updates for e.g. within a major version range. 06:55:51 From Anik : That’s well said Daniel 06:55:58 From Piotr Godowski : Reacted to "That’s well said Dan..." with šŸ‘ 06:57:35 From CNCF Operator Framework Lifecycle Manager : Isn’t that solved at the FBC level? 06:58:28 From Anik : Yea I think that was just an example of ā€œcan’t think of all the edge casesā€ 06:59:16 From huizenga : Not to complicate things more but should we factor in the concept of operand versioning? 06:59:38 From Anik : That’s feels like another hour of discussion šŸ˜… 06:59:46 From Piotr Godowski : Reacted to "That’s feels like an..." with šŸ˜‚ 07:00:02 From Daniel Messer : FWIW I don’t see how operand versioning relates 07:00:13 From Piotr Godowski : Shall we call out the open questions identified here? 07:00:16 From Anik : Yea that’d be my first question in that hour 07:00:57 From Todd Short : I keep hearing about ā€œoperator writers messing upā€, and that ā€œOLM has to fix itā€. Is there a way to get out of that business? 07:01:15 From Joe Lanford : +1 daniel 07:01:29 From Piotr Godowski : +1 too to Daniel šŸ™‚ 07:01:56 From Joe Lanford : There’s also some important convos to have related to channels and dependencies 07:02:01 From Daniel Messer : or: ā€œthis channel is marked deprecated, please switch to the ā€˜awesome’ channel to continue to receive updates" 07:02:22 From Joe Lanford : +1 about deprecations 07:02:36 From Piotr Godowski : Replying to "or: ā€œthis channel is..." Or - your current ā€œStableā€ channel is EOS’ed soon - why don’t you jump to the ā€œnew stableā€ ? 07:03:31 From Daniel Messer : we should also be able to remove the defaultChannel concept 07:03:41 From Piotr Godowski : Have to drop. I must admit I enjoyed this, thank you guys. Apologies Anik again for not following the raised hand rule. 07:03:42 From Jordan : I think there’s a big convo to have about how we traverse even a semver chain. 07:03:46 From Piotr Godowski : Take care guys 07:03:52 From Jordan : Thanks Piotr! 07:03:54 From Daniel Messer : Reacted to "Take care guys" with šŸ‘‹ 07:04:13 From Jordan : IMHO a mistake to put channels as a bundle property. 07:04:24 From Daniel Messer : yeah, would lead to a lot of channels 07:04:42 From Jordan : Centralization forces (some level of) conformity/uniformity.; 07:04:44 From Daniel Messer : stable vs. Stable vs. StAbLe 07:05:00 From huizenga : Need to drop, thanks 07:07:19 From Daniel Messer : so, what do we leave with here? 07:07:36 From Daniel Messer : let’s start an issue or discussion or doc upstream to list the various entities we want to see 07:07:44 From Anik : Reacted to "let’s start an issue..." with šŸ‘ 07:08:23 From Jordan : Like forcing everyone to use semver provides an implicit channel-like mechanism, but we still need to take care to traverse.