owned this note
owned this note
Published
Linked with GitHub
---
tags: Notes
---
# Airship Beyond 2.0
## Document Management
Using Phases, and the document model still leaves gaps in terms of facilitating docmeent managemeent.
Do we need to. have a frther role on document generation
i.e. We have HostGenerator's etc.
Goal: make life easier for the new Airship user
* Discovery of infrastructure elements, and automated generation of matching declarative intent, e.g. via Redfish
* What would the user provide, and what would be discovered?
* E.g. redfish addresses and credentials provided, and then generation of BMH definitions
* Or, generation of a catalogue
* Couple approaches,
* airship centric (generate catalogues) and
* E.g. airshipctl querying hosts over redfish
* Metal3 centric (generate BMH objects directly).
* E.g. airship -> Metal3 -> Ironic Inspector
* Document analysis automation
* As an operator with many sites, how can I query across my declarative intent?
* Certificate expiration across sites
* Already in progress for single-site level
* BMH characteristics (hardware capabilities, versions)
* Could leverage the per-site inventory
* TODO: look at BMH and think about what would be valuable
* Metrics around tenant clusters
* What is the divergence/diffs among different sites?
* Is a site secure enough (e.g. encrypted docs)
* Am I over capacity w.r.t. workloads on hosts
* RFC: What would be useful for your organization?
* Cross-site intent analysis could show / compare workloads across inventory
### Initial site setup (take away manual kustomize pain)
* `airshipctl document init`
* Make me a site that looks like other sites
* Site templating of some kind
* Make me a site/type that has certain characteristics
* Can we distill an entire site into a catalogue
* Type would be the leaf
* Leverage generic phase definitions at type level
* A catalogue that generates a BAU base site definition that can be extended?
* ^There are several different approaches to do this:
* Ask the user what's special about their site
* Use automation to fill in the rest
* Infra discovery (discussed above)
* Bare Metal Validate
* Fail Fast/Early
* Hardware doesn't match what the intent expects
* Issue [#77](https://github.com/airshipit/airshipctl/issues/77) - Stale, needs updating
* possible approach: Add a phase just after ephemeral bootstrap to apply HCC across the full cluster, and make sure you have all expected labels before provisioning target cluster
* Need to investigate whether enhancement to BMC is required for this
### Declarative Discovery
We have been talking about the ability to interact with a set of servers , and generate what we need for the declaratie model. PErhaps this is the time too iintroduce. something like that.
Extending the value of the redfish interaction to build / generate the appropriate inventory of a green field site, or extend inventory with any random server set.
### Workload vs Infrastructure capabilities
Does Airship need or want to extend its role in the management of the software that runs on clusters deployed by airship?
* Airship UI could list out e.g. HelmReleases
* Cross-site intent analysis could show / compare workloads across inventory
* Distribution of workloads across sites
* Take cross-site intent and translate dynamically into applied site intent
* "Smart Phase"
* Activities/Responsibilities:
* Growing capacity (bursting)
* Scaling workloads on existing clusters
* Building CaaS clusters on the fly
* Movement of NFs from site to site (e.g. draining)
* Using airshipctl as a tool for mainly Lifecycling software instead of lifecycling clusters
***"I am a user that only care about the application, the clusters are there for me already"***
* PhasePlan could drive software delivery / dependencies order, etc
* Decoupled phases from type will help use this for software only
* PErhaps a treasuremap example/template for Software PhasePlans
* documet init that drives that treasremap template perhaps
* document phase add kind of thing that consumes/incorporates a workload
## Multi Region/Site capabilities
### What do we want from multi region
#### Discovery of State/Release
What is hapening with a site and the clsters within ig
Not from the perpsective of Telemtery ,but from the oerational views of interaction.
* A site being self-aware of its own lifecycle state
* How do we know "what's happening", e.g. CRs or admission controllers
* Goal: query the site for current state
* Applies to changes beyond phase executions -- e.g. admin/tenant activities
* States:
* Normal
* In the middle of an update (and how many phases in)
* In an error state
* In maintenance window (external entity-driven state)
* When in a maintenance window, admission controller would say "don't do this"
#### Assessment or Audting Abilitites
What releases are in a site
what is the software rnning on the clsters.
Are the differences between sites, and if so how do we synchronize address them
* Flux doesn't appear to have a CLI that lists HelmReleases nicely
* The Helm CLI lists deployed charts
* TODO: see what summarization tools/libraries exist for generic Kubernetes resources / CRs / application health
### Authorization/Kubeconfig Management
Do we need a Directiry Service fnctionlairty built around document's or cluster interactions
* CLI ability to pull declarative kubectl, extract & add your auth in
* Append site kubeconfigs into your local kubeconfig
* Use the document set to build a service
* Talk to a cluster to get the list of endpoints available
* TODO: discuss in the design call soon to take into account in WIP
1. Kubernetes directory service, using declarative intent as source of truth
2. Adding other kinds of endpoints to the directory service
## Centralized management
* Pull vs push model
* Need to solve for other things first before clarity here
* Defer
## Centralized or distributed CAPI
Nothing to do here, CAPI solves for both just fine
## Phase Plan as an airshipctl command
* airshipctl phase plan executor
* Needs to be idempotent
* Need one of:
* Ephemeral phase plan & target phase plan <- do this one
* As above but with multiple phase groups instead of plans
### Defer the below:
## UI Capabilties
## Day 2 Enhancements?
## Extend into Switch Fabric
- Interact with switch fabric operators
- Declarative Fabric / Integration with Infgrastructure context.
## Treasuremap
* Extends types of sites