Vince Prignano
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
Publish Note

Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

Your note will be visible on your profile and discoverable by anyone.
Your note is now live.
This note is visible on your profile and discoverable online.
Everyone on the web can find and read all notes of this public team.
See published notes
Unpublish note
Please check the box to agree to the Community Guidelines.
View profile
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
--- title: enabling-cluster-api-based-installations-via-openshift-install authors: - "@patrickdillon" - "@vincepri" - "@JoelSpeed" reviewers: - "@sdodson" - "@zaneb" approvers: - "@sdodson" - "@zaneb" api-approvers: - "None" creation-date: 2023-10-16 last-updated: 2023-10-16 tracking-link: # link to the tracking ticket (for example: Jira Feature or Epic ticket) that corresponds to this enhancement - TBD see-also: - "/enhancements/this-other-neat-thing.md" replaces: - "/enhancements/that-less-than-great-idea.md" superseded-by: - "/enhancements/our-past-effort.md" --- # Enabling Cluster-API-based Installations via openshift-install ## Summary This enhancement discusses how `openshift-install` can use cluster-api (CAPI) infrastructure providers to provision infrastructure for clusters, without requiring access to an external management cluster or a local container runtime. By running a Kubernetes control plane and CAPI-provider controllers as subprocesses on the installer host, `openshift-install` can use CAPI and its providers in a similar manner to how Terraform and its providers are currently being used. ## Motivation There are two primary motivations: 1. OpenShift Alignment with CAPI: CAPI offers numerous potential benefits; such as: day-2 infrastructure management, an API for users to edit cluster infrastructure, and upstream collaboration. Installer support for CAPI would be foudational for adopting these benefits. 2. Terraform BSL License Change: due to the restrictive license change of Terraform, `openshift-install` needs a framework to replace the primary tool it used to provision cluster infrastructure. In addition to the benefits listed above, CAPI provides solutions for the biggest gaps left by Terraform: a common API for specifying infrastructure across cloud providers and robust infrastructure error handling. ### User Stories - As an existing user/client of the installer, I want backwards compatibility so that I can continue to use the installer (e.g. `create cluster`) in the same manner and with existing automation. - As a security analyst, I want the installer image to be free of Terraform and related dependencies to decrease surface area for vulnerabilities. - As an advanced user or cluster administrator, I want to be able to edit the CAPI infrastructure manifests so that I can customize control-plane infrastructure. ### Goals - To provide a common user and developer experience when installing and developing across cloud platforms - To be backwards compatible and fully satisfy the requirements of install-config type APIs. - To keep the user experience for day-zero operations unchanged or improved. - To not require any new runtime dependencies. - To provide an extensible framework to plug-in new infrastructure cloud providers. ### Non-Goals / Future work - To retain full and strict backward compatibility with the infrastructure previously created with Terraform - To optimize build processes or binary size - To use an existing management cluster to install OpenShift - To pivot the CAPI manifests to the newly-installed cluster to enable day-2 infrastructure management within the cluster. ## Proposal The Installer will create CAPI infrastructure manifests based on user input from the install config; then, in order to provision cluster infrastructure, apply the manifests to CAPI controllers running on a local Kubernetes control-plane setup by [envtest](https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/envtest). ### Workflow Description **cluster creator** is a human user responsible for deploying a cluster. Note that the workflow does not change for this user. **openshift-install** is the Installer binary. 1. The cluster creator provides an install-config and credentials 2. (optional) The cluster creator runs `openshift-install create manifests` 3. (optional) The cluster creator edits the newly created CAPI manifests. 4. The cluster creator runs `openshift-install create cluster` 5. `openshift-install` extracts the kube-api server, etcd, CAPI infrastructure provider & the cloud CAPI provider to the install dir 6. `openshift-install` using `envtest` initializes a control plane locally on the Installer host 7. `openshift-install` execs the CAPI infrastructure and cloud provider as subprocesses, pointing them to the local control plane 8. `openshift-install` applies the CAPI manifests to the control plane 9. The CAPI controllers provision cluster infrastructure based on the manifests 10. `openshift-install` monitors the status of the local manifests as they are applied 11. If the statuses are as expected, infrastrucutre has been provisioned and installation continues with the normal flow. In the case of an error in the final step, the Installer will bubble up resources with non-expected statuses. #### Variation and form factor considerations [optional] How does this proposal intersect with Standalone OCP, Microshift and Hypershift? If the cluster creator uses a standing desk, in step 1 above they can stand instead of sitting down. See https://github.com/openshift/enhancements/blob/master/enhancements/workload-partitioning/management-workload-partitioning.md#high-level-end-to-end-workflow and https://github.com/openshift/enhancements/blob/master/enhancements/agent-installer/automated-workflow-for-agent-based-installer.md for more detailed examples. ### API Extensions API Extensions are CRDs, admission and conversion webhooks, aggregated API servers, and finalizers, i.e. those mechanisms that change the OCP API surface and behaviour. - Name the API extensions this enhancement adds or modifies. - Does this enhancement modify the behaviour of existing resources, especially those owned by other parties than the authoring team (including upstream resources), and, if yes, how? Please add those other parties as reviewers to the enhancement. Examples: - Adds a finalizer to namespaces. Namespace cannot be deleted without our controller running. - Restricts the label format for objects to X. - Defaults field Y on object kind Z. Fill in the operational impact of these API Extensions in the "Operational Aspects of API Extensions" section. ### Implementation Details/Notes/Constraints [optional] #### Overview In a typical CAPI installation, manifests indicating the desired cluster configuration are applied to a management cluster. In order to keep `openshift-install` free of any new external runtime dependencies, the dependencies will be [embedded][embed] into the `openshift-install` binary, extracted at runtime, and cleaned up afterward. This approach is similar to what we have been using for Terraform. With Terraform, the Installer has been embedding the Terraform and cloud-specific provider binaries within the Installer binary and extracting them at runtime. The Installer produces the Terraform configuration files and invokes Terraform using the `tf-exec` library. ![terraform diagram(2)](https://hackmd.io/_uploads/HkFrqGCS6.jpg) We can follow a similar pattern to run CAPI controllers locally on the Installer host. In addition to the CAPI controller binaries, `kube-apiserver` and `etcd` are embedded in order to run a local control plane, orchestrated with `envtest`. ![capi diagram(3)](https://hackmd.io/_uploads/r1YU9zRSa.jpg) #### Local control plane The local control plane is setup using the previously available work done in Controller Runtime through [envtest][envtest]. Envtest was born due to a necessity to run integration tests for controllers against a real API server, register webhooks (conversion, admission, validation), and managing the lifecycle of Custom Resource Definitions. Over time, `envtest` matured in a way that now can be used to run controllers in a local environment, reducing or eliminating the need for a full Kubernetes cluster to run controllers. At a high level, the local control plane is responsible for: - Setting up certificates for the apiserver and etcd. - Running (and cleaning up, on shutdown) the local control plane components. - Installing any required component, like Custom Resource Definitions (CRDs) - For Cluster API core the CRDs are stored in `data/data/cluster-api/core-components.yaml`. - Infrastructure providers are expected to store their components in `data/data/cluster-api/<name>-infrastructure-components.yaml` - Upon install, the local control plane takes care of modifying any webhook (conversion, admission, validation) to point to the `host:post` combination assigned. - Each controller manager will have its own `host:port` combination assigned. - Certificates are generated and injected in the server, and the client certs in the api-server webhook configuration. - For each process that the local control plane manages, a health check (ping to `/healthz`) is required to pass similarly how, when running in a Deployment, a health probe is configured. #### Manifests The Installer will produce the CAPI manifests as part of the `manifests` target, writing them to a new `cluster-api` directory alongside the existing `manifests` and `openshift` directories: ```shell= $ ./openshift-install create manifests --dir install-dir INFO Credentials loaded from the "default" profile in file "~/.aws/credentials" INFO Consuming Install Config from target directory INFO Manifests created in: install-dir/cluster-api, install-dir/manifests and install-dir/openshift $ tree install-dir/cluster-api/ install-dir/cluster-api/ ├── 00_capi-namespace.yaml ├── 01_aws-cluster-controller-identity-default.yaml ├── 01_capi-cluster.yaml ├── 02_infra-cluster.yaml ├── 10_inframachine_mycluster-6lxqp-master-0.yaml ├── 10_inframachine_mycluster-6lxqp-master-1.yaml ├── 10_inframachine_mycluster-6lxqp-master-2.yaml ├── 10_inframachine_mycluster-6lxqp-master-bootstrap.yaml ├── 10_machine_mycluster-6lxqp-master-0.yaml ├── 10_machine_mycluster-6lxqp-master-1.yaml ├── 10_machine_mycluster-6lxqp-master-2.yaml └── 10_machine_mycluster-6lxqp-master-bootstrap.yaml 1 directory, 12 files ``` The manifests within this `cluster-api` directory will not be written to the cluster or included in bootstrap ignition. In future work, we expect these manifests to be pivoted to the cluster to enable the target cluster to take over managing its own infrastructure. ### Risks and Mitigations While we do not expect these changes to introduce a significant security risk, we are working with product security teams to ensure they are aware of the changes and are able to review. ### Drawbacks By depending on CAPI providers whose codebases live in a repository external to the Installer, the process for developing features and delivering fixes is complex. While we had the same situation for Terraform, the CAPI providers will be more actively developed than their Terraform counterparts. Furthermore, it will be necessary to ensure that the CAPI providers used by the Installer match the version of those in the payload. While this external dependency is a significant drawback, it is not unique to this design and is common throughout OpenShift (e.g. any time the API or library-go must be updated before being vendored into a component). To minimize the devex friction, we will focus on documenting a workflow for developing providers while working with the Installer. If the problem becomes significant, we could consider automation to bump Installer providers when merges happen upstream or in our forks. ## Design Details ### Open Questions [optional] 1. UX design during install process as well as during failure (log collection). The Installer will dump (potentially prettified) controller logs. Once we reach a certain level of stability it may be worthwhile to implement a UI. ### Test Plan As this is replacing existing functionality in the Installer, we can rely on existing testing infrastructure. ### Graduation Criteria #### Dev Preview -> Tech Preview - Ability to utilize the enhancement end to end - End user documentation, relative API stability #### Tech Preview -> GA - More testing (upgrade, downgrade, scale) - Sufficient time for feedback - Available by default - User facing documentation created in [openshift-docs](https://github.com/openshift/openshift-docs/) #### Removing a deprecated feature - Announce deprecation and support policy of the existing feature - Deprecate the feature ### Upgrade / Downgrade Strategy As this enhancement only concerns the Installation process and affects only the underlying cluster infrastructure, this change should not affect existing cluster upgrades. ### Version Skew Strategy N/A ### Operational Aspects of API Extensions N/A #### Failure Modes During a failed install, the controller logs (displayed in stdout and collect in .openshift_install.log) will contain useful information. The status of the CAPI manifests may also contain useful information, in which case it would be important to display that to users and collect for bugs and support cases. There is an open question about the best way to handle this UX, and we expect the answer to become more clear during development. As the infrastructure will be reconciled by a controller, it will be possible to resolve issues during an ongoing installation, although this would not necessarily be a feature we would call attention to for documented use cases. Finally, the Installer will need to be able to identify when infrastructure provisioning has failed during an installation. Initially this will be achieved through a timeout. #### Support Procedures Describe how to - detect the failure modes in a support situation, describe possible symptoms (events, metrics, alerts, which log output in which component) Examples: - If the webhook is not running, kube-apiserver logs will show errors like "failed to call admission webhook xyz". - Operator X will degrade with message "Failed to launch webhook server" and reason "WehhookServerFailed". - The metric `webhook_admission_duration_seconds("openpolicyagent-admission", "mutating", "put", "false")` will show >1s latency and alert `WebhookAdmissionLatencyHigh` will fire. - disable the API extension (e.g. remove MutatingWebhookConfiguration `xyz`, remove APIService `foo`) - What consequences does it have on the cluster health? Examples: - Garbage collection in kube-controller-manager will stop working. - Quota will be wrongly computed. - Disabling/removing the CRD is not possible without removing the CR instances. Customer will lose data. Disabling the conversion webhook will break garbage collection. - What consequences does it have on existing, running workloads? Examples: - New namespaces won't get the finalizer "xyz" and hence might leak resource X when deleted. - SDN pod-to-pod routing will stop updating, potentially breaking pod-to-pod communication after some minutes. - What consequences does it have for newly created workloads? Examples: - New pods in namespace with Istio support will not get sidecars injected, breaking their networking. - Does functionality fail gracefully and will work resume when re-enabled without risking consistency? Examples: - The mutating admission webhook "xyz" has FailPolicy=Ignore and hence will not block the creation or updates on objects when it fails. When the webhook comes back online, there is a controller reconciling all objects, applying labels that were not applied during admission webhook downtime. - Namespaces deletion will not delete all objects in etcd, leading to zombie objects when another namespace with the same name is created. ## Implementation History Major milestones in the life cycle of a proposal should be tracked in `Implementation History`. ## Alternatives Using other infrastructure-as-code alternatives such as Pulumi, Ansible, or OpenTofu all have their own individual drawbacks. We prefer the CAPI solution over these alternatives because it: * streamlines Installer development (we do not need to re-implement features for the control plane) * lays the foundation for OpenShift to implement future CAPI features * requires less development effort, as CAPI providers are already setup to provision infrastructure for a cluster It would also be possible to implement the installation using direct SDK calls for the cloud provider. In addition to the reasons stated above, using individual SDK implementations would not present a common framework across various cloud platforms. ## Infrastructure Needed [optional] Use this section if you need things from the project. Examples include a new subproject, repos requested, github details, and/or testing infrastructure. Listing these here allows the community to get the process for these resources started right away. [embed]: https://pkg.go.dev/embed [envtest]: https://github.com/kubernetes-sigs/controller-runtime/tree/main/tools/setup-envtest

Import from clipboard

Paste your markdown or webpage here...

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.
Upgrade
All
  • All
  • Team
No template.

Create a template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

Slide Example

API Docs

Edit in VSCode

Install browser extension

Contacts

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Get Full History Access

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

Note content is identical to the latest version.
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully