Vince Prignano
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    --- title: enabling-cluster-api-based-installations-via-openshift-install authors: - "@patrickdillon" - "@vincepri" - "@JoelSpeed" reviewers: - "@sdodson" - "@zaneb" approvers: - "@sdodson" - "@zaneb" api-approvers: - "None" creation-date: 2023-10-16 last-updated: 2023-10-16 tracking-link: # link to the tracking ticket (for example: Jira Feature or Epic ticket) that corresponds to this enhancement - TBD see-also: - "/enhancements/this-other-neat-thing.md" replaces: - "/enhancements/that-less-than-great-idea.md" superseded-by: - "/enhancements/our-past-effort.md" --- # Enabling Cluster-API-based Installations via openshift-install ## Summary This enhancement discusses how `openshift-install` can use cluster-api (CAPI) infrastructure providers to provision infrastructure for clusters, without requiring access to an external management cluster or a local container runtime. By running a Kubernetes control plane and CAPI-provider controllers as subprocesses on the installer host, `openshift-install` can use CAPI and its providers in a similar manner to how Terraform and its providers are currently being used. ## Motivation There are two primary motivations: 1. OpenShift Alignment with CAPI: CAPI offers numerous potential benefits; such as: day-2 infrastructure management, an API for users to edit cluster infrastructure, and upstream collaboration. Installer support for CAPI would be foudational for adopting these benefits. 2. Terraform BSL License Change: due to the restrictive license change of Terraform, `openshift-install` needs a framework to replace the primary tool it used to provision cluster infrastructure. In addition to the benefits listed above, CAPI provides solutions for the biggest gaps left by Terraform: a common API for specifying infrastructure across cloud providers and robust infrastructure error handling. ### User Stories - As an existing user/client of the installer, I want backwards compatibility so that I can continue to use the installer (e.g. `create cluster`) in the same manner and with existing automation. - As a security analyst, I want the installer image to be free of Terraform and related dependencies to decrease surface area for vulnerabilities. - As an advanced user or cluster administrator, I want to be able to edit the CAPI infrastructure manifests so that I can customize control-plane infrastructure. ### Goals - To provide a common user and developer experience when installing and developing across cloud platforms - To be backwards compatible and fully satisfy the requirements of install-config type APIs. - To keep the user experience for day-zero operations unchanged or improved. - To not require any new runtime dependencies. - To provide an extensible framework to plug-in new infrastructure cloud providers. ### Non-Goals / Future work - To retain full and strict backward compatibility with the infrastructure previously created with Terraform - To optimize build processes or binary size - To use an existing management cluster to install OpenShift - To pivot the CAPI manifests to the newly-installed cluster to enable day-2 infrastructure management within the cluster. ## Proposal The Installer will create CAPI infrastructure manifests based on user input from the install config; then, in order to provision cluster infrastructure, apply the manifests to CAPI controllers running on a local Kubernetes control-plane setup by [envtest](https://pkg.go.dev/sigs.k8s.io/controller-runtime/pkg/envtest). ### Workflow Description **cluster creator** is a human user responsible for deploying a cluster. Note that the workflow does not change for this user. **openshift-install** is the Installer binary. 1. The cluster creator provides an install-config and credentials 2. (optional) The cluster creator runs `openshift-install create manifests` 3. (optional) The cluster creator edits the newly created CAPI manifests. 4. The cluster creator runs `openshift-install create cluster` 5. `openshift-install` extracts the kube-api server, etcd, CAPI infrastructure provider & the cloud CAPI provider to the install dir 6. `openshift-install` using `envtest` initializes a control plane locally on the Installer host 7. `openshift-install` execs the CAPI infrastructure and cloud provider as subprocesses, pointing them to the local control plane 8. `openshift-install` applies the CAPI manifests to the control plane 9. The CAPI controllers provision cluster infrastructure based on the manifests 10. `openshift-install` monitors the status of the local manifests as they are applied 11. If the statuses are as expected, infrastrucutre has been provisioned and installation continues with the normal flow. In the case of an error in the final step, the Installer will bubble up resources with non-expected statuses. #### Variation and form factor considerations [optional] How does this proposal intersect with Standalone OCP, Microshift and Hypershift? If the cluster creator uses a standing desk, in step 1 above they can stand instead of sitting down. See https://github.com/openshift/enhancements/blob/master/enhancements/workload-partitioning/management-workload-partitioning.md#high-level-end-to-end-workflow and https://github.com/openshift/enhancements/blob/master/enhancements/agent-installer/automated-workflow-for-agent-based-installer.md for more detailed examples. ### API Extensions API Extensions are CRDs, admission and conversion webhooks, aggregated API servers, and finalizers, i.e. those mechanisms that change the OCP API surface and behaviour. - Name the API extensions this enhancement adds or modifies. - Does this enhancement modify the behaviour of existing resources, especially those owned by other parties than the authoring team (including upstream resources), and, if yes, how? Please add those other parties as reviewers to the enhancement. Examples: - Adds a finalizer to namespaces. Namespace cannot be deleted without our controller running. - Restricts the label format for objects to X. - Defaults field Y on object kind Z. Fill in the operational impact of these API Extensions in the "Operational Aspects of API Extensions" section. ### Implementation Details/Notes/Constraints [optional] #### Overview In a typical CAPI installation, manifests indicating the desired cluster configuration are applied to a management cluster. In order to keep `openshift-install` free of any new external runtime dependencies, the dependencies will be [embedded][embed] into the `openshift-install` binary, extracted at runtime, and cleaned up afterward. This approach is similar to what we have been using for Terraform. With Terraform, the Installer has been embedding the Terraform and cloud-specific provider binaries within the Installer binary and extracting them at runtime. The Installer produces the Terraform configuration files and invokes Terraform using the `tf-exec` library. ![terraform diagram(2)](https://hackmd.io/_uploads/HkFrqGCS6.jpg) We can follow a similar pattern to run CAPI controllers locally on the Installer host. In addition to the CAPI controller binaries, `kube-apiserver` and `etcd` are embedded in order to run a local control plane, orchestrated with `envtest`. ![capi diagram(3)](https://hackmd.io/_uploads/r1YU9zRSa.jpg) #### Local control plane The local control plane is setup using the previously available work done in Controller Runtime through [envtest][envtest]. Envtest was born due to a necessity to run integration tests for controllers against a real API server, register webhooks (conversion, admission, validation), and managing the lifecycle of Custom Resource Definitions. Over time, `envtest` matured in a way that now can be used to run controllers in a local environment, reducing or eliminating the need for a full Kubernetes cluster to run controllers. At a high level, the local control plane is responsible for: - Setting up certificates for the apiserver and etcd. - Running (and cleaning up, on shutdown) the local control plane components. - Installing any required component, like Custom Resource Definitions (CRDs) - For Cluster API core the CRDs are stored in `data/data/cluster-api/core-components.yaml`. - Infrastructure providers are expected to store their components in `data/data/cluster-api/<name>-infrastructure-components.yaml` - Upon install, the local control plane takes care of modifying any webhook (conversion, admission, validation) to point to the `host:post` combination assigned. - Each controller manager will have its own `host:port` combination assigned. - Certificates are generated and injected in the server, and the client certs in the api-server webhook configuration. - For each process that the local control plane manages, a health check (ping to `/healthz`) is required to pass similarly how, when running in a Deployment, a health probe is configured. #### Manifests The Installer will produce the CAPI manifests as part of the `manifests` target, writing them to a new `cluster-api` directory alongside the existing `manifests` and `openshift` directories: ```shell= $ ./openshift-install create manifests --dir install-dir INFO Credentials loaded from the "default" profile in file "~/.aws/credentials" INFO Consuming Install Config from target directory INFO Manifests created in: install-dir/cluster-api, install-dir/manifests and install-dir/openshift $ tree install-dir/cluster-api/ install-dir/cluster-api/ ├── 00_capi-namespace.yaml ├── 01_aws-cluster-controller-identity-default.yaml ├── 01_capi-cluster.yaml ├── 02_infra-cluster.yaml ├── 10_inframachine_mycluster-6lxqp-master-0.yaml ├── 10_inframachine_mycluster-6lxqp-master-1.yaml ├── 10_inframachine_mycluster-6lxqp-master-2.yaml ├── 10_inframachine_mycluster-6lxqp-master-bootstrap.yaml ├── 10_machine_mycluster-6lxqp-master-0.yaml ├── 10_machine_mycluster-6lxqp-master-1.yaml ├── 10_machine_mycluster-6lxqp-master-2.yaml └── 10_machine_mycluster-6lxqp-master-bootstrap.yaml 1 directory, 12 files ``` The manifests within this `cluster-api` directory will not be written to the cluster or included in bootstrap ignition. In future work, we expect these manifests to be pivoted to the cluster to enable the target cluster to take over managing its own infrastructure. ### Risks and Mitigations While we do not expect these changes to introduce a significant security risk, we are working with product security teams to ensure they are aware of the changes and are able to review. ### Drawbacks By depending on CAPI providers whose codebases live in a repository external to the Installer, the process for developing features and delivering fixes is complex. While we had the same situation for Terraform, the CAPI providers will be more actively developed than their Terraform counterparts. Furthermore, it will be necessary to ensure that the CAPI providers used by the Installer match the version of those in the payload. While this external dependency is a significant drawback, it is not unique to this design and is common throughout OpenShift (e.g. any time the API or library-go must be updated before being vendored into a component). To minimize the devex friction, we will focus on documenting a workflow for developing providers while working with the Installer. If the problem becomes significant, we could consider automation to bump Installer providers when merges happen upstream or in our forks. ## Design Details ### Open Questions [optional] 1. UX design during install process as well as during failure (log collection). The Installer will dump (potentially prettified) controller logs. Once we reach a certain level of stability it may be worthwhile to implement a UI. ### Test Plan As this is replacing existing functionality in the Installer, we can rely on existing testing infrastructure. ### Graduation Criteria #### Dev Preview -> Tech Preview - Ability to utilize the enhancement end to end - End user documentation, relative API stability #### Tech Preview -> GA - More testing (upgrade, downgrade, scale) - Sufficient time for feedback - Available by default - User facing documentation created in [openshift-docs](https://github.com/openshift/openshift-docs/) #### Removing a deprecated feature - Announce deprecation and support policy of the existing feature - Deprecate the feature ### Upgrade / Downgrade Strategy As this enhancement only concerns the Installation process and affects only the underlying cluster infrastructure, this change should not affect existing cluster upgrades. ### Version Skew Strategy N/A ### Operational Aspects of API Extensions N/A #### Failure Modes During a failed install, the controller logs (displayed in stdout and collect in .openshift_install.log) will contain useful information. The status of the CAPI manifests may also contain useful information, in which case it would be important to display that to users and collect for bugs and support cases. There is an open question about the best way to handle this UX, and we expect the answer to become more clear during development. As the infrastructure will be reconciled by a controller, it will be possible to resolve issues during an ongoing installation, although this would not necessarily be a feature we would call attention to for documented use cases. Finally, the Installer will need to be able to identify when infrastructure provisioning has failed during an installation. Initially this will be achieved through a timeout. #### Support Procedures Describe how to - detect the failure modes in a support situation, describe possible symptoms (events, metrics, alerts, which log output in which component) Examples: - If the webhook is not running, kube-apiserver logs will show errors like "failed to call admission webhook xyz". - Operator X will degrade with message "Failed to launch webhook server" and reason "WehhookServerFailed". - The metric `webhook_admission_duration_seconds("openpolicyagent-admission", "mutating", "put", "false")` will show >1s latency and alert `WebhookAdmissionLatencyHigh` will fire. - disable the API extension (e.g. remove MutatingWebhookConfiguration `xyz`, remove APIService `foo`) - What consequences does it have on the cluster health? Examples: - Garbage collection in kube-controller-manager will stop working. - Quota will be wrongly computed. - Disabling/removing the CRD is not possible without removing the CR instances. Customer will lose data. Disabling the conversion webhook will break garbage collection. - What consequences does it have on existing, running workloads? Examples: - New namespaces won't get the finalizer "xyz" and hence might leak resource X when deleted. - SDN pod-to-pod routing will stop updating, potentially breaking pod-to-pod communication after some minutes. - What consequences does it have for newly created workloads? Examples: - New pods in namespace with Istio support will not get sidecars injected, breaking their networking. - Does functionality fail gracefully and will work resume when re-enabled without risking consistency? Examples: - The mutating admission webhook "xyz" has FailPolicy=Ignore and hence will not block the creation or updates on objects when it fails. When the webhook comes back online, there is a controller reconciling all objects, applying labels that were not applied during admission webhook downtime. - Namespaces deletion will not delete all objects in etcd, leading to zombie objects when another namespace with the same name is created. ## Implementation History Major milestones in the life cycle of a proposal should be tracked in `Implementation History`. ## Alternatives Using other infrastructure-as-code alternatives such as Pulumi, Ansible, or OpenTofu all have their own individual drawbacks. We prefer the CAPI solution over these alternatives because it: * streamlines Installer development (we do not need to re-implement features for the control plane) * lays the foundation for OpenShift to implement future CAPI features * requires less development effort, as CAPI providers are already setup to provision infrastructure for a cluster It would also be possible to implement the installation using direct SDK calls for the cloud provider. In addition to the reasons stated above, using individual SDK implementations would not present a common framework across various cloud platforms. ## Infrastructure Needed [optional] Use this section if you need things from the project. Examples include a new subproject, repos requested, github details, and/or testing infrastructure. Listing these here allows the community to get the process for these resources started right away. [embed]: https://pkg.go.dev/embed [envtest]: https://github.com/kubernetes-sigs/controller-runtime/tree/main/tools/setup-envtest

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully