# 0001. Management & Child Cluster Communication Pattern * Project/RFC: Rancher & Cluster API Strategy * Product/Customer Impact: Yes * Architecturally Relevant: Yes * Status: tbd [proposed/rejected/accepted/deprecated/superceded] * Date: 2022-11-18 * Authors: Richard Case * Deciders: Matt, Richard, Sergey, Will, Chris ## Context Rancher Manager currently uses an "agent initiated" communication between the management and child cluster. In this model the **Agent** initiates a connection (1) to the management cluster and this tunnel is then used by the **App** to perform operations (2) against the apiserver in the **Child Cluster**. ``` +------------------------+ +------------------------+ | Management Cluster | | Child Cluster | | | | | | | | | | +-----------+ | | +-------------+ | | | |<----+--1 (tunnel)--+-----| | | | | App | | | | Agent | | | | |<----+------2-------+---->| | | | +-----------+ | | +-------------+ | | | | | | | | | +------------------------+ +------------------------+ ``` (a simplified view, for full details see the [docs](https://docs.ranchermanager.rancher.io/reference-guides/rancher-manager-architecture/communicating-with-downstream-user-clusters)) In contrast, Cluster API (represented by **App** below) currently assumes a communication model where it can initiate a direct connection (2) to the apiserver in a child cluster (that it manages) to perform operations (3). ``` +--------------------------+ | Management Cluster | | | | | +-------------------+ | +-----------+ | | | | | App +-----+------2------->| | | | | | | Child Cluster | | +-----------+<----+------3------->| | | | | | | | 1 | +-------------------+ | | | | v | | +--------------+ | | | Kubeconfig | | | | | | | +--------------+ | +--------------------------+ ``` This is a fundamental difference between the two. If we want to use CAPI for all provsioning in the future then a decision between the following needs to be made: 1. Rancher adopts the CAPI direct connection model 2. Rancher keeps the current agent iniated model & we find a way to support it in CAPI ## Decision To be discussed and decided in a meeting between the **Deciders**. ## Consequences > After decision, delete the section that doesn't apply *Option 1* This is a fundamental change to the architecture of Rancher Manager. This change would directly impact users of Rancher as it will probably require changes to their firewall and general network configuration. This communication model may even violate security polices which could have an impact on usage of Rancher (especially for enterprise/regulated customers). However on the plus side, an agent would not be needed in child clusters as there is a direct connection Rancher can perform operations directly (like creating RBAC, querying workloads). This means we would no longer have to maintain and deploy the agent. **TO FINISH** *Option 2* Cluster API will need to be changed so that it can work without direct connectivity to the apiserver of the child clusters. This is a fundametal change with potentially a large impact. Its not only CAPI itself but also CAPI providers that directly connect to child clusters to perform operations, for example, the AWS provider (CAPA) connects to child clusters to configure aws-iam-authenticator. In CAPI communication to a child cluster is usually done via the `sigs.k8s.io/cluster-api/controllers/remote` package so its possible that changes could be localised here (but this would come out in any proposal) From recent discussions there is interest in supporting this scenario from the CAPI community and there is an existing [issue](https://github.com/kubernetes-sigs/cluster-api/issues/6520). These ongoind discussions will likely result in formation of a "feature/working group" to investigate and produce a proposal. This will require us to be invloved in the proposal and subsequent implementation. We would need to consider & investigate the scalability of moving to this model. There have been concerns raised about maintaining full TLS tunnels from all child clusters to a single management cluster. One suggestion is that the CAPI proposal also explores alternatives to a full tunnel Messaging is an alternative.