Introduction:
If for some reason you need to completely replace all of your control plane nodes in a HA setup using kubeadm there is a specific order of operations to ensure the health of your production cluster. The goal is to make this changes with zero downtime. The order of operations is important due to how etcd works. Read this link to understand why we are doing this in this specific order.
Infastructure Assumptions:
Host OS: Ubuntu 20.04
Kubernetes Version: 1.26.0
Internal Etcd Cluster
3 control plane nodes (old)
3 control plane nodes (new)
3 worker nodes
Dev, control plane, and worker hosts are on the same network and can communicate with each other.
A single host were you can issue kubectl commands against the cluster in question. We will call this the "dev" host.
Setup:
We need to backup the etcd data store. If we crash the cluster with this proceedure then we have a way to recover. Please review the tutorial here on how to backup your etcd cluster.
Ideally you would create 3 brand new nodes configured and ready to accept the kubeadm join command and have them standing by.
{{< note >}}
If you use dhcp to assign ips make sure they have a static mapping. The ip of any host in the cluster can never change after bootstrapping.
{{< /note >}}
Next we need to install the etcdctl tool on the dev host. This tool allows you to interact directly with the etcd servers on the contol plane nodes. Normally we would put this tool on one of the control plane nodes however we will removing them all so the tool needs to be on a host that will persist during the removal process. There is three things we need to do.
sudo apt install etcd-client -y
This is needed becuase we will be swaping in and out hosts and the endpoints need to change accordingly.
Gather Facts:
Lets query etcd and gather our first required peice of information, who the leader is.
etcdctl endpoint status --cacert=ca.crt --cert=peer.crt --key=peer.key --endpoints=${ENDPOINTS} --write-out=table
Output:
We see that host_3 is the current leader. This is important becuase we will replace this host last. Not critical but I think important to insuring the health of the cluster. So the order of operation is this.
Replacement Process:
These steps will be the same for each old node we are replacing. The only exception will be the last node, the leader, were we will force a leader election to a new node before we remove the remaining old node.
Before we get started we need the kubeadm join string. On one of the control plane nodes use this command:
sudo kubeadm token create --print-join-command --certificate-key $(kubeadm certs certificate-key)
Keep this string available as you will need to use it on each new node. You should see something similar:
kubeadm join <IP:PORT of Load Balanced api enpoint> --token <redact> --discovery-token-ca-cert-hash sha256:<redact> --control-plane --certificate-key <redact>
{{< note >}}
Depending on the way your cluster was initally setup your values should be different but each element in the join command has to be present.
{{< /note >}}
First we need to remove the old node from the etcd cluster. Get the ID of the old node we are removing from the information we gathered previously. From the dev host use the command:
etcdctl member remove <ID of etcd member being removed> --cacert=ca.crt --cert=peer.crt --key=peer.key --endpoints=<IP of remaining node>:2379,<IP of remaining node>:2379
{{< note >}}
Do not include the node ip:port that is being removed in the endpoint with this command. We can't use the ENDPOINTS env variable here.
{{< /note >}}
Now cordon, drain and remove the node from the cluster. To do this we use the following commands on the dev host:
kubectl cordon <name of node being removed>
kubectl drain <name of node being removed> --ignore-daemonsets
(wait till all dained)
kubectl delete <name of node being removed>
Shut down the host. On the host that is being removed shutdown -h now
This will ensure this somehow doesn't creep back into the node pool.
Login to the new host, and paste the kubeadm join command and wait for it to complete. To confirm that the new node is apart the cluster and healthy, on the dev host, issue kubectl get nodes
. You should see the list of nodes with the ready state. The names will be unique to your cluster. If your new node is not ready wait till it is before proceeding.
etcdctl member list --cacert=ca.crt --cert=peer.crt --key=peer.key --endpoints=${ENDPOINTS} --write-out=table
{{< note >}}
Look carefully at the ip address in the output. These will change as you replace nodes.
{{< /note >}}
{{< caution >}}
If your etcd cluster is not healty or one of the nodes is not started/ready you have to stop. If you remove an additional node, you will crash your kubernetes cluster hard. The recovery is a hard process with no guarantee of success. You have to take steps to remedy the new node before you proceed. You MUST have 3 healthy etcd nodes.
{{< /caution >}}
Once healthy we can proceed to the next node. For each node to be replaced you would follow the above proceedure. Pay particular attention the ips and hostnames so they match the nodes you are working on. When you get to the final node, (the original leader) You will need to force a leader election.
Before step 1, from the dev host issue this command:
Lets check to make sure the election happened and was successful.
We are now safe to replace the final control plane node. Proceed to step 1.
If you are careful and methodical in this process you should have a fully functional kubernetes cluster with completely new control plane nodes with zero down time.