# kubeadm k8s 備份與復原 ## 建立測試用 pod ``` $ kubectl run test --image=nginx $ kubectl get po NAME READY STATUS RESTARTS AGE test 1/1 Running 0 27m ``` ## 備份 k8s * 安裝 etcdutl、etcd、etcdctl 指令 ``` $ ETCD_RELEASE=$(curl -s https://api.github.com/repos/etcd-io/etcd/releases/latest|grep tag_name | cut -d '"' -f 4) $ wget https://github.com/etcd-io/etcd/releases/download/${ETCD_RELEASE}/etcd-${ETCD_RELEASE}-linux-amd64.tar.gz $ tar zxvf etcd-${ETCD_RELEASE}-linux-amd64.tar.gz $ sudo cp -rp etcd-${ETCD_RELEASE}-linux-amd64/etc* /usr/local/bin $ rm -f etcd-${ETCD_VER}-linux-amd64.tar.gz $ etcd --version etcd Version: 3.6.1 Git SHA: a4708be Go Version: go1.23.10 Go OS/Arch: linux/amd64 $ etcdctl version etcdctl version: 3.6.1 API version: 3.6 $ etcdutl version etcdutl version: 3.6.1 API version: 3.6 ``` * 設定與測試 etcdctl,填入自己的 etcd ip ``` $ alias etcdctl="ETCDCTL_API=3 sudo /usr/local/bin/etcdctl \ --endpoints=127.0.0.1:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/apiserver-etcd-client.crt \ --key=/etc/kubernetes/pki/apiserver-etcd-client.key" $ etcdctl member list 768bf4f42edfb9b3, started, m1, https://172.20.7.80:2380, https://172.20.7.80:2379, false e044dfaaca4a3cf4, started, m2, https://172.20.7.81:2380, https://172.20.7.81:2379, false ff78f71ad246a1cd, started, m3, https://172.20.7.82:2380, https://172.20.7.82:2379, false $ etcdctl endpoint status -w table +----------------+------------------+---------+-----------------+---------+--------+-----------------------+-------+-----------+------------+-----------+------------+--------------------+--------+--------------------------+-------------------+ | ENDPOINT | ID | VERSION | STORAGE VERSION | DB SIZE | IN USE | PERCENTAGE NOT IN USE | QUOTA | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | DOWNGRADE TARGET VERSION | DOWNGRADE ENABLED | +----------------+------------------+---------+-----------------+---------+--------+-----------------------+-------+-----------+------------+-----------+------------+--------------------+--------+--------------------------+-------------------+ | 127.0.0.1:2379 | 5718db9e2058ee89 | 3.6.4 | 3.6.0 | 8.1 MB | 3.5 MB | 58% | 0 B | false | false | 17 | 8896038 | 8896038 | | | false | +----------------+------------------+---------+-----------------+---------+--------+-----------------------+-------+-----------+------------+-----------+------------+--------------------+--------+--------------------------+-------------------+ ``` ### 對 etcd snapshot * 使用 `etcdctl` snapshot ``` $ mkdir ~/etcd $ etcdctl snapshot save "$HOME"/etcd/etcd-snapshot.db ``` * 檢查備份 ``` $ ls -l "$HOME"/etcd total 25556 -rw------- 1 root root 26165280 Jun 26 15:24 etcd-snapshot.db ``` * 使用 `etcdutl` 檢查 snapshot 狀態 ``` $ sudo etcdutl --write-out=table snapshot status etcd/etcd-snapshot.db +----------+----------+------------+------------+---------+ | HASH | REVISION | TOTAL KEYS | TOTAL SIZE | VERSION | +----------+----------+------------+------------+---------+ | c7ee3a10 | 2761931 | 687 | 26 MB | | +----------+----------+------------+------------+---------+ ``` ## etcd 復原 * 將 pod 刪除 ``` $ kubectl delete pod test pod "test" deleted ``` * 建立復原目錄 ``` $ sudo mkdir /var/lib/etcd-restore ``` * 使用 `etcdutl` 將剛剛的 snapshot restore 到 `/var/lib/etcd-restore` 目錄 ``` $ sudo etcdutl --data-dir="/var/lib/etcd-restore" snapshot restore "$HOME"/etcd/etcd-snapshot.db ``` ``` $ sudo ls -l /var/lib/etcd-restore total 4 drwx------ 4 root root 4096 Jun 26 15:47 member ``` * 設定 etcd 掛載到剛剛 restore 的目錄 ``` $ sudo nano /etc/kubernetes/manifests/etcd.yaml ...... - hostPath: path: /var/lib/etcd-restore # 修改此行 type: DirectoryOrCreate name: etcd-data ``` * 重啟 kubelet ``` $ sudo systemctl daemon-reload $ sudo systemctl restart kubelet ``` * 更新後 etcd 會重新產生,使用 `crictl` 檢查狀態 ``` $ sudo crictl ps -a | grep etcd 2732de9b81398 2e96e5913fc06 31 seconds ago Running etcd 0 f7d4519040e0d etcd-m1 ``` * 等待時間恢復 k8s 後,確認剛剛刪除的 pod 也復原 ``` $ kubectl get no NAME STATUS ROLES AGE VERSION m1 Ready control-plane 14d v1.30.13 w1 Ready worker 14d v1.30.13 w2 Ready worker 14d v1.30.13 $ kubectl get po NAME READY STATUS RESTARTS AGE test 1/1 Running 0 68m ``` ### 如果使用以上方式恢復還需要再 restore 一次到 `/var/lib/etcd` 目錄下,不然升級時間查會有問題 #### 以下方式為直接 restore 到 `/var/lib/etcd` 目錄 * 先關閉 kubelet ``` $ sudo systemctl stop kubelet ``` * 刪除 apiserver、etcd、controller-manager、scheduler、etcd container ``` $ sudo crictl ps -a --name 'kube-apiserver|kube-controller-manager|kube-scheduler|etcd' -q | xargs -r -n1 sudo crictl rm -f ``` * 將 etcd 儲存資料移出 ``` $ sudo mv /var/lib/etcd "$HOME"/etcd/ ``` * 使用 `etcdutl` 將剛剛的 snapshot restore 到 `/var/lib/etcd` 目錄 ``` $ sudo etcdutl --data-dir="/var/lib/etcd" snapshot restore "$HOME"/etcd/etcd-snapshot.db ``` * 重啟 kubelet ``` $ sudo systemctl daemon-reload $ sudo systemctl restart kubelet ``` * 更新後 etcd 會重新產生,使用 `crictl` 檢查狀態 ``` $ sudo crictl ps -a|grep etcd 0b5caf8d752ed 5f1f5298c888daa46c4409ff4cefe5ca9d16e479419f94cdb5f5d5563dac0115 About a minute ago Running etcd 0 6abd9326a3f19 etcd-m1 kube-syste ``` * 等待時間恢復 k8s 後,確認剛剛刪除的 pod 也復原 ``` $ kubectl get no NAME STATUS ROLES AGE VERSION m1 Ready control-plane 14d v1.30.13 w1 Ready worker 14d v1.30.13 w2 Ready worker 14d v1.30.13 $ kubectl get po NAME READY STATUS RESTARTS AGE test 1/1 Running 0 68m ```