# RKE2 etcd 優化設定 ## 儲存空間配額 * etcd 預設有儲存空間配額 `quota-backend-bytes`,預設為 2G,儲存空間配額可以理解為 ETCD 資料庫大小,預設限制 2G(建議最大 8G)。當資料寫入耗盡儲存空間時,ETCD 會引發整個叢集範圍的警告,該警告將會導致叢集切換為維護模式,維護模式 僅接受鍵值讀取和刪除,不支援寫入。 ## 手動釋放配額空間 * 如果在 ETCD 中看到類似以下的 log,那麼表示 ETCD 配額空間可能已經滿了,需要手動去清理並釋放空間。 ``` Error: rpc error: code = 8 desc = etcdserver: mvcc: database space exceeded ``` 1. 執行以下命令查看配額空間具體使用資訊 ``` $ eval $(kubectl get nodes -owide|grep -E "etcd|control-plane" |awk '{printf "https://"$6":2379,"}'|awk '{gsub(",$","");print "export ETCDCTL_ENDPOINTS=\""$1"\""}') && export ETCDCTL_CACERT=/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt && export ETCDCTL_CERT=/var/lib/rancher/rke2/server/tls/etcd/server-client.crt && export ETCDCTL_KEY=/var/lib/rancher/rke2/server/tls/etcd/server-client.key $ etcdctl --write-out=table endpoint status +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://192.168.11.113:2379 | 1110de9cd9afd504 | 3.5.13 | 28 MB | false | false | 10 | 5955192 | 5955188 | | | https://192.168.11.118:2379 | baf08dcf5b53670d | 3.5.13 | 28 MB | false | false | 10 | 5955193 | 5955193 | | | https://192.168.11.108:2379 | 305950ab0a98bf30 | 3.5.13 | 28 MB | true | false | 10 | 5955193 | 5955193 | | +-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ ``` 2. 手動釋放配額空間 ``` # 取得目前的修訂版本 $ rev=$(ETCDCTL_ENDPOINTS='https://127.0.0.1:2379' etcdctl endpoint status --write-out="json" | egrep -o '"revision":[0-9]*' | egrep -o '[0-9].*') # 壓縮所有舊的修訂(每個 ETCD 執行個體都要執行) $ ETCDCTL_API=3 etcdctl compact $rev # 碎片整理,釋放空間 $ ETCDCTL_API=3 etcdctl defrag # 解除警報(每個 ETCD 執行個體都要執行) $ ETCDCTL_API=3 etcdctl alarm disarm ``` ## 歷史版本清理 * ETCD 會儲存多版本數據,隨著寫入的主鍵增加,歷史版本將會越來越多,且 ETCD 預設不會自動清理歷史資料。資料達到 `--quota-backend-bytes` 設定的配額值時就無法寫入數據,必須要壓縮並清理歷史資料才能繼續寫入。 * 如果設定 `--auto-compaction-retention=72h`,那麼就會每 72 小時進行一次資料壓縮。 * `--auto-compaction-mode` 設定壓縮模式,可以選擇 revision 或 periodic 來壓縮數據,預設為 periodic。 ``` --auto-compaction-mode # 配置壓縮模式,可以選擇 revision 或 periodic 來壓縮數據 --auto-compaction-retention # 多久進行壓縮 ``` ## rke2 etcd 設定 * 透過 rancher custom create 的 rke2,在叢集管理點選 Edit YAML,在 `spec.rkeConfig.machineGlobalConfig` 新增 etcd-arg 參數,設定 6g 額度空間,以及定時清理歷史版本。 ``` etcd-arg: - quota-backend-bytes=6442450944 - auto-compaction-mode=periodic - auto-compaction-retention=60m ``` ![image](https://hackmd.io/_uploads/ryDkQf08Jl.png) * 驗證 ![image](https://hackmd.io/_uploads/BJhjWM0Lkx.png) ``` $ cat /etc/rancher/rke2/config.yaml.d/50-rancher.yaml { "agent-token": "d7qb9frh5xm2mv7c472gjsl94gk2gvtw8w9cjj2nt75mmqfnvfw78m", "cni": "calico", "disable-kube-proxy": false, "etcd-arg": [ "data-dir=/etcd/data", "wal-dir=/etcd/wal", "quota-backend-bytes=6442450944", "auto-compaction-mode=periodic", "auto-compaction-retention=60m" ], ...... ``` ``` $ cat /var/lib/rancher/rke2/server/db/etcd/config advertise-client-urls: https://192.168.11.70:2379 auto-compaction-mode: periodic auto-compaction-retention: 1h0m0s client-transport-security: cert-file: /var/lib/rancher/rke2/server/tls/etcd/server-client.crt client-cert-auth: true key-file: /var/lib/rancher/rke2/server/tls/etcd/server-client.key trusted-ca-file: /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt data-dir: /etcd/data election-timeout: 5000 experimental-initial-corrupt-check: true experimental-watch-progress-notify-interval: 5000000000 heartbeat-interval: 500 initial-advertise-peer-urls: https://192.168.11.70:2380 initial-cluster: rke2-72c87363=https://192.168.11.70:2380 initial-cluster-state: new listen-client-http-urls: https://127.0.0.1:2382 listen-client-urls: https://127.0.0.1:2379,https://192.168.11.70:2379 listen-metrics-urls: http://127.0.0.1:2381 listen-peer-urls: https://127.0.0.1:2380,https://192.168.11.70:2380 log-outputs: - stderr logger: zap name: rke2-72c87363 peer-transport-security: cert-file: /var/lib/rancher/rke2/server/tls/etcd/peer-server-client.crt client-cert-auth: true key-file: /var/lib/rancher/rke2/server/tls/etcd/peer-server-client.key trusted-ca-file: /var/lib/rancher/rke2/server/tls/etcd/peer-ca.crt quota-backend-bytes: 6442450944 snapshot-count: 10000 wal-dir: /etcd/wal ``` ## 參考 https://ranchermanager.docs.rancher.com/v2.5/getting-started/installation-and-upgrade/advanced-options/advanced-use-cases/tune-etcd-for-large-installs https://www.xtplayer.cn/etcd/etcd-optimize/ https://etcd.io/docs/v3.4/op-guide/maintenance/