# RKE2 etcd 優化設定
## 儲存空間配額
* etcd 預設有儲存空間配額 `quota-backend-bytes`,預設為 2G,儲存空間配額可以理解為 ETCD 資料庫大小,預設限制 2G(建議最大 8G)。當資料寫入耗盡儲存空間時,ETCD 會引發整個叢集範圍的警告,該警告將會導致叢集切換為維護模式,維護模式 僅接受鍵值讀取和刪除,不支援寫入。
## 手動釋放配額空間
* 如果在 ETCD 中看到類似以下的 log,那麼表示 ETCD 配額空間可能已經滿了,需要手動去清理並釋放空間。
```
Error: rpc error: code = 8 desc = etcdserver: mvcc: database space exceeded
```
1. 執行以下命令查看配額空間具體使用資訊
```
$ eval $(kubectl get nodes -owide|grep -E "etcd|control-plane" |awk '{printf "https://"$6":2379,"}'|awk '{gsub(",$","");print "export ETCDCTL_ENDPOINTS=\""$1"\""}') && export ETCDCTL_CACERT=/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt && export ETCDCTL_CERT=/var/lib/rancher/rke2/server/tls/etcd/server-client.crt && export ETCDCTL_KEY=/var/lib/rancher/rke2/server/tls/etcd/server-client.key
$ etcdctl --write-out=table endpoint status
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://192.168.11.113:2379 | 1110de9cd9afd504 | 3.5.13 | 28 MB | false | false | 10 | 5955192 | 5955188 | |
| https://192.168.11.118:2379 | baf08dcf5b53670d | 3.5.13 | 28 MB | false | false | 10 | 5955193 | 5955193 | |
| https://192.168.11.108:2379 | 305950ab0a98bf30 | 3.5.13 | 28 MB | true | false | 10 | 5955193 | 5955193 | |
+-----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
```
2. 手動釋放配額空間
```
# 取得目前的修訂版本
$ rev=$(ETCDCTL_ENDPOINTS='https://127.0.0.1:2379' etcdctl endpoint status --write-out="json" | egrep -o '"revision":[0-9]*' | egrep -o '[0-9].*')
# 壓縮所有舊的修訂(每個 ETCD 執行個體都要執行)
$ ETCDCTL_API=3 etcdctl compact $rev
# 碎片整理,釋放空間
$ ETCDCTL_API=3 etcdctl defrag
# 解除警報(每個 ETCD 執行個體都要執行)
$ ETCDCTL_API=3 etcdctl alarm disarm
```
## 歷史版本清理
* ETCD 會儲存多版本數據,隨著寫入的主鍵增加,歷史版本將會越來越多,且 ETCD 預設不會自動清理歷史資料。資料達到 `--quota-backend-bytes` 設定的配額值時就無法寫入數據,必須要壓縮並清理歷史資料才能繼續寫入。
* 如果設定 `--auto-compaction-retention=72h`,那麼就會每 72 小時進行一次資料壓縮。
* `--auto-compaction-mode` 設定壓縮模式,可以選擇 revision 或 periodic 來壓縮數據,預設為 periodic。
```
--auto-compaction-mode # 配置壓縮模式,可以選擇 revision 或 periodic 來壓縮數據
--auto-compaction-retention # 多久進行壓縮
```
## rke2 etcd 設定
* 透過 rancher custom create 的 rke2,在叢集管理點選 Edit YAML,在 `spec.rkeConfig.machineGlobalConfig` 新增 etcd-arg 參數,設定 6g 額度空間,以及定時清理歷史版本。
```
etcd-arg:
- quota-backend-bytes=6442450944
- auto-compaction-mode=periodic
- auto-compaction-retention=60m
```

* 驗證

```
$ cat /etc/rancher/rke2/config.yaml.d/50-rancher.yaml
{
"agent-token": "d7qb9frh5xm2mv7c472gjsl94gk2gvtw8w9cjj2nt75mmqfnvfw78m",
"cni": "calico",
"disable-kube-proxy": false,
"etcd-arg": [
"data-dir=/etcd/data",
"wal-dir=/etcd/wal",
"quota-backend-bytes=6442450944",
"auto-compaction-mode=periodic",
"auto-compaction-retention=60m"
],
......
```
```
$ cat /var/lib/rancher/rke2/server/db/etcd/config
advertise-client-urls: https://192.168.11.70:2379
auto-compaction-mode: periodic
auto-compaction-retention: 1h0m0s
client-transport-security:
cert-file: /var/lib/rancher/rke2/server/tls/etcd/server-client.crt
client-cert-auth: true
key-file: /var/lib/rancher/rke2/server/tls/etcd/server-client.key
trusted-ca-file: /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt
data-dir: /etcd/data
election-timeout: 5000
experimental-initial-corrupt-check: true
experimental-watch-progress-notify-interval: 5000000000
heartbeat-interval: 500
initial-advertise-peer-urls: https://192.168.11.70:2380
initial-cluster: rke2-72c87363=https://192.168.11.70:2380
initial-cluster-state: new
listen-client-http-urls: https://127.0.0.1:2382
listen-client-urls: https://127.0.0.1:2379,https://192.168.11.70:2379
listen-metrics-urls: http://127.0.0.1:2381
listen-peer-urls: https://127.0.0.1:2380,https://192.168.11.70:2380
log-outputs:
- stderr
logger: zap
name: rke2-72c87363
peer-transport-security:
cert-file: /var/lib/rancher/rke2/server/tls/etcd/peer-server-client.crt
client-cert-auth: true
key-file: /var/lib/rancher/rke2/server/tls/etcd/peer-server-client.key
trusted-ca-file: /var/lib/rancher/rke2/server/tls/etcd/peer-ca.crt
quota-backend-bytes: 6442450944
snapshot-count: 10000
wal-dir: /etcd/wal
```
## 參考
https://ranchermanager.docs.rancher.com/v2.5/getting-started/installation-and-upgrade/advanced-options/advanced-use-cases/tune-etcd-for-large-installs
https://www.xtplayer.cn/etcd/etcd-optimize/
https://etcd.io/docs/v3.4/op-guide/maintenance/