# PodDisruptionBudget * PodDisruptionBudget 是一種 Kubernetes API 資源,定義一個Pod 集合(通常是 Deployment、StatefulSet 或 ReplicaSet)在進行「自願性中斷」時,至少要保持多少個 Pod 可用,避免服務完全中斷。 * 當 Kubernetes 需要對應用執行自願性中斷時,它會檢查該應用是否有定義 PDB。如果正在執行的 Pods 數量已經達到 PDB 的最低限制,則 Kubernetes 不會進行中斷,確保應用持續可用。 ## 解釋 在沒有設定 PDB 的情況下: * 叢集有 3 個節點,您的微服務的 pod「app-1」和「app-2」分別在「node-1」和「node-2」上運行: ![image](https://hackmd.io/_uploads/H1wOwo6O1x.png) * 現在,叢集管理員 drain node-1 以升級叢集: ![image](https://hackmd.io/_uploads/BkrcDopu1e.png) * 「app-1」 現已終止,新的 pod「app-3」正處於 Pending 狀態。此時,如果叢集管理員再 drain node-2 會導致所有 pod 服務都不可用。 ![image](https://hackmd.io/_uploads/rJohPjaOkx.png) 設定 PDB 的情況下: * 叢集管理員嘗試 drain node-2,他會阻止 pod 被刪除,因為目前只有 1 個可用的 pod。 ![image](https://hackmd.io/_uploads/SyxE_ip_1g.png) ## 實作 * 目前環境是 3m1w ``` kubectl get no NAME STATUS ROLES AGE VERSION cilium-m1 Ready control-plane,etcd,master,worker 18d v1.31.3+rke2r1 cilium-m2 Ready control-plane,etcd,master,worker 18d v1.31.3+rke2r1 cilium-m3 Ready control-plane,etcd,master,worker 18d v1.31.3+rke2r1 cilium-w1 Ready worker 18d v1.31.3+rke2r1 ``` * 部屬 nginx deployment ``` $ echo 'apiVersion: apps/v1 kind: Deployment metadata: labels: app: my-test name: my-test spec: replicas: 2 selector: matchLabels: app: my-test strategy: {} template: metadata: labels: app: my-test spec: containers: - image: nginx name: nginx lifecycle: postStart: exec: command: - /bin/sh - -c - | sleep 60' | kubectl apply -f - ``` ``` $ kubectl get pod -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES mt-test-686f7695ff-8ttr8 1/1 Running 0 25m 10.42.2.156 cilium-m3 <none> <none> mt-test-686f7695ff-gpnh6 1/1 Running 0 25m 10.42.3.169 cilium-w1 <none> <none> ``` * 建立 PDB 資源,針對帶有 `app: my-test` label 的 pod 限制。 ``` $ echo 'apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: my-test spec: minAvailable: 1 # 規定最少要有一個 pod 是 running 的 selector: matchLabels: app: my-test' | kubectl apply -f - $ kubectl get pdb NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE my-test 1 N/A 0 35m ``` * 可以看到在 drain 第二台節點時,因為有 PDB 限制,所以最後一個 pod 不會被驅離,而是等到第一個 pod 恢復時才被刪除。 ``` $ kubectl drain cilium-w1 --delete-emptydir-data --ignore-daemonsets $ kubectl drain cilium-m3 --delete-emptydir-data --ignore-daemonsets ...... error when evicting pods/"my-test-7dbcd79794-d7t8g" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. evicting pod default/my-test-7dbcd79794-d7t8g error when evicting pods/"my-test-7dbcd79794-d7t8g" -n "default" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. evicting pod default/my-test-7dbcd79794-d7t8g pod/my-test-7dbcd79794-d7t8g evicted node/cilium-m3 drained ``` ``` $ kubectl get pod -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES my-test-7dbcd79794-77sfw 1/1 Running 0 3m5s 10.42.1.2 cilium-m2 <none> <none> my-test-7dbcd79794-s6m7q 1/1 Running 0 4m16s 10.42.0.243 cilium-m1 <none> <none> ``` ## 參考 https://kubernetes.io/docs/concepts/workloads/pods/disruptions/#voluntary-and-involuntary-disruptions https://github.com/mercari/production-readiness-checklist/blob/master/docs/concepts/pod-disruption-budget.md