# RKE2 調整節點 pod 數量上限 ## 背景 如果應用環境是使用實體機安裝 rke2,通常實體機單台資源很大,因此會期望在單個節點上創建更多 pod,而預設 k8s 在單個節點上只允許創建 110 個 pod。 ## 實作 > 透過 rancher 部署 rke2 1m2w 環境 > rke2:v1.28.10+rke2r1 > CNI: calico 1. Cluster Management -> 編輯叢集 -> Advanced 設定 kubelet 參數將 pod 數量調整到 500 ![image](https://hackmd.io/_uploads/BJ5-7SCyye.png) 2. 確認 kubelet limit 已調整到 500 ![image](https://hackmd.io/_uploads/rJEMVHA11g.png) 3. 檢查 pod CIDR 可用數量,但是 cni 如果使用 calico ,他不會參考這個,因為 calico 是使用自己定義的 ippool 來規範 pod ip 範圍 ``` $ kubectl get node -A -o custom-columns=NODE_NAME:.metadata.name,POD_CIDR:.spec.podCIDR NODE_NAME POD_CIDR demo-1 10.42.0.0/24 demo-2 10.42.1.0/24 demo-3 10.42.2.0/24 ``` * IPPool 的 blockSize 是用來控制分配給每個節點(Node)的 IP 地址範圍大小 ``` $ kubectl describe ippools.crd.projectcalico.org default-ipv4-ippool Name: default-ipv4-ippool Namespace: Labels: <none> Annotations: projectcalico.org/metadata: {"uid":"95928154-b497-41ae-bfb3-1e7e027783d0","creationTimestamp":"2024-09-26T06:06:33Z"} API Version: crd.projectcalico.org/v1 Kind: IPPool Metadata: Creation Timestamp: 2024-09-26T06:06:33Z Generation: 1 Resource Version: 1242 UID: ee994569-31cd-4438-bd61-d8945387fd40 Spec: Allowed Uses: Workload Tunnel Block Size: 26 Cidr: 10.42.0.0/16 Ipip Mode: Never Nat Outgoing: true Node Selector: all() Vxlan Mode: Always Events: <none> ``` * calico 使用 blockaffinities 定義 pod ip 範圍 ``` $ kubectl get blockaffinities NAME AGE demo-1-10-42-94-192-26 21d demo-2-10-42-86-64-26 21d demo-3-10-42-29-0-26 21d $ kubectl describe blockaffinities demo-1-10-42-94-192-26 Name: demo-1-10-42-94-192-26 Namespace: Labels: <none> Annotations: projectcalico.org/metadata: {"creationTimestamp":null} API Version: crd.projectcalico.org/v1 Kind: BlockAffinity Metadata: Creation Timestamp: 2024-09-26T06:06:33Z Generation: 2 Resource Version: 1248 UID: 7514cdc4-a3bd-4505-900e-844a54af94f6 Spec: Cidr: 10.42.94.192/26 Deleted: false Node: demo-1 State: confirmed Events: <none> ``` 4. 安裝 calicoctl ``` $ curl -L https://github.com/projectcalico/calico/releases/download/v3.28.2/calicoctl-linux-amd64 -o calicoctl $ chmod +x ./calicoctl $ mv calicoctl /usr/local/bin/ ``` ``` $ calicoctl get ippool -o wide --allow-version-mismatch NAME CIDR NAT IPIPMODE VXLANMODE DISABLED DISABLEBGPEXPORT SELECTOR default-ipv4-ippool 10.42.0.0/16 true Never Always false false all() ``` 5. 建立新的 ippool ``` $ calicoctl get ippool -o yaml --allow-version-mismatch > calico-ippool.yaml $ nano calico-ippool.yaml apiVersion: projectcalico.org/v3 items: - apiVersion: projectcalico.org/v3 kind: IPPool metadata: name: my-ipv4-ippool # change spec: allowedUses: - Workload - Tunnel blockSize: 23 # 一個節點要至少能夠有 500 個 ip,因此設定 23 cidr: 10.44.0.0/16 # 要更換網段不然會跟現在的網段衝突 ipipMode: Never natOutgoing: true nodeSelector: all() vxlanMode: Always kind: IPPoolList metadata: resourceVersion: "7895950" $ calicoctl apply -f calico-ippool.yaml --allow-version-mismatch ``` 6. 將原本的 ippool 關閉 ``` $ calicoctl get ippool -o wide --allow-version-mismatch NAME CIDR NAT IPIPMODE VXLANMODE DISABLED DISABLEBGPEXPORT SELECTOR default-ipv4-ippool 10.42.0.0/16 true Never Always false false all() my-ipv4-ippool 10.44.0.0/16 true Never Always false false all() $ kubectl edit ippools.crd.projectcalico.org default-ipv4-ippool aapiVersion: projectcalico.org/v3 kind: IPPool metadata: creationTimestamp: "2024-09-26T06:06:33Z" name: default-ipv4-ippool resourceVersion: "7895243" uid: 7799a296-8cb3-441c-bd86-1b29cae1bf02 spec: allowedUses: - Workload - Tunnel blockSize: 26 cidr: 10.42.0.0/16 disabled: true # add ipipMode: Never natOutgoing: true nodeSelector: all() vxlanMode: Always $ calicoctl get ippool -o wide --allow-version-mismatch NAME CIDR NAT IPIPMODE VXLANMODE DISABLED DISABLEBGPEXPORT SELECTOR default-ipv4-ippool 10.42.0.0/16 true Never Always true false all() my-ipv4-ippool 10.44.0.0/16 true Never Always false false all() ``` 7. 將所有機器重啟,讓 pod 重啟獲得新的 ip ``` # 確認 pod ip 是否改變,如果沒改變就手動刪除 pod 重建 $ calicoctl get wep --all-namespaces --allow-version-mismatch NAMESPACE WORKLOAD NODE NETWORKS INTERFACE calico-system calico-kube-controllers-b69764646-k6c5s demo-1 10.44.210.7/32 calie4976550ec1 cattle-fleet-system fleet-agent-0 demo-1 10.44.210.11/32 cali14f95abeeaf cattle-system cattle-cluster-agent-67dcb8d794-87lvd demo-3 10.44.232.4/32 cali00b45a1e6e7 cattle-system cattle-cluster-agent-67dcb8d794-s4kfp demo-1 10.44.210.9/32 calib5963cc47a2 cattle-system rancher-webhook-6cdb786fbb-lw4pg demo-1 10.44.210.4/32 cali0dd73075b27 cattle-system system-upgrade-controller-7c88897cc8-77mdh demo-1 10.44.210.12/32 cali913306fa622 default dns-test demo-3 10.44.232.3/32 cali0f31e7aaff4 default iperf3-ds-8szzl demo-3 10.44.232.2/32 califcd241938d7 default iperf3-ds-cnpwt demo-2 10.44.238.6/32 calice32ea8e555 default iperf3-ds-fwr6s demo-1 10.44.210.10/32 cali03ab97aab2b kube-system helm-install-rke2-ingress-nginx-z6n7d demo-1 calif6df3e2e3f4 kube-system helm-install-rke2-metrics-server-fqmq2 demo-1 cali92b0c0d73f7 kube-system helm-install-rke2-snapshot-controller-crd-djgrg demo-1 cali3b20df51335 kube-system helm-install-rke2-snapshot-controller-kx6cc demo-1 cali8a65f26207b kube-system helm-install-rke2-snapshot-validation-webhook-kd84d demo-1 cali26fa53d60c2 kube-system rke2-coredns-rke2-coredns-54d6c599fb-6znj9 demo-1 10.44.210.6/32 calie2d0028aa2f kube-system rke2-coredns-rke2-coredns-54d6c599fb-x4q6h demo-2 10.44.238.4/32 calib4e3597286f kube-system rke2-coredns-rke2-coredns-autoscaler-86bfc57df7-82zzr demo-1 10.44.210.1/32 cali00ce8191c54 kube-system rke2-ingress-nginx-controller-4qn94 demo-2 10.44.238.5/32 cali63d5b01985f kube-system rke2-ingress-nginx-controller-4v4j2 demo-3 10.44.232.1/32 cali9e9f97529fe kube-system rke2-ingress-nginx-controller-t2zf4 demo-1 10.44.210.5/32 cali98bd752c76d kube-system rke2-metrics-server-7f84955c7c-8j47w demo-1 10.44.210.8/32 cali06894f45a1c kube-system rke2-snapshot-controller-df4977c76-6487p demo-1 10.44.210.3/32 cali1f377d3ba4a kube-system rke2-snapshot-validation-webhook-54856dd469-566sw demo-1 10.44.210.2/32 cali4bb1bda362c ``` 8. kube-controller-manager 新增參數去停用 kube-controller-manager 自動建立子網路。如果不取消可能會有 CIDRNotAvailable 誤報訊息,因此取消透過 kube-controller-manager 分配子網路位址 > allocate-node-cidrs=false ![image](https://hackmd.io/_uploads/Bk1Pzv0kyg.png) ## 驗證 * 新產生的 blockaffinities 是我們規劃的網段 ``` $ kubectl get blockaffinities NAME AGE demo-1-10-42-94-192-26 21d demo-1-10-44-210-0-23 14m demo-2-10-42-86-64-26 21d demo-2-10-44-238-0-23 14m demo-3-10-42-29-0-26 21d demo-3-10-44-232-0-23 14m $ kubectl describe blockaffinities demo-2-10-44-238-0-23 Name: demo-2-10-44-238-0-23 Namespace: Labels: <none> Annotations: projectcalico.org/metadata: {"creationTimestamp":null} API Version: crd.projectcalico.org/v1 Kind: BlockAffinity Metadata: Creation Timestamp: 2024-10-17T08:54:42Z Generation: 2 Resource Version: 7900800 UID: 4ec48adb-5654-4046-8a5b-d4b617633d3d Spec: Cidr: 10.44.238.0/23 Deleted: false Node: demo-2 State: confirmed Events: <none> ``` * 建立 deployment 並在同個節點上產生 400 個 pod ``` $ echo 'apiVersion: apps/v1 kind: Deployment metadata: creationTimestamp: null labels: app: test name: test spec: replicas: 400 selector: matchLabels: app: test strategy: {} template: metadata: creationTimestamp: null labels: app: test spec: containers: - image: quay.io/cooloo9871/nginx name: nginx nodeName: demo-2' | kubectl apply -f - ``` * 已驗證所有 pod 都 running ``` $ kubectl get po | grep Running | wc -l 400 ``` ![image](https://hackmd.io/_uploads/HyQerU0kkx.png) * 目前 demo-2 節點上有 406 個 pod 在執行 ![image](https://hackmd.io/_uploads/HyUdDUCJye.png) ## 參考 https://www.suse.com/support/kb/doc/?id=000021093 https://www.suse.com/support/kb/doc/?id=000020167