[建置] 安裝 K8s (v2)

[建置] 安裝 K8s (v2) === ###### tags: `K8s / Setup` ###### tags: `Kubernetes`, `k8s`, `setup`, `install` [TOC] - kubeadm 重置 https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-reset/ ``` sudo rm -rf /etc/cni/net.d ``` ## 安裝限制 - OS: Ubuntu 20.04.6 LTS (focal) - Kubernetes 版本：v1.31 - CNI 請使用 helm 安裝 Cilium (v1.17.2) ## 重建 K8s ### ✅ 0. 檢查是否有未清理乾淨？ ``` $ kubectl $ kubeadm ``` ### ✅ 0. 系統初始化建議 ```bash= # 更新套件庫 sudo apt update && sudo apt upgrade -y # 安裝必要工具 # - https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#installing-kubeadm-kubelet-and-kubectl (步驟一) # - gnupg 有需要？ sudo apt-get install -y apt-transport-https ca-certificates curl gpg gnupg lsb-release software-properties-common # 永久關閉 swap（K8s 要求） sudo swapoff -a sudo sed -i '/ swap / s/^$.*$$/#\1/' /etc/fstab # 開啟 bridge-nf-call-iptables # (開啟必要的 kernel 模組與參數（IPv4 forwarding）) cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf overlay br_netfilter EOF cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 net.bridge.bridge-nf-call-ip6tables = 1 EOF sudo modprobe overlay sudo modprobe br_netfilter sudo sysctl --system ``` - **驗證設定** ```bash= $ lsmod | grep overlay overlay 118784 51 $ lsmod | grep br_netfilter br_netfilter 28672 0 bridge 176128 1 br_netfilter $ sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.ipv4.ip_forward net.bridge.bridge-nf-call-iptables = 1 net.bridge.bridge-nf-call-ip6tables = 1 net.ipv4.ip_forward = 1 ``` - ### ✅ `lsmod` 模組載入檢查 | 模組名稱 | 狀態 | 說明 | |-------------------|------|------| | `overlay` | ✅ 已載入 | 為 containerd 必須的檔案系統支援 | | `br_netfilter` | ✅ 已載入 | 支援 Kubernetes 用 iptables 控制 Pod/Service 流量 | | `bridge` | ✅ 已載入 | `br_netfilter` 的依賴模組之一 | **✅ 說明：模組都已正常載入，符合 Kubernetes/Cilium 安裝需求。** - ### ✅ `sysctl` 核心參數檢查 ``` net.bridge.bridge-nf-call-iptables = 1 net.bridge.bridge-nf-call-ip6tables = 1 net.ipv4.ip_forward = 1 ``` 這三項是 Kubernetes 安裝前的必備條件，功能如下： | 參數名稱 | 狀態 | 說明 | |----------------------------------|------|------| | `net.bridge.bridge-nf-call-iptables` | ✅ OK | 可讓 bridged 封包被 iptables 處理（Pod 流量） | | `net.bridge.bridge-nf-call-ip6tables` | ✅ OK | 與上面類似，針對 IPv6 封包 | | `net.ipv4.ip_forward` | ✅ OK | 允許封包從一個介面轉發到另一個（Pod 到 Service） | - ### ✅ 結論你目前的系統環境： - ✅ overlay 檔案系統支援完整 - ✅ bridge 模組與 iptables 處理支援完整 - ✅ 轉發設定與 sysctl 核心值正確 - ✅ 可繼續進行 containerd 與 kubeadm 的安裝流程 --- ### ✅ 1. 安裝 containerd 並使用 systemd cgroup driver 參考官方文件：[Containerd with systemd](https://kubernetes.io/docs/setup/production-environment/container-runtimes/#containerd-systemd) ```bash= # 安裝 containerd sudo apt install -y containerd # [設定 containerd 使用 systemd 作為 cgroup driver（K8s 官方建議）] # 建立預設設定檔 sudo mkdir -p /etc/containerd sudo containerd config default | sudo tee /etc/containerd/config.toml # 修改 cgroup driver 為 systemd sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml # 重啟 containerd sudo systemctl restart containerd sudo systemctl enable containerd ``` - **驗證設定** - 查看 SystemdCgroup 值 ```bash $ cat /etc/containerd/config.toml | grep SystemdCgroup SystemdCgroup = true ``` - 使用 `ctr` 測試 hello 容器 - 拉取測試映像檔 ```bash $ sudo ctr image pull docker.io/library/hello-world:latest docker.io/library/hello-world:latest: resolved |++++++++++++++++++++++++++++++++++++++| index-sha256:0b6a027b5cf322f09f6706c754e086a232ec1ddba835c8a15c6cb74ef0d43c29: done |++++++++++++++++++++++++++++++++++++++| manifest-sha256:03b62250a3cb1abd125271d393fc08bf0cc713391eda6b57c02d1ef85efcc25c: done |++++++++++++++++++++++++++++++++++++++| config-sha256:74cc54e27dc41bb10dc4b2226072d469509f2f22f1a3ce74f4a59661a1d44602: done |++++++++++++++++++++++++++++++++++++++| layer-sha256:e6590344b1a5dc518829d6ea1524fc12f8bcd14ee9a02aa6ad8360cce3a9a9e9: done |++++++++++++++++++++++++++++++++++++++| elapsed: 5.6 s total: 15.4 K (2.8 KiB/s) unpacking linux/amd64 sha256:0b6a027b5cf322f09f6706c754e086a232ec1ddba835c8a15c6cb74ef0d43c29... done: 47.933065ms ``` - 執行容器 ```bash sudo ctr run --rm --tty docker.io/library/hello-world:latest hello-test Hello from Docker! This message shows that your installation appears to be working correctly. To generate this message, Docker took the following steps: 1. The Docker client contacted the Docker daemon. 2. The Docker daemon pulled the "hello-world" image from the Docker Hub. (amd64) 3. The Docker daemon created a new container from that image which runs the executable that produces the output you are currently reading. 4. The Docker daemon streamed that output to the Docker client, which sent it to your terminal. To try something more ambitious, you can run an Ubuntu container with: $ docker run -it ubuntu bash Share images, automate workflows, and more with a free Docker ID: https://hub.docker.com/ For more examples and ideas, visit: https://docs.docker.com/get-started/ ``` 📌 `--rm` 表示容器結束後自動移除，`--tty` 讓你可以看到輸出內容 --- ### ✅ 2. 安裝 kubeadm, kubelet 和 kubectl (v1.31) > 安裝步驟：https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/#installing-kubeadm-kubelet-and-kubectl ```bash= # 加入 Kubernetes 套件庫金鑰與 repo sudo curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.31/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.31/deb/ /" | \ sudo tee /etc/apt/sources.list.d/kubernetes.list # 安裝套件 sudo apt update sudo apt install -y kubelet kubeadm kubectl sudo apt-mark hold kubelet kubeadm kubectl # (Optional) Enable the kubelet service before running kubeadm: # sudo systemctl enable --now kubelet ``` - **驗證設定** - 被 hold 住的指令： ``` $ apt-mark showhold kubeadm kubectl kubelet ``` - 查看版本 ``` $ kubectl version Client Version: v1.31.9 <-- Kustomize Version: v5.4.2 The connection to the server localhost:8080 was refused - did you specify the right host or port? ``` --- ### ✅ 3. 初始化控制平面節點 (master) #### 3.1 請根據實際網段與網卡設定 CIDR ```bash= sudo kubeadm init \ --pod-network-cidr=192.168.0.0/16 \ --kubernetes-version=v1.31.9 ``` - ### 排解錯誤：[ERROR Port-6443]: Port 6443 is in use ``` $ sudo kubeadm init \ > --pod-network-cidr=192.168.0.0/16 \ > --kubernetes-version=v1.31.9 [init] Using Kubernetes version: v1.31.9 [preflight] Running pre-flight checks [WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR Port-6443]: Port 6443 is in use [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...` To see the stack trace of this error execute with --v=5 or higher ``` - #### 第一步：確認是哪個程式佔用 6443 埠 ``` $ sudo lsof -i :6443 ``` ![](https://hackmd.io/_uploads/ryZF57gmxe.png) ``` $ sudo netstat -tulnp | grep 6443 tcp6 0 0 :::6443 :::* LISTEN 2199538/kube-apiser ``` - #### 第二步：清除之前殘留（如果是 kubeadm 初始化過） - **用這個指令重置環境**： ``` sudo kubeadm reset -f ``` - **接著刪除 Kubernetes 設定與 etcd 檔案**： ```bash sudo rm -rf /etc/kubernetes sudo rm -rf /var/lib/etcd sudo systemctl stop kubelet sudo systemctl restart containerd ``` - #### 🔁 第三步：再次嘗試初始化 ```bash # sudo kubeadm init ... ``` --- #### 3.2 初始化完成後，根據提示設定 `kubectl` (kubeconfig)： ```bash= mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config ``` - `-i`, `--interactive`：如果目標檔案已存在，則在覆蓋之前提示使用者確認。 - **驗證安裝結果** ![](https://hackmd.io/_uploads/BkAGRmeQxe.png) --- ### ✅ 4.A 安裝 Flannel ```bash= wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml nano kube-flannel.yml # 修改 # Before: # "Network": "10.244.0.0/16" # After: # "Network": "192.168.0.0/16" # 安裝 flannel kubectl apply -f kube-flannel.yml # 查看 flannel 狀態 kubectl -n kube-flannel get all ``` --- ### ✅ 4.B 安裝 Cilium (CNI) via Helm (v1.17.2) 參考：[Cilium 安裝文件](https://docs.cilium.io/en/v1.17/gettingstarted/k8s-install-default/) - ### 4.1 安裝 Cilium CLI > https://docs.cilium.io/en/v1.17/gettingstarted/k8s-install-default/#install-the-cilium-cli ``` # 官方提供 (for Linux) CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt) CLI_ARCH=amd64 if [ "$(uname -m)" = "aarch64" ]; then CLI_ARCH=arm64; fi curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum} sha256sum --check cilium-linux-${CLI_ARCH}.tar.gz.sha256sum sudo tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin rm cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum} ``` - ### 4.2 安裝 Cilium Chart ```bash= # 安裝 Helm curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash # 加入 Cilium Helm 倉庫 helm repo add cilium https://helm.cilium.io/ helm repo update # 安裝 Cilium，建議加上 true 模式以取代 kube-proxy helm install cilium cilium/cilium \ --version 1.17.2 \ --namespace kube-system \ --set operator.replicas=1 \ --set kubeProxyReplacement=true \ --set k8sServiceHost=10.78.26.241 \ --set k8sServicePort=6443 \ --set ipam.mode=kubernetes \ --set securityContext.privileged=true # 移除 Cilium (如果需要重新安裝) # $ helm uninstall cilium --namespace kube-system ``` - `10.78.26.241` -> `<your-k8s-api-host>` - `<your-k8s-api-host>` 可使用 `hostname -i` 查詢 - 或查詢 `kubectl get nodes -o wide` 確認本機內部 IP - **參數補充說明** :::danger - ### `--set kubeProxyReplacement=true` - [`kube-proxy-replacement`](https://docs.cilium.io/en/v1.14/operations/upgrade/#deprecated-options) ![](https://hackmd.io/_uploads/rkbwp4x7el.png) ![](https://hackmd.io/_uploads/SJZlJHxXll.png) - **v1.15** - **移除**：`strict`, `partial`, `disabled` - **改用**：`true`, `false` - ### `--set ipam.mode=kubernetes` - [IP Address Management (IPAM)](https://docs.cilium.io/en/stable/network/concepts/ipam/index.html) ::: - ### 4.3 驗證安裝結果 - `$ cilium status` ![](https://hackmd.io/_uploads/BydHwSxXxe.png) - `$ kubectl get all -n kube-system` ![](https://hackmd.io/_uploads/rJxqNVxQgx.png) 觀察 coredns 的 pod，是否從 `0/1` 變成 `1/1`： ``` NAME READY STATUS RESTARTS AGE pod/cilium-envoy-pbs2z 1/1 Running 0 83m pod/cilium-operator-6966d94848-4gjqf 1/1 Running 0 83m pod/cilium-x7c58 1/1 Running 0 83m pod/coredns-7c65d6cfc9-hlsfk 1/1 Running 0 126m pod/coredns-7c65d6cfc9-j54cz 1/1 Running 0 126m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/cilium-operator 1/1 1 1 83m deployment.apps/coredns 2/2 2 2 126m ``` --- ### ✅ 5. 移除 master 節點的 NoSchedule 污點（讓其也可當 worker 用） ```bash= kubectl taint nodes --all node-role.kubernetes.io/control-plane- || true kubectl taint nodes --all node-role.kubernetes.io/master- || true ``` - **驗證結果** - 查看指令 ```bash kubectl get nodes/stage-kube01 -o yaml ``` - Node spec：**Before** ```yaml spec: podCIDR: 192.168.0.0/24 podCIDRs: - 192.168.0.0/24 taints: - effect: NoSchedule key: node-role.kubernetes.io/control-plane <-- status: ``` - Node spec：**After** ```yaml spec: podCIDR: 192.168.0.0/24 podCIDRs: - 192.168.0.0/24 status: ``` --- ### ✅ 6. 驗證安裝狀態 ```bash= kubectl get nodes -o wide kubectl get pods -A cilium status ``` - ### CIDR（Classless Inter-Domain Routing, 無類別域間路由） - 查看指令 ```bash kubectl get nodes/stage-kube01 -o yaml | grep podCIDR -A1 ``` - 假設這個 cluster 設定： ```yaml --pod-network-cidr=192.168.0.0/16 ``` - 你的 node1： ```yaml spec: podCIDR: 192.168.0.0/24 podCIDRs: - 192.168.0.0/24 ``` - node2 可能會變成 ```yaml spec: podCIDR: 192.168.1.0/24 podCIDRs: - 192.168.1.0/24 ``` - 依此類推。 - 指定 `/16`，K8s 拿來做「cluster 大網段」。 - 各個 node 只拿一個 `/24` 當自己的 pod 子網段（這個就是你看到的 podCIDR: 192.168.0.0/24）。 - 當你再加入新 node，會自動分下一個未用過的 `/24`。 :::info :bulb: 如需加入 worker 節點，請在 worker 上重複 **第 0~2 步**，然後使用 `kubeadm join` 指令（會在 init 後出現）加入叢集。 ::: --- ### ✅ 7. 安裝 NVIDIA GPU Operator（依系統狀況動態決定參數） NVIDIA GPU Operator 可協助在 Kubernetes 節點上自動部署以下 GPU 相關元件： | 元件 | 說明 | |-----------------------------|--------------------------------------------| | **Driver** | 安裝 NVIDIA 驅動（選擇性） | | **Container Toolkit** | 安裝 `nvidia-container-toolkit` 並設定 runtime | | **Device Plugin** | 提供 GPU 資訊給 kubelet 進行排程與掛載 | | **DCGM Exporter** | 匯出 GPU 健康狀況與使用率 metrics（Prometheus） | | **GPU Feature Discovery** | 偵測節點上的 GPU 功能與型號 | - ### [總覽] 使用 helm 安裝 gpu-operator，參數設定可依據「**系統狀態**」決定 ```bash= helm repo add nvidia https://nvidia.github.io/gpu-operator helm repo update kubectl create namespace gpu-operator helm install --wait --generate-name \ nvidia/gpu-operator \ --namespace gpu-operator \ --set driver.enabled=false \ --set toolkit.enabled=true ``` | 狀況 | 參數設定 | 說明 | |----------------------------------|--------|-----| | ❌ GPU 驅動未裝 (`nvidia-smi`) | `--set driver.enabled=true` | 由 gpu-operator 幫忙裝 | | ✅ GPU 驅動已裝 (`nvidia-smi`) | `--set driver.enabled=false` | 不用幫忙 | | ❌ containerd 尚未設定支援 nvidia | `--set toolkit.enabled=true` | 由 gpu-operator 幫忙裝 | | ✅ containerd 已經設定支援 nvidia | `--set toolkit.enabled=false` | 不用幫忙 | - ### 📌 7.1 檢查當前系統狀態 > 情境：**(A)** K8s重建, **(B)** 全新安裝 > 使用下表逐項檢查主機狀態： | 項目 | 指令範例 | 狀態評估 | |------------------------------|----------------------------------------|-----------------------| | ✅ GPU 驅動是否已安裝 | `nvidia-smi` | 有輸出表示已安裝 | | ✅ containerd 是否已安裝 | `containerd --version` | 有版本輸出表示已安裝 | | ✅ containerd 是否支援 nvidia | `cat /etc/containerd/config.toml \| grep nvidia` | 有 nvidia 條目表示已設定 | --- - ### 📌 7.2 根據狀態選擇 Helm 安裝參數請根據你的實際狀況選擇安裝方式： - #### 🔧 **情境 A：已手動安裝 GPU 驅動，但尚未設定 container runtime** > 適用於：K8s 重建 | 狀況 | 是否符合 | |----------------------------------|----------| | ✅ GPU 驅動已裝 (`nvidia-smi`) | 是 | | ❌ containerd 尚未設定支援 nvidia | 是 | 👉 推薦設定： ``` --set driver.enabled=false \ --set toolkit.enabled=true ``` - 🟢 GPU 驅動由你自行管理，不讓 Operator 干預 - 🟢 讓 Operator 幫你設定 containerd 的 GPU runtime（自動加上 `nvidia` runtime 到 config.toml） --- - #### 🔧 **情境 B：都還沒裝，完全交由 GPU Operator 處理** > 適用於：全新安裝 | 狀況 | 是否符合 | |------------------|----------| | ❌ GPU 驅動未裝 | 是 | | ❌ containerd 尚未設定 | 是 | 👉 推薦設定： ``` --set driver.enabled=true \ --set toolkit.enabled=true ``` - 🟢 由 GPU Operator 全權處理驅動與 runtime 安裝 - 📌 需注意需搭配支援的 OS（如 RHEL/CentOS/Ubuntu），否則驅動安裝可能失敗 --- - #### 🔧 **情境 C：你所有東西都已手動設定好** > 適用於：有客製化需求的進階管理者 | 狀況 | 是否符合 | |----------------------------------|----------| | ✅ GPU 驅動 (`nvidia-smi`) 已裝 | 是 | | ✅ containerd 已設定支援 nvidia | 是 | 👉 推薦設定： ``` --set driver.enabled=false \ --set toolkit.enabled=false ``` - 🟢 完全不讓 GPU Operator 修改系統 - 適用於企業/空氣隔離環境，已手動管理所有 runtime - 手動註冊 nvidia runtime - 向 containerd 註冊 nvidia runtime ```bash # nvidia-ctk: Tools to configure the NVIDIA Container Toolkit sudo nvidia-ctk runtime configure --runtime=containerd ``` - 驗證方式： ``` cat /etc/containerd/config.toml | grep nvidia ``` ![](https://hackmd.io/_uploads/r1Knnammge.png) --- - ### 📦 7.3 安裝 GPU Operator（以情境 A 為例）(K8s重建) ```bash= helm repo add nvidia https://nvidia.github.io/gpu-operator helm repo update kubectl create namespace gpu-operator helm install --wait --generate-name \ nvidia/gpu-operator \ --namespace gpu-operator \ --set driver.enabled=false \ --set toolkit.enabled=true ``` --- - ### ✅ 7.4 驗證 GPU Operator 運作 ``` kubectl get pods -n gpu-operator ``` 應看到以下元件皆為 `Running`： | 元件 Pod 名稱 | 說明 | |-----------------------------|-------------------------| | `nvidia-device-plugin-*` | 掛載 GPU 給 Pod 使用 | | `nvidia-container-toolkit-*` | 設定 containerd runtime | | `gpu-feature-discovery-*` | 掃描 GPU 特性 | | `nvidia-dcgm-exporter-*` | 匯出 metrics（Prometheus）| ![image](https://hackmd.io/_uploads/B1A86TXQel.png) --- - ### 🔬 7.5 測試 GPU 是否可用 - #### 測試能否要到指定的 GPU 張數 ```yaml # gpu-test.yaml apiVersion: v1 kind: Pod metadata: name: gpu-test spec: restartPolicy: OnFailure containers: - name: cuda-container image: nvidia/cuda:12.2.0-base-ubuntu22.04 resources: limits: nvidia.com/gpu: 1 command: ["nvidia-smi"] ``` ``` kubectl apply -f gpu-test.yaml kubectl logs gpu-test ``` - #### 測試能否能跑 cuda 程式 (執行向量相加) ```yaml= # cuda-sample-vectoradd.yaml apiVersion: v1 kind: Pod metadata: name: cuda-sample-vectoradd spec: restartPolicy: Never containers: - name: cuda-container image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.7.1-ubuntu20.04 resources: limits: nvidia.com/gpu: 1 # requesting 1 GPU ``` ``` kubectl apply -f cuda-sample-vectoradd.yaml kubectl logs gpu-test ``` --- - ### ✅ 7.6 小結 | 項目 | 狀態 | |---------------------------|------| | GPU 驅動 (`nvidia-smi`) | ✅ 已安裝 | | Containerd + systemd | ✅ 已安裝 | | containerd 設定 nvidia runtime | ✅ 已由 Operator 自動完成 | | GPU Operator 安裝 | ✅ 成功部署 | | 測試 Pod 使用 GPU | ✅ 成功執行 | --- ### ✅ 8. 安裝 local-path storage (rancher.io) > 參數：[Configuration](https://github.com/rancher/local-path-provisioner/tree/master/deploy/chart/local-path-provisioner#configuration) ```bash git clone https://github.com/rancher/local-path-provisioner.git cd local-path-provisioner # 安裝 helm install local-path-storage \ --create-namespace \ --namespace local-path-storage \ ./deploy/chart/local-path-provisioner/ \ --set storageClass.provisionerName=rancher.io/local-path \ --set storageClass.defaultClass=true \ --set storageClass.name=default-local \ --set-json nodePathMap='[{"node":"DEFAULT_PATH_FOR_NON_LISTED_NODES","paths":["/mnt/k8s-local-path"]}]' # k8s-local-path 資料夾會由 rancher controller 自動建立 ``` - **建立結果** ![](https://hackmd.io/_uploads/rkeVzMdmlg.png) ![](https://hackmd.io/_uploads/BJbK4m_7xx.png) - **檢查配置狀態** ``` $ kubectl -n local-path-storage get all,sc,cm,daemonset,csidriver,csinode NAME READY STATUS RESTARTS AGE pod/local-path-storage-local-path-provisioner-f5846d78c-vvlsl 1/1 Running 0 26m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/local-path-storage-local-path-provisioner 1/1 1 1 26m NAME DESIRED CURRENT READY AGE replicaset.apps/local-path-storage-local-path-provisioner-f5846d78c 1 1 1 26m NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE storageclass.storage.k8s.io/default-local (default) rancher.io/local-path Delete WaitForFirstConsumer true 26m NAME DATA AGE configmap/kube-root-ca.crt 1 75m configmap/local-path-config 4 26m NAME DRIVERS AGE csinode.storage.k8s.io/stage-kube01 0 6d1h <-- =0 預期 ``` --- --- # 附錄 ## 徹底移除現有的 Kubernetes - ### by gpt-4o (2025/06/06) 你想要**徹底移除目前已建立的 Kubernetes 環境（包括 kubeadm 叢集、containerd 設定、CNI 等），然後重新安裝乾淨的環境。**以下是**完整的清除步驟**，涵蓋： - kubeadm/kubelet/kubectl 移除 - Kubernetes 叢集設定與元件移除 - containerd 重設（非完全移除） - CNI 清除（包含 Cilium） --- - ### ✅ 1. 移除 CNI（Cilium） - #### Cilium 安裝的資源會殘留在 etcd/cluster 中，建議使用 CLI 完整移除： ``` cilium uninstall ``` - 若你沒有 CLI 或找不到 cilium binary，也可以手動刪除： ``` kubectl delete -f https://raw.githubusercontent.com/cilium/cilium/v1.15/install/kubernetes/quick-install.yaml ``` - 或者若還可以用 Helm 安裝過： ``` helm uninstall cilium -n kube-system ``` - #### 刪除 CRDs（⚠️ 會清掉 Cilium 所有資料）： ```bash= # 快速確認是否有殘存 cilium CRD? kubectl get crds | grep cilium # 移除 cilium CRD? kubectl delete crds -l app.kubernetes.io/part-of=cilium ``` - #### 刪除 CNI 網路設定： ```bash= # sudo rm -f /etc/cni/net.d/05-cilium.conflist sudo rm -rf /etc/cni/net.d sudo rm -rf /opt/cni ``` - #### 刪除 CNI 網路介面： ```bash= # 檢查是否還有 cilium interface ip link show | grep cilium # 有的話強制刪掉： sudo ip link delete cilium_vxlan sudo ip link delete cilium_host sudo ip link delete cilium_net ``` - ### ✅ 2. 停止 Kubernetes 相關服務 ``` sudo systemctl stop kubelet sudo systemctl stop containerd ``` - 檢查相關 process ``` ps -aux | grep kubelet ps -aux | grep flanneld sudo kill <pid> ``` - ### ✅ 3. 重置 kubeadm 設定（會清除叢集） - #### 這會清除所有 Kubernetes 資訊，但不會移除安裝的套件。 ``` sudo kubeadm reset -f ``` - #### 也請手動刪除 kubeconfig： ``` sudo rm -rf ~/.kube sudo rm -rf /etc/kubernetes/ ``` - ### ✅ 4. 移除套件與相關檔案 - #### 移除 kubeadm/kubelet/kubectl ``` sudo apt-get purge -y kubeadm kubectl kubelet kubernetes-cni cri-tools --allow-change-held-packages sudo apt-get autoremove -y ``` - `--allow-change-held-packages` 的由來 ``` E: Held packages were changed and -y was used without --allow-change-held-packages. ``` - #### 移除 containerd（選擇性：如果你要重新設定 containerd） ``` sudo apt-get purge -y containerd sudo apt-get autoremove -y sudo rm -rf /etc/containerd sudo rm -rf /var/lib/containerd ``` - ### ✅ 5. 清除網路設定與 iptables（重要） - ### ~~這些是 kube-proxy 和 CNI 會修改的部份：~~ (:warning: 會刪除所有 iptables 資料) ``` sudo iptables -F # Flush all chains (清空所有 chain 的規則) sudo iptables -X # Delete all user-defined chains (刪除自訂 chain) sudo iptables -t nat -F sudo iptables -t nat -X sudo iptables -t mangle -F sudo iptables -t mangle -X sudo iptables -X sudo ipvsadm --clear 2>/dev/null ``` - 暴力回復 ``` sudo iptables -P INPUT ACCEPT sudo iptables -P FORWARD ACCEPT sudo iptables -P OUTPUT ACCEPT ``` - ### 如何回報 iptables 資訊給 AI ``` sudo iptables -L -n -v sudo iptables -t nat -L -n -v sudo ipvsadm -L -n ``` - ### [修改版] 清除 iptables，但只限於 K8s/Cilium 相關規則（更安全）可以只清除與 Kubernetes/Cilium 有關的規則，而保留 Docker / 其他 zone 的設定： ```bash= # 1. 備份現有規則 sudo iptables-save > ~/iptables-backup-$(date +%F).txt sudo ipvsadm-save > ~/ipvs-backup-$(date +%F).txt # 2. 解除主鏈上的 Cilium 掛載 sudo iptables -D INPUT -j CILIUM_INPUT 2>/dev/null sudo iptables -D FORWARD -j CILIUM_FORWARD 2>/dev/null sudo iptables -D OUTPUT -j CILIUM_OUTPUT 2>/dev/null # 3. 刪除 CILIUM_* 自定義 chain for chain in $(sudo iptables -S | grep -oP '^-N\s+\K(CILIUM_[^\s]+)' | sort -u); do sudo iptables -F $chain 2>/dev/null sudo iptables -X $chain 2>/dev/null done # 4. 刪除 KUBE-* 自定義 chain for chain in $(sudo iptables -S | grep -oP '^-N\s+\K(KUBE-[^\s]+)' | sort -u); do sudo iptables -F $chain 2>/dev/null sudo iptables -X $chain 2>/dev/null done # 5. 清除 IPVS（可保留） sudo ipvsadm --clear ``` - ### Refine 原本的 code - **修改前** ``` cut -d' ' -f2 ``` - **修改後** ``` grep -oP '^-N\s+\K(CILIUM_[^\s]+)' ``` - **分析問題**：這行會不小心抓到很多不是 chain 名稱的文字，例如： ```bash -A INPUT -j CILIUM_INPUT # <-- 第 2 欄是 INPUT，不是你想要的 CILIUM_INPUT -N CILIUM_INPUT # <-- 才是你真正要的自定義 chain ``` 所以這樣的結果可能會多抓到 `INPUT`, `FORWARD`, `OUTPUT`，導致： - `iptables -F INPUT` 被執行（不一定是錯，但不是你要的） - `iptables -X INPUT` 執行會失敗，因為是內建 chain - **⚠ 總結：** ![](https://hackmd.io/_uploads/rybLr8g7lg.png) - **復原現有規則（如清錯）** ``` sudo iptables-restore < ~/iptables-backup-2025-06-06.txt sudo ipvsadm-restore < ~/ipvs-backup-2025-06-06.txt ``` - ### ✅ 6. 重啟系統（建議） ``` sudo reboot ``` - ### ✅ 7. 重新安裝 Kubernetes（照你原本步驟）重開機後，你就可以依照我提供的流程： - 安裝 containerd - 安裝 kubeadm/kubelet/kubectl - 使用 kubeadm init 建立新叢集 - 安裝 Cilium CNI --- ## [CNI] Cilium ### Helm 安裝 Cilium 常見參數對照表 - ### by gpt-4o (2025/06/06) 以下是 **使用 Helm 安裝 Cilium v1.17.x** 時常見的 `--set` 參數說明表，包含每個參數的用途、是否必要，以及推薦的情境： | 參數名稱 | 範例值 | 必要性 | 說明 | |----------|--------|--------|------| | `kubeProxyReplacement` | `true` 或 `"strict"` | ✅ 必要 | 啟用 kube-proxy 替代功能。設定為 `true` 表示啟用 BPF-based service routing，建議搭配 `strict` 使用（完全取代 kube-proxy）。 | | `k8sServiceHost` | 節點 IP（如 `10.78.26.241`） | ✅ 必要 | 指定 Kubernetes API Server 的 IP（`kube-apiserver`）。Helm 安裝時需指定以取得集群資訊。 | | `k8sServicePort` | `6443` | ✅ 必要 | Kubernetes API Server 的通訊埠，通常是 6443。 | | `ipam.mode` | `kubernetes` | ✅ 強烈建議 | 使用 Kubernetes 的 PodCIDR IP 分配機制。適合 kubeadm 初始化的叢集，最穩定簡單。 | | `operator.replicas` | `1` | ✅ 必要 | Cilium operator pod 的副本數。單節點或小型叢集設為 1 即可。 | | `securityContext.privileged` | `true` | ✅ 必要 | 允許 Cilium agent 有操作核心 BPF 的特權，否則無法正常運作。 | | `hubble.relay.enabled` | `true` | ⚙️ 選用 | 啟用 Hubble Relay，讓 Hubble UI 能跨節點收集資料。 | | `hubble.ui.enabled` | `true` | ⚙️ 選用 | 啟用 Hubble 網頁 UI（預設埠為 NodePort）。適合開發或觀察網路流量時使用。 | | `cluster.name` | `default` | ⚙️ 建議 | 指定叢集名稱，對於多叢集管理或 Hubble 可觀察性有幫助。 | | `cluster.id` | `0` | ⚙️ 建議 | 叢集的 ID，用於多叢集或 Cilium ClusterMesh。單叢集可設為 0。 | --- ### `--set ipam.mode=kubernetes` 的用途 (Cilium IPAM 模式) > [IP Address Management (IPAM)](https://docs.cilium.io/en/stable/network/concepts/ipam/index.html) > ![image](https://hackmd.io/_uploads/SkpSSreXel.png) - ### by gpt-4o (2025/06/06) `--set ipam.mode=kubernetes` 是在使用 Helm 安裝 Cilium 時指定 IP 分配模式（IPAM, IP Address Management），其設定為 `kubernetes` 表示： > **讓 Cilium 使用 Kubernetes 的內建 IP 分配邏輯來分配 Pod IP。** --- ## 🔍 詳細說明： Cilium 支援多種 IP 分配模式（`ipam.mode`）： | 模式名稱 | 說明 | |------------------|------| | `kubernetes` | 使用 kubelet 提供的 Pod CIDR，由 Kubernetes 控制 IP 分配。這是最簡單、與 kubeadm 最相容的方式。 | | `cluster-pool` | Cilium 自己管理一個全叢集 IP pool，跳過 kubelet 的 PodCIDR 分配。適合大型/動態叢集 | | `eni`, `azure`, `gke`, ... | 與雲端平台整合的 IPAM 模式 | --- ## ✅ 什麼情況用 `kubernetes` 是正確選擇？ - 你使用 **kubeadm 建立叢集** 且有設定 `--pod-network-cidr=...` - 每個 Node 有自己的 Pod CIDR，由 Kubernetes 自動管理 - 想用最簡單、內建的方式處理 IP 分配 --- ## ✅ 如果不加會怎樣？如果你省略 `--set ipam.mode=...`，Cilium 有時會預設使用 `cluster-pool`，這會導致： - 如果你沒設定額外的 `--set clusterPoolIPv4PodCIDR=...` 等參數，可能會因 **缺乏 IP pool 設定而啟動失敗** - 或產生非預期的 IP 分配策略 --- ## ✅ 總結建議 | 是否加上 `--set ipam.mode=kubernetes` | 建議 | 原因 | |--------------------------------------|--------|------| | ✅ 是 | ✅ 推薦 | 最簡單、安全、適合使用 `kubeadm` 的小中型叢集 | | ❌ 否（預設使用 cluster-pool） | ⚠️ 風險高 | 需要額外指定 IP pool，否則可能失敗或混亂 | --- 是否需要我幫你列出 `kubernetes` 模式 vs `cluster-pool` 模式的實際差異與使用情境？ --- ### `--set securityContext.privileged=true` 用途為何？ - ### gpt-4o (2025/06/06) `--set securityContext.privileged=true` 是在 Helm 安裝 Cilium 時給予 **Cilium agent（Pod）使用 Linux 特權模式（privileged mode）** 的設定。 --- ### 🔍 什麼是 privileged mode？在 Kubernetes 中，Pod 預設會在限制許可的環境中執行。而設定 `privileged: true`，會讓容器： - 擁有幾乎等同於主機上的 root 權限 - 可以使用 `BPF`、`iptables`、`mount`, `net_admin` 等需要高權限的功能 - 可以載入核心模組或設定網路設備（如 `tc`, `xfrm`, `cgroup` 等） --- ### ✅ 為什麼 Cilium 需要這個？ Cilium 作為一個基於 eBPF 的 CNI plugin，需要： - 安裝 BPF 程式到 Linux 核心 - 管理網路裝置與 route table - 使用 `cgroup`, `iptables`, `tc` 等低階功能這些動作在預設的 Kubernetes Pod 安全上下文中是不允許的。 --- ### 📌 給 `cilium-agent` 加上 `privileged` 的作用： ``` securityContext: privileged: true ``` 等同於讓 Cilium Pod： - ✅ 獲得所有 Linux capabilities - ✅ 能在主機網路層操作 BPF map 和程式 - ✅ 順利啟動並正常建立 overlay routing --- ### ⚠️ 安全性考量這是一個 **高權限設定**，所以建議僅對像 Cilium 這種需要深度操作 Linux kernel 的元件使用，**不要亂用在一般業務 Pod**。 --- ### ✅ 結論 | 設定 | 作用與必要性 | |-----------------------------------|-------------------------------| | `--set securityContext.privileged=true` | 🟢 讓 Cilium 可以操作 BPF、網路與核心設定，是必要的設定之一 | {%hackmd vaaMgNRPS4KGJDSFG0ZE0w %}