# Install Rook Ceph on K8s ## 先決條件 0. 需要一座至少 3 個節點的 K8s ,這邊讓 control plane 也可以當 worker 使用 ``` $ kubectl get no -owide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME m1 Ready control-plane 5d23h v1.34.1 172.20.1.50 <none> Ubuntu 24.04.3 LTS 6.8.0-88-generic cri-o://1.34.1 w1 Ready <none> 5d23h v1.34.1 172.20.1.51 <none> Ubuntu 24.04.3 LTS 6.8.0-88-generic cri-o://1.34.1 w2 Ready <none> 4m31s v1.34.1 172.20.1.52 <none> Ubuntu 24.04.3 LTS 6.8.0-88-generic cri-o://1.34.1 ``` 1. 每個節點的需要一顆 Raw Devices ``` # 確保所有叢集的節點都多了一顆 Raw Devices $ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sda 8:0 0 30G 0 disk ├─sda1 8:1 0 1G 0 part /boot/efi ├─sda2 8:2 0 2G 0 part /boot └─sda3 8:3 0 26.9G 0 part └─ubuntu--vg-ubuntu--lv 252:0 0 13.5G 0 lvm /var/lib/containers/storage/overlay / sdb 8:16 0 200G 0 disk ``` 2. 在每一台節點啟用 `rbd` 模組 ``` $ echo "rbd" | sudo tee /etc/modules-load.d/rbd.conf $ sudo modprobe rbd $ lsmod |grep rbd rbd 126976 0 libceph 544768 1 rbd ``` 3. 確保 CPU、Memory 資源足夠,規模小的叢集 Memory 建議至少 64GB,[參考](https://docs.ceph.com/en/latest/start/hardware-recommendations/) ## 部署 Rook Ceph ``` $ git clone --single-branch --branch master https://github.com/rook/rook.git;cd rook/deploy/examples ``` ``` $ kubectl create -f crds.yaml -f common.yaml -f csi-operator.yaml -f operator.yaml ``` * 指定每台節點的 `/dev/sdb` 硬碟給 rook ceph 使用 ``` $ nano cluster.yaml ...... network: connections: ...... provider: host # 改為 hostnetwork ...... storage: # cluster level storage configuration and selection useAllNodes: true useAllDevices: false deviceFilter: "^sdb" # 指定硬碟名稱 $ kubectl create -f cluster.yaml ``` * 確認部署完畢 ``` $ kubectl -n rook-ceph get cephcluster NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL FSID rook-ceph /var/lib/rook 3 6m13s Ready Cluster created successfully HEALTH_OK b295be2e-5166-43dc-8b2f-2adcc9cd457b $ kubectl -n rook-ceph get po -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ceph-csi-controller-manager-6fdd9498b7-8b9xc 1/1 Running 0 3m 10.244.190.65 w1 <none> <none> rook-ceph-crashcollector-m1-c9bb6f65b-zjzdl 1/1 Running 0 38s 172.20.1.50 m1 <none> <none> rook-ceph-crashcollector-w1-5f6d64f774-kxkvf 1/1 Running 0 36s 172.20.1.51 w1 <none> <none> rook-ceph-crashcollector-w2-84b7f6c957-hzmkd 1/1 Running 0 70s 172.20.1.52 w2 <none> <none> rook-ceph-exporter-m1-7894f94574-8k4ps 1/1 Running 0 35s 172.20.1.50 m1 <none> <none> rook-ceph-exporter-w1-8d779d8d4-4fxmr 1/1 Running 0 33s 172.20.1.51 w1 <none> <none> rook-ceph-exporter-w2-848fd9b9ff-qlhms 1/1 Running 0 69s 172.20.1.52 w2 <none> <none> rook-ceph-mgr-a-6b84947bb4-b74lc 3/3 Running 0 88s 172.20.1.51 w1 <none> <none> rook-ceph-mgr-b-d767d4cf9-49mt9 3/3 Running 0 86s 172.20.1.50 m1 <none> <none> rook-ceph-mon-a-86f6d8579b-bb2zz 2/2 Running 0 2m18s 172.20.1.51 w1 <none> <none> rook-ceph-mon-b-5bbbddffb-x8b9q 2/2 Running 0 2m10s 172.20.1.52 w2 <none> <none> rook-ceph-mon-c-7d7ddc6fd8-pqqsm 2/2 Running 0 119s 172.20.1.50 m1 <none> <none> rook-ceph-operator-5f77d65c8c-gpps7 1/1 Running 0 2m59s 10.244.190.66 w1 <none> <none> rook-ceph-osd-0-974cb9b5d-rsc76 2/2 Running 0 38s 172.20.1.50 m1 <none> <none> rook-ceph-osd-1-78f65c5847-b4vql 2/2 Running 0 39s 172.20.1.52 w2 <none> <none> rook-ceph-osd-2-bddf54c8-8lt2c 2/2 Running 0 36s 172.20.1.51 w1 <none> <none> rook-ceph-osd-prepare-m1-z5kdw 0/1 Completed 0 64s 10.244.202.1 m1 <none> <none> rook-ceph-osd-prepare-w1-t7fpt 0/1 Completed 0 63s 10.244.190.70 w1 <none> <none> rook-ceph-osd-prepare-w2-xgp5r 0/1 Completed 0 62s 10.244.80.198 w2 <none> <none> rook-ceph.cephfs.csi.ceph.com-ctrlplugin-584f9c8447-7ccct 5/5 Running 0 2m23s 10.244.190.69 w1 <none> <none> rook-ceph.cephfs.csi.ceph.com-ctrlplugin-584f9c8447-z8ljb 5/5 Running 0 2m23s 10.244.80.197 w2 <none> <none> rook-ceph.cephfs.csi.ceph.com-nodeplugin-48vl2 2/2 Running 0 2m23s 172.20.1.50 m1 <none> <none> rook-ceph.cephfs.csi.ceph.com-nodeplugin-kvs2l 2/2 Running 0 2m23s 172.20.1.52 w2 <none> <none> rook-ceph.cephfs.csi.ceph.com-nodeplugin-rqhqf 2/2 Running 0 2m23s 172.20.1.51 w1 <none> <none> rook-ceph.rbd.csi.ceph.com-ctrlplugin-77df7bd68c-mjhzx 5/5 Running 0 2m23s 10.244.190.68 w1 <none> <none> rook-ceph.rbd.csi.ceph.com-ctrlplugin-77df7bd68c-nqw7h 5/5 Running 0 2m23s 10.244.80.196 w2 <none> <none> rook-ceph.rbd.csi.ceph.com-nodeplugin-2cf79 2/2 Running 0 2m23s 172.20.1.52 w2 <none> <none> rook-ceph.rbd.csi.ceph.com-nodeplugin-nq8dg 2/2 Running 0 2m23s 172.20.1.51 w1 <none> <none> rook-ceph.rbd.csi.ceph.com-nodeplugin-xnxsq 2/2 Running 0 2m23s 172.20.1.50 m1 <none> <none> ``` * 部署 Ceph Toolbox,可以使用 CLI 操作 ceph ``` $ kubectl apply -f https://raw.githubusercontent.com/rook/rook/master/deploy/examples/toolbox.yaml ``` * 可以使用指令檢查 ceph 狀態 ``` $ kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph status cluster: id: b295be2e-5166-43dc-8b2f-2adcc9cd457b health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 2m) mgr: a(active, starting, since 21s), standbys: b osd: 3 osds: 3 up (since 87s), 3 in (since 118s) data: pools: 1 pools, 1 pgs objects: 2 objects, 449 KiB usage: 81 MiB used, 600 GiB / 600 GiB avail pgs: 1 active+clean ``` ## 測試 ### 測試 Block Storage(RBD) * 先建立 Ceph Pool ``` $ kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd pool create kubernetes # Use the rbd tool to initialize the pool $ kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- rbd pool init kubernetes # 設定 pool 最大容量為 10GB $ kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd pool set-quota kubernetes max_bytes $((10 * 1024 * 1024 * 1024)) ``` ``` $ kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd pool get-quota kubernetes quotas for pool 'kubernetes': max objects: N/A max bytes : 10 GiB (current num bytes: 7225379 bytes) ``` * 檢查現在有哪些 pool 可以使用 ``` $ kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd pool ls .mgr kubernetes ``` * 建立 Ceph Client 身分驗證 * 在 Ceph 系統中建立一個專門的使用者帳號(kubernetes),並賦予它存取特定硬碟池 (Pool) 的權限 ``` $ kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph auth get-or-create client.kubernetes mon 'profile rbd' osd 'profile rbd pool=kubernetes' mgr 'profile rbd pool=kubernetes' [client.kubernetes] key = AQAGvDdpTElCExAAxOBdAWq1xMLfTvVVGMqlbQ== # 這個就是待會要填的 userkey ``` ``` $ kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph auth get-key client.kubernetes AQAGvDdpTElCExAAxOBdAWq1xMLfTvVVGMqlbQ== ``` ![image](https://hackmd.io/_uploads/S1aiJ7Sup.png) * 獲取 Ceph monitor 和 fsid 資訊 ``` $ kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph mon dump epoch 3 fsid b295be2e-5166-43dc-8b2f-2adcc9cd457b # 此行就是 clusterID last_changed 2025-12-09T06:00:34.717245+0000 created 2025-12-09T06:00:13.104271+0000 min_mon_release 19 (squid) election_strategy: 1 0: [v2:172.20.1.51:3300/0,v1:172.20.1.51:6789/0] mon.a 1: [v2:172.20.1.52:3300/0,v1:172.20.1.52:6789/0] mon.b 2: [v2:172.20.1.50:3300/0,v1:172.20.1.50:6789/0] mon.c ``` * 在要掛載 Rook Ceph RBD 的其他叢集設定與部屬 Ceph-CSI RBD plugins ``` $ kubectl get no -owide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME m1 Ready control-plane 3h8m v1.34.1 172.20.1.60 <none> Ubuntu 24.04.3 LTS 6.8.0-88-generic cri-o://1.34.1 w1 Ready <none> 16m v1.34.1 172.20.1.61 <none> Ubuntu 24.04.3 LTS 6.8.0-88-generic cri-o://1.34.1 w2 Ready <none> 15m v1.34.1 172.20.1.62 <none> Ubuntu 24.04.3 LTS 6.8.0-88-generic cri-o://1.34.1 $ cd ~ && git clone https://github.com/ceph/ceph-csi.git ``` * 建立並切換至 `csi-ceph` namespace ``` $ kubectl create ns csi-ceph $ kubectl config set-context --current --namespace=csi-ceph ``` * 設定 Ceph-csi configmap ``` $ cd ~/ceph-csi/deploy/rbd/kubernetes $ cat <<EOF > csi-config-map.yaml --- apiVersion: v1 kind: ConfigMap data: config.json: |- [ { "clusterID": "b295be2e-5166-43dc-8b2f-2adcc9cd457b", "monitors": [ "172.20.1.51:6789", "172.20.1.52:6789", "172.20.1.53:6789" ] } ] metadata: name: ceph-csi-config EOF ``` > 須設定 clusterID 和 monitors 的 IP Address * 設定 csidriver 的 pod 不要 Mount host 主機的 `/etc/selinux` 到 pods 裡面 ``` $ sed -i 's|seLinuxMount: true|seLinuxMount: false|g' csidriver.yaml $ sed -i '/- mountPath: \/etc\/selinux/,+2d' csi-rbdplugin.yaml $ sed -i '/- name: etc-selinux/,+2d' csi-rbdplugin.yaml ``` * 將所有 Yaml 中定義的物件都更換為 csi-ceph Namespace ``` $ sed -i 's|namespace: default|namespace: csi-ceph|g' *.yaml ``` * 設定 csi-rbdplugin-provisioner 和 csi-rbdplugin 的 Pod 能夠在 Contorlplane Node 上執行 ``` $ sed -i '36i\ tolerations:\n - operator: Exists' csi-rbdplugin-provisioner.yaml $ sed -i '24i\ tolerations:\n - operator: Exists' csi-rbdplugin.yaml ``` * 產生 CEPH-CSI cephx secret ``` $ cat <<EOF > ~/ceph-csi/examples/rbd/secret.yaml --- apiVersion: v1 kind: Secret metadata: name: csi-rbd-secret namespace: csi-ceph # change stringData: userID: kubernetes # change userKey: AQAGvDdpTElCExAAxOBdAWq1xMLfTvVVGMqlbQ== # change # Encryption passphrase encryptionPassphrase: test_passphrase EOF ``` > 須設定 namespace、userID 和 userKey 的值 * 建立 CEPH-CSI cephx secret ``` $ kubectl apply -f ~/ceph-csi/examples/rbd/secret.yaml ``` * 給予 csi-ceph 這個 Namespace 最高權限 ``` $ kubectl label ns csi-ceph pod-security.kubernetes.io/enforce=privileged ``` > csi-rbdplugin 和 csi-rbdplugin-provisioner 的 Pod 會需要 privileged 的權限。 * 最新版本的 ceph-csi 還需要另一個 ConfigMap 物件來定義 Ceph 的設定資訊,以便新增至 CSI Container 內的 ceph.conf 檔案中 ``` $ kubectl apply -f ~/ceph-csi/deploy/ceph-conf.yaml ``` * 開始部屬 ceph-csi ``` $ cd ~/ceph-csi/examples/rbd $ ./plugin-deploy.sh ~/ceph-csi/deploy/rbd/kubernetes ## 移除 vault $ kubectl delete -f ../kms/vault/vault.yaml ``` * 檢視 ceph-csi 部屬狀態 ``` $ kubectl get all NAME READY STATUS RESTARTS AGE pod/csi-rbdplugin-lt295 3/3 Running 0 28m pod/csi-rbdplugin-provisioner-57ff7fc887-j8gcj 7/7 Running 0 29m pod/csi-rbdplugin-provisioner-57ff7fc887-s7rd4 7/7 Running 0 29m pod/csi-rbdplugin-provisioner-57ff7fc887-vvdc4 7/7 Running 0 29m pod/csi-rbdplugin-vrgpr 3/3 Running 0 16m pod/csi-rbdplugin-x7kjh 3/3 Running 1 (15m ago) 16m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/csi-metrics-rbdplugin ClusterIP 10.96.233.216 <none> 8080/TCP 38m service/csi-rbdplugin-provisioner ClusterIP 10.96.72.154 <none> 8080/TCP 38m NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/csi-rbdplugin 3 3 3 3 3 <none> 38m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/csi-rbdplugin-provisioner 3/3 3 3 38m NAME DESIRED CURRENT READY AGE replicaset.apps/csi-rbdplugin-provisioner-57ff7fc887 3 3 3 38m ``` * 設定 StorageClass Yaml 檔 ``` $ cat <<EOF > csi-rbd-sc.yaml --- apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: csi-rbd-sc provisioner: rbd.csi.ceph.com parameters: clusterID: b295be2e-5166-43dc-8b2f-2adcc9cd457b # change pool: kubernetes # change imageFeatures: layering csi.storage.k8s.io/provisioner-secret-name: csi-rbd-secret csi.storage.k8s.io/provisioner-secret-namespace: csi-ceph csi.storage.k8s.io/controller-expand-secret-name: csi-rbd-secret csi.storage.k8s.io/controller-expand-secret-namespace: csi-ceph csi.storage.k8s.io/node-stage-secret-name: csi-rbd-secret csi.storage.k8s.io/node-stage-secret-namespace: csi-ceph reclaimPolicy: Delete allowVolumeExpansion: true mountOptions: - discard EOF ``` > 須設定 clusterID 和 pool 的值 * 建立 StorageClass ``` $ kubectl apply -f csi-rbd-sc.yaml ``` * 檢視 StorageClass ``` $ kubectl get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE csi-rbd-sc rbd.csi.ceph.com Delete Immediate true 8s ``` * 切回 `default` namespace ``` $ kubectl config set-context --current --namespace=default ``` #### 驗收 * 設定 PVC ``` $ cat <<EOF > raw-block-pvc-rwo.yaml --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: raw-block-pvc spec: accessModes: - ReadWriteOnce volumeMode: Block resources: requests: storage: 1Gi storageClassName: csi-rbd-sc EOF ``` > Note: Using ceph-csi, specifying Filesystem for volumeMode can support both ReadWriteOnce and ReadOnlyMany accessMode claims, and specifying Block for volumeMode can support ReadWriteOnce, ReadWriteMany, and ReadOnlyMany accessMode claims. * 建立 PVC Yaml 檔 ``` $ kubectl apply -f raw-block-pvc-rwo.yaml ``` ``` $ kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE raw-block-pvc Bound pvc-41c6f577-fee0-4d00-b1b3-55cafbb2943b 1Gi RWX csi-rbd-sc <unset> 43m ``` * 設定 Deployment Object Yaml 檔 ``` $ echo 'apiVersion: apps/v1 kind: Deployment metadata: name: pod-with-raw-block-volume labels: os: alpine spec: replicas: 1 selector: matchLabels: os: alpine template: metadata: labels: os: alpine spec: containers: - name: alpine image: taiwanese/alpine:stable imagePullPolicy: IfNotPresent command: ["/bin/sleep", "infinity"] volumeDevices: - name: data devicePath: /dev/xvda securityContext: privileged: true capabilities: add: ["SYS_ADMIN"] lifecycle: postStart: exec: command: - /bin/sh - -c - | set -e mkdir /ceph checkformat=$(blkid | grep -w /dev/xvda | cut -d ':' -f1) [[ "$checkformat" != /dev/xvda ]] && (mkfs.xfs /dev/xvda && mount /dev/xvda /ceph) || mount /dev/xvda /ceph preStop: exec: command: - /bin/bash - -c - | umount -f /ceph volumes: - name: data persistentVolumeClaim: claimName: raw-block-pvc' > raw-block-deployment.yaml ``` ``` $ kubectl apply -f raw-block-deployment.yaml ``` ``` $ kubectl get pods -l os=alpine -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod-with-raw-block-volume-54bc4c8cbd-975jx 1/1 Running 0 7s 10.244.190.68 w1 <none> <none> ``` * 檢視已掛載的 Ceph Block Device ``` $ kubectl exec -it pod-with-raw-block-volume-54bc4c8cbd-975jx -- lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS loop0 7:0 0 1G 0 loop sda 8:0 0 30G 0 disk ├─sda1 8:1 0 1G 0 part ├─sda2 8:2 0 2G 0 part └─sda3 8:3 0 26.9G 0 part nbd0 43:0 0 0B 0 disk nbd1 43:32 0 0B 0 disk nbd2 43:64 0 0B 0 disk nbd3 43:96 0 0B 0 disk nbd4 43:128 0 0B 0 disk nbd5 43:160 0 0B 0 disk nbd6 43:192 0 0B 0 disk nbd7 43:224 0 0B 0 disk rbd0 251:0 0 1G 0 disk /ceph ``` * 測試 pod 刪除資料是否永存 ``` $ kubectl exec pod-with-raw-block-volume-54bc4c8cbd-975jx -- sh -c "echo 123 > /ceph/test" $ kubectl delete po pod-with-raw-block-volume-54bc4c8cbd-975jx ``` ``` $ kubectl exec pod-with-raw-block-volume-54bc4c8cbd-7b92s -- cat /ceph/test 123 ``` ## 參考 https://rook.io/docs/rook/latest/Getting-Started/quickstart/#tldr