# Install Rook Ceph on K8s
## 先決條件
0. 需要一座至少 3 個節點的 K8s ,這邊讓 control plane 也可以當 worker 使用
```
$ kubectl get no -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
m1 Ready control-plane 5d23h v1.34.1 172.20.1.50 <none> Ubuntu 24.04.3 LTS 6.8.0-88-generic cri-o://1.34.1
w1 Ready <none> 5d23h v1.34.1 172.20.1.51 <none> Ubuntu 24.04.3 LTS 6.8.0-88-generic cri-o://1.34.1
w2 Ready <none> 4m31s v1.34.1 172.20.1.52 <none> Ubuntu 24.04.3 LTS 6.8.0-88-generic cri-o://1.34.1
```
1. 每個節點的需要一顆 Raw Devices
```
# 確保所有叢集的節點都多了一顆 Raw Devices
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 30G 0 disk
├─sda1 8:1 0 1G 0 part /boot/efi
├─sda2 8:2 0 2G 0 part /boot
└─sda3 8:3 0 26.9G 0 part
└─ubuntu--vg-ubuntu--lv 252:0 0 13.5G 0 lvm /var/lib/containers/storage/overlay
/
sdb 8:16 0 200G 0 disk
```
2. 在每一台節點啟用 `rbd` 模組
```
$ echo "rbd" | sudo tee /etc/modules-load.d/rbd.conf
$ sudo modprobe rbd
$ lsmod |grep rbd
rbd 126976 0
libceph 544768 1 rbd
```
3. 確保 CPU、Memory 資源足夠,規模小的叢集 Memory 建議至少 64GB,[參考](https://docs.ceph.com/en/latest/start/hardware-recommendations/)
## 部署 Rook Ceph
```
$ git clone --single-branch --branch master https://github.com/rook/rook.git;cd rook/deploy/examples
```
```
$ kubectl create -f crds.yaml -f common.yaml -f csi-operator.yaml -f operator.yaml
```
* 指定每台節點的 `/dev/sdb` 硬碟給 rook ceph 使用
```
$ nano cluster.yaml
......
network:
connections:
......
provider: host # 改為 hostnetwork
......
storage: # cluster level storage configuration and selection
useAllNodes: true
useAllDevices: false
deviceFilter: "^sdb" # 指定硬碟名稱
$ kubectl create -f cluster.yaml
```
* 確認部署完畢
```
$ kubectl -n rook-ceph get cephcluster
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL FSID
rook-ceph /var/lib/rook 3 6m13s Ready Cluster created successfully HEALTH_OK b295be2e-5166-43dc-8b2f-2adcc9cd457b
$ kubectl -n rook-ceph get po -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ceph-csi-controller-manager-6fdd9498b7-8b9xc 1/1 Running 0 3m 10.244.190.65 w1 <none> <none>
rook-ceph-crashcollector-m1-c9bb6f65b-zjzdl 1/1 Running 0 38s 172.20.1.50 m1 <none> <none>
rook-ceph-crashcollector-w1-5f6d64f774-kxkvf 1/1 Running 0 36s 172.20.1.51 w1 <none> <none>
rook-ceph-crashcollector-w2-84b7f6c957-hzmkd 1/1 Running 0 70s 172.20.1.52 w2 <none> <none>
rook-ceph-exporter-m1-7894f94574-8k4ps 1/1 Running 0 35s 172.20.1.50 m1 <none> <none>
rook-ceph-exporter-w1-8d779d8d4-4fxmr 1/1 Running 0 33s 172.20.1.51 w1 <none> <none>
rook-ceph-exporter-w2-848fd9b9ff-qlhms 1/1 Running 0 69s 172.20.1.52 w2 <none> <none>
rook-ceph-mgr-a-6b84947bb4-b74lc 3/3 Running 0 88s 172.20.1.51 w1 <none> <none>
rook-ceph-mgr-b-d767d4cf9-49mt9 3/3 Running 0 86s 172.20.1.50 m1 <none> <none>
rook-ceph-mon-a-86f6d8579b-bb2zz 2/2 Running 0 2m18s 172.20.1.51 w1 <none> <none>
rook-ceph-mon-b-5bbbddffb-x8b9q 2/2 Running 0 2m10s 172.20.1.52 w2 <none> <none>
rook-ceph-mon-c-7d7ddc6fd8-pqqsm 2/2 Running 0 119s 172.20.1.50 m1 <none> <none>
rook-ceph-operator-5f77d65c8c-gpps7 1/1 Running 0 2m59s 10.244.190.66 w1 <none> <none>
rook-ceph-osd-0-974cb9b5d-rsc76 2/2 Running 0 38s 172.20.1.50 m1 <none> <none>
rook-ceph-osd-1-78f65c5847-b4vql 2/2 Running 0 39s 172.20.1.52 w2 <none> <none>
rook-ceph-osd-2-bddf54c8-8lt2c 2/2 Running 0 36s 172.20.1.51 w1 <none> <none>
rook-ceph-osd-prepare-m1-z5kdw 0/1 Completed 0 64s 10.244.202.1 m1 <none> <none>
rook-ceph-osd-prepare-w1-t7fpt 0/1 Completed 0 63s 10.244.190.70 w1 <none> <none>
rook-ceph-osd-prepare-w2-xgp5r 0/1 Completed 0 62s 10.244.80.198 w2 <none> <none>
rook-ceph.cephfs.csi.ceph.com-ctrlplugin-584f9c8447-7ccct 5/5 Running 0 2m23s 10.244.190.69 w1 <none> <none>
rook-ceph.cephfs.csi.ceph.com-ctrlplugin-584f9c8447-z8ljb 5/5 Running 0 2m23s 10.244.80.197 w2 <none> <none>
rook-ceph.cephfs.csi.ceph.com-nodeplugin-48vl2 2/2 Running 0 2m23s 172.20.1.50 m1 <none> <none>
rook-ceph.cephfs.csi.ceph.com-nodeplugin-kvs2l 2/2 Running 0 2m23s 172.20.1.52 w2 <none> <none>
rook-ceph.cephfs.csi.ceph.com-nodeplugin-rqhqf 2/2 Running 0 2m23s 172.20.1.51 w1 <none> <none>
rook-ceph.rbd.csi.ceph.com-ctrlplugin-77df7bd68c-mjhzx 5/5 Running 0 2m23s 10.244.190.68 w1 <none> <none>
rook-ceph.rbd.csi.ceph.com-ctrlplugin-77df7bd68c-nqw7h 5/5 Running 0 2m23s 10.244.80.196 w2 <none> <none>
rook-ceph.rbd.csi.ceph.com-nodeplugin-2cf79 2/2 Running 0 2m23s 172.20.1.52 w2 <none> <none>
rook-ceph.rbd.csi.ceph.com-nodeplugin-nq8dg 2/2 Running 0 2m23s 172.20.1.51 w1 <none> <none>
rook-ceph.rbd.csi.ceph.com-nodeplugin-xnxsq 2/2 Running 0 2m23s 172.20.1.50 m1 <none> <none>
```
* 部署 Ceph Toolbox,可以使用 CLI 操作 ceph
```
$ kubectl apply -f https://raw.githubusercontent.com/rook/rook/master/deploy/examples/toolbox.yaml
```
* 可以使用指令檢查 ceph 狀態
```
$ kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph status
cluster:
id: b295be2e-5166-43dc-8b2f-2adcc9cd457b
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 2m)
mgr: a(active, starting, since 21s), standbys: b
osd: 3 osds: 3 up (since 87s), 3 in (since 118s)
data:
pools: 1 pools, 1 pgs
objects: 2 objects, 449 KiB
usage: 81 MiB used, 600 GiB / 600 GiB avail
pgs: 1 active+clean
```
## 測試
### 測試 Block Storage(RBD)
* 先建立 Ceph Pool
```
$ kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd pool create kubernetes
# Use the rbd tool to initialize the pool
$ kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- rbd pool init kubernetes
# 設定 pool 最大容量為 10GB
$ kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd pool set-quota kubernetes max_bytes $((10 * 1024 * 1024 * 1024))
```
```
$ kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd pool get-quota kubernetes
quotas for pool 'kubernetes':
max objects: N/A
max bytes : 10 GiB (current num bytes: 7225379 bytes)
```
* 檢查現在有哪些 pool 可以使用
```
$ kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd pool ls
.mgr
kubernetes
```
* 建立 Ceph Client 身分驗證
* 在 Ceph 系統中建立一個專門的使用者帳號(kubernetes),並賦予它存取特定硬碟池 (Pool) 的權限
```
$ kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph auth get-or-create client.kubernetes mon 'profile rbd' osd 'profile rbd pool=kubernetes' mgr 'profile rbd pool=kubernetes'
[client.kubernetes]
key = AQAGvDdpTElCExAAxOBdAWq1xMLfTvVVGMqlbQ== # 這個就是待會要填的 userkey
```
```
$ kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph auth get-key client.kubernetes
AQAGvDdpTElCExAAxOBdAWq1xMLfTvVVGMqlbQ==
```

* 獲取 Ceph monitor 和 fsid 資訊
```
$ kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph mon dump
epoch 3
fsid b295be2e-5166-43dc-8b2f-2adcc9cd457b # 此行就是 clusterID
last_changed 2025-12-09T06:00:34.717245+0000
created 2025-12-09T06:00:13.104271+0000
min_mon_release 19 (squid)
election_strategy: 1
0: [v2:172.20.1.51:3300/0,v1:172.20.1.51:6789/0] mon.a
1: [v2:172.20.1.52:3300/0,v1:172.20.1.52:6789/0] mon.b
2: [v2:172.20.1.50:3300/0,v1:172.20.1.50:6789/0] mon.c
```
* 在要掛載 Rook Ceph RBD 的其他叢集設定與部屬 Ceph-CSI RBD plugins
```
$ kubectl get no -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
m1 Ready control-plane 3h8m v1.34.1 172.20.1.60 <none> Ubuntu 24.04.3 LTS 6.8.0-88-generic cri-o://1.34.1
w1 Ready <none> 16m v1.34.1 172.20.1.61 <none> Ubuntu 24.04.3 LTS 6.8.0-88-generic cri-o://1.34.1
w2 Ready <none> 15m v1.34.1 172.20.1.62 <none> Ubuntu 24.04.3 LTS 6.8.0-88-generic cri-o://1.34.1
$ cd ~ && git clone https://github.com/ceph/ceph-csi.git
```
* 建立並切換至 `csi-ceph` namespace
```
$ kubectl create ns csi-ceph
$ kubectl config set-context --current --namespace=csi-ceph
```
* 設定 Ceph-csi configmap
```
$ cd ~/ceph-csi/deploy/rbd/kubernetes
$ cat <<EOF > csi-config-map.yaml
---
apiVersion: v1
kind: ConfigMap
data:
config.json: |-
[
{
"clusterID": "b295be2e-5166-43dc-8b2f-2adcc9cd457b",
"monitors": [
"172.20.1.51:6789",
"172.20.1.52:6789",
"172.20.1.53:6789"
]
}
]
metadata:
name: ceph-csi-config
EOF
```
> 須設定 clusterID 和 monitors 的 IP Address
* 設定 csidriver 的 pod 不要 Mount host 主機的 `/etc/selinux` 到 pods 裡面
```
$ sed -i 's|seLinuxMount: true|seLinuxMount: false|g' csidriver.yaml
$ sed -i '/- mountPath: \/etc\/selinux/,+2d' csi-rbdplugin.yaml
$ sed -i '/- name: etc-selinux/,+2d' csi-rbdplugin.yaml
```
* 將所有 Yaml 中定義的物件都更換為 csi-ceph Namespace
```
$ sed -i 's|namespace: default|namespace: csi-ceph|g' *.yaml
```
* 設定 csi-rbdplugin-provisioner 和 csi-rbdplugin 的 Pod 能夠在 Contorlplane Node 上執行
```
$ sed -i '36i\ tolerations:\n - operator: Exists' csi-rbdplugin-provisioner.yaml
$ sed -i '24i\ tolerations:\n - operator: Exists' csi-rbdplugin.yaml
```
* 產生 CEPH-CSI cephx secret
```
$ cat <<EOF > ~/ceph-csi/examples/rbd/secret.yaml
---
apiVersion: v1
kind: Secret
metadata:
name: csi-rbd-secret
namespace: csi-ceph # change
stringData:
userID: kubernetes # change
userKey: AQAGvDdpTElCExAAxOBdAWq1xMLfTvVVGMqlbQ== # change
# Encryption passphrase
encryptionPassphrase: test_passphrase
EOF
```
> 須設定 namespace、userID 和 userKey 的值
* 建立 CEPH-CSI cephx secret
```
$ kubectl apply -f ~/ceph-csi/examples/rbd/secret.yaml
```
* 給予 csi-ceph 這個 Namespace 最高權限
```
$ kubectl label ns csi-ceph pod-security.kubernetes.io/enforce=privileged
```
> csi-rbdplugin 和 csi-rbdplugin-provisioner 的 Pod 會需要 privileged 的權限。
* 最新版本的 ceph-csi 還需要另一個 ConfigMap 物件來定義 Ceph 的設定資訊,以便新增至 CSI Container 內的 ceph.conf 檔案中
```
$ kubectl apply -f ~/ceph-csi/deploy/ceph-conf.yaml
```
* 開始部屬 ceph-csi
```
$ cd ~/ceph-csi/examples/rbd
$ ./plugin-deploy.sh ~/ceph-csi/deploy/rbd/kubernetes
## 移除 vault
$ kubectl delete -f ../kms/vault/vault.yaml
```
* 檢視 ceph-csi 部屬狀態
```
$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/csi-rbdplugin-lt295 3/3 Running 0 28m
pod/csi-rbdplugin-provisioner-57ff7fc887-j8gcj 7/7 Running 0 29m
pod/csi-rbdplugin-provisioner-57ff7fc887-s7rd4 7/7 Running 0 29m
pod/csi-rbdplugin-provisioner-57ff7fc887-vvdc4 7/7 Running 0 29m
pod/csi-rbdplugin-vrgpr 3/3 Running 0 16m
pod/csi-rbdplugin-x7kjh 3/3 Running 1 (15m ago) 16m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/csi-metrics-rbdplugin ClusterIP 10.96.233.216 <none> 8080/TCP 38m
service/csi-rbdplugin-provisioner ClusterIP 10.96.72.154 <none> 8080/TCP 38m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/csi-rbdplugin 3 3 3 3 3 <none> 38m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/csi-rbdplugin-provisioner 3/3 3 3 38m
NAME DESIRED CURRENT READY AGE
replicaset.apps/csi-rbdplugin-provisioner-57ff7fc887 3 3 3 38m
```
* 設定 StorageClass Yaml 檔
```
$ cat <<EOF > csi-rbd-sc.yaml
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: csi-rbd-sc
provisioner: rbd.csi.ceph.com
parameters:
clusterID: b295be2e-5166-43dc-8b2f-2adcc9cd457b # change
pool: kubernetes # change
imageFeatures: layering
csi.storage.k8s.io/provisioner-secret-name: csi-rbd-secret
csi.storage.k8s.io/provisioner-secret-namespace: csi-ceph
csi.storage.k8s.io/controller-expand-secret-name: csi-rbd-secret
csi.storage.k8s.io/controller-expand-secret-namespace: csi-ceph
csi.storage.k8s.io/node-stage-secret-name: csi-rbd-secret
csi.storage.k8s.io/node-stage-secret-namespace: csi-ceph
reclaimPolicy: Delete
allowVolumeExpansion: true
mountOptions:
- discard
EOF
```
> 須設定 clusterID 和 pool 的值
* 建立 StorageClass
```
$ kubectl apply -f csi-rbd-sc.yaml
```
* 檢視 StorageClass
```
$ kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
csi-rbd-sc rbd.csi.ceph.com Delete Immediate true 8s
```
* 切回 `default` namespace
```
$ kubectl config set-context --current --namespace=default
```
#### 驗收
* 設定 PVC
```
$ cat <<EOF > raw-block-pvc-rwo.yaml
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: raw-block-pvc
spec:
accessModes:
- ReadWriteOnce
volumeMode: Block
resources:
requests:
storage: 1Gi
storageClassName: csi-rbd-sc
EOF
```
> Note: Using ceph-csi, specifying Filesystem for volumeMode can support both ReadWriteOnce and ReadOnlyMany accessMode claims, and specifying Block for volumeMode can support ReadWriteOnce, ReadWriteMany, and ReadOnlyMany accessMode claims.
* 建立 PVC Yaml 檔
```
$ kubectl apply -f raw-block-pvc-rwo.yaml
```
```
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
raw-block-pvc Bound pvc-41c6f577-fee0-4d00-b1b3-55cafbb2943b 1Gi RWX csi-rbd-sc <unset> 43m
```
* 設定 Deployment Object Yaml 檔
```
$ echo 'apiVersion: apps/v1
kind: Deployment
metadata:
name: pod-with-raw-block-volume
labels:
os: alpine
spec:
replicas: 1
selector:
matchLabels:
os: alpine
template:
metadata:
labels:
os: alpine
spec:
containers:
- name: alpine
image: taiwanese/alpine:stable
imagePullPolicy: IfNotPresent
command: ["/bin/sleep", "infinity"]
volumeDevices:
- name: data
devicePath: /dev/xvda
securityContext:
privileged: true
capabilities:
add: ["SYS_ADMIN"]
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- |
set -e
mkdir /ceph
checkformat=$(blkid | grep -w /dev/xvda | cut -d ':' -f1)
[[ "$checkformat" != /dev/xvda ]] && (mkfs.xfs /dev/xvda && mount /dev/xvda /ceph) || mount /dev/xvda /ceph
preStop:
exec:
command:
- /bin/bash
- -c
- |
umount -f /ceph
volumes:
- name: data
persistentVolumeClaim:
claimName: raw-block-pvc' > raw-block-deployment.yaml
```
```
$ kubectl apply -f raw-block-deployment.yaml
```
```
$ kubectl get pods -l os=alpine -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod-with-raw-block-volume-54bc4c8cbd-975jx 1/1 Running 0 7s 10.244.190.68 w1 <none> <none>
```
* 檢視已掛載的 Ceph Block Device
```
$ kubectl exec -it pod-with-raw-block-volume-54bc4c8cbd-975jx -- lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 1G 0 loop
sda 8:0 0 30G 0 disk
├─sda1 8:1 0 1G 0 part
├─sda2 8:2 0 2G 0 part
└─sda3 8:3 0 26.9G 0 part
nbd0 43:0 0 0B 0 disk
nbd1 43:32 0 0B 0 disk
nbd2 43:64 0 0B 0 disk
nbd3 43:96 0 0B 0 disk
nbd4 43:128 0 0B 0 disk
nbd5 43:160 0 0B 0 disk
nbd6 43:192 0 0B 0 disk
nbd7 43:224 0 0B 0 disk
rbd0 251:0 0 1G 0 disk /ceph
```
* 測試 pod 刪除資料是否永存
```
$ kubectl exec pod-with-raw-block-volume-54bc4c8cbd-975jx -- sh -c "echo 123 > /ceph/test"
$ kubectl delete po pod-with-raw-block-volume-54bc4c8cbd-975jx
```
```
$ kubectl exec pod-with-raw-block-volume-54bc4c8cbd-7b92s -- cat /ceph/test
123
```
## 參考
https://rook.io/docs/rook/latest/Getting-Started/quickstart/#tldr