# Test Ceph RBD with Talos Kubernetes ## 目標 * 測試 Deployment Object 可不可以共享同一個 Ceph RBD ## 測試環境 * 已準備好 talos k8s 1m2w ``` $ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME andy-m1 Ready control-plane 3h2m v1.29.0 172.20.0.51 <none> Talos (v1.6.1) 6.1.69-talos containerd://1.7.11 andy-w1 Ready <none> 175m v1.29.0 172.20.0.52 <none> Talos (v1.6.1) 6.1.69-talos containerd://1.7.11 andy-w2 Ready <none> 175m v1.29.0 172.20.0.53 <none> Talos (v1.6.1) 6.1.69-talos containerd://1.7.11 ``` ``` $ kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints NAME TAINTS andy-m1 [map[effect:NoSchedule key:node-role.kubernetes.io/control-plane]] andy-w1 <none> andy-w2 <none> ``` > Talos OS version: v1.6.1 > K8S version: v1.29.0 > Ceph Version: 18.2.0 Reef (stable) > 使用 nodeName 可以無視節點上的 Taints,或是使用 toleration (容忍度),可以讓 Pod 容忍節點的 Taints ## ceph 設定 * 建立 Ceph Pool ``` $ ceph osd pool create kubernetes # Use the rbd tool to initialize the pool $ rbd pool init kubernetes # 設定 pool 最大容量為 10GB $ ceph osd pool set-quota kubernetes max_bytes $((10 * 1024 * 1024 * 1024)) ``` * 檢查現在有哪些 pool 可以使用 ``` $ ceph osd pool ls .mgr VMs cephfs_data cephfs_metadata k8sfs_data k8sfs_metadata OSCDC kubernetes ``` * 建立 Ceph Client 身分驗證 ``` $ ceph auth get-or-create client.kubernetes mon 'profile rbd' osd 'profile rbd pool=kubernetes' mgr 'profile rbd pool=kubernetes' [client.kubernetes] key = AQBHwpdlkLnxGRAAIRXPJ6ytaHZi1fLPjmOxkQ== ``` ![image](https://hackmd.io/_uploads/S1aiJ7Sup.png) ## 設定與部屬 Ceph-CSI RBD plugins * 在 Talos k8s 外部管理主機下載 ceph-csi ``` $ cd ~ && git clone https://github.com/ceph/ceph-csi.git ``` * 建立並切換至 csi-ceph namespace ``` $ kubectl create ns csi-ceph namespace/csi-ceph created $ kubectl config set-context --current --namespace=csi-ceph Context "admin@bobo" modified. ``` * 配置 ceph-csi 設定檔 - 獲取 Ceph monitor 和 fsid 資訊 - 在 Proxmox Ceph Node ( monitor ) 執行以下命令 : ``` $ ceph mon dump epoch 9 fsid af1d2e23-01ab-4d9c-a395-3bc77ec3fd72 # 此行就是 clusterID last_changed 2023-12-21T01:14:58.392961+0800 created 2023-11-23T01:44:42.944087+0800 min_mon_release 17 (quincy) election_strategy: 1 0: [v2:192.168.200.201:3300/0,v1:192.168.200.201:6789/0] mon.node1 1: [v2:192.168.200.202:3300/0,v1:192.168.200.202:6789/0] mon.node2 2: [v2:192.168.200.203:3300/0,v1:192.168.200.203:6789/0] mon.node3 dumped monmap epoch 9 ``` * 設定 Ceph-csi configmap - 在 Talos 外部管理主機執行以下命令 : ``` $ cd ~/ceph-csi/deploy/rbd/kubernetes $ cat <<EOF > csi-config-map.yaml --- apiVersion: v1 kind: ConfigMap data: config.json: |- [ { "clusterID": "af1d2e23-01ab-4d9c-a395-3bc77ec3fd72", "monitors": [ "192.168.200.201:6789", "192.168.200.202:6789", "192.168.200.203:6789" ] } ] metadata: name: ceph-csi-config EOF ``` > 須設定 clusterID 和 monitors 的 IP Address * 設定 csidriver 的 pod 不要 Mount host 主機的 /etc/selinux 到 pods 裡面 ``` $ sed -i 's|seLinuxMount: true|seLinuxMount: false|g' csidriver.yaml $ sed -i '/- mountPath: \/etc\/selinux/,+2d' csi-rbdplugin.yaml $ sed -i '/- name: etc-selinux/,+2d' csi-rbdplugin.yaml ``` * 將所有 Yaml 中定義的物件都更換為 csi-ceph Namespace ``` $ sed -i 's|namespace: default|namespace: csi-ceph|g' *.yaml ``` * 設定 csi-rbdplugin-provisioner 和 csi-rbdplugin 的 Pod 能夠在 Contorlplane Node 上執行 ``` $ sed -i '36i\ tolerations:\n - operator: Exists' csi-rbdplugin-provisioner.yaml $ sed -i '24i\ tolerations:\n - operator: Exists' csi-rbdplugin.yaml ``` * 產生 CEPH-CSI cephx secret ``` $ cat <<EOF > ~/ceph-csi/examples/rbd/secret.yaml --- apiVersion: v1 kind: Secret metadata: name: csi-rbd-secret namespace: csi-ceph # change stringData: userID: kubernetes # change userKey: AQBtn5dlJ6pVMRAAyGV/AYYmQzOSl9gHQ7rg3Q== # change # Encryption passphrase encryptionPassphrase: test_passphrase EOF ``` > 須設定 namespace、userID 和 userKey 的值 * 建立 CEPH-CSI cephx secret ``` $ kubectl apply -f ~/ceph-csi/examples/rbd/secret.yaml ``` * 部屬 ceph-csi - 設定 csi-ceph Namespace 中的 Pod 能夠擁有 privileged 權限 - 如果 talos k8s 已經設定 `admissionControl` 可以省略此命令 ``` $ kubectl label ns csi-ceph pod-security.kubernetes.io/enforce=privileged ``` > csi-rbdplugin 和 csi-rbdplugin-provisioner 的 Pod 會需要 privileged 的權限。 > Talos K8S PSA 預設所有的 Pod 不能有 privileged 的權限,此時可以透過幫 Namespace 貼上面這個 label ,就能夠覆蓋掉 K8S PSA 的設定。 * 最新版本的 ceph-csi 還需要另一個 ConfigMap 物件來定義 Ceph 的設定資訊,以便新增至 CSI Container 內的 ceph.conf 檔案中 ``` $ kubectl apply -f ~/ceph-csi/deploy/ceph-conf.yaml ``` * 開始部屬 ceph-csi ``` $ cd ~/ceph-csi/examples/rbd $ ./plugin-deploy.sh ~/ceph-csi/deploy/rbd/kubernetes ## 移除 vault $ kubectl delete -f ../kms/vault/vault.yaml ``` * 檢視 ceph-csi 部屬狀態 ``` $ kubectl get all NAME READY STATUS RESTARTS AGE pod/csi-rbdplugin-ksnpd 3/3 Running 1 (45s ago) 79s pod/csi-rbdplugin-lbdzj 3/3 Running 1 (44s ago) 79s pod/csi-rbdplugin-provisioner-86dfdb7b57-9hf7x 7/7 Running 0 79s pod/csi-rbdplugin-provisioner-86dfdb7b57-dz4jk 7/7 Running 1 (43s ago) 79s pod/csi-rbdplugin-provisioner-86dfdb7b57-g4nhh 7/7 Running 1 (43s ago) 79s pod/csi-rbdplugin-qn257 3/3 Running 1 (44s ago) 79s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/csi-metrics-rbdplugin ClusterIP 10.97.225.41 <none> 8080/TCP 79s service/csi-rbdplugin-provisioner ClusterIP 10.101.179.140 <none> 8080/TCP 79s NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/csi-rbdplugin 3 3 3 3 3 <none> 79s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/csi-rbdplugin-provisioner 3/3 3 3 79s NAME DESIRED CURRENT READY AGE replicaset.apps/csi-rbdplugin-provisioner-86dfdb7b57 3 3 3 79s ``` * 設定 StorageClass Yaml 檔 ``` $ cat <<EOF > csi-rbd-sc.yaml --- apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: csi-rbd-sc provisioner: rbd.csi.ceph.com parameters: clusterID: af1d2e23-01ab-4d9c-a395-3bc77ec3fd72 # change pool: kubernetes imageFeatures: layering csi.storage.k8s.io/provisioner-secret-name: csi-rbd-secret csi.storage.k8s.io/provisioner-secret-namespace: csi-ceph csi.storage.k8s.io/controller-expand-secret-name: csi-rbd-secret csi.storage.k8s.io/controller-expand-secret-namespace: csi-ceph csi.storage.k8s.io/node-stage-secret-name: csi-rbd-secret csi.storage.k8s.io/node-stage-secret-namespace: csi-ceph reclaimPolicy: Delete allowVolumeExpansion: true mountOptions: - discard EOF ``` > 須設定 clusterID 和 pool 的值 * 建立 StorageClass ``` $ kubectl apply -f csi-rbd-sc.yaml ``` * 檢視 StorageClass ``` $ kubectl get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE csi-rbd-sc rbd.csi.ceph.com Delete Immediate true 12s ``` ## 驗收 * 設定 PVC ``` $ cat <<EOF > raw-block-pvc-rwo.yaml --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: raw-block-pvc spec: accessModes: - ReadWriteOnce volumeMode: Block resources: requests: storage: 1Gi storageClassName: csi-rbd-sc EOF ``` > Note: Using ceph-csi, specifying Filesystem for volumeMode can support both ReadWriteOnce and ReadOnlyMany accessMode claims, and specifying Block for volumeMode can support ReadWriteOnce, ReadWriteMany, and ReadOnlyMany accessMode claims. > 目前測試 ReadWriteMany 有問題 * 建立 PVC Yaml 檔 ``` $ kubectl apply -f raw-block-pvc-rwo.yaml ``` ``` $ kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE raw-block-pvc Bound pvc-41c6f577-fee0-4d00-b1b3-55cafbb2943b 1Gi RWX csi-rbd-sc <unset> 43m ``` * 設定 Deployment Object Yaml 檔 ``` $ cat <<EOF > raw-block-deployment.yaml --- apiVersion: apps/v1 kind: Deployment metadata: name: pod-with-raw-block-volume labels: os: alpine spec: replicas: 1 selector: matchLabels: os: alpine template: metadata: labels: os: alpine spec: containers: - name: alpine image: taiwanese/alpine:stable imagePullPolicy: IfNotPresent command: ["/bin/sleep", "infinity"] volumeDevices: - name: data devicePath: /dev/xvda securityContext: capabilities: add: ["SYS_ADMIN"] lifecycle: postStart: exec: command: - /bin/sh - -c - | set -e mkdir /ceph checkformat=$(blkid | grep -w /dev/xvda | cut -d ':' -f1) [[ "$checkformat" != /dev/xvda ]] && (mkfs.xfs /dev/xvda && mount /dev/xvda /ceph) || mount /dev/xvda /ceph volumes: - name: data persistentVolumeClaim: claimName: raw-block-pvc EOF ``` ``` $ kubectl apply -f raw-block-deployment.yaml ``` ``` $ kubectl get pods -l os=alpine -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES pod-with-raw-block-volume-6577bb5c48-2w5vk 1/1 Running 0 5m51s 10.244.2.66 andy-w2 <none> <none> ``` * 檢視已掛載的 Ceph Block Device ``` $ kubectl exec -it pod-with-raw-block-volume-65c5dfb6c8-2w5vk -- lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS loop0 7:0 0 564K 1 loop loop1 7:1 0 53.5M 1 loop loop2 7:2 0 1G 0 loop sda 8:0 0 21.2G 0 disk ├─sda1 8:1 0 100M 0 part ├─sda2 8:2 0 1M 0 part ├─sda3 8:3 0 1000M 0 part ├─sda4 8:4 0 1M 0 part ├─sda5 8:5 0 100M 0 part └─sda6 8:6 0 20G 0 part /etc/resolv.conf /etc/hostname /dev/termination-log /etc/hosts rbd0 252:0 0 1G 0 disk /ceph ``` * 測試 pod 刪除資料是否永存 ``` $ kubectl exec pod-with-raw-block-volume-65c5dfb6c8-2w5vk -- sh -c "echo 123 > /ceph/test" $ kubectl delete po pod-with-raw-block-volume-65c5dfb6c8-2w5vk --force ``` ``` $ kubectl exec pod-with-raw-block-volume-65c5dfb6c8-cmgw9 -- cat /ceph/test 123 ``` ### 列出在 kubernetes pool 底下的 RBD images - 在 Ceph Node 執行以下命令 ``` $ rbd ls kubernetes csi-vol-aeb47446-7e34-48d9-8a5b-4d2bcf5fcb9d ``` * 查看 RBD Image 的詳細資訊 ``` $ rbd info kubernetes/csi-vol-aeb47446-7e34-48d9-8a5b-4d2bcf5fcb9d rbd image 'csi-vol-aeb47446-7e34-48d9-8a5b-4d2bcf5fcb9d': size 1 GiB in 256 objects order 22 (4 MiB objects) snapshot_count: 0 id: 7a58fc8d452398 block_name_prefix: rbd_data.7a58fc8d452398 format: 2 features: layering op_features: flags: create_timestamp: Mon Jan 8 15:16:14 2024 access_timestamp: Mon Jan 8 15:16:14 2024 modify_timestamp: Mon Jan 8 15:16:14 2024 ``` * 檢查 object ``` $ rados -p kubernetes ls|grep rbd_data.7a58fc8d452398 rbd_data.7a58fc8d452398.00000000000000e0 rbd_data.7a58fc8d452398.0000000000000020 rbd_data.7a58fc8d452398.0000000000000040 rbd_data.7a58fc8d452398.0000000000000060 rbd_data.7a58fc8d452398.0000000000000080 rbd_data.7a58fc8d452398.00000000000000ff rbd_data.7a58fc8d452398.00000000000000a0 rbd_data.7a58fc8d452398.0000000000000000 rbd_data.7a58fc8d452398.00000000000000c0 ``` ## 清除測試環境 ``` ## 在 Talos 外部管理主機執行以下命令 $ kubectl delete -f raw-block-deployment.yaml,raw-block-pvc-rwx.yaml $ kubectl delete -f csi-rbd-sc.yaml,secret.yaml $ kubectl delete -f ~/ceph-csi/deploy/ceph-conf.yaml $ ./plugin-teardown.sh ~/ceph-csi/deploy/rbd/kubernetes/ $ kubectl label ns csi-ceph pod-security.kubernetes.io/enforce- $ kubectl get all,configmap,secret NAME DATA AGE configmap/kube-root-ca.crt 1 20h $ kubectl config set-context --current --namespace=default $ kubectl delete ns csi-ceph $ cd ~ && rm -r ceph-csi/ ## 在 Ceph Node 執行以下命令 $ rbd -p kubernetes ls $ ceph auth rm client.kubernetes $ ceph osd pool rm kubernetes kubernetes --yes-i-really-really-mean-it ``` ## 連結 https://hackmd.io/@QI-AN/Test-Ceph-RBD-with-Talos-Kubernetes#Ceph-%E8%A8%AD%E5%AE%9A https://stackoverflow.com/questions/44140593/how-to-run-command-after-initialization