# Harvester CSI Driver * 如果要使用 rwx 功能需要在 harvester 1.4.0 以後才能使用。 ## 實作 * 在 Harvester cluster 建立 rwx storage class ``` kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: longhorn-rwx provisioner: driver.longhorn.io allowVolumeExpansion: true reclaimPolicy: Delete volumeBindingMode: Immediate parameters: numberOfReplicas: "1" staleReplicaTimeout: "2880" fromBackup: "" fsType: "ext4" nfsOptions: "rw,vers=4.2,noresvport,softerr,timeo=600,retrans=5" ``` * 在 Harvester cluster 執行以下指令 > `bash -s <serviceaccount name> <namespace>`,須注意自己的 guest cluster 要使用哪個 sa 創建,以及放在哪個 namespace 下。 ``` $ curl -sfL https://raw.githubusercontent.com/harvester/harvester-csi-driver/master/deploy/generate_addon_csi.sh | bash -s default default RKE2 ...... ########## cloud-config ############ apiVersion: v1 clusters: - cluster: certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJlakNDQVIrZ0F3SUJBZ0lCQURBS0JnZ3Foa2pPUFFRREFqQWtNU0l3SUFZRFZRUUREQmx5YTJVeUxYTmwKY25abGNpMWpZVUF4TnpNek1qQTRNak14TUI0WERUSTBNVEl3TXpBMk5ETTFNVm9YRFRNME1USXdNVEEyTkRNMQpNVm93SkRFaU1DQUdBMVVFQXd3WmNtdGxNaTF6WlhKMlpYSXRZMkZBTVRjek16SXdPREl6TVRCWk1CTUdCeXFHClNNNDlBZ0VHQ0NxR1NNNDlBd0VIQTBJQUJKakZHdnl1a1hBdjJWUmh0di9ZWHlMVklQQ3VJcFFwQkVBNXlXb08KRDl0aEN2R1dwM2V3bmkwbXRIcmlwNjJpb3h5UXNGdEFXK3NPTEphOFhDRkRLRytqUWpCQU1BNEdBMVVkRHdFQgovd1FFQXdJQ3BEQVBCZ05WSFJNQkFmOEVCVEFEQVFIL01CMEdBMVVkRGdRV0JCUTJJVVpZQmw3b1I1MnRSYkozCndWckRKdlQzWkRBS0JnZ3Foa2pPUFFRREFnTkpBREJHQWlFQXVUY0tpOWdqS1FHUEo0T1JmejM2ZmxWMURRK0YKUXExdzQ1bmh0MW5mYzVVQ0lRREJsa204enNPMVdiYlo0RGk2Nk1MQmVwMFpQNE0zTUhRRjRSazY0Zk85RFE9PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg== server: https://172.20.0.51:6443 name: default contexts: - context: cluster: default namespace: default user: default-default-default name: default-default-default current-context: default-default-default kind: Config preferences: {} users: - name: default-default-default user: token: eyJhbGciOiJSUzI1NiIsImtpZCI6IjBhMDBzeVlXaUp2ZUdDeHRBamV2b0twTmIxbjNmY0RMLWhtOEZZT19nREUifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJkZWZhdWx0Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6ImRlZmF1bHQtdG9rZW4iLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGVmYXVsdCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjRkNWM4YWM0LTc1NTMtNDY4My1hZmQ4LTIxMzgxMjU4MzVmOSIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDpkZWZhdWx0OmRlZmF1bHQifQ.MRlbkY6GspQPNfgFPqazt4OeLQB3DV5bVvFrj2AaUKSm0KryDebn0qsHDxZ0i0sp_skRLVTXqQifjOAnMcSCcmgNNz71_maSwn1mA3Fm7IuCCYyjUWFG-6Vxt1IUnd2MJ6LjLC85Xy0U1XBBIcgK2J0bhm7mqMQ5QOpQbDllwjGguIhZ5OdbtPxx_Bl0--ii8TE7btpeIrWc6Tku5_Uy01WzWrn3zEJX1Tb-lHnHKOzZiNO6jHdsjQSqlzZHq30U_kpcobpVlkfiFziUT6LqmhlqOBHvPhrHEQerz59Obwx55ZdcluJ-1zekvTgx7KIbK54LOc_wB-iSgsSmC8fIyA ########## cloud-init user data ############ write_files: - encoding: b64 content: YXBpVmVyc2lvbjogdjEKY2x1c3RlcnM6Ci0gY2x1c3RlcjoKICAgIGNlcnRpZmljYXRlLWF1dGhvcml0eS1kYXRhOiBMUzB0TFMxQ1JVZEpUaUJEUlZKVVNVWkpRMEZVUlMwdExTMHRDazFKU1VKbGFrTkRRVklyWjBGM1NVSkJaMGxDUVVSQlMwSm5aM0ZvYTJwUFVGRlJSRUZxUVd0TlUwbDNTVUZaUkZaUlVVUkVRbXg1WVRKVmVVeFlUbXdLWTI1YWJHTnBNV3BaVlVGNFRucE5lazFxUVRSTmFrMTRUVUkwV0VSVVNUQk5WRWwzVFhwQk1rNUVUVEZOVm05WVJGUk5NRTFVU1hkTlZFRXlUa1JOTVFwTlZtOTNTa1JGYVUxRFFVZEJNVlZGUVhkM1dtTnRkR3hOYVRGNldsaEtNbHBZU1hSWk1rWkJUVlJqZWsxNlNYZFBSRWw2VFZSQ1drMUNUVWRDZVhGSENsTk5ORGxCWjBWSFEwTnhSMU5OTkRsQmQwVklRVEJKUVVKS2FrWkhkbmwxYTFoQmRqSldVbWgwZGk5WldIbE1Wa2xRUTNWSmNGRndRa1ZCTlhsWGIwOEtSRGwwYUVOMlIxZHdNMlYzYm1rd2JYUkljbWx3TmpKcGIzaDVVWE5HZEVGWEszTlBURXBoT0ZoRFJrUkxSeXRxVVdwQ1FVMUJORWRCTVZWa1JIZEZRZ292ZDFGRlFYZEpRM0JFUVZCQ1owNVdTRkpOUWtGbU9FVkNWRUZFUVZGSUwwMUNNRWRCTVZWa1JHZFJWMEpDVVRKSlZWcFpRbXczYjFJMU1uUlNZa296Q25kV2NrUktkbFF6V2tSQlMwSm5aM0ZvYTJwUFVGRlJSRUZuVGtwQlJFSkhRV2xGUVhWVVkwdHBPV2RxUzFGSFVFbzBUMUptZWpNMlpteFdNVVJSSzBZS1VYRXhkelExYm1oME1XNW1ZelZWUTBsUlJFSnNhMjA0ZW5OUE1WZGlZbG8wUkdrMk5rMU1RbVZ3TUZwUU5FMHpUVWhSUmpSU2F6WTBaazg1UkZFOVBRb3RMUzB0TFVWT1JDQkRSVkpVU1VaSlEwRlVSUzB0TFMwdENnPT0KICAgIHNlcnZlcjogaHR0cHM6Ly8xNzIuMjAuMC41MTo2NDQzCiAgbmFtZTogZGVmYXVsdApjb250ZXh0czoKLSBjb250ZXh0OgogICAgY2x1c3RlcjogZGVmYXVsdAogICAgbmFtZXNwYWNlOiBkZWZhdWx0CiAgICB1c2VyOiBkZWZhdWx0LWRlZmF1bHQtZGVmYXVsdAogIG5hbWU6IGRlZmF1bHQtZGVmYXVsdC1kZWZhdWx0CmN1cnJlbnQtY29udGV4dDogZGVmYXVsdC1kZWZhdWx0LWRlZmF1bHQKa2luZDogQ29uZmlnCnByZWZlcmVuY2VzOiB7fQp1c2VyczoKLSBuYW1lOiBkZWZhdWx0LWRlZmF1bHQtZGVmYXVsdAogIHVzZXI6CiAgICB0b2tlbjogZXlKaGJHY2lPaUpTVXpJMU5pSXNJbXRwWkNJNklqQmhNREJ6ZVZsWGFVcDJaVWREZUhSQmFtVjJiMHR3VG1JeGJqTm1ZMFJNTFdodE9FWlpUMTluUkVVaWZRLmV5SnBjM01pT2lKcmRXSmxjbTVsZEdWekwzTmxjblpwWTJWaFkyTnZkVzUwSWl3aWEzVmlaWEp1WlhSbGN5NXBieTl6WlhKMmFXTmxZV05qYjNWdWRDOXVZVzFsYzNCaFkyVWlPaUprWldaaGRXeDBJaXdpYTNWaVpYSnVaWFJsY3k1cGJ5OXpaWEoyYVdObFlXTmpiM1Z1ZEM5elpXTnlaWFF1Ym1GdFpTSTZJbVJsWm1GMWJIUXRkRzlyWlc0aUxDSnJkV0psY201bGRHVnpMbWx2TDNObGNuWnBZMlZoWTJOdmRXNTBMM05sY25acFkyVXRZV05qYjNWdWRDNXVZVzFsSWpvaVpHVm1ZWFZzZENJc0ltdDFZbVZ5Ym1WMFpYTXVhVzh2YzJWeWRtbGpaV0ZqWTI5MWJuUXZjMlZ5ZG1salpTMWhZMk52ZFc1MExuVnBaQ0k2SWpSa05XTTRZV00wTFRjMU5UTXRORFk0TXkxaFptUTRMVEl4TXpneE1qVTRNelZtT1NJc0luTjFZaUk2SW5ONWMzUmxiVHB6WlhKMmFXTmxZV05qYjNWdWREcGtaV1poZFd4ME9tUmxabUYxYkhRaWZRLk1SbGJrWTZHc3BRUE5mZ0ZQcWF6dDRPZUxRQjNEVjViVnZGcmoyQWFVS1NtMEtyeURlYm4wcXNIRHhaMGkwc3Bfc2tSTFZUWHFRaWZqT0FuTWNTQ2NtZ05OejcxX21hU3duMW1BM0ZtN0l1Q0NZeWpVV0ZHLTZWeHQxSVVuZDJNSjZMakxDODVYeTBVMVhCQkljZ0sySjBiaG03bXFNUTVRT3BRYkRsbHdqR2d1SWhaNU9kYnRQeHhfQmwwLS1paThURTdidHBlSXJXYzZUa3U1X1V5MDFXeldybjN6RUpYMVRiLWxIbkhLT3paaU5PNmpIZHNqUVNxbHpaSHEzMFVfa3Bjb2JwVmxrZmlGemlVVDZMcW1obHFPQkh2UGhySEVRZXJ6NTlPYnd4NTVaZGNsdUotMXpla3ZUZ3g3S0liSzU0TE9jX3dCLWlTZ3NTbUM4Zkl5QQo= owner: root:root path: /var/lib/rancher/rke2/etc/config-files/cloud-provider-config permissions: '0644' ``` * 創建 rke2 Guest cluster ,在 User Data: 的 #cloud-config 新增上一個步驟的 cloud-init user data 輸出,設定好所有參數後創建一個 rke2 叢集 ![image](https://hackmd.io/_uploads/S1REp9DHkx.png) * Cloud Provider 選則 Harvester ![image](https://hackmd.io/_uploads/rkfAKCwBke.png) ## 進到 Guest cluster * 部屬好後可以在 kube-system namespace 看到 pod 資源 ``` $ kubectl -n kube-system get po | grep harvester harvester-csi-driver-controllers-7dc9d4668d-b4b99 3/3 Running 0 4m50s harvester-csi-driver-controllers-7dc9d4668d-b9vjp 3/3 Running 0 4m50s harvester-csi-driver-controllers-7dc9d4668d-pd74r 3/3 Running 0 4m50s harvester-csi-driver-jrcdp 2/2 Running 0 4m50s ``` ``` $ kubectl get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE harvester (default) driver.harvesterhci.io Delete Immediate true 6m10s ``` * 在每一台 Node 上,安裝 NFSv4 client ``` $ kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.6.0/deploy/prerequisite/longhorn-nfs-installation.yaml # 每個節點開啟 nfs client 服務 $ systemctl enable --now nfs-client.target ``` ## 在 Guest cluster 驗證 ### 測試 rwo pvc * 建立 2Gi 大小的 pvc ``` $ echo 'apiVersion: v1 kind: PersistentVolumeClaim metadata: name: longhorn-rwo-pvc spec: accessModes: - ReadWriteOnce storageClassName: harvester resources: requests: storage: 2Gi' | kubectl apply -f - ``` ``` $ kubectl get pv,pvc NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS VOLUMEATTRIBUTESCLASS REASON AGE persistentvolume/pvc-81308977-1f69-4b57-a624-29fec03f3b88 2Gi RWO Delete Bound default/longhorn-rwo-pvc harvester <unset> 92s NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE persistentvolumeclaim/longhorn-rwo-pvc Bound pvc-81308977-1f69-4b57-a624-29fec03f3b88 2Gi RWO harvester <unset> 92s ``` * 建立 Pod 使用 Longhorn Volume ``` $ echo 'apiVersion: v1 kind: Pod metadata: name: volume-rwo-test namespace: default spec: containers: - name: volume-test image: nginx:stable-alpine imagePullPolicy: IfNotPresent volumeMounts: - name: volv mountPath: /data ports: - containerPort: 80 volumes: - name: volv persistentVolumeClaim: claimName: longhorn-rwo-pvc' | kubectl apply -f - ``` * 驗證 pod 可以掛載 ``` $ kubectl get po NAME READY STATUS RESTARTS AGE volume-rwo-test 1/1 Running 0 80s ``` * 在 harvester 頁面可以看到是使用本地的 new-sc ![image](https://hackmd.io/_uploads/HknzIjDHkg.png) ### 驗證 rwx pvc * 為了要讓 hervester 的 longhorn 可以掛載到 guest cluster,guest cluster 的每個節點都需要再加一張 Management Network。 ![image](https://hackmd.io/_uploads/BJxsMgOBke.png) * 設定第二張網卡 eth1 網卡為 dhcp ![image](https://hackmd.io/_uploads/Sk_9vS_B1l.png) * eth1 網卡是為了可以跟 harvester k8s cluster 溝通 ``` $ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether f6:e3:fa:29:fc:0c brd ff:ff:ff:ff:ff:ff altname enp1s0 inet 172.20.1.43/16 brd 172.20.255.255 scope global eth0 valid_lft forever preferred_lft forever 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UP group default qlen 1000 link/ether e6:70:89:2d:59:92 brd ff:ff:ff:ff:ff:ff altname enp2s0 inet 10.0.2.2/24 brd 10.0.2.255 scope global eth1 valid_lft forever preferred_lft forever ``` * 建立 rwx 使用的 storage class ``` $ echo 'allowVolumeExpansion: false apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: rwx-sc parameters: hostStorageClass: longhorn-rwx provisioner: driver.harvesterhci.io reclaimPolicy: Delete volumeBindingMode: Immediate' | kubectl apply -f - ``` ``` $ kubectl get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE harvester (default) driver.harvesterhci.io Delete Immediate true 19m rwx-sc driver.harvesterhci.io Delete Immediate false 6s ``` * 建立一個 rwx pvc ``` $ echo 'apiVersion: v1 kind: PersistentVolumeClaim metadata: name: longhorn-rwx-pvc spec: accessModes: - ReadWriteMany storageClassName: rwx-sc resources: requests: storage: 2Gi' | kubectl apply -f - ``` ``` $ kubectl get pvc,pv NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE persistentvolumeclaim/longhorn-rwx-pvc Bound pvc-6dd711c7-fce5-42c6-a9f8-56224bab0c33 2Gi RWX rwx-sc <unset> 3m7s NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS VOLUMEATTRIBUTESCLASS REASON AGE persistentvolume/pvc-6dd711c7-fce5-42c6-a9f8-56224bab0c33 2Gi RWX Delete Bound default/longhorn-rwx-pvc rwx-sc <unset> 3m2s ``` ``` $ echo 'apiVersion: v1 kind: Pod metadata: name: volume-rwx-test1 namespace: default labels: app: test spec: containers: - name: volume-test image: nginx:stable-alpine imagePullPolicy: IfNotPresent volumeMounts: - name: volv mountPath: /data ports: - containerPort: 80 volumes: - name: volv persistentVolumeClaim: claimName: longhorn-rwx-pvc affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - test topologyKey: kubernetes.io/hostname --- apiVersion: v1 kind: Pod metadata: name: volume-rwx-test2 namespace: default labels: app: test spec: containers: - name: volume-test image: nginx:stable-alpine imagePullPolicy: IfNotPresent volumeMounts: - name: volv mountPath: /data ports: - containerPort: 80 volumes: - name: volv persistentVolumeClaim: claimName: longhorn-rwx-pvc affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - test topologyKey: kubernetes.io/hostname' | kubectl apply -f - ``` * 驗證 pod 可以掛載 rwx pvc ``` $ kubectl get po -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES volume-rwx-test1 1/1 Running 0 2m41s 10.42.71.202 hvx-rke2-pool2-lrhlt-b7kfj <none> <none> volume-rwx-test2 1/1 Running 0 2m41s 10.42.102.40 hvx-rke2-pool1-fsw49-p7q7m <none> <none> $ kubectl exec volume-rwx-test1 -- sh -c "echo 123 > /data/test" $ kubectl exec volume-rwx-test2 -- cat /data/test 123 ``` * 在 harvester 頁面可以看到是使用本地的 longhorn-rwx ![image](https://hackmd.io/_uploads/Hk4TIL_HJg.png) * 可以看到 harvester 是透過 svc 掛載 nfs 給 guest cluster,因此 guest cluster 需要可以直接跟 harvester k8s cluster 溝通。 ``` $ mount | grep nfs rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime) 10.53.155.127:/pvc-af59efa8-7b67-43a9-b7fc-c7894d0305d4 on /var/lib/kubelet/plugins/kubernetes.io/csi/driver.harvesterhci.io/23e8d19c0115f9217b6da50b2d42eaab3f539673636791b43c517cd08aea5762/globalmount type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,softerr,softreval,noresvport,proto=tcp,timeo=600,retrans=5,sec=sys,clientaddr=172.20.1.42,local_lock=none,addr=10.53.155.127) 10.53.155.127:/pvc-af59efa8-7b67-43a9-b7fc-c7894d0305d4 on /var/lib/kubelet/pods/2e31ef84-3321-4d30-b125-c4c0ab4262b2/volumes/kubernetes.io~csi/pvc-6d8fd6e9-12e2-4539-a89d-f65151c704eb/mount type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,softerr,softreval,noresvport,proto=tcp,timeo=600,retrans=5,sec=sys,clientaddr=172.20.1.42,local_lock=none,addr=10.53.155.127) ``` ## 故障排除 * 如果持續遇到無法掛載的問題可以排查這個 pod ``` $ kubectl -n kube-system logs harvester-csi-driver-xxxx -c harvester-csi-driver ``` * 如果出現以下報錯,代表在 Harvester cluster 的 sa 並沒有足夠的權限給他看到 volume 資源 ``` time="2025-05-26T06:25:11Z" level=error msg="GRPC error: rpc error: code = DeadlineExceeded desc = Failed to wait the volume pvc-5de61560-f6bd-40b5-80ae-991306f6b290 status to settled" time="2025-05-26T06:25:11Z" level=info msg="GRPC call: /csi.v1.Controller/ControllerPublishVolume request: {\"node_id\":\"k3s-pool1-x8msd-dsptv\",\"volume_capability\":{\"AccessType\":{\"Mount\":{\"fs_type\":\"ext4\"}},\"access_mode\":{\"mode\":1}},\"volume_context\":{\"storage.kubernetes.io/csiProvisionerIdentity\":\"1748240288072-8081-driver.harvesterhci.io\"},\"volume_id\":\"pvc-5de61560-f6bd-40b5-80ae-991306f6b290\"}" time="2025-05-26T06:25:11Z" level=info msg="ControllerServer ControllerPublishVolume req: volume_id:\"pvc-5de61560-f6bd-40b5-80ae-991306f6b290\" node_id:\"k3s-pool1-x8msd-dsptv\" volume_capability:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1748240288072-8081-driver.harvesterhci.io\" > " time="2025-05-26T06:25:11Z" level=warning msg="waitForVolumeSettled: error while waiting for volume pvc-2f009559-0085-4838-a230-5beaa53f4389 to be settled. Err: volumes.longhorn.io \"pvc-2f009559-0085-4838-a230-5beaa53f4389\" is forbidden: User \"system:serviceaccount:default:default\" cannot get resource \"volumes\" in API group \"longhorn.io\" in the namespace \"longhorn-system\"" time="2025-05-26T06:25:11Z" level=error msg="GRPC error: rpc error: code = DeadlineExceeded desc = Failed to wait the volume pvc-5de61560-f6bd-40b5-80ae-991306f6b290 status to settled" ``` * 在 Harvester cluster 建立以下 RBAC 規則,根據報錯替換不同的 sa。 ``` apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: longhorn-volume-access namespace: longhorn-system rules: - apiGroups: ["longhorn.io"] resources: ["volumes"] verbs: ["get", "list", "watch"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: allow-default-sa-access-longhorn namespace: longhorn-system subjects: - kind: ServiceAccount name: default namespace: default roleRef: kind: Role name: longhorn-volume-access apiGroup: rbac.authorization.k8s.io ``` ## 參考 https://docs.harvesterhci.io/v1.4/rancher/csi-driver https://github.com/harvester/harvester/issues/1992