Longhorn - HackMD

# Longhorn * Longhorn 是 Rancher Labs 的創新開源項目，為 Kubernetes 提供可靠、輕量級且的分佈式塊存儲系統。 * 需使用 ext4 、 XFS 檔案系統。 ## 架構 ![image](https://hackmd.io/_uploads/SJPrikNyC.png) * Longhorn Manager Pod 作為 DaemonSet 在每個節點上運行，接受 UI 或是 k8s 發來的命令，負責在 k8s 中創建和管理 volume。 * 當要求 Longhorn Manager 建立 volume 時，他會在建立 volume 的節點上建立一個 Longhorn Engine 與 replica 他們都以 linux process 方式運作，並且依照宣告的 replica 數量在 node 上建立多個 replica。 * 多個 replica 確保了資料的高可用，即使某個 replica 或是 Longhorn Engine 損毀，都不會影響到 pod 對資料的存取。 * Longhorn Engine 會將跨多個節點上儲存的多個 replica 將資料同步複製存到各自的節點上。 * Longhorn 做寫入資料時，是一次寫入多個 replica 副本中。 ## 注意 * 在 longhorn 環境不要再使用 multipath 會有衝突 https://longhorn.io/kb/troubleshooting-volume-with-multipath/ ## longhorn 事前安裝準備 * 在每一台 Node 上，安裝 open-iscsi ``` $ kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.6.0/deploy/prerequisite/longhorn-iscsi-installation.yaml ``` * 檢查有沒有安裝成功 ``` $ kubectl logs -f -l app=longhorn-iscsi-installation -c iscsi-installation iscsi install successfully iscsi install successfully ``` * 在每一台 Node 上，安裝 NFSv4 client ``` $ kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.6.0/deploy/prerequisite/longhorn-nfs-installation.yaml ``` * 檢查有沒有安裝成功 ``` $ kubectl logs -l app=longhorn-nfs-installation -c nfs-installation nfs install successfully nfs install successfully ``` * 刪除 daemonset ``` $ kubectl delete ds longhorn-iscsi-installation longhorn-nfs-installation daemonset.apps "longhorn-iscsi-installation" deleted daemonset.apps "longhorn-nfs-installation" deleted ``` * 環境確認 ``` $ curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.6.0/scripts/environment_check.sh | bash [INFO] Required dependencies 'kubectl jq mktemp' are installed. [INFO] All nodes have unique hostnames. [INFO] Waiting for longhorn-environment-check pods to become ready (0/2)... [INFO] All longhorn-environment-check pods are ready (2/2). [INFO] MountPropagation is enabled [INFO] Checking iscsid... [INFO] Checking multipathd... [INFO] Checking packages... [INFO] Cleaning up longhorn-environment-check pods... [INFO] Cleanup completed. ``` ## 安裝 longhorn * 安裝 v1.6.0 版本 * 將 Replicas 數量調整至 2 份 ``` $ curl -s https://raw.githubusercontent.com/longhorn/longhorn/v1.6.0/deploy/longhorn.yaml | sed 's/numberOfReplicas: "3"/numberOfReplicas: "2"/' | kubectl apply -f - ``` ``` kubectl -n longhorn-system get all NAME READY STATUS RESTARTS AGE pod/csi-attacher-5bc78f548f-mqtv4 1/1 Running 10 (51m ago) 5d20h pod/csi-attacher-5bc78f548f-whn59 1/1 Running 7 (24h ago) 5d20h pod/csi-attacher-5bc78f548f-z92gh 1/1 Running 9 (24h ago) 5d20h pod/csi-provisioner-756d58684c-jtmpn 1/1 Running 7 (23h ago) 5d20h pod/csi-provisioner-756d58684c-lz5lh 1/1 Running 13 (51m ago) 5d20h pod/csi-provisioner-756d58684c-vhn7s 1/1 Running 7 (24h ago) 5d20h pod/csi-resizer-5d5969d6b-fkksm 1/1 Running 10 (24h ago) 5d20h pod/csi-resizer-5d5969d6b-jlpcv 1/1 Running 8 (24h ago) 5d20h pod/csi-resizer-5d5969d6b-qdrl8 1/1 Running 4 (24h ago) 5d20h pod/csi-snapshotter-8c8cf87cd-48pgk 1/1 Running 10 (24h ago) 5d20h pod/csi-snapshotter-8c8cf87cd-qb86t 1/1 Running 10 (24h ago) 5d20h pod/csi-snapshotter-8c8cf87cd-tsftq 1/1 Running 4 (24h ago) 5d20h pod/engine-image-ei-ce3e2479-54794 1/1 Running 2 (24h ago) 5d20h pod/engine-image-ei-ce3e2479-67r4v 1/1 Running 1 (24h ago) 5d20h pod/engine-image-ei-ce3e2479-drfsb 1/1 Running 1 (24h ago) 5d20h pod/engine-image-ei-ce3e2479-mg8xc 1/1 Running 2 (25h ago) 5d20h pod/instance-manager-2518c17315db00692be0b82bad3706ad 1/1 Running 0 24h pod/instance-manager-54866156070ae7574e58eefc61a02c5e 1/1 Running 0 24h pod/instance-manager-7987fbb3021c10acb318bc134060e9ac 1/1 Running 0 24h pod/instance-manager-d97fcfbc8d7410f90f23310ebbcfb0fd 1/1 Running 0 24h pod/longhorn-csi-plugin-5fkdt 3/3 Running 4 (24h ago) 5d20h pod/longhorn-csi-plugin-jpf4t 3/3 Running 8 (24h ago) 5d20h pod/longhorn-csi-plugin-kbl7n 3/3 Running 13 (24h ago) 5d20h pod/longhorn-csi-plugin-pp6lc 3/3 Running 16 (24h ago) 5d20h pod/longhorn-driver-deployer-679879d8cc-z28hs 1/1 Running 2 (24h ago) 5d20h pod/longhorn-manager-5rhg2 1/1 Running 1 (24h ago) 5d20h pod/longhorn-manager-gtj9h 1/1 Running 2 (25h ago) 5d20h pod/longhorn-manager-rms4x 1/1 Running 1 (24h ago) 5d20h pod/longhorn-manager-zjzpc 1/1 Running 2 (24h ago) 5d20h pod/longhorn-ui-854cb599d5-n6rgw 1/1 Running 3 (24h ago) 5d20h pod/longhorn-ui-854cb599d5-wpzlr 1/1 Running 1 (24h ago) 5d20h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/longhorn-admission-webhook ClusterIP 10.43.79.32 <none> 9502/TCP 21d service/longhorn-backend ClusterIP 10.43.79.88 <none> 9500/TCP 21d service/longhorn-conversion-webhook ClusterIP 10.43.21.70 <none> 9501/TCP 21d service/longhorn-engine-manager ClusterIP None <none> <none> 21d service/longhorn-frontend ClusterIP 10.43.112.41 <none> 80/TCP 21d service/longhorn-recovery-backend ClusterIP 10.43.6.133 <none> 9503/TCP 21d service/longhorn-replica-manager ClusterIP None <none> <none> 21d NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/engine-image-ei-ce3e2479 4 4 4 4 4 <none> 5d20h daemonset.apps/longhorn-csi-plugin 4 4 4 4 4 <none> 5d20h daemonset.apps/longhorn-manager 4 4 4 4 4 <none> 21d NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/csi-attacher 3/3 3 3 5d20h deployment.apps/csi-provisioner 3/3 3 3 5d20h deployment.apps/csi-resizer 3/3 3 3 5d20h deployment.apps/csi-snapshotter 3/3 3 3 5d20h deployment.apps/longhorn-driver-deployer 1/1 1 1 21d deployment.apps/longhorn-ui 2/2 2 2 21d NAME DESIRED CURRENT READY AGE replicaset.apps/csi-attacher-5bc78f548f 3 3 3 5d20h replicaset.apps/csi-provisioner-756d58684c 3 3 3 5d20h replicaset.apps/csi-resizer-5d5969d6b 3 3 3 5d20h replicaset.apps/csi-snapshotter-8c8cf87cd 3 3 3 5d20h replicaset.apps/longhorn-driver-deployer-5cfcddfd6c 0 0 0 21d replicaset.apps/longhorn-driver-deployer-679879d8cc 1 1 1 5d20h replicaset.apps/longhorn-driver-deployer-7bd5f75df8 0 0 0 21d replicaset.apps/longhorn-ui-665d8ffb55 0 0 0 21d replicaset.apps/longhorn-ui-854cb599d5 2 2 2 5d20h ``` ## 建立 Longhorn Volume * 建立 2Gi 大小的 pvc ``` apiVersion: v1 kind: PersistentVolumeClaim metadata: name: longhorn-rwo-pvc spec: accessModes: - ReadWriteOnce storageClassName: longhorn resources: requests: storage: 2Gi ``` ``` $ kubectl get pv,pvc NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE persistentvolume/pvc-538bb6d7-fac8-4a3b-a36d-5e53bc5b2eb4 2Gi RWO Delete Bound default/longhorn-rwo-pvc longhorn 23s NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/longhorn-rwo-pvc Bound pvc-538bb6d7-fac8-4a3b-a36d-5e53bc5b2eb4 2Gi RWO longhorn 26s ``` * 建立 Pod 使用 Longhorn Volume ``` apiVersion: v1 kind: Pod metadata: name: volume-rwo-test namespace: default spec: containers: - name: volume-test image: nginx:stable-alpine imagePullPolicy: IfNotPresent volumeMounts: - name: volv mountPath: /data ports: - containerPort: 80 volumes: - name: volv persistentVolumeClaim: claimName: longhorn-rwo-pvc ``` * 確認 longhorn 有兩個 replica，分別存放在 cilium-w3、cilium-m1 ![image](https://hackmd.io/_uploads/HJbcYj1J0.png) ## 驗證資料自動修復 ``` # 在 cilium-w3 下執行以下命令 $ ls -l /var/lib/longhorn/replicas/ total 0 drwx------ 1 root root 140 Mar 26 09:47 pvc-5c51fe8e-7129-4da1-b9b3-8c90c68a71d5-834d42f5 # 在 cilium-m1 下執行以下命令 $ ls -l /var/lib/longhorn/replicas/ total 0 drwx------ 1 root root 140 Mar 26 09:47 pvc-5c51fe8e-7129-4da1-b9b3-8c90c68a71d5-a6f25e62 ``` ``` # 在 cilium-m1 下執行以下命令 $ kubectl exec volume-rwo-test -- sh -c "echo haha > /data/test" $ kubectl exec volume-rwo-test -- cat /data/test haha $ kubectl get po volume-rwo-test -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES volume-test 1/1 Running 0 5m31s 10.42.1.84 cilium-w1 <none> <none> # 在 cilium-w3 下執行以下命令 $ sudo rm -r /var/lib/longhorn/replicas/pvc-5c51fe8e-7129-4da1-b9b3-8c90c68a71d5-834d42f5/ ``` ``` # 在 cilium-m1 下執行以下命令，檢查資料是否因缺少一個副本，而有損壞 $ kubectl exec volume-rwo-test -- cat /data/test haha 在 cilium-w3 下執行以下命令，確認副本已重建回來 $ ls -l /var/lib/longhorn/replicas/ total 0 drwx------ 1 root root 358 Mar 26 09:55 pvc-5c51fe8e-7129-4da1-b9b3-8c90c68a71d5-834d42f5 ``` * [註] 每次 Longhorn 偵測到一個副本掛掉，系統會自動進行快照，然後開始在節點上進行重建。 * [註] 如果是透過 UI 刪除 replica 會在不同節點上重建，如果是直接到 host 刪除 `/var/lib/longhorn/replicas/` 底下的目錄，會在同一個節點上重建。 ![image](https://hackmd.io/_uploads/ryPb3sky0.png) ## 測試檔案大小是否有限制 * 測試在 pvc 只允許 2Gi 的限制下寫入一個 3Gi 的檔案 ``` # 確認 pvc 可以限制檔案大小 $ kubectl exec -it volume-rwo-test -- sh -c "dd count=3k bs=1M if=/dev/zero of=/data/test3g.img" dd: error writing '/data/test3g.img': No space left on device 1929+0 records in 1928+0 records out command terminated with exit code 1 ``` * 檢查實際储存的空間只有 2.0G ``` # cilium-m1 主機檢查實際使用多少空間 $ sudo du -sh /var/lib/longhorn/replicas/pvc-5c51fe8e-7129-4da1-b9b3-8c90c68a71d5-a6f25e62 2.0G /var/lib/longhorn/replicas/pvc-5c51fe8e-7129-4da1-b9b3-8c90c68a71d5-a6f25e62 ``` * 如果從磁碟區中刪除了內容，Longhorn 磁碟區本身的大小將無法縮小。例如，如果建立 2 GB 的 volume，使用 2 GB，然後刪除 2 GB 的內容，則磁碟上的實際大小仍為 2 GB，而不是 0 GB。發生這種情況是因為 Longhorn 在區塊級別運行，而不是檔案系統級別。 ``` $ kubectl exec volume-rwo-test -- sh -c "rm /data/test3g.img" # 就算 pod 內資料刪除，在 host 查看還是會顯示使用 2.0G $ sudo du -sh /var/lib/longhorn/replicas/pvc-5c51fe8e-7129-4da1-b9b3-8c90c68a71d5-a6f25e62 2.0G /var/lib/longhorn/replicas/pvc-5c51fe8e-7129-4da1-b9b3-8c90c68a71d5-a6f25e62 ``` ## 測試 ReadWriteOnce 是否真的會限制 * 再次建立不同名字的 pod ，但使用同個 pvc，宣告要長在不同節點上 ``` apiVersion: v1 kind: Pod metadata: name: volume-rwo-test2 namespace: default spec: containers: - name: volume-test image: nginx:stable-alpine imagePullPolicy: IfNotPresent volumeMounts: - name: volv mountPath: /data ports: - containerPort: 80 volumes: - name: volv persistentVolumeClaim: claimName: longhorn-rwo-pvc nodeName: cilium-m1 ``` * volume2-test 會一直在 ContainerCreating 的階段，並且 describe 可以看到因為被 Multi-Attach 限制，代表 accessModes 會限制。 ``` $ kubectl get po -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES volume-rwo-test 1/1 Running 0 19m 10.42.1.84 cilium-w1 <none> <none> volume-rwo-test2 0/1 ContainerCreating 0 8s <none> cilium-m1 <none> <none> $ kubectl describe po volume-rwo-test2 ...... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedAttachVolume 44s attachdetach-controller Multi-Attach error for volume "pvc-5c51fe8e-7129-4da1-b9b3-8c90c68a71d5" Volume is already used by pod(s) volume-rwo-test $ kubectl delete po volume-rwo-test2 ``` * 將 volume-rwo-test2 宣告長在跟 volume-rwo-test 同一個節點上 ``` apiVersion: v1 kind: Pod metadata: name: volume-rwo-test2 namespace: default spec: containers: - name: volume-test image: nginx:stable-alpine imagePullPolicy: IfNotPresent volumeMounts: - name: volv mountPath: /data ports: - containerPort: 80 volumes: - name: volv persistentVolumeClaim: claimName: longhorn-rwo-pvc nodeName: cilium-w1 ``` * 確認 ReadWriteOnce 只要 pod 是在同一台 node 上都可以存取同個 pvc ``` $ kubectl get po -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES volume-rwo-test 1/1 Running 0 23m 10.42.1.84 cilium-w1 <none> <none> volume-rwo-test2 1/1 Running 0 6s 10.42.1.191 cilium-w1 <none> <none> $ kubectl exec volume-rwo-test2 -- cat /data/test haha ``` ## 手動掛載 Block devices * kubelet 實際上再做出 pod 並且要掛載 longhorn 做出的 volume 時，longhorn-engine 會再產生 pod 的那個 node 上做出 `/dev/longhorn/volume-name-xxxxx` 這個就是 Block devices，而 kubelet 再去掛載他。 ![image](https://hackmd.io/_uploads/SJkCm8EkR.png) ``` # 在 cilium-m1 檢查 /dev/longhorn/ 目錄 # ls -l /dev/longhorn/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8 brw-rw---- 1 root root 8, 32 Mar 29 09:40 /dev/longhorn/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8 # mount /dev/longhorn/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8 /mnt # ls -l /mnt total 20 drwx------ 2 root root 16384 Mar 29 09:32 lost+found -rw-r--r-- 1 root root 5 Mar 29 09:32 test # cat /mnt/test haha # umount /mnt ``` ## 驗證 longhorn-engine 是如何做出 Block devices * 在產生 pod 的 node 上檢視 longhorn-engine linux process。 * longhorn-engine 指向兩個 replica 來找到實際資料存储的位置 `--replica tcp://10.42.0.205:10001` `--replica tcp://10.42.3.217:10010`。 * longhorn-engine 會在產生 pod 的 node 上做出 Block devices。 ```! $ ps aux | grep -v grep | grep pvc-5d571f3e-303e-473b-b7e8-336096ad43b8 root 29754 0.5 0.1 1914824 30184 ? Sl 21:44 0:21 /engine-binaries/registry.rancher.com-rancher-mirrored-longhornio-longhorn-engine-v1.5.4/longhorn --engine-instance-name pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-e-0 controller pvc-5d571f3e-303e-473b-b7e8-336096ad43b8 --frontend tgt-blockdev --size 2147483648 --current-size 0 --engine-replica-timeout 8 --file-sync-http-client-timeout 30 --replica tcp://10.42.0.205:10001 --replica tcp://10.42.3.217:10010 --listen 0.0.0.0:10000 ``` ``` $ kubectl get po -A -owide | grep -E '10.42.0.205|10.42.3.217' longhorn-system instance-manager-2518c17315db00692be0b82bad3706ad 1/1 Running 0 4d13h 10.42.0.205 cilium-m1 <none> <none> longhorn-system instance-manager-d97fcfbc8d7410f90f23310ebbcfb0fd 1/1 Running 0 4d13h 10.42.3.217 cilium-w3 <none> <none> ``` ## 在 cilium-m1 檢視 replica linux process * replica 這個 process 如果掛了 longhorn-manager 會負責在把他建回來 ``` $ ps aux | grep -v grep | grep pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-6b0c4980 root 22805 0.1 0.2 2135764 24824 ? Sl 21:44 0:07 /host/var/lib/longhorn/engine-binaries/registry.rancher.com-rancher-mirrored-longhornio-longhorn-engine-v1.5.4/longhorn --volume-name pvc-5d571f3e-303e-473b-b7e8-336096ad43b8 replica /host/var/lib/longhorn/replicas/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-6b0c4980 --size 2147483648 --replica-instance-name pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-r-03dc6292 --sync-agent-port-count 7 --listen 0.0.0.0:10001 ``` * 環境清除 ``` $ kubectl delete po volume-rwo-test volume-rwo-test2 $ kubectl delete pvc longhorn-rwo-pvc ``` ## ReadWriteMany (RWX) Volume ### 環境準備 * 每一台節點都需要安裝 nfs client 套件 * 每個節點的主機名稱在 Kubernetes 叢集中都是唯一的。 ### 架構圖 * 當不同節點的 pod 掛載 RWX pvc 時，csi-plugin 會呼叫 Longhorn Manager 建立 volume 與 share-manager。 * 實際上使用 RWX pvc 的 pod 是透過 nfs 掛載到 share-manager 這個 pod，再由這個 pod 找到 longhorn 储存的 vloume。 * share-manager 這個 pod 是由 Longhorn Manager 管理，如果這個 pod 掛了 Longhorn Manager 會再建回來。 ![image](https://hackmd.io/_uploads/S1m3CC1yA.png) * 建立 RWX pvc ``` apiVersion: v1 kind: PersistentVolumeClaim metadata: name: longhorn-rwx-pvc spec: accessModes: - ReadWriteMany storageClassName: longhorn resources: requests: storage: 2Gi ``` * 建立兩個 pod 分別長在不同節點上，使用同個 pvc ``` apiVersion: v1 kind: Pod metadata: name: volume-rwx-test namespace: default labels: app: test spec: containers: - name: volume-test image: nginx:stable-alpine imagePullPolicy: IfNotPresent volumeMounts: - name: volv mountPath: /data ports: - containerPort: 80 volumes: - name: volv persistentVolumeClaim: claimName: longhorn-rwx-pvc affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - test topologyKey: kubernetes.io/hostname --- apiVersion: v1 kind: Pod metadata: name: volume-rwx-test2 namespace: default labels: app: test spec: containers: - name: volume-test image: nginx:stable-alpine imagePullPolicy: IfNotPresent volumeMounts: - name: volv mountPath: /data ports: - containerPort: 80 volumes: - name: volv persistentVolumeClaim: claimName: longhorn-rwx-pvc affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - test topologyKey: kubernetes.io/hostname ``` * 確認 pod 之間資料是否共享 ``` $ kubectl get po -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES volume-rwx-test 1/1 Running 0 42s 10.42.1.60 cilium-w1 <none> <none> volume-rwx-test2 1/1 Running 0 42s 10.42.2.188 cilium-w2 <none> <none> $ kubectl exec volume-rwx-test -- sh -c "echo haha > /data/test" $ kubectl exec volume-rwx-test2 -- sh -c "cat /data/test" haha ``` * 在 longhorn-system 會而外建立 share-manager 這個 pod ``` $ kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE longhorn-rwx-pvc Bound pvc-0357af88-4d68-412e-a77d-355af0c4f608 2Gi RWX longhorn 22m $ kubectl -n longhorn-system get po NAME READY STATUS RESTARTS AGE csi-attacher-5bc78f548f-mqtv4 1/1 Running 11 (20m ago) 6d csi-attacher-5bc78f548f-whn59 1/1 Running 7 (28h ago) 6d csi-attacher-5bc78f548f-z92gh 1/1 Running 9 (28h ago) 6d csi-provisioner-756d58684c-jtmpn 1/1 Running 7 (27h ago) 6d csi-provisioner-756d58684c-lz5lh 1/1 Running 14 (23m ago) 6d csi-provisioner-756d58684c-vhn7s 1/1 Running 7 (28h ago) 6d csi-resizer-5d5969d6b-fkksm 1/1 Running 10 (28h ago) 6d csi-resizer-5d5969d6b-jlpcv 1/1 Running 8 (28h ago) 6d csi-resizer-5d5969d6b-qdrl8 1/1 Running 4 (28h ago) 6d csi-snapshotter-8c8cf87cd-48pgk 1/1 Running 10 (28h ago) 6d csi-snapshotter-8c8cf87cd-qb86t 1/1 Running 10 (28h ago) 6d csi-snapshotter-8c8cf87cd-tsftq 1/1 Running 4 (28h ago) 6d engine-image-ei-ce3e2479-54794 1/1 Running 2 (28h ago) 6d engine-image-ei-ce3e2479-67r4v 1/1 Running 1 (28h ago) 6d engine-image-ei-ce3e2479-drfsb 1/1 Running 1 (28h ago) 6d engine-image-ei-ce3e2479-mg8xc 1/1 Running 2 (28h ago) 6d instance-manager-2518c17315db00692be0b82bad3706ad 1/1 Running 0 28h instance-manager-54866156070ae7574e58eefc61a02c5e 1/1 Running 0 28h instance-manager-7987fbb3021c10acb318bc134060e9ac 1/1 Running 0 28h instance-manager-d97fcfbc8d7410f90f23310ebbcfb0fd 1/1 Running 0 28h longhorn-csi-plugin-5fkdt 3/3 Running 4 (28h ago) 6d longhorn-csi-plugin-jpf4t 3/3 Running 8 (28h ago) 6d longhorn-csi-plugin-kbl7n 3/3 Running 13 (28h ago) 6d longhorn-csi-plugin-pp6lc 3/3 Running 16 (28h ago) 6d longhorn-driver-deployer-679879d8cc-z28hs 1/1 Running 2 (28h ago) 6d longhorn-manager-5rhg2 1/1 Running 1 (28h ago) 6d longhorn-manager-gtj9h 1/1 Running 2 (28h ago) 6d longhorn-manager-rms4x 1/1 Running 1 (28h ago) 6d longhorn-manager-zjzpc 1/1 Running 2 (28h ago) 6d longhorn-ui-854cb599d5-n6rgw 1/1 Running 3 (28h ago) 6d longhorn-ui-854cb599d5-wpzlr 1/1 Running 1 (28h ago) 6d share-manager-pvc-0357af88-4d68-412e-a77d-355af0c4f608 1/1 Running 0 16m ``` ### 測試檔案大小是否有限制 * 確認 pvc 可以限制檔案大小 ``` $ kubectl exec -it volume-rwx-test -- sh -c "dd count=3k bs=1M if=/dev/zero of=/data/test3g.img" dd: error writing '/data/test3g.img': No space left on device 2549+0 records in 2548+0 records out command terminated with exit code 1 ``` * 環境清除 ``` $ kubectl delete po volume-rwx-test volume-rwx-test2 $ kubectl delete pvc longhorn-rwx-pvc ``` ## longhorn 讀寫資料運作流程 * pod 寫資料時會寫入 Live data。在 snapshot 後寫入資料時如果是針對同個 block 資料區間寫入，資料會寫入到 Live data 並且 index 回重新指向到 Live data 的 block 資料區間。 * 當 pod 讀取資料時，會先從 Live Data 找資料，如果 Live Data 沒有會往上一個 snapshot 找，以此類推直到找到資料。 * 為了提高 read 的效能，longhorn 會維護一個 index，他會記錄所有資料存放差異的 block 位置讀取 block 的資料區間，而每一個 block 储存大小單位為 4KiB。 ![image](https://hackmd.io/_uploads/r1-sDgNk0.png) * 建立 pvc & pod ``` apiVersion: v1 kind: PersistentVolumeClaim metadata: name: longhorn-rwo-pvc spec: accessModes: - ReadWriteOnce storageClassName: longhorn resources: requests: storage: 2Gi --- apiVersion: v1 kind: Pod metadata: name: volume-rwo-test namespace: default spec: containers: - name: volume-test image: nginx:stable-alpine imagePullPolicy: IfNotPresent volumeMounts: - name: volv mountPath: /data ports: - containerPort: 80 volumes: - name: volv persistentVolumeClaim: claimName: longhorn-rwo-pvc ``` * 檢視 replica 存放位置 ![image](https://hackmd.io/_uploads/rym7UrN1C.png) * 在 cilium-w3 檢視 block 資料區間 ``` $ sudo ls -l /var/lib/longhorn/replicas/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-94ec09f1 total 99496 -rw------- 1 root root 4096 Mar 29 21:46 revision.counter -rw-r--r-- 1 root root 2147483648 Mar 29 21:46 volume-head-000.img -rw-r--r-- 1 root root 126 Mar 29 21:44 volume-head-000.img.meta -rw-r--r-- 1 root root 142 Mar 29 21:44 volume.meta ``` * 寫入一筆資料 ``` $ kubectl exec volume-rwo-test -- sh -c "echo haha >> /data/test" ``` ``` $ sudo filefrag -v /var/lib/longhorn/replicas/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-94ec09f1/volume-head-000.img Filesystem type is: 9123683e File size of /var/lib/longhorn/replicas/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-94ec09f1/volume-head-000.img is 2147483648 (524288 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 0.. 257: 7022200.. 7022457: 258: 1: 258.. 258: 7022459.. 7022459: 1: 7022458: 2: 265.. 265: 5555783.. 5555783: 1: 7022466: 3: 272.. 273: 5555784.. 5555785: 2: 4: 289.. 289: 6224085.. 6224085: 1: 5555801: 5: 290.. 545: 7022507.. 7022762: 256: 6224086: 6: 546.. 800: 7022842.. 7023096: 255: 7022763: 7: 801.. 1056: 7023259.. 7023514: 256: 7023097: ...... ``` * 透過 longhorn UI 點選 take snapshot ![image](https://hackmd.io/_uploads/ByR0IrEJ0.png) * snapshot 成功後會看到最新的快照資訊 ![image](https://hackmd.io/_uploads/ry7ZPBEyA.png) * 檢視已產生新的 block ``` $ sudo ls -l /var/lib/longhorn/replicas/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-94ec09f1 total 99504 -rw------- 1 root root 4096 Mar 29 21:48 revision.counter -rw-r--r-- 1 root root 2147483648 Mar 29 21:49 volume-head-001.img # Live Data -rw-r--r-- 1 root root 178 Mar 29 21:49 volume-head-001.img.meta -rw-r--r-- 1 root root 194 Mar 29 21:49 volume.meta -rw-r--r-- 1 root root 2147483648 Mar 29 21:48 volume-snap-74b0f1ba-510d-4459-aa6e-0a1a5062450d.img # Newest Snapshot -rw-r--r-- 1 root root 125 Mar 29 21:49 volume-snap-74b0f1ba-510d-4459-aa6e-0a1a5062450d.img.meta ``` * 檢視 pod 內的資料還存在 ``` $ kubectl exec volume-rwo-test -- cat /data/test haha ``` * 檢視 Live Data 的 block 資料區間 * 因為目前 Live Data 沒有寫入資料，所以 Live Data 的 block 資料區間是空的，代表現在看到的資料都是上一層 snapshot 的資料 ``` $ sudo filefrag -v /var/lib/longhorn/replicas/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-94ec09f1/volume-head-001.img Filesystem type is: 9123683e File size of /var/lib/longhorn/replicas/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-94ec09f1/volume-head-001.img is 2147483648 (524288 blocks of 4096 bytes) /var/lib/longhorn/replicas/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-94ec09f1/volume-head-001.img: 0 extents found ``` * 在寫入一筆資料 ``` $ kubectl exec volume-rwo-test -- sh -c "echo test >> /data/test" ``` * 檢視 pod 內的資料是完整的兩筆 ``` $ kubectl exec volume-rwo-test -- cat /data/test haha test ``` * 再次檢視 Live Data 的 block 資料區間，此時就有新的 block 存放區間，因此證明 longhorn 會做資料差異存放，類似於 Copy-on-Write ``` $ sudo filefrag -v /var/lib/longhorn/replicas/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-94ec09f1/volume-head-001.img Filesystem type is: 9123683e File size of /var/lib/longhorn/replicas/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-94ec09f1/volume-head-001.img is 2147483648 (524288 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 289.. 289: 6225454.. 6225454: 1: 289: 1: 33025.. 33025: 6225731.. 6225731: 1: 6258190: 2: 262186.. 262188: 6225451.. 6225453: 3: 6454892: 3: 262189.. 262191: 6225732.. 6225734: 3: 6225454: last /var/lib/longhorn/replicas/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-94ec09f1/volume-head-001.img: 4 extents found ``` ## Backup and Restore with NFS storage * 在 sles15 安裝 nfs server ``` $ sudo zypper -n install nfs-kernel-server (1/1) Installing: nfs-kernel-server-2.1.1-150500.20.2.x86_64 .....................................[done] $ sudo mkdir /opt/backup $ echo '/opt/backup 192.168.11.90/24(rw,sync,no_subtree_check,no_root_squash)' | sudo tee /etc/exports $ sudo systemctl enable --now nfs-server ``` * 建立測試用 pvc,pod ``` apiVersion: v1 kind: PersistentVolumeClaim metadata: name: longhorn-volv-pvc spec: accessModes: - ReadWriteOnce storageClassName: longhorn resources: requests: storage: 2Gi --- apiVersion: v1 kind: Pod metadata: name: volume-test namespace: default spec: containers: - name: volume-test image: nginx:stable-alpine imagePullPolicy: IfNotPresent volumeMounts: - name: volv mountPath: /data ports: - containerPort: 80 volumes: - name: volv persistentVolumeClaim: claimName: longhorn-volv-pvc ``` * 設定 Backup Target ![](https://hackmd.io/_uploads/SJPJd-Ws2.png) ![](https://hackmd.io/_uploads/rkoe_-bi2.png) * 備份 PV ![](https://hackmd.io/_uploads/rklGubWin.png) ![](https://hackmd.io/_uploads/Hk6mdWbjn.png) * 檢視是否 Backup 成功 ![](https://hackmd.io/_uploads/rk-H_Z-s2.png) ``` # 在 nfs server 執行以下命令 $ sudo ls -l /opt/backup/backupstore/volumes/a2/c0/pvc-f6d0eeee-f711-4376-aa48-43096863dc24 total 4 drwx------ 1 root root 68 Jul 22 21:13 backups drwx------ 1 root root 36 Jul 22 21:13 blocks -rw-r--r-- 1 root root 686 Jul 22 21:13 volume.cfg ``` * 刪除 Pod & PVC ``` $ kubectl delete pod volume-test pod "volume-test" deleted $ kubectl delete pvc longhorn-volv-pvc ``` * Restore PV ![](https://hackmd.io/_uploads/SkG2O-Wsn.png) * Number of Replicas 改成 2 ![](https://hackmd.io/_uploads/Sk8aOWbih.png) * 啟動 ![](https://hackmd.io/_uploads/SJOrtbbj3.png) ![](https://hackmd.io/_uploads/B1grFWZjh.png) ![](https://hackmd.io/_uploads/ryNVtZWo2.png) ![](https://hackmd.io/_uploads/ryOLYbWon.png) * 檢查 PV/PVC 是否恢復 ``` $ kubectl get pv,pvc NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE persistentvolume/pvc-ce8a7348-0281-4d0e-a808-440b527b1db5 2Gi RWO Retain Bound default/longhorn-volv-pvc longhorn-static 82s NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/longhorn-volv-pvc Bound pvc-ce8a7348-0281-4d0e-a808-440b527b1db5 2Gi RWO longhorn-static 82s ``` * 建立 Pod ，並檢視資料使否還在 ``` $ kubectl exec volume-test -- cat /data/test haha ``` * 環境清除 ``` $ kubectl delete po volume-test $ kubectl delete pvc longhorn-rwo-pvc ``` ## 測試 Block Volume * 建立 Block Volume pvc ``` apiVersion: v1 kind: PersistentVolumeClaim metadata: name: longhorn-block-vol spec: accessModes: - ReadWriteOnce volumeMode: Block storageClassName: longhorn resources: requests: storage: 3Gi ``` * 建立 pod，並自動 format ext4 格式 ``` apiVersion: apps/v1 kind: Deployment metadata: name: pod-with-raw-block-volume labels: os: alpine spec: replicas: 1 selector: matchLabels: os: alpine template: metadata: labels: os: alpine spec: containers: - name: alpine image: taiwanese/alpine:stable imagePullPolicy: IfNotPresent command: ["/bin/sleep", "infinity"] volumeDevices: - name: data devicePath: /dev/sdb securityContext: privileged: true lifecycle: postStart: exec: command: - /bin/sh - -c - | set -e mkdir /longhorn checkformat=$(blkid | grep -w /dev/sdb | cut -d ':' -f1) [[ "$checkformat" != /dev/sdb ]] && (mkfs.ext4 /dev/sdb && mount /dev/sdb /longhorn) || mount /dev/sdb /longhorn preStop: exec: command: - /bin/sh - -c - | umount /dev/sdb volumes: - name: data persistentVolumeClaim: claimName: longhorn-block-vol ``` ``` $ kubectl get po NAME READY STATUS RESTARTS AGE pod-with-raw-block-volume-bfd6df465-55ccw 1/1 Running 0 42s ``` * 進入 pod 檢視多了一個 sdb 硬碟 ``` $ kubectl exec -it pod-with-raw-block-volume-bfd6df465-55ccw -- bash bash-5.1# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS loop0 7:0 0 3G 0 loop sda 8:0 0 100G 0 disk ├─sda1 8:1 0 8M 0 part └─sda2 8:2 0 100G 0 part sdb 8:16 0 3G 0 disk /longhorn sr0 11:0 1 13.5G 0 rom bash-5.1# echo 123 > /longhorn/test bash-5.1# cat /longhorn/test 123 ``` * 測試 pod 移轉 node，資料是否永存 ``` $ kubectl scale deploy pod-with-raw-block-volume --replicas=0 $ kubectl cordon cilium-w1 $ kubectl scale deploy pod-with-raw-block-volume --replicas=1 $ kubectl get no NAME STATUS ROLES AGE VERSION cilium-m1 Ready control-plane,etcd,master,worker 12d v1.27.10+rke2r1 cilium-w1 Ready,SchedulingDisabled worker 54m v1.27.10+rke2r1 $ kubectl uncordon cilium-w1 $ kubectl exec -it pod-with-raw-block-volume-bfd6df465-n2tt8 -- cat /longhorn/test 123 ``` ## mysql 掛載 volume 測試 ``` apiVersion: v1 kind: Service metadata: name: mysql labels: app: mysql spec: ports: - port: 3306 selector: app: mysql --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: mysql-pvc spec: accessModes: - ReadWriteOnce storageClassName: longhorn resources: requests: storage: 2Gi --- apiVersion: apps/v1 kind: Deployment metadata: name: mysql labels: app: mysql spec: selector: matchLabels: app: mysql # has to match .spec.template.metadata.labels strategy: type: Recreate template: metadata: labels: app: mysql spec: restartPolicy: Always containers: - image: docker.io/taiwanese/mydb name: mysql livenessProbe: exec: command: - ls - /var/lib/mysql/lost+found initialDelaySeconds: 5 periodSeconds: 5 ports: - containerPort: 3306 name: mysql volumeMounts: - name: mysql-volume mountPath: /var/lib/mysql volumes: - name: mysql-volume persistentVolumeClaim: claimName: mysql-pvc ``` * mysql 帳號密碼皆是 bigred ``` $ kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 12d mysql ClusterIP 10.43.214.58 <none> 3306/TCP 4s $ mysql -u bigred -p -h 10.43.214.58 Enter password: bigred Welcome to the MariaDB monitor. Commands end with ; or \g. Your MySQL connection id is 8 Server version: 8.0.32 MySQL Community Server - GPL Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MySQL [(none)]> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | mysql | | performance_schema | | sys | +--------------------+ 4 rows in set (0.013 sec) ``` ## Replica Auto Balance * longhorn 預設配置 Replica Auto Balance 是關閉的，可以開啟這個功能讓 replicas 自動平衡在所有節點 - least-effort: 平均分配 replicas，以盡量減少冗餘。 - best-effort: 平均分配 replicas，以實現均衡冗餘。如果 longhorn 有足夠的節點可以平均 replicas，longhorn 會強制 re-schedule replicas。 ### 參考連結 https://www.server-world.info/en/note?os=SUSE_Linux_Enterprise_15&p=iscsi&f=2 https://blog.csdn.net/RancherLabs/article/details/127100126 ###### tags: `工具`