# Longhorn
* Longhorn 是 Rancher Labs 的創新開源項目,為 Kubernetes 提供可靠、輕量級且的分佈式塊存儲系統。
* 需使用 ext4 、 XFS 檔案系統。
## 架構

* Longhorn Manager Pod 作為 DaemonSet 在每個節點上運行,接受 UI 或是 k8s 發來的命令,負責在 k8s 中創建和管理 volume。
* 當要求 Longhorn Manager 建立 volume 時,他會在建立 volume 的節點上建立一個 Longhorn Engine 與 replica 他們都以 linux process 方式運作,並且依照宣告的 replica 數量在 node 上建立多個 replica。
* 多個 replica 確保了資料的高可用,即使某個 replica 或是 Longhorn Engine 損毀,都不會影響到 pod 對資料的存取。
* Longhorn Engine 會將跨多個節點上儲存的多個 replica 將資料同步複製存到各自的節點上。
* Longhorn 做寫入資料時,是一次寫入多個 replica 副本中。
## 注意
* 在 longhorn 環境不要再使用 multipath 會有衝突
https://longhorn.io/kb/troubleshooting-volume-with-multipath/
## longhorn 事前安裝準備
* 在每一台 Node 上,安裝 open-iscsi
```
$ kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.6.0/deploy/prerequisite/longhorn-iscsi-installation.yaml
```
* 檢查有沒有安裝成功
```
$ kubectl logs -f -l app=longhorn-iscsi-installation -c iscsi-installation
iscsi install successfully
iscsi install successfully
```
* 在每一台 Node 上,安裝 NFSv4 client
```
$ kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.6.0/deploy/prerequisite/longhorn-nfs-installation.yaml
```
* 檢查有沒有安裝成功
```
$ kubectl logs -l app=longhorn-nfs-installation -c nfs-installation
nfs install successfully
nfs install successfully
```
* 刪除 daemonset
```
$ kubectl delete ds longhorn-iscsi-installation longhorn-nfs-installation
daemonset.apps "longhorn-iscsi-installation" deleted
daemonset.apps "longhorn-nfs-installation" deleted
```
* 環境確認
```
$ curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.6.0/scripts/environment_check.sh | bash
[INFO] Required dependencies 'kubectl jq mktemp' are installed.
[INFO] All nodes have unique hostnames.
[INFO] Waiting for longhorn-environment-check pods to become ready (0/2)...
[INFO] All longhorn-environment-check pods are ready (2/2).
[INFO] MountPropagation is enabled
[INFO] Checking iscsid...
[INFO] Checking multipathd...
[INFO] Checking packages...
[INFO] Cleaning up longhorn-environment-check pods...
[INFO] Cleanup completed.
```
## 安裝 longhorn
* 安裝 v1.6.0 版本
* 將 Replicas 數量調整至 2 份
```
$ curl -s https://raw.githubusercontent.com/longhorn/longhorn/v1.6.0/deploy/longhorn.yaml | sed 's/numberOfReplicas: "3"/numberOfReplicas: "2"/' | kubectl apply -f -
```
```
kubectl -n longhorn-system get all
NAME READY STATUS RESTARTS AGE
pod/csi-attacher-5bc78f548f-mqtv4 1/1 Running 10 (51m ago) 5d20h
pod/csi-attacher-5bc78f548f-whn59 1/1 Running 7 (24h ago) 5d20h
pod/csi-attacher-5bc78f548f-z92gh 1/1 Running 9 (24h ago) 5d20h
pod/csi-provisioner-756d58684c-jtmpn 1/1 Running 7 (23h ago) 5d20h
pod/csi-provisioner-756d58684c-lz5lh 1/1 Running 13 (51m ago) 5d20h
pod/csi-provisioner-756d58684c-vhn7s 1/1 Running 7 (24h ago) 5d20h
pod/csi-resizer-5d5969d6b-fkksm 1/1 Running 10 (24h ago) 5d20h
pod/csi-resizer-5d5969d6b-jlpcv 1/1 Running 8 (24h ago) 5d20h
pod/csi-resizer-5d5969d6b-qdrl8 1/1 Running 4 (24h ago) 5d20h
pod/csi-snapshotter-8c8cf87cd-48pgk 1/1 Running 10 (24h ago) 5d20h
pod/csi-snapshotter-8c8cf87cd-qb86t 1/1 Running 10 (24h ago) 5d20h
pod/csi-snapshotter-8c8cf87cd-tsftq 1/1 Running 4 (24h ago) 5d20h
pod/engine-image-ei-ce3e2479-54794 1/1 Running 2 (24h ago) 5d20h
pod/engine-image-ei-ce3e2479-67r4v 1/1 Running 1 (24h ago) 5d20h
pod/engine-image-ei-ce3e2479-drfsb 1/1 Running 1 (24h ago) 5d20h
pod/engine-image-ei-ce3e2479-mg8xc 1/1 Running 2 (25h ago) 5d20h
pod/instance-manager-2518c17315db00692be0b82bad3706ad 1/1 Running 0 24h
pod/instance-manager-54866156070ae7574e58eefc61a02c5e 1/1 Running 0 24h
pod/instance-manager-7987fbb3021c10acb318bc134060e9ac 1/1 Running 0 24h
pod/instance-manager-d97fcfbc8d7410f90f23310ebbcfb0fd 1/1 Running 0 24h
pod/longhorn-csi-plugin-5fkdt 3/3 Running 4 (24h ago) 5d20h
pod/longhorn-csi-plugin-jpf4t 3/3 Running 8 (24h ago) 5d20h
pod/longhorn-csi-plugin-kbl7n 3/3 Running 13 (24h ago) 5d20h
pod/longhorn-csi-plugin-pp6lc 3/3 Running 16 (24h ago) 5d20h
pod/longhorn-driver-deployer-679879d8cc-z28hs 1/1 Running 2 (24h ago) 5d20h
pod/longhorn-manager-5rhg2 1/1 Running 1 (24h ago) 5d20h
pod/longhorn-manager-gtj9h 1/1 Running 2 (25h ago) 5d20h
pod/longhorn-manager-rms4x 1/1 Running 1 (24h ago) 5d20h
pod/longhorn-manager-zjzpc 1/1 Running 2 (24h ago) 5d20h
pod/longhorn-ui-854cb599d5-n6rgw 1/1 Running 3 (24h ago) 5d20h
pod/longhorn-ui-854cb599d5-wpzlr 1/1 Running 1 (24h ago) 5d20h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/longhorn-admission-webhook ClusterIP 10.43.79.32 <none> 9502/TCP 21d
service/longhorn-backend ClusterIP 10.43.79.88 <none> 9500/TCP 21d
service/longhorn-conversion-webhook ClusterIP 10.43.21.70 <none> 9501/TCP 21d
service/longhorn-engine-manager ClusterIP None <none> <none> 21d
service/longhorn-frontend ClusterIP 10.43.112.41 <none> 80/TCP 21d
service/longhorn-recovery-backend ClusterIP 10.43.6.133 <none> 9503/TCP 21d
service/longhorn-replica-manager ClusterIP None <none> <none> 21d
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/engine-image-ei-ce3e2479 4 4 4 4 4 <none> 5d20h
daemonset.apps/longhorn-csi-plugin 4 4 4 4 4 <none> 5d20h
daemonset.apps/longhorn-manager 4 4 4 4 4 <none> 21d
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/csi-attacher 3/3 3 3 5d20h
deployment.apps/csi-provisioner 3/3 3 3 5d20h
deployment.apps/csi-resizer 3/3 3 3 5d20h
deployment.apps/csi-snapshotter 3/3 3 3 5d20h
deployment.apps/longhorn-driver-deployer 1/1 1 1 21d
deployment.apps/longhorn-ui 2/2 2 2 21d
NAME DESIRED CURRENT READY AGE
replicaset.apps/csi-attacher-5bc78f548f 3 3 3 5d20h
replicaset.apps/csi-provisioner-756d58684c 3 3 3 5d20h
replicaset.apps/csi-resizer-5d5969d6b 3 3 3 5d20h
replicaset.apps/csi-snapshotter-8c8cf87cd 3 3 3 5d20h
replicaset.apps/longhorn-driver-deployer-5cfcddfd6c 0 0 0 21d
replicaset.apps/longhorn-driver-deployer-679879d8cc 1 1 1 5d20h
replicaset.apps/longhorn-driver-deployer-7bd5f75df8 0 0 0 21d
replicaset.apps/longhorn-ui-665d8ffb55 0 0 0 21d
replicaset.apps/longhorn-ui-854cb599d5 2 2 2 5d20h
```
## 建立 Longhorn Volume
* 建立 2Gi 大小的 pvc
```
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: longhorn-rwo-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 2Gi
```
```
$ kubectl get pv,pvc
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-538bb6d7-fac8-4a3b-a36d-5e53bc5b2eb4 2Gi RWO Delete Bound default/longhorn-rwo-pvc longhorn 23s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/longhorn-rwo-pvc Bound pvc-538bb6d7-fac8-4a3b-a36d-5e53bc5b2eb4 2Gi RWO longhorn 26s
```
* 建立 Pod 使用 Longhorn Volume
```
apiVersion: v1
kind: Pod
metadata:
name: volume-rwo-test
namespace: default
spec:
containers:
- name: volume-test
image: nginx:stable-alpine
imagePullPolicy: IfNotPresent
volumeMounts:
- name: volv
mountPath: /data
ports:
- containerPort: 80
volumes:
- name: volv
persistentVolumeClaim:
claimName: longhorn-rwo-pvc
```
* 確認 longhorn 有兩個 replica,分別存放在 cilium-w3、cilium-m1

## 驗證資料自動修復
```
# 在 cilium-w3 下執行以下命令
$ ls -l /var/lib/longhorn/replicas/
total 0
drwx------ 1 root root 140 Mar 26 09:47 pvc-5c51fe8e-7129-4da1-b9b3-8c90c68a71d5-834d42f5
# 在 cilium-m1 下執行以下命令
$ ls -l /var/lib/longhorn/replicas/
total 0
drwx------ 1 root root 140 Mar 26 09:47 pvc-5c51fe8e-7129-4da1-b9b3-8c90c68a71d5-a6f25e62
```
```
# 在 cilium-m1 下執行以下命令
$ kubectl exec volume-rwo-test -- sh -c "echo haha > /data/test"
$ kubectl exec volume-rwo-test -- cat /data/test
haha
$ kubectl get po volume-rwo-test -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
volume-test 1/1 Running 0 5m31s 10.42.1.84 cilium-w1 <none> <none>
# 在 cilium-w3 下執行以下命令
$ sudo rm -r /var/lib/longhorn/replicas/pvc-5c51fe8e-7129-4da1-b9b3-8c90c68a71d5-834d42f5/
```
```
# 在 cilium-m1 下執行以下命令,檢查資料是否因缺少一個副本,而有損壞
$ kubectl exec volume-rwo-test -- cat /data/test
haha
在 cilium-w3 下執行以下命令,確認副本已重建回來
$ ls -l /var/lib/longhorn/replicas/
total 0
drwx------ 1 root root 358 Mar 26 09:55 pvc-5c51fe8e-7129-4da1-b9b3-8c90c68a71d5-834d42f5
```
* [註] 每次 Longhorn 偵測到一個副本掛掉,系統會自動進行快照,然後開始在節點上進行重建。
* [註] 如果是透過 UI 刪除 replica 會在不同節點上重建,如果是直接到 host 刪除 `/var/lib/longhorn/replicas/` 底下的目錄,會在同一個節點上重建。

## 測試檔案大小是否有限制
* 測試在 pvc 只允許 2Gi 的限制下寫入一個 3Gi 的檔案
```
# 確認 pvc 可以限制檔案大小
$ kubectl exec -it volume-rwo-test -- sh -c "dd count=3k bs=1M if=/dev/zero of=/data/test3g.img"
dd: error writing '/data/test3g.img': No space left on device
1929+0 records in
1928+0 records out
command terminated with exit code 1
```
* 檢查實際储存的空間只有 2.0G
```
# cilium-m1 主機檢查實際使用多少空間
$ sudo du -sh /var/lib/longhorn/replicas/pvc-5c51fe8e-7129-4da1-b9b3-8c90c68a71d5-a6f25e62
2.0G /var/lib/longhorn/replicas/pvc-5c51fe8e-7129-4da1-b9b3-8c90c68a71d5-a6f25e62
```
* 如果從磁碟區中刪除了內容,Longhorn 磁碟區本身的大小將無法縮小。例如,如果建立 2 GB 的 volume,使用 2 GB,然後刪除 2 GB 的內容,則磁碟上的實際大小仍為 2 GB,而不是 0 GB。發生這種情況是因為 Longhorn 在區塊級別運行,而不是檔案系統級別。
```
$ kubectl exec volume-rwo-test -- sh -c "rm /data/test3g.img"
# 就算 pod 內資料刪除,在 host 查看還是會顯示使用 2.0G
$ sudo du -sh /var/lib/longhorn/replicas/pvc-5c51fe8e-7129-4da1-b9b3-8c90c68a71d5-a6f25e62
2.0G /var/lib/longhorn/replicas/pvc-5c51fe8e-7129-4da1-b9b3-8c90c68a71d5-a6f25e62
```
## 測試 ReadWriteOnce 是否真的會限制
* 再次建立不同名字的 pod ,但使用同個 pvc,宣告要長在不同節點上
```
apiVersion: v1
kind: Pod
metadata:
name: volume-rwo-test2
namespace: default
spec:
containers:
- name: volume-test
image: nginx:stable-alpine
imagePullPolicy: IfNotPresent
volumeMounts:
- name: volv
mountPath: /data
ports:
- containerPort: 80
volumes:
- name: volv
persistentVolumeClaim:
claimName: longhorn-rwo-pvc
nodeName: cilium-m1
```
* volume2-test 會一直在 ContainerCreating 的階段,並且 describe 可以看到因為被 Multi-Attach 限制,代表 accessModes 會限制。
```
$ kubectl get po -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
volume-rwo-test 1/1 Running 0 19m 10.42.1.84 cilium-w1 <none> <none>
volume-rwo-test2 0/1 ContainerCreating 0 8s <none> cilium-m1 <none> <none>
$ kubectl describe po volume-rwo-test2
......
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedAttachVolume 44s attachdetach-controller Multi-Attach error for volume "pvc-5c51fe8e-7129-4da1-b9b3-8c90c68a71d5" Volume is already used by pod(s) volume-rwo-test
$ kubectl delete po volume-rwo-test2
```
* 將 volume-rwo-test2 宣告長在跟 volume-rwo-test 同一個節點上
```
apiVersion: v1
kind: Pod
metadata:
name: volume-rwo-test2
namespace: default
spec:
containers:
- name: volume-test
image: nginx:stable-alpine
imagePullPolicy: IfNotPresent
volumeMounts:
- name: volv
mountPath: /data
ports:
- containerPort: 80
volumes:
- name: volv
persistentVolumeClaim:
claimName: longhorn-rwo-pvc
nodeName: cilium-w1
```
* 確認 ReadWriteOnce 只要 pod 是在同一台 node 上都可以存取同個 pvc
```
$ kubectl get po -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
volume-rwo-test 1/1 Running 0 23m 10.42.1.84 cilium-w1 <none> <none>
volume-rwo-test2 1/1 Running 0 6s 10.42.1.191 cilium-w1 <none> <none>
$ kubectl exec volume-rwo-test2 -- cat /data/test
haha
```
## 手動掛載 Block devices
* kubelet 實際上再做出 pod 並且要掛載 longhorn 做出的 volume 時,longhorn-engine 會再產生 pod 的那個 node 上做出 `/dev/longhorn/volume-name-xxxxx` 這個就是 Block devices,而 kubelet 再去掛載他。

```
# 在 cilium-m1 檢查 /dev/longhorn/ 目錄
# ls -l /dev/longhorn/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8
brw-rw---- 1 root root 8, 32 Mar 29 09:40 /dev/longhorn/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8
# mount /dev/longhorn/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8 /mnt
# ls -l /mnt
total 20
drwx------ 2 root root 16384 Mar 29 09:32 lost+found
-rw-r--r-- 1 root root 5 Mar 29 09:32 test
# cat /mnt/test
haha
# umount /mnt
```
## 驗證 longhorn-engine 是如何做出 Block devices
* 在產生 pod 的 node 上檢視 longhorn-engine linux process。
* longhorn-engine 指向兩個 replica 來找到實際資料存储的位置 `--replica tcp://10.42.0.205:10001` `--replica tcp://10.42.3.217:10010`。
* longhorn-engine 會在產生 pod 的 node 上做出 Block devices。
```!
$ ps aux | grep -v grep | grep pvc-5d571f3e-303e-473b-b7e8-336096ad43b8
root 29754 0.5 0.1 1914824 30184 ? Sl 21:44 0:21 /engine-binaries/registry.rancher.com-rancher-mirrored-longhornio-longhorn-engine-v1.5.4/longhorn --engine-instance-name pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-e-0 controller pvc-5d571f3e-303e-473b-b7e8-336096ad43b8 --frontend tgt-blockdev --size 2147483648 --current-size 0 --engine-replica-timeout 8 --file-sync-http-client-timeout 30 --replica tcp://10.42.0.205:10001 --replica tcp://10.42.3.217:10010 --listen 0.0.0.0:10000
```
```
$ kubectl get po -A -owide | grep -E '10.42.0.205|10.42.3.217'
longhorn-system instance-manager-2518c17315db00692be0b82bad3706ad 1/1 Running 0 4d13h 10.42.0.205 cilium-m1 <none> <none>
longhorn-system instance-manager-d97fcfbc8d7410f90f23310ebbcfb0fd 1/1 Running 0 4d13h 10.42.3.217 cilium-w3 <none> <none>
```
## 在 cilium-m1 檢視 replica linux process
* replica 這個 process 如果掛了 longhorn-manager 會負責在把他建回來
```
$ ps aux | grep -v grep | grep pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-6b0c4980
root 22805 0.1 0.2 2135764 24824 ? Sl 21:44 0:07 /host/var/lib/longhorn/engine-binaries/registry.rancher.com-rancher-mirrored-longhornio-longhorn-engine-v1.5.4/longhorn --volume-name pvc-5d571f3e-303e-473b-b7e8-336096ad43b8 replica /host/var/lib/longhorn/replicas/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-6b0c4980 --size 2147483648 --replica-instance-name pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-r-03dc6292 --sync-agent-port-count 7 --listen 0.0.0.0:10001
```
* 環境清除
```
$ kubectl delete po volume-rwo-test volume-rwo-test2
$ kubectl delete pvc longhorn-rwo-pvc
```
## ReadWriteMany (RWX) Volume
### 環境準備
* 每一台節點都需要安裝 nfs client 套件
* 每個節點的主機名稱在 Kubernetes 叢集中都是唯一的。
### 架構圖
* 當不同節點的 pod 掛載 RWX pvc 時,csi-plugin 會呼叫 Longhorn Manager 建立 volume 與 share-manager。
* 實際上使用 RWX pvc 的 pod 是透過 nfs 掛載到 share-manager 這個 pod,再由這個 pod 找到 longhorn 储存的 vloume。
* share-manager 這個 pod 是由 Longhorn Manager 管理,如果這個 pod 掛了 Longhorn Manager 會再建回來。

* 建立 RWX pvc
```
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: longhorn-rwx-pvc
spec:
accessModes:
- ReadWriteMany
storageClassName: longhorn
resources:
requests:
storage: 2Gi
```
* 建立兩個 pod 分別長在不同節點上,使用同個 pvc
```
apiVersion: v1
kind: Pod
metadata:
name: volume-rwx-test
namespace: default
labels:
app: test
spec:
containers:
- name: volume-test
image: nginx:stable-alpine
imagePullPolicy: IfNotPresent
volumeMounts:
- name: volv
mountPath: /data
ports:
- containerPort: 80
volumes:
- name: volv
persistentVolumeClaim:
claimName: longhorn-rwx-pvc
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- test
topologyKey: kubernetes.io/hostname
---
apiVersion: v1
kind: Pod
metadata:
name: volume-rwx-test2
namespace: default
labels:
app: test
spec:
containers:
- name: volume-test
image: nginx:stable-alpine
imagePullPolicy: IfNotPresent
volumeMounts:
- name: volv
mountPath: /data
ports:
- containerPort: 80
volumes:
- name: volv
persistentVolumeClaim:
claimName: longhorn-rwx-pvc
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- test
topologyKey: kubernetes.io/hostname
```
* 確認 pod 之間資料是否共享
```
$ kubectl get po -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
volume-rwx-test 1/1 Running 0 42s 10.42.1.60 cilium-w1 <none> <none>
volume-rwx-test2 1/1 Running 0 42s 10.42.2.188 cilium-w2 <none> <none>
$ kubectl exec volume-rwx-test -- sh -c "echo haha > /data/test"
$ kubectl exec volume-rwx-test2 -- sh -c "cat /data/test"
haha
```
* 在 longhorn-system 會而外建立 share-manager 這個 pod
```
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
longhorn-rwx-pvc Bound pvc-0357af88-4d68-412e-a77d-355af0c4f608 2Gi RWX longhorn 22m
$ kubectl -n longhorn-system get po
NAME READY STATUS RESTARTS AGE
csi-attacher-5bc78f548f-mqtv4 1/1 Running 11 (20m ago) 6d
csi-attacher-5bc78f548f-whn59 1/1 Running 7 (28h ago) 6d
csi-attacher-5bc78f548f-z92gh 1/1 Running 9 (28h ago) 6d
csi-provisioner-756d58684c-jtmpn 1/1 Running 7 (27h ago) 6d
csi-provisioner-756d58684c-lz5lh 1/1 Running 14 (23m ago) 6d
csi-provisioner-756d58684c-vhn7s 1/1 Running 7 (28h ago) 6d
csi-resizer-5d5969d6b-fkksm 1/1 Running 10 (28h ago) 6d
csi-resizer-5d5969d6b-jlpcv 1/1 Running 8 (28h ago) 6d
csi-resizer-5d5969d6b-qdrl8 1/1 Running 4 (28h ago) 6d
csi-snapshotter-8c8cf87cd-48pgk 1/1 Running 10 (28h ago) 6d
csi-snapshotter-8c8cf87cd-qb86t 1/1 Running 10 (28h ago) 6d
csi-snapshotter-8c8cf87cd-tsftq 1/1 Running 4 (28h ago) 6d
engine-image-ei-ce3e2479-54794 1/1 Running 2 (28h ago) 6d
engine-image-ei-ce3e2479-67r4v 1/1 Running 1 (28h ago) 6d
engine-image-ei-ce3e2479-drfsb 1/1 Running 1 (28h ago) 6d
engine-image-ei-ce3e2479-mg8xc 1/1 Running 2 (28h ago) 6d
instance-manager-2518c17315db00692be0b82bad3706ad 1/1 Running 0 28h
instance-manager-54866156070ae7574e58eefc61a02c5e 1/1 Running 0 28h
instance-manager-7987fbb3021c10acb318bc134060e9ac 1/1 Running 0 28h
instance-manager-d97fcfbc8d7410f90f23310ebbcfb0fd 1/1 Running 0 28h
longhorn-csi-plugin-5fkdt 3/3 Running 4 (28h ago) 6d
longhorn-csi-plugin-jpf4t 3/3 Running 8 (28h ago) 6d
longhorn-csi-plugin-kbl7n 3/3 Running 13 (28h ago) 6d
longhorn-csi-plugin-pp6lc 3/3 Running 16 (28h ago) 6d
longhorn-driver-deployer-679879d8cc-z28hs 1/1 Running 2 (28h ago) 6d
longhorn-manager-5rhg2 1/1 Running 1 (28h ago) 6d
longhorn-manager-gtj9h 1/1 Running 2 (28h ago) 6d
longhorn-manager-rms4x 1/1 Running 1 (28h ago) 6d
longhorn-manager-zjzpc 1/1 Running 2 (28h ago) 6d
longhorn-ui-854cb599d5-n6rgw 1/1 Running 3 (28h ago) 6d
longhorn-ui-854cb599d5-wpzlr 1/1 Running 1 (28h ago) 6d
share-manager-pvc-0357af88-4d68-412e-a77d-355af0c4f608 1/1 Running 0 16m
```
### 測試檔案大小是否有限制
* 確認 pvc 可以限制檔案大小
```
$ kubectl exec -it volume-rwx-test -- sh -c "dd count=3k bs=1M if=/dev/zero of=/data/test3g.img"
dd: error writing '/data/test3g.img': No space left on device
2549+0 records in
2548+0 records out
command terminated with exit code 1
```
* 環境清除
```
$ kubectl delete po volume-rwx-test volume-rwx-test2
$ kubectl delete pvc longhorn-rwx-pvc
```
## longhorn 讀寫資料運作流程
* pod 寫資料時會寫入 Live data。在 snapshot 後寫入資料時如果是針對同個 block 資料區間寫入,資料會寫入到 Live data 並且 index 回重新指向到 Live data 的 block 資料區間。
* 當 pod 讀取資料時,會先從 Live Data 找資料,如果 Live Data 沒有會往上一個 snapshot 找,以此類推直到找到資料。
* 為了提高 read 的效能,longhorn 會維護一個 index,他會記錄所有資料存放差異的 block 位置讀取 block 的資料區間,而每一個 block 储存大小單位為 4KiB。

* 建立 pvc & pod
```
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: longhorn-rwo-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 2Gi
---
apiVersion: v1
kind: Pod
metadata:
name: volume-rwo-test
namespace: default
spec:
containers:
- name: volume-test
image: nginx:stable-alpine
imagePullPolicy: IfNotPresent
volumeMounts:
- name: volv
mountPath: /data
ports:
- containerPort: 80
volumes:
- name: volv
persistentVolumeClaim:
claimName: longhorn-rwo-pvc
```
* 檢視 replica 存放位置

* 在 cilium-w3 檢視 block 資料區間
```
$ sudo ls -l /var/lib/longhorn/replicas/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-94ec09f1
total 99496
-rw------- 1 root root 4096 Mar 29 21:46 revision.counter
-rw-r--r-- 1 root root 2147483648 Mar 29 21:46 volume-head-000.img
-rw-r--r-- 1 root root 126 Mar 29 21:44 volume-head-000.img.meta
-rw-r--r-- 1 root root 142 Mar 29 21:44 volume.meta
```
* 寫入一筆資料
```
$ kubectl exec volume-rwo-test -- sh -c "echo haha >> /data/test"
```
```
$ sudo filefrag -v /var/lib/longhorn/replicas/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-94ec09f1/volume-head-000.img
Filesystem type is: 9123683e
File size of /var/lib/longhorn/replicas/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-94ec09f1/volume-head-000.img is 2147483648 (524288 blocks of 4096 bytes)
ext: logical_offset: physical_offset: length: expected: flags:
0: 0.. 257: 7022200.. 7022457: 258:
1: 258.. 258: 7022459.. 7022459: 1: 7022458:
2: 265.. 265: 5555783.. 5555783: 1: 7022466:
3: 272.. 273: 5555784.. 5555785: 2:
4: 289.. 289: 6224085.. 6224085: 1: 5555801:
5: 290.. 545: 7022507.. 7022762: 256: 6224086:
6: 546.. 800: 7022842.. 7023096: 255: 7022763:
7: 801.. 1056: 7023259.. 7023514: 256: 7023097:
......
```
* 透過 longhorn UI 點選 take snapshot

* snapshot 成功後會看到最新的快照資訊

* 檢視已產生新的 block
```
$ sudo ls -l /var/lib/longhorn/replicas/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-94ec09f1
total 99504
-rw------- 1 root root 4096 Mar 29 21:48 revision.counter
-rw-r--r-- 1 root root 2147483648 Mar 29 21:49 volume-head-001.img # Live Data
-rw-r--r-- 1 root root 178 Mar 29 21:49 volume-head-001.img.meta
-rw-r--r-- 1 root root 194 Mar 29 21:49 volume.meta
-rw-r--r-- 1 root root 2147483648 Mar 29 21:48 volume-snap-74b0f1ba-510d-4459-aa6e-0a1a5062450d.img # Newest Snapshot
-rw-r--r-- 1 root root 125 Mar 29 21:49 volume-snap-74b0f1ba-510d-4459-aa6e-0a1a5062450d.img.meta
```
* 檢視 pod 內的資料還存在
```
$ kubectl exec volume-rwo-test -- cat /data/test
haha
```
* 檢視 Live Data 的 block 資料區間
* 因為目前 Live Data 沒有寫入資料,所以 Live Data 的 block 資料區間是空的,代表現在看到的資料都是上一層 snapshot 的資料
```
$ sudo filefrag -v /var/lib/longhorn/replicas/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-94ec09f1/volume-head-001.img
Filesystem type is: 9123683e
File size of /var/lib/longhorn/replicas/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-94ec09f1/volume-head-001.img is 2147483648 (524288 blocks of 4096 bytes)
/var/lib/longhorn/replicas/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-94ec09f1/volume-head-001.img: 0 extents found
```
* 在寫入一筆資料
```
$ kubectl exec volume-rwo-test -- sh -c "echo test >> /data/test"
```
* 檢視 pod 內的資料是完整的兩筆
```
$ kubectl exec volume-rwo-test -- cat /data/test
haha
test
```
* 再次檢視 Live Data 的 block 資料區間,此時就有新的 block 存放區間,因此證明 longhorn 會做資料差異存放,類似於 Copy-on-Write
```
$ sudo filefrag -v /var/lib/longhorn/replicas/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-94ec09f1/volume-head-001.img
Filesystem type is: 9123683e
File size of /var/lib/longhorn/replicas/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-94ec09f1/volume-head-001.img is 2147483648 (524288 blocks of 4096 bytes)
ext: logical_offset: physical_offset: length: expected: flags:
0: 289.. 289: 6225454.. 6225454: 1: 289:
1: 33025.. 33025: 6225731.. 6225731: 1: 6258190:
2: 262186.. 262188: 6225451.. 6225453: 3: 6454892:
3: 262189.. 262191: 6225732.. 6225734: 3: 6225454: last
/var/lib/longhorn/replicas/pvc-5d571f3e-303e-473b-b7e8-336096ad43b8-94ec09f1/volume-head-001.img: 4 extents found
```
## Backup and Restore with NFS storage
* 在 sles15 安裝 nfs server
```
$ sudo zypper -n install nfs-kernel-server
(1/1) Installing: nfs-kernel-server-2.1.1-150500.20.2.x86_64 .....................................[done]
$ sudo mkdir /opt/backup
$ echo '/opt/backup 192.168.11.90/24(rw,sync,no_subtree_check,no_root_squash)' | sudo tee /etc/exports
$ sudo systemctl enable --now nfs-server
```
* 建立測試用 pvc,pod
```
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: longhorn-volv-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 2Gi
---
apiVersion: v1
kind: Pod
metadata:
name: volume-test
namespace: default
spec:
containers:
- name: volume-test
image: nginx:stable-alpine
imagePullPolicy: IfNotPresent
volumeMounts:
- name: volv
mountPath: /data
ports:
- containerPort: 80
volumes:
- name: volv
persistentVolumeClaim:
claimName: longhorn-volv-pvc
```
* 設定 Backup Target


* 備份 PV


* 檢視是否 Backup 成功

```
# 在 nfs server 執行以下命令
$ sudo ls -l /opt/backup/backupstore/volumes/a2/c0/pvc-f6d0eeee-f711-4376-aa48-43096863dc24
total 4
drwx------ 1 root root 68 Jul 22 21:13 backups
drwx------ 1 root root 36 Jul 22 21:13 blocks
-rw-r--r-- 1 root root 686 Jul 22 21:13 volume.cfg
```
* 刪除 Pod & PVC
```
$ kubectl delete pod volume-test
pod "volume-test" deleted
$ kubectl delete pvc longhorn-volv-pvc
```
* Restore PV

* Number of Replicas 改成 2

* 啟動




* 檢查 PV/PVC 是否恢復
```
$ kubectl get pv,pvc
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-ce8a7348-0281-4d0e-a808-440b527b1db5 2Gi RWO Retain Bound default/longhorn-volv-pvc longhorn-static 82s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/longhorn-volv-pvc Bound pvc-ce8a7348-0281-4d0e-a808-440b527b1db5 2Gi RWO
longhorn-static 82s
```
* 建立 Pod ,並檢視資料使否還在
```
$ kubectl exec volume-test -- cat /data/test
haha
```
* 環境清除
```
$ kubectl delete po volume-test
$ kubectl delete pvc longhorn-rwo-pvc
```
## 測試 Block Volume
* 建立 Block Volume pvc
```
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: longhorn-block-vol
spec:
accessModes:
- ReadWriteOnce
volumeMode: Block
storageClassName: longhorn
resources:
requests:
storage: 3Gi
```
* 建立 pod,並自動 format ext4 格式
```
apiVersion: apps/v1
kind: Deployment
metadata:
name: pod-with-raw-block-volume
labels:
os: alpine
spec:
replicas: 1
selector:
matchLabels:
os: alpine
template:
metadata:
labels:
os: alpine
spec:
containers:
- name: alpine
image: taiwanese/alpine:stable
imagePullPolicy: IfNotPresent
command: ["/bin/sleep", "infinity"]
volumeDevices:
- name: data
devicePath: /dev/sdb
securityContext:
privileged: true
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- |
set -e
mkdir /longhorn
checkformat=$(blkid | grep -w /dev/sdb | cut -d ':' -f1)
[[ "$checkformat" != /dev/sdb ]] && (mkfs.ext4 /dev/sdb && mount /dev/sdb /longhorn) || mount /dev/sdb /longhorn
preStop:
exec:
command:
- /bin/sh
- -c
- |
umount /dev/sdb
volumes:
- name: data
persistentVolumeClaim:
claimName: longhorn-block-vol
```
```
$ kubectl get po
NAME READY STATUS RESTARTS AGE
pod-with-raw-block-volume-bfd6df465-55ccw 1/1 Running 0 42s
```
* 進入 pod 檢視多了一個 sdb 硬碟
```
$ kubectl exec -it pod-with-raw-block-volume-bfd6df465-55ccw -- bash
bash-5.1# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 3G 0 loop
sda 8:0 0 100G 0 disk
├─sda1 8:1 0 8M 0 part
└─sda2 8:2 0 100G 0 part
sdb 8:16 0 3G 0 disk /longhorn
sr0 11:0 1 13.5G 0 rom
bash-5.1# echo 123 > /longhorn/test
bash-5.1# cat /longhorn/test
123
```
* 測試 pod 移轉 node,資料是否永存
```
$ kubectl scale deploy pod-with-raw-block-volume --replicas=0
$ kubectl cordon cilium-w1
$ kubectl scale deploy pod-with-raw-block-volume --replicas=1
$ kubectl get no
NAME STATUS ROLES AGE VERSION
cilium-m1 Ready control-plane,etcd,master,worker 12d v1.27.10+rke2r1
cilium-w1 Ready,SchedulingDisabled worker 54m v1.27.10+rke2r1
$ kubectl uncordon cilium-w1
$ kubectl exec -it pod-with-raw-block-volume-bfd6df465-n2tt8 -- cat /longhorn/test
123
```
## mysql 掛載 volume 測試
```
apiVersion: v1
kind: Service
metadata:
name: mysql
labels:
app: mysql
spec:
ports:
- port: 3306
selector:
app: mysql
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-pvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 2Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: mysql
labels:
app: mysql
spec:
selector:
matchLabels:
app: mysql # has to match .spec.template.metadata.labels
strategy:
type: Recreate
template:
metadata:
labels:
app: mysql
spec:
restartPolicy: Always
containers:
- image: docker.io/taiwanese/mydb
name: mysql
livenessProbe:
exec:
command:
- ls
- /var/lib/mysql/lost+found
initialDelaySeconds: 5
periodSeconds: 5
ports:
- containerPort: 3306
name: mysql
volumeMounts:
- name: mysql-volume
mountPath: /var/lib/mysql
volumes:
- name: mysql-volume
persistentVolumeClaim:
claimName: mysql-pvc
```
* mysql 帳號密碼皆是 bigred
```
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 12d
mysql ClusterIP 10.43.214.58 <none> 3306/TCP 4s
$ mysql -u bigred -p -h 10.43.214.58
Enter password: bigred
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 8.0.32 MySQL Community Server - GPL
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MySQL [(none)]> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| sys |
+--------------------+
4 rows in set (0.013 sec)
```
## Replica Auto Balance
* longhorn 預設配置 Replica Auto Balance 是關閉的,可以開啟這個功能讓 replicas 自動平衡在所有節點
- least-effort: 平均分配 replicas,以盡量減少冗餘。
- best-effort: 平均分配 replicas,以實現均衡冗餘。如果 longhorn 有足夠的節點可以平均 replicas,longhorn 會強制 re-schedule replicas。
### 參考連結
https://www.server-world.info/en/note?os=SUSE_Linux_Enterprise_15&p=iscsi&f=2
https://blog.csdn.net/RancherLabs/article/details/127100126
###### tags: `工具`