# openebs hostpath localpv privisioner 使用 ext4/xfs project quota 功能限制pvc使用量 ## 介绍 ### project quota 介绍 quota 子系统用于限制磁盘的使用量。 从限制的主体进行分类,quota 包含 user quota、group quota 与 project quota 三部分。顾名思义,user quota、group quota 限制的主体分别是 user、user group,而 project quota 限制的主体则是 project id。当一个目录下的所有子目录和文件拥有相同的 project id 时,就可以限制一个目录下总的磁盘使用量。 quota 子系统其实是一项“古老”的特性,user quota 与 group quota 早在 Linux v2.6 就开始支持,而 project quota 则来得相对晚一些。project quota 特性最初来源于 XFS,Linux v4.5 开始 ext4 才正式支持 project quota。 ### quota 配额 quota 从限制的对象进行分类,包括 block quota 与 inode quota 两部分。 限制的类型又包含 softlimit 与 hardlimit 两类,其中 hardlimit 是不可超越的,即 inode/block 分配过程中若当前占用的 inode/block 数量超过 hardlimit,那么分配过程就会失败。 softlimit 是可以暂时超越的,inode/block 分配过程中若当前占用的 inode/block 数量超过 softlimit 但是尚未超过 hardlimit,那么只会打印 warnning 信息,但是分配过程并不会失败。 但是系统不能长时间超过 softlimit,系统可以超过 softlimit 的最长时间称为 grace time,在第一次超过 softlimit 的时候开始计时,在 grace time 时间以内,尽管当前占用的 inode/block 数量超过 softlimit,但是分配过程不会失败;而如果 grace time 时间以后,当前占用的 inode/block 数量仍然没有降到 softlimit 以下,此时分配过程就会失败。 grace time 参数是 filesystem wide 统一的,即同一个文件系统(磁盘分区)共用同一份 grace time 参数。 关于xfs/ext4 quota更多介绍可查阅以下文档: - xfs quota: https://man7.org/linux/man-pages/man8/xfs_quota.8.html - ext4 quota: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_file_systems/assembly_limiting-storage-space-usage-on-ext4-with-quotas_managing-file-systems ## 测试环境基本信息 单节点k8s集群 - os: ubuntu 20.04.4 LTS (Focal Fossa) - kernel: Linux 5.4.0-135-generic #152-Ubuntu SMP Wed Nov 23 20:19:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux - k8s: v1.23.7 - helm: v3.6.3 ## 安装 openebs localpv helm chart ```bash root@stonetest1:~# helm repo add openebs https://openebs.github.io/charts root@stonetest1:~# helm update root@stonetest1:~# helm search repo openebs NAME CHART VERSION APP VERSION DESCRIPTION openebs/openebs 3.3.1 3.3.0 Containerized Attached Storage for Kubernetes root@stonetest1:~# cat openebs-values.yaml apiserver: enabled: false varDirectoryPath: baseDir: "/openebs" provisioner: enabled: false localprovisioner: enabled: true image: "stoneshiyunify/openebs-localpv" imageTag: "v0.2" basePath: "/openebs/local" deviceClass: enabled: false hostpathClass: # Name of the default hostpath StorageClass name: openebs-hostpath # If true, enables creation of the openebs-hostpath StorageClass enabled: true # Available reclaim policies: Delete/Retain, defaults: Delete. reclaimPolicy: Delete # If true, sets the openebs-hostpath StorageClass as the default StorageClass isDefaultClass: false # Path on the host where local volumes of this storage class are mounted under. # NOTE: If not specified, this defaults to the value of localprovisioner.basePath. basePath: "/openebs/local" # Custom node affinity label(s) for example "openebs.io/node-affinity-value" # that will be used instead of hostnames # This helps in cases where the hostname changes when the node is removed and # added back with the disks still intact. # Example: # nodeAffinityLabels: # - "openebs.io/node-affinity-key-1" # - "openebs.io/node-affinity-key-2" nodeAffinityLabels: [] # Prerequisite: XFS Quota requires an XFS filesystem mounted with # the 'pquota' or 'prjquota' mount option. xfsQuota: # If true, enables XFS project quota enabled: true # Detailed configuration options for XFS project quota. # If XFS Quota is enabled with the default values, the usage limit # is set at the storage capacity specified in the PVC. softLimitGrace: "80%" hardLimitGrace: "100%" # Prerequisite: EXT4 Quota requires an EXT4 filesystem mounted with # the 'prjquota' mount option. ext4Quota: # If true, enables XFS project quota enabled: true # Detailed configuration options for EXT4 project quota. # If EXT4 Quota is enabled with the default values, the usage limit # is set at the storage capacity specified in the PVC. softLimitGrace: "80%" hardLimitGrace: "100%" snapshotOperator: enabled: false ndm: enabled: false ndmOperator: enabled: false ndmExporter: enabled: false webhook: enabled: false crd: enableInstall: false policies: monitoring: enabled: false analytics: enabled: false jiva: enabled: false openebsLocalpv: enabled: false localpv-provisioner: openebsNDM: enabled: false cstor: enabled: false openebsNDM: enabled: false openebs-ndm: enabled: false localpv-provisioner: enabled: false openebsNDM: enabled: false zfs-localpv: enabled: false lvm-localpv: enabled: false nfs-provisioner: enabled: false root@stonetest1:~# kubectl create ns openebs root@stonetest1:~# helm install openebs openebs/openebs -n openebs -f openebs-values.yaml root@stonetest1:~# kubectl -n openebs get pod NAME READY STATUS RESTARTS AGE openebs-localpv-provisioner-6d9bffd9db-g2hpw 1/1 Running 1 (56m ago) 134m root@stonetest1:~# kubectl get sc openebs-hostpath NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE openebs-hostpath openebs.io/local Delete WaitForFirstConsumer false 133m root@stonetest1:~# kubectl get sc openebs-hostpath -o yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: annotations: cas.openebs.io/config: | - name: StorageType value: "hostpath" - name: BasePath value: "/openebs/local" - name: XFSQuota enabled: "true" data: softLimitGrace: "80%" hardLimitGrace: "100%" - name: EXT4Quota enabled: "true" data: softLimitGrace: "80%" hardLimitGrace: "100%" meta.helm.sh/release-name: openebs meta.helm.sh/release-namespace: openebs openebs.io/cas-type: local creationTimestamp: "2022-12-09T03:59:00Z" labels: app.kubernetes.io/managed-by: Helm name: openebs-hostpath resourceVersion: "34215091" uid: 94b5b547-efde-478f-92b8-f3f676064dcf provisioner: openebs.io/local reclaimPolicy: Delete volumeBindingMode: WaitForFirstConsumer ``` **注意** - 安装chart时, 我将localpv-provisioner的镜像更改至我本人(stone)编译的镜像(stoneshiyunify/openebs-localpv:v0.2), 因为我认为官方的镜像计算出的quota limit数值有误,但此问题暂未得到openebs官方确认。详情参阅 github issue: https://github.com/openebs/dynamic-localpv-provisioner/issues/150 - 强烈建议每个节点都使用独立的文件系统(对应独立的硬盘/分区)专门为openebs使用,因为project quota 需在文件系统层级支持和开启。chart中将openebs的basepath指定为/openebs, 此目录即是一个独立硬盘/分区的挂载点, 专供openebs使用。下文介绍了如何将新硬盘挂载至/openebs并开启quota功能。不建议使用chart默认的 /var/openebs 目录,因为它通常和根目录/共享一个文件系统,根目录所在的文件系统可能无法开启quota功能。 ## 准备测试文件 准备一个deployment用来测试quota功能。此deployment会创建一个pvc并挂载到busybox容器中。 ```bash root@stonetest1:~# cat busy-deployment.yaml kind: PersistentVolumeClaim apiVersion: v1 metadata: name: busybox-test spec: storageClassName: openebs-hostpath accessModes: - ReadWriteOnce resources: requests: storage: 10G --- apiVersion: apps/v1 kind: Deployment metadata: name: busybox-test spec: replicas: 1 selector: matchLabels: app: busybox-test template: metadata: labels: app: busybox-test spec: containers: - name: busybox image: busybox:1.29 imagePullPolicy: IfNotPresent command: [ "/bin/sh", "-c", "tail -f /dev/null" ] volumeMounts: - name: volume1 mountPath: "/mnt/volume1" volumes: - name: volume1 persistentVolumeClaim: claimName: busybox-test ``` ## 使用 ext4 project quota ### 挂载openebs专用文件系统并开启project quota 登录集群每个节点,执行以下操作: - 安装 quota utility 和 quota kernel module ```bash root@stonetest1:~# sudo apt update && sudo apt install quota linux-image-extra-virtual # check if quota installed root@stonetest1:~# quota --version ``` - 将硬盘挂载至集群节点。本文的硬盘挂载至/dev/vdc。 - 格式化硬盘,并创建文件系统 ```bash root@stonetest1:~# mkfs.ext4 -O project,quota /dev/vdc ``` - 挂载文件系统,并开启project quota ```bash root@stonetest1:~# mkdir /openebs root@stonetest1:~# mount -o prjquota /dev/vdc /openebs root@stonetest1:~# mount | grep openebs /dev/vdc on /openebs type ext4 (rw,relatime,prjquota) ``` 提示:可将此硬盘添加到/etc/fstab以使节点重启后自动生效 - 查看quota状态 ```bash root@stonetest1:~# quotaon -Ppv /openebs project quota on /openebs (/dev/vdc) is on (enforced) ``` quota状态应为 enforced,表示project quota已开启 ### 测试 ```bash # 创建pvc,启动工作负载 root@stonetest1:~# kubectl create ns ext4 root@stonetest1:~# kubectl -n ext4 apply -f busy-deployment.yaml root@stonetest1:~# kubectl -n ext4 get pod NAME READY STATUS RESTARTS AGE busybox-test-64856dd56f-wkp92 1/1 Running 0 92m root@stonetest1:~# kubectl -n ext4 get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE busybox-test Bound pvc-ec2182f3-0e78-48e8-a427-eb5224c8c9c7 10G RWO openebs-hostpath 92m # /openebs的project id为0 root@stonetest1:~# lsattr -p -d /openebs 0 --------------e----- /openebs # /openeb/local/pvc-ec2182f3-0e78-48e8-a427-eb5224c8c9c7 的project id 为1 root@stonetest1:/openebs/local# lsattr -p 1 --------------e---P- ./pvc-ec2182f3-0e78-48e8-a427-eb5224c8c9c7 # 查看quota使用情况 root@stonetest1:~# repquota -P /openebs *** Report for project quotas on device /dev/vdc Block grace time: 7days; Inode grace time: 7days Block limits File limits Project used soft hard grace used soft hard grace ---------------------------------------------------------------------- #0 -- 24 0 0 3 0 0 #1 -- 4 8000000 10000000 1 0 0 说明: project id 0 是/openebs 的quota状况 peoject id 1 是pvc-ec2182f3-0e78-48e8-a427-eb5224c8c9c7的quota使用状况 8000000KB 即为pvc容量(10G)的80% (soft limit) 10000000KB 即为pvc容量(10G)的100% (hard limit) 4KB 为此pvc的当前空间使用量 # 登录busybox容器并创建文件进行测试 root@stonetest1:~# kubectl -n ext4 exec -it busybox-test-64856dd56f-wkp92 -- sh / # df -h Filesystem Size Used Available Use% Mounted on overlay 96.7G 80.0G 16.8G 83% / tmpfs 64.0M 0 64.0M 0% /dev tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup /dev/vda1 96.7G 80.0G 16.8G 83% /dev/termination-log /dev/vdc 7.6G 4.0K 7.6G 0% /mnt/volume1 /dev/vda1 96.7G 80.0G 16.8G 83% /etc/resolv.conf /dev/vda1 96.7G 80.0G 16.8G 83% /etc/hostname /dev/vda1 96.7G 80.0G 16.8G 83% /etc/hosts shm 64.0M 0 64.0M 0% /dev/shm tmpfs 14.4G 12.0K 14.4G 0% /var/run/secrets/kubernetes.io/serviceaccount tmpfs 7.8G 0 7.8G 0% /proc/acpi tmpfs 64.0M 0 64.0M 0% /proc/kcore tmpfs 64.0M 0 64.0M 0% /proc/keys tmpfs 64.0M 0 64.0M 0% /proc/timer_list tmpfs 64.0M 0 64.0M 0% /proc/sched_debug tmpfs 7.8G 0 7.8G 0% /proc/scsi tmpfs 7.8G 0 7.8G 0% /sys/firmware / # cd /mnt/volume1 /mnt/volume1 # fallocate -l 6G aaa /mnt/volume1 # ls -l && df -B 1K /mnt/volume1 total 6291460 -rw-r--r-- 1 root root 6442450944 Dec 9 05:42 aaa Filesystem 1K-blocks Used Available Use% Mounted on /dev/vdc 8000000 6291464 1708536 79% /mnt/volume1 /mnt/volume1 # fallocate -l 3G bbb /mnt/volume1 # ls -l && df -B 1K /mnt/volume1 total 9437192 -rw-r--r-- 1 root root 6442450944 Dec 9 05:42 aaa -rw-r--r-- 1 root root 3221225472 Dec 9 05:43 bbb Filesystem 1K-blocks Used Available Use% Mounted on /dev/vdc 8000000 8000000 0 100% /mnt/volume1 /mnt/volume1 # fallocate -l 2G ccc fallocate: fallocate 'ccc': Disk quota exceeded /mnt/volume1 # fallocate -l 50M ddd /mnt/volume1 # ls -l && df -B 1K /mnt/volume1 total 9922568 -rw-r--r-- 1 root root 6442450944 Dec 9 05:42 aaa -rw-r--r-- 1 root root 3221225472 Dec 9 05:43 bbb -rw-r--r-- 1 root root 444596224 Dec 9 05:43 ccc -rw-r--r-- 1 root root 52428800 Dec 9 05:43 ddd Filesystem 1K-blocks Used Available Use% Mounted on /dev/vdc 8000000 8000000 0 100% /mnt/volume1 # 查看quota report root@stonetest1:~# repquota -P /openebs *** Report for project quotas on device /dev/vdc Block grace time: 7days; Inode grace time: 7days Block limits File limits Project used soft hard grace used soft hard grace ---------------------------------------------------------------------- #0 -- 24 0 0 3 0 0 #1 +- 9922572 8000000 10000000 6days 5 0 0 ``` ### 关闭/开启/手动编辑 project quota 开启: quotaon -P /openebs 关闭: quotaoff -P /openebs 编辑:edquota ## 使用 xfs project quota ### 挂载openebs专用文件系统并开启project quota 登录集群每个节点,执行以下操作: - xfs_quota 功能比较古老,一般的linux发行版中都自带有此命令。 ```bash root@stonetest:~# xfs_quota -V xfs_quota version 5.13.0 ``` - 将硬盘挂载至集群节点。本文的硬盘挂载至/dev/vdc。 - 格式化硬盘,并创建文件系统 ```bash root@stonetest:~# mkfs -t xfs /dev/vdc ``` - 挂载文件系统,并开启project quota ```bash root@stonetest:~# mkdir /openebs root@stonetest:~# mount -o prjquota /dev/vdc /openebs root@stonetest:~# mount | grep vdc /dev/vdc on /openebs type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,prjquota) ``` 提示:可将此硬盘添加到/etc/fstab以使节点重启后自动生效 - 查看quota状态 ```bash root@stonetest:~# xfs_quota -x xfs_quota> state -p Project quota state on /openebs (/dev/vdc) Accounting: ON Enforcement: ON Inode: #131 (2 blocks, 2 extents) Blocks grace time: [7 days] Blocks max warnings: 5 Inodes grace time: [7 days] Inodes max warnings: 5 Realtime Blocks grace time: [7 days] ``` ### 测试 ```bash root@stonetest:~# kubectl -n xfs apply -f busy-deployment.yaml persistentvolumeclaim/busybox-test created deployment.apps/busybox-test created root@stonetest:~# kubectl -n xfs get pod NAME READY STATUS RESTARTS AGE busybox-test-64856dd56f-9282t 1/1 Running 0 35s root@stonetest:~# kubectl -n xfs get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE busybox-test Bound pvc-29537c78-cbc6-4d60-a2d4-84db36483a94 10G RWO openebs-hostpath 38s root@stonetest:~# xfs_quota -x xfs_quota> report Project quota on /openebs (/dev/vdc) Blocks Project ID Used Soft Hard Warn/Grace ---------- -------------------------------------------------- #0 0 0 0 00 [0 days] #1 0 8000000 10000000 00 [--------] # 解释: # project id 0 为 /openebs 的project quota # project id 1 为 上述pvc的project quota root@stonetest:~# kubectl -n xfs exec -it busybox-test-64856dd56f-9282t -- sh / # df -h Filesystem Size Used Available Use% Mounted on overlay 96.7G 26.3G 70.4G 27% / tmpfs 64.0M 0 64.0M 0% /dev /dev/vdc 7.6G 0 7.6G 0% /mnt/volume1 /dev/vda1 96.7G 26.3G 70.4G 27% /dev/termination-log /dev/vda1 96.7G 26.3G 70.4G 27% /etc/resolv.conf /dev/vda1 96.7G 26.3G 70.4G 27% /etc/hostname /dev/vda1 96.7G 26.3G 70.4G 27% /etc/hosts shm 64.0M 0 64.0M 0% /dev/shm tmpfs 14.4G 12.0K 14.4G 0% /var/run/secrets/kubernetes.io/serviceaccount tmpfs 7.8G 0 7.8G 0% /proc/acpi tmpfs 64.0M 0 64.0M 0% /proc/kcore tmpfs 64.0M 0 64.0M 0% /proc/keys tmpfs 64.0M 0 64.0M 0% /proc/timer_list tmpfs 7.8G 0 7.8G 0% /proc/scsi tmpfs 7.8G 0 7.8G 0% /sys/firmware / # cd /mnt/volume1 /mnt/volume1 # ls -l && df -B 1K /mnt/volume1 total 0 Filesystem 1K-blocks Used Available Use% Mounted on /dev/vdc 8000000 0 8000000 0% /mnt/volume1 /mnt/volume1 # fallocate -l 6G aaa /mnt/volume1 # ls -l && df -B 1K /mnt/volume1 total 6291456 -rw-r--r-- 1 root root 6442450944 Dec 9 09:40 aaa Filesystem 1K-blocks Used Available Use% Mounted on /dev/vdc 8000000 6291456 1708544 79% /mnt/volume1 /mnt/volume1 # fallocate -l 3G bbb /mnt/volume1 # ls -l && df -B 1K /mnt/volume1 total 9437184 -rw-r--r-- 1 root root 6442450944 Dec 9 09:40 aaa -rw-r--r-- 1 root root 3221225472 Dec 9 09:41 bbb Filesystem 1K-blocks Used Available Use% Mounted on /dev/vdc 8000000 8000000 0 100% /mnt/volume1 /mnt/volume1 # fallocate -l 2G ccc fallocate: fallocate 'ccc': No space left on device /mnt/volume1 # fallocate -l 50M ddd /mnt/volume1 # ls -l && df -B 1K /mnt/volume1 total 9488384 -rw-r--r-- 1 root root 6442450944 Dec 9 09:40 aaa -rw-r--r-- 1 root root 3221225472 Dec 9 09:41 bbb -rw-r--r-- 1 root root 0 Dec 9 09:41 ccc -rw-r--r-- 1 root root 52428800 Dec 9 09:41 ddd Filesystem 1K-blocks Used Available Use% Mounted on /dev/vdc 8000000 8000000 0 100% /mnt/volume1 /mnt/volume1 # du -s . 9488384 . root@stonetest:~# xfs_quota -x xfs_quota> report -a Project quota on /openebs (/dev/vdc) Blocks Project ID Used Soft Hard Warn/Grace ---------- -------------------------------------------------- #0 0 0 0 00 [0 days] #1 9488384 8000000 10000000 00 [6 days] project id 1 已使用9488384KB, 已超过softlimit,还未超过hardlimit。 ``` ### 关闭/开启/手动编辑 project quota xfs_quota -x, 根据help输入相应的命令即可。 ``` root@stonetest:~# xfs_quota -x xfs_quota> help df [-bir] [-hN] [-f file] -- show free and used counts for blocks and inodes disable [-gpu] [-v] -- disable quota enforcement dump [-g|-p|-u] [-f file] -- dump quota information for backup utilities enable [-gpu] [-v] -- enable quota enforcement help [command] -- help for one or all commands limit [-g|-p|-u] bsoft|bhard|isoft|ihard|rtbsoft|rtbhard=N -d|id|name -- modify quota limits off [-gpu] [-v] -- permanently switch quota off for a path path [N] -- set current path, or show the list of paths print -- list known mount points and projects project [-c|-s|-C|-d <depth>|-p <path>] project ... -- check, setup or clear project quota trees quit -- exit the program quot [-bir] [-g|-p|-u] [-acv] [-f file] -- summarize filesystem ownership quota [-bir] [-g|-p|-u] [-hnNv] [-f file] [id|name]... -- show usage and limits remove [-gpu] [-v] -- remove quota extents from a filesystem report [-bir] [-gpu] [-ahnt] [-f file] -- report filesystem quota information restore [-g|-p|-u] [-f file] -- restore quota limits from a backup file state [-gpu] [-a] [-v] [-f file] -- get overall quota state information timer [-bir] [-g|-p|-u] value [-d|id|name] -- set quota enforcement timeouts warn [-bir] [-g|-p|-u] value -d|id|name -- get/set enforcement warning counter Use 'help commandname' for extended help. ``` ## 遗留问题 - 使用xfs project quota时,当pvc删除之后,`xfs_quota -x -c 'report -h' /openebs` 仍显示此pvc的project quota, openebs并没有将次quota条目删除。可使用`xfs_quota -x -c 'limit -p bsoft=0 bhard=0 {project-id}' /openebs` 手动删除此project quota. 使用ext4 project quota没有此问题。 - 不论xfs或ext4,将pv挂载至容器之后,在容器内,pv的挂载点显示的总容量为 soft limit(本例中,8GB),但实际可写入的容量为 hard limit(本例中,10GB)。此为系统限制,无法更改。 - 使用xfs project quota时,无法查阅project id对应的path。这是因为openebs没有将project id 和 path 写入 /etc/projects 和 /etc/projid 文件中。pvc多了以后,维护起来非常困难,因为分不清哪个project id对应哪个pvc的目录。ext4不使用/etc/projects和/etc/projid,无此问题。 ## 总结 - ext4 和 xfs 都可使用project quota特性来限制pvc的可用空间。 - 基于以上遗留问题,推荐优先使用 ext4 project quota. - openebs localpv provisioner 官方镜像有bug(limit计算问题,pvc删除后未自动清除quota条目问题,未写入/etc/projects(projid)文件等问题),且仅支持blocks limit,不包括inodes limit,需修复和改进。