Currently RWO volumes are provided in OCS (gcs upstream yet to get the feature), using gluster-block project, which depends on tcmu-runner project to provide a iSCSI block interface to the files.
In the containerized environments, the need of PV is mainly to provide storage till the given pod/container runs. After the container finishes the task, the PV is no more required. Considering that particular use case, we experimented with loopback block files instead of iscsi based block files for containers.
The workflow is something very basic:
NOTES:
Create Block Volume:
truncate -s $size /mnt/glusterfs/$file-on-gluster-mount
) and is setup using losetup
gets a /dev/loopNN
file.
I think we can avoid direct calls to losetup (so we don't have to track the loop devices)
"jstrunk"
Ack, a file should be sufficient. I added losetup, mainly because it can serve the Block Volume part of CSI too.
"amarts"
Could consider enlarging existing bhv also via distribute
"jstrunk"
Create Volume:
mkfs.xfs /dev/loopNN
(where loopNN is the device file created).Mount Volume:
mount /dev/loopNN /mount/point
This needs to account for mounting the bhv exactly once on a node (perferably in the CSI node pod, not exposed to the host), then mounting the individual files as needed (these mounts would be exposed to the host).
"jstrunk"
Unmount?
Unmount bhv after last reference is gone.
Also need to consider cleanup on crash/shutdown.
"jstrunk"
Delete Volume:
Snapshot:
Clone:
We need to think about (not implement initially) how we handle space consumption due to snapshot & clone. I think this will require ability to grow BHV.
jstrunk
The crutial critical section of the workflow here is, who creates the BHV, and when.
Considering, there can be no co-ordination with CSI controller pods, it is recommended to have BHV creation code in glusterd2, as it contains enough locking logic to handle only one BHV request even if there are 100s of PVC reaching it.
This would be handled by the CSI provisioner pod (there is only 1). I think it can internally serialize bhv creation, allowing it all to go into CSI.
jstrunk
Technically, it looks very similar to what we should be doing in glusterd2 for current gluster-block project.
To prove that this works, we have had a CSI container from personal repo, and it seemed to work fine.
Now, to get it merged, We need a logic to handle the cases of creating BHV, and handling the case of BHV full. With that in place, we can call the implementation good to go for MVP-0.