# RWO using loopback on gluster ----- Currently RWO volumes are provided in OCS (gcs upstream yet to get the feature), using [gluster-block]() project, which depends on tcmu-runner project to provide a iSCSI block interface to the files. In the containerized environments, the need of PV is mainly to provide storage till the given pod/container runs. After the container finishes the task, the PV is no more required. Considering that particular use case, we experimented with loopback block files instead of iscsi based block files for containers. The workflow is something very basic: NOTES: - 'Volume' here means rwo-block-volume - BHV stands for Block Hosting Volume, which is a glusterfs volume, which can host 1 or more Block volumes (ie, glusterfs files). #### APIs, and what they should do: * Create Block Volume: - If BHV (Block Hosting Volume) exists, a file is created (`truncate -s $size /mnt/glusterfs/$file-on-gluster-mount`) and is setup using `losetup` gets a `/dev/loopNN` file. > I think we can avoid direct calls to losetup (so we don't have to track the loop devices) > ``` > truncate /path/of/file > mkfs.xfs /path/of/file > mount -o loop /path/of/file /mnt > ``` > [name="jstrunk"] > Ack, a file should be sufficient. I added losetup, mainly because it can serve the Block Volume part of CSI too. > [name="amarts"] - If BHV is not present, create one with give parameters. - If BHV exists and not having enough space, create another BHV in existing nodes. > Could consider enlarging existing bhv also via distribute > [name="jstrunk"] - If there is no storage for creating BHV, then raise the event, so operator can pick up the signal and add more nodes, or more storage. * Create Volume: - Perform create block volume - do `mkfs.xfs /dev/loopNN` (where loopNN is the device file created). * Mount Volume: - `mount /dev/loopNN /mount/point` > This needs to account for mounting the bhv exactly once on a node (perferably in the CSI node pod, not exposed to the host), then mounting the individual files as needed (these mounts would be exposed to the host). > [name="jstrunk"] * Unmount? - Umount the volume. > Unmount bhv after last reference is gone. > Also need to consider cleanup on crash/shutdown. > [name="jstrunk"] * Delete Volume: - Delete the file on gluster mount. - Check if this is the last file on BHV, and if yes, consider deleting BHV. * Snapshot: - Send reflink/copy-file-range request to glusterfs's file. * Clone: - Same as Snapshot, but the new file would be usable here. > We need to think about (not implement initially) how we handle space consumption due to snapshot & clone. I *think* this will require ability to grow BHV. > [name=jstrunk] #### Implementation details The crutial critical section of the workflow here is, who creates the BHV, and when. Considering, there can be no co-ordination with CSI controller pods, it is recommended to have BHV creation code in glusterd2, as it contains enough locking logic to handle only one BHV request even if there are 100s of PVC reaching it. > This would be handled by the CSI provisioner pod (there is only 1). I think it can internally serialize bhv creation, allowing it all to go into CSI. > [name=jstrunk] Technically, it looks very similar to what we should be doing in glusterd2 for current gluster-block project. #### mvp-0 To prove that this works, we have had a CSI container from personal repo, and it seemed to work fine. Now, to get it merged, We need a logic to handle the cases of creating BHV, and handling the case of BHV full. With that in place, we can call the implementation good to go for MVP-0.