# Design for VolumeGroupReplication
## Problem statement
Provide an option to do RBD Group Replication where users can choose the RBD PVC dynamically to do group level replication which provides point in time replication for all the volumes in the group
This is going to be based on the approach where admin/user will choose to add labels to the PVC to add/remove pvc's from the group.
## Approach 1
* We need to have 2 CRD like below
### VolumeGroupReplication CRD
```yaml
apiVersion: replication.storage.openshift.io/v1alpha1
kind: VolumeGroupReplication
metadata:
name: volumereplication-sample
namespace: default
spec:
volumeGroupReplicationClass: volumereplicationclass-sample
replicationState: primary
sync: auto
source:
selector:
matchLabels:
group: testing
```
### VolumeGroupReplicationClass CRD
```yaml
apiVersion: groupreplication.storage.openshift.io/v1alpha1
kind: VolumeGroupReplicationClass
metadata:
name: groupvolumereplicationclass-sample
spec:
provisioner: example.provisioner.io
parameters:
replication.storage.openshift.io/group-replication-secret-name: secret-name
replication.storage.openshift.io/group-replication-secret-namespace: secret-namespace
# schedulingInterval is a vendor specific parameter. It is used to set the
# replication scheduling interval for storage volumes that are replication
# enabled using related VolumeReplication resource
schedulingInterval: 1m
```
We are going to have single CRD for replication control but when it comes to the implementation we need to have 2 different functionalities under the hood. We need to make More API calls to bring the acutal state to the desired state
* GetVolumeGroup
* CreateVolumeGroup
* ModifyVolumeGroup
* EnableVolumeGroupReplication
* PromoteVolumeGroupReplication
The above are the few API calls need to make to have create of the CR and also when someone adds a label to the PVC and for the State change i.e for changing the `replicationState` from `primary` to `secondary`
## Approach 2
* Having CR like below for volume grouping
```yaml
apiVersion: group.storage.openshift.io/v1alpha1
kind: VolumeGroupClass
metadata:
name: myVolumeGroup
namespace: default
spec:
provisioner: example.provisioner.io
parameters:
clusterID: xyz
group.storage.openshift.io/group-secret-name: secret-name
group.storage.openshift.io/group-secret-namespace: secret-namespace
```
```yaml
apiVersion: group.storage.openshift.io/v1alpha1
kind: VolumeGroup
metadata:
name: myVolumeGroup
namespace: default
spec:
source:
selector:
matchLabels:
group: testing
groupName: group-uuid
```
```yaml
apiVersion: group.storage.openshift.io/v1alpha1
kind: VolumeGroupContent
metadata:
name: group-uuid
spec:
volumeHandles:
- name: pvc-xxx
- name: pvc-xyz
groupHandle: csi-group-identifier
groupAttributes:
xyz: "abc"
```
We are going to have VolumeGroup CRD for group operatorion which intern calls required and only necessary RPC calls related to grouping nothing else
* GetVolumeGroup
* CreateVolumeGroup
* ModifyVolumeGroup
* We can reuse the Replication CRD by just chaning the datasource to support volumeGroup.
```yaml
apiVersion: replication.storage.openshift.io/v1alpha1
kind: VolumeReplication
metadata:
name: volumereplication-sample
namespace: default
spec:
volumeReplicationClass: volumereplicationclass-sample
replicationState: primary
replicationHandle: replicationHandle # optional
dataSource:
kind: VolumeGroup
name: myVolumeGroup # should be in same namespace as VolumeReplication
```
The above are the few API calls need to make to have create of the CR and also when someone adds a label to the PVC and for the State change i.e for changing the `replicationState` from `primary` to `secondary`
## Pros/Cons of both approach
### Approach 1
#### Pros
- New 2 CRD's for group replication
#### Cons
- two extra New CRD's New implementation
- Complex internal logic to handle group and implmentation as we are making it tightly coupled
- More number of RPC calls as we need to keep the desired/acutal flow idempotent
- Might get into complex problems later on as we are closing tieing group replication with pvc grouping
### Approach 2
#### Pros
- 1 New CRD only for grouping
- Replication flow will remain the same as we are just chaning the dataSource
- Already existing code for group we just need csi side implementation or some minor changes to controller
- The current flow can be reused when kubernetes implements some native volumegroup concept.
- Anyone can reuse group concept if they need only group and can build new things on top of it.
- Only Required RPC calls will be made if there any change in the group level as we need to keep only group idempotent
#### Cons
- New CRD for grouping
## CSI-addons spec change
### Volumegroup
https://github.com/csi-addons/spec/tree/main/volumegroup contains the implementation of grouping
### VolumeReplication for Group
ReplicationSource already exists in the spec, we need to check the fesibility of it.
## CRD in csi-addons operator
### Volumegroup
* New CRD and controller need to be added for it
* It need to watch for the pvc objects and perform group operations
* Create/Modify/DeleteGroup RPC calls
### VolumeReplication for Group
* Extend the VolumeReplication controller to accept the volumegroup object and do group level operations based on the group ID
* Internal spec changes for csi-addons controller to csi-addons sidecar
### VolumegroupClass
* New CRD need to be added for it.
## CephCSI implmentation
### VolumeGroup operation
* CreateVolumeGroup
* DeleteVolumeGroup
* ModifyVolumeGroup
Above are the 3 mandatory RPC calls need to be implemented in the cephcsi
### OMAP regeneration
As grouping already exists in the RBD we need to generate the grouping omap.
We might need to add VolumeGroupContent which can be similar to snapshotcontent or snapshotgroupcontent which will have the metadata information which can help cephcsi to regenerate the omap metadata for grouping.
### VolumeReplication of group operation
The eixsting RPC calls of replication need to support group as well.
## RBD ClI Output
## Open items
## Tasks
## integration
## Questions
* Do we need to consider cloned PVC and do all the implemetation we are doing for 4.17 EPIC which Rakshith is working on? Dependecy (Risk) [flattening restored/clone pvc when it is a part of group]
* Does rbd supports adding images to the group when mirroring is enabled.
* Does rbd supports removing images to the group when mirroring is enabled.
* RBD Grouping Constraints
* Does RBD images can be part of two groups.
*
* Go-ceph API's dependency (at Risk)
* Will individual RBD image mirrored as part of a group still reflect the same status as being mirrored a individual level ? (used as part of PVC deletion, watcher check etc)
* Can we delete the group after taking snapshot. if this disallows what will happen.
> can i delete a group when i have snapshots created for that group.
Yes
> can i delete a group when i have mirroring enabled for that group
yes, Delete will disable the mirroring internally
> Does rbd supports adding images to the group when mirroring is enabled.
Yes
> Does rbd supports removing images to the group when mirroring is enabled.
Not straight away, because there is a check at rbd cli level which validates if image being part of any group and ristricts the same. So one need to remove the image from group and only then be allowed to delete the image.
> Deleting the group will also deletes the user snapshot created for that group? i mean if i delete the group i cannot restore the snapshot which was created as part of that group
Right
> Will individual RBD image mirrored as part of a group still reflect the same status as being mirrored a individual image level ?
Currently if it is being added to group only group status show the status, and image status shows `status not found`
>Will individual RBD image mirrored as part of a group still reflect the same status as being mirrored a individual level ? (used as part of PVC deletion, watcher check etc)
Yes
> If the image is already mirror enabled it, if we add that image to a group and try to enable mirroring on it, it will fail, is that correct
Yes
> Can we delete the group after taking snapshot.
yes it will delete all the snapshots when we delete the group with an image, if we remove the images and later delete the group we can still have the images.
## Low level design for approach 2 (volumegroup and replicate volume group)
## CSI-addons spec change
### Volumegroup
https://github.com/csi-addons/spec/tree/main/volumegroup contains the implementation of grouping
### VolumeReplication for Group
ReplicationSource already exists in the spec, we need to check the fesibility of it.
## CRD in csi-addons operator
### Volumegroup
* New CRD and controller need to be added for it
* It need to watch for the pvc objects and perform group operations
* Create/Modify/DeleteGroup RPC calls
### VolumeReplication for Group
* Extend the VolumeReplication controller to accept the volumegroup object and do group level operations based on the group ID
* Internal spec changes for csi-addons controller to csi-addons sidecar
### VolumegroupClass
* New CRD need to be added for it.
## CephCSI implmentation
### VolumeGroup operation
Prereq/Constraints from rbd
* A image can be part of only one group at a time.
We need grouping for both volumegroupsnapshot and volumegroup functionality. As both are not dependent on each other we need to have single grouping in cephcsi and should be shared between these two functionality.
* CreateVolumeGroup
* Req:
* RequestID (?)
* VolID's
* Secrets
* Parameters
* Response:
* GroupID
* Implementation
* Validate the volumeID's, secrets and parameters
* Check all the volumeID's are valid and exists in the clusters
* Check if the volumeID's is not already part of any other group
* If they are part of group
* Check all the images present in the group
* If present
* Add/Update Ref count
* Return Success
* If not present
* Return Error
* If not
* Create the group
* Create a group in rados omap
* Update all the images to contain the group ID
* Create the group in rbd
* Add Ref Count
* Return back the response
* DeleteVolumeGroup
* Req:
* GroupID
* Secrets
* Parameters
* Response:
* {}
* Implementation
* Validate the GroupID, secrets and parameters
* Remove/Add extra images from the omap
* Remove/Add extra images from the group
* Update Ref Count
* Return success response
* How to ensure that no one use using it? before deleting it? Add a ref counter concept to this as well like we do for ROX.
* Adding Ref Count in modify can cause problem?
* We should have something like Operation (key): (int)value so that we dont update the ref count
* Add Request(ID) as the key to ref count?
* ModifyVolumeGroup
* Req:
* VolID's
* GroupID
* Secrets
* Parameters
* Response:
* {}
* Implementation
* Validate the volumeID's secrets and parameters
* Check all the volumeID's are valid and exists in the clusters
* Check if the volumeID's are part of this group
* Get the list of images from the group
* Remove/Add extra images from the omap
* Remove/Add extra images from the group
* Return success response
* Question
* Locking need to be shared for group operation (both volumegroup snapshot and volumegroup should share the locking) external-snapshotter and csi-addons contains sync is required.
Group Replication
* EnableVolumeReplication
* All the exists operations will be done in group level instead of volume level.
* A swtich statement need to be added to handle either volume/group
* DisableVolumeReplication
* All the exists operations will be done in group level instead of volume level.
* A swtich statement need to be added to handle either volume/group
* PromoteVolume
* All the exists operations will be done in group level instead of volume level.
* A swtich statement need to be added to handle either volume/group
* DemoteVolume
* All the exists operations will be done in group level instead of volume level.
* A swtich statement need to be added to handle either volume/group
* ResyncVolume
* All the exists operations will be done in group level instead of volume level.
* A swtich statement need to be added to handle either volume/group
Extra work:
* Deprecate volumeID and use volumeSource as the identifier