volume-backup - HackMD

# Overview ``` Folder Structure Usage (backup volume) ── Prerequisites ── Installing volume-backup Service ── Checking volume-backup Service ── Changing backup time ── Uninstalling volume-backup service Usage (restore volume) Principle of backup-volume service Principle of restore.sh script ``` ## Folder Structure --- ``` volume-backup ├── build.sh ├── Dockerfile ├── res │ └── tools │ └── kubectl ├── conf │ └── config.ini ├── deploy │ └── helm │ └── harbor ├── deploy.sh ├── README.md ├── backuptime_setting.sh ├── restore.sh ├── check_service.sh └── uninstall.sh ``` ## Usage (backup volume) --- ### Prerequisites * Kubernetes cluster 1.20+ * Helm v3.2.0+ must be installed on your workstation * Block-storage-system-longhorn, longhorn 1.4+ * A nfs-server with a shared directory that can be accessed by Longhorn Before installing volume-backup service, use following command to delete default setting "backup-target" in longhorn: ``` kubectl delete setting -n longhorn-system backup-target ``` ### Installing volume-backup Service To deploy volume-backup service with ansible, you need to config "nfs_address" in inventory file. To deploy volume-backup service with deploy.sh script, navigate into the directory of volume-backup and use following command to create an image and push it to “Harbor repository“: ``` $ bash build.sh ``` Then use following command to deploy service: ``` $ bash deploy.sh ``` ### Checking volume-backup Service To check volume-backup service with check_service.sh script, navigate into the directory of volume-backup and use following command: ``` $ bash check_service.sh ``` ### Changing backup time By default, volumes backup every four hours after deploying the volume-backup service. You can freely modify the backup time for each volume, meaning that the backup time for every volume can be independent. To change backup time, navigate into the directory of volume-backup and use following command: ``` $ bash backuptime_setting.sh ``` ### Uninstalling volume-backup service To uninstall volume-backup service with uninstall.sh script, navigate into the directory of volume-backup and use following command: ``` $ bash uninstall.sh ``` ## Usage (restore volume) --- :warning: It can only restore volumes which are now attached to pods that are scaled by statefulset and deployment. To restore volume with restore.sh script, first you have to install jq, then use following command to restore volume automatically. ``` $ bash restore.sh ``` ## Principle of backup-volume service --- In this backup-volume service, we first create six kinds of recurringjobs in longhorn, which continuously backup volumes with different time slots. ![](https://hackmd.io/_uploads/SynzmY3hh.png) Next, we deploy a pod which keeps executing a python file. Below is what this python file does. 1. It sets node-down-pod-deletion-policy to delete-both-statefulset-and-deployment-pod. In this way, the pod can be transferred to other available nodes when the current node is down. 2. Run two threads. First thread does "label volumes" every one minute. The other thread does "delete redundant backups" every one hour. **label volumes** Check each volume's labels. Patch recurringjobs to those new volumes that don't have "recur*" labels and then set "recur4hour" enabled. See the picture below. ![](https://hackmd.io/_uploads/SJnwUYn3h.png) **delete redundant backups** There is a field called "retain" for every recurringjob, which is set as two by default, meaning that two backups would be kept for each recurringjob. However, we may change backup time sometimes, leading to redundant backups for each volume. For example, now we have two backups for recurringjob 4hour of A volume. we change backuptime to 30min, then it will have two recurringjob 4hour backups and two recurringjob 30min backups after. To prevent this, it checks if each volume has more than two backups totally. If so, it deletes the oldest redundant backups. ## Principle of restore.sh script --- #### Step 1: Firstly, the script lists all the volumes in the 'longhorn-system' namespace, and ask you to select the one you wish to delete and restore. Since deleting a volume may lead to the removal of the corresponding PV and PVC, the script initially removes the 'finalizers' from the volume. This ensures that the PV and PVC are preserved. #### Step 2: In this step, it stops the pod which is associated with the deleted volume. (If we don't stop the pod, it won't attach to the volume which is going to be restored later.) However, the pod is owned by statefulset or deployment, so if we want to stop the pod, we have to scale down the replicas of statefulset or deployment to 0. That is, if replicas are > 1, other pod(s) stop as well. #### Step 3: Wait until the pod(s) termination is(are) all complete. #### Step 4: Restore volume. It restores the latest backup. #### Step 5: Restart the pod(s) by scaling the replicas.