--- tags: User, WIP --- <a href="https://hackmd.io/@teoroo-cluster" alt="Teoroo Cluster"> <img src="https://img.shields.io/badge/Teoroo--CMC-All%20Public%20Notes-black?style=flat-square" /></a> # Teoroo cluster: data management Here you can find information about how data backed up in the TEOROO cluster, that includes: - how to setup a data backup for yourself - how files are backed up systemwise ## Incremental snapshots - Recover from accidents (delete a file, etc) - Snapshot (hourly, daily, weekly, etc) - Avoid wasting file (plain copy)[^1] ![](https://i.imgur.com/LAVm2VO.png) [^1]: Image source: https://www.nakivo.com/blog/what-is-incremental-backup/ Available in: - [Tetralith](https://www.nsc.liu.se/support/storage/snic-centrestorage/recover-deleted-files/) - [Rackham](https://www.uppmax.uu.se/support/user-guides/disk-storage-guide/) ## Setting up a snapshot We suggest two ways to do make the backup: - [rsync-time-backup(rtb)](https://github.com/laurent22/rsync-time-backup.git) - **Pros** - Easier to set up - Transparent file structure - **Cons** - Naive deduplicate (does not handle, e.g. file moving) - [borg](https://www.borgbackup.org/) - **Pros** - Supports encryption/compression/deduplicate - **Cons** - Harder to set up ### Locally snapshots Here we demonstrate how to create a snapshot locally on Teoroo2. It is suggested for you to put your files in `$HOME/BACKUP`, which we will **not** backup to Uppmax regularly. Assuming we would like to to backup a folder with the following structure: ``` >> ll demoproj/ total 8.0K -rw-r--r--+ 1 yunqi teoroo 18 Mar 31 10:41 exclude.txt -rw-r--r-- 1 yunqi teoroo 0 Mar 31 10:45 hello.txt drwxr-xr-x 2 yunqi teoroo 4.0K Mar 31 10:42 HUGEFILES/ -rw-r--r-- 1 yunqi teoroo 0 Mar 31 10:51 READEME.md ``` In `exclude.txt` we listed the files we do not wish to backup, according to the rsync exclude file format ([tutorial](https://sites.google.com/site/rsync2u/home/rsync-tutorial/the-exclude-from-option)) ```exclude - **HUGEFILE/* - *~ ``` **rtb** ```bash /sw/rtb.sh $HOME/myproj $HOME/BACKUP/myproj $HOME/myproj/exclude.txt ``` `rtb` automatically cleans up old backups to save space, this can be changed adjusted with the option `--strategy` > Default: "1:1 30:7 365:30" means after one > day, keep one backup per day. After 30 days, keep one backup every 7 days. > After 365 days keep one backup every 30 days. **borg** In borg, the backup folder is called a "repo", without encryption. ```bash /sw/borg init -e none $HOME/BACKUP/myproj # First time /sw/borg create --patterns-from $HOME/myproj/exclude.txt $HOME/BACKUP/myproj::inital $HOME/myproj ``` To check the files in a borg repo: ```bash /sw/borg list $HOME/BACKUP/myproj # lists the backups in a repo /sw/borg list $HOME/BACKUP/myproj::initial # list the files in a backup ``` To restore a file please check [here](https://borgbackup.readthedocs.io/en/stable/quickstart.html#restoring-a-backup). ### Dry run There's some difference how rtb and borg handles exclusion/inclusion patterns when you are not sure, use a dry run to check: ```bash # rsync (rtb) rsync -aP --dry-run demoproj/ test --exclude-from demoproj/exclude.txt # borg /sw/borg create --patterns-from $HOME/myproj/exclude.txt --list --dry-run $HOME/BACKUP/borgproj/::test $HOME/myproj/ `date +'%y-%m-%d'` ``` ### Automate the backup Run `crontab -e` and add the following line ```crontab # m h dom mon dow command 0 3 * * * /sw/rtb.sh $HOME/myproj $HOME/BACKUP/myproj $HOME/myproj/exclude.txt 0 3 * * * /sw/borg create --patterns-from $HOME/myproj/exclude.txt $HOME/BACKUP/borgproj/::`date +'%y-%m-%d'` $HOME/myproj/ ``` In borg more flexibility is available, please check [here](https://borgbackup.readthedocs.io/en/stable/quickstart.html#restoring-a-backup). ### Backup to a remote server via ssh Both rtb and borg supports backing up to remote servers via `ssh`. ```bash # for rtb /sw/rtb.sh $HOME/myproj $USER@rackham.uppmax.uu.se:$HOME/myproj $HOME/myproj/exclude.txt # for borg /sw/borg init -e none yunqi@rackham.uppmax.uu.se:$HOME/myproj ``` Since rtb logins with `ssh` multiple times during the backup, it's tedious to enter the password each time. One way to do this is to use a password-protected ssh key and use the ssh-agent. First generate a ssh key (in `.ssh` by default) ```bash ssh-keygen # and follow the instruction # copy the key to the remote server e.g. rackham ssh-copy-id rackham.uppmax.uu.se ``` And add the following lines to your `.bashrc` ```bash eval `ssh-agent -s` ssh-add # or run this when you need ssh ``` Now you only need to enter your password once (see also documentation on from ssh-agent [Github](https://docs.github.com/en/github/authenticating-to-github/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent)) It is possible not recommended to setup automatic backup to remote server with a passwordless ssh key. To backup to remote server safely one can use a dedicated ssh key for backup and restrict it following the instruction in the [borg documentaion](https://borgbackup.readthedocs.io/en/stable/quickstart.html#remote-repositories) ## Systemwise offsite snapshot The home folder in the TEOROO cluster is regularly backed up to the Rackham cluster in Uppmax. The backup process is setup using the [borg](https://www.borgbackup.org/) code. The backup is done in the weekend. The files on the remote server is encrypted and the password is hold by the system admin. **Note:** - The `$HOME/BACKUP` folder in every user home folder is reserved for the your personal backup or snapshots, they are **NOT** backup up by the systemwise snapshot.