---
tags: User, WIP
---
<a href="https://hackmd.io/@teoroo-cluster" alt="Teoroo Cluster">
<img src="https://img.shields.io/badge/Teoroo--CMC-All%20Public%20Notes-black?style=flat-square" /></a>
# Teoroo cluster: data management
Here you can find information about how data backed up in the TEOROO cluster, that includes:
- how to setup a data backup for yourself
- how files are backed up systemwise
## Incremental snapshots
- Recover from accidents (delete a file, etc)
- Snapshot (hourly, daily, weekly, etc)
- Avoid wasting file (plain copy)[^1]

[^1]: Image source: https://www.nakivo.com/blog/what-is-incremental-backup/
Available in:
- [Tetralith](https://www.nsc.liu.se/support/storage/snic-centrestorage/recover-deleted-files/)
- [Rackham](https://www.uppmax.uu.se/support/user-guides/disk-storage-guide/)
## Setting up a snapshot
We suggest two ways to do make the backup:
- [rsync-time-backup(rtb)](https://github.com/laurent22/rsync-time-backup.git)
- **Pros**
- Easier to set up
- Transparent file structure
- **Cons**
- Naive deduplicate (does not handle, e.g. file moving)
- [borg](https://www.borgbackup.org/)
- **Pros**
- Supports encryption/compression/deduplicate
- **Cons**
- Harder to set up
### Locally snapshots
Here we demonstrate how to create a snapshot locally on Teoroo2.
It is suggested for you to put your files in `$HOME/BACKUP`, which
we will **not** backup to Uppmax regularly.
Assuming we would like to to backup a folder with the following
structure:
```
>> ll demoproj/
total 8.0K
-rw-r--r--+ 1 yunqi teoroo 18 Mar 31 10:41 exclude.txt
-rw-r--r-- 1 yunqi teoroo 0 Mar 31 10:45 hello.txt
drwxr-xr-x 2 yunqi teoroo 4.0K Mar 31 10:42 HUGEFILES/
-rw-r--r-- 1 yunqi teoroo 0 Mar 31 10:51 READEME.md
```
In `exclude.txt` we listed the files we do not wish to backup,
according to the rsync exclude file format ([tutorial](https://sites.google.com/site/rsync2u/home/rsync-tutorial/the-exclude-from-option))
```exclude
- **HUGEFILE/*
- *~
```
**rtb**
```bash
/sw/rtb.sh $HOME/myproj $HOME/BACKUP/myproj $HOME/myproj/exclude.txt
```
`rtb` automatically cleans up old backups to save space, this can be
changed adjusted with the option `--strategy`
> Default: "1:1 30:7 365:30" means after one
> day, keep one backup per day. After 30 days, keep one backup every 7 days.
> After 365 days keep one backup every 30 days.
**borg**
In borg, the backup folder is called a "repo", without encryption.
```bash
/sw/borg init -e none $HOME/BACKUP/myproj # First time
/sw/borg create --patterns-from $HOME/myproj/exclude.txt $HOME/BACKUP/myproj::inital $HOME/myproj
```
To check the files in a borg repo:
```bash
/sw/borg list $HOME/BACKUP/myproj # lists the backups in a repo
/sw/borg list $HOME/BACKUP/myproj::initial # list the files in a backup
```
To restore a file please check [here](https://borgbackup.readthedocs.io/en/stable/quickstart.html#restoring-a-backup).
### Dry run
There's some difference how rtb and borg handles exclusion/inclusion
patterns when you are not sure, use a dry run to check:
```bash
# rsync (rtb)
rsync -aP --dry-run demoproj/ test --exclude-from demoproj/exclude.txt
# borg
/sw/borg create --patterns-from $HOME/myproj/exclude.txt --list --dry-run $HOME/BACKUP/borgproj/::test $HOME/myproj/
`date +'%y-%m-%d'`
```
### Automate the backup
Run `crontab -e` and add the following line
```crontab
# m h dom mon dow command
0 3 * * * /sw/rtb.sh $HOME/myproj $HOME/BACKUP/myproj $HOME/myproj/exclude.txt
0 3 * * * /sw/borg create --patterns-from $HOME/myproj/exclude.txt $HOME/BACKUP/borgproj/::`date +'%y-%m-%d'` $HOME/myproj/
```
In borg more flexibility is available, please check [here](https://borgbackup.readthedocs.io/en/stable/quickstart.html#restoring-a-backup).
### Backup to a remote server via ssh
Both rtb and borg supports backing up to remote servers via `ssh`.
```bash
# for rtb
/sw/rtb.sh $HOME/myproj $USER@rackham.uppmax.uu.se:$HOME/myproj $HOME/myproj/exclude.txt
# for borg
/sw/borg init -e none yunqi@rackham.uppmax.uu.se:$HOME/myproj
```
Since rtb logins with `ssh` multiple times during the backup,
it's tedious to enter the password each time.
One way to do this is to use a password-protected ssh key and
use the ssh-agent. First generate a ssh key (in `.ssh` by default)
```bash
ssh-keygen # and follow the instruction
# copy the key to the remote server e.g. rackham
ssh-copy-id rackham.uppmax.uu.se
```
And add the following lines to your `.bashrc`
```bash
eval `ssh-agent -s`
ssh-add # or run this when you need ssh
```
Now you only need to enter your password once (see also
documentation on from ssh-agent [Github](https://docs.github.com/en/github/authenticating-to-github/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent))
It is possible not recommended to setup automatic backup to remote
server with a passwordless ssh key. To backup to remote server safely
one can use a dedicated ssh key for backup and restrict it
following the instruction in the [borg documentaion](https://borgbackup.readthedocs.io/en/stable/quickstart.html#remote-repositories)
## Systemwise offsite snapshot
The home folder in the TEOROO cluster is regularly backed up
to the Rackham cluster in Uppmax. The backup process is setup using
the [borg](https://www.borgbackup.org/) code. The backup is done
in the weekend. The files on the remote server is encrypted
and the password is hold by the system admin.
**Note:**
- The `$HOME/BACKUP` folder in every user home folder is reserved for the
your personal backup or snapshots, they are **NOT** backup up by the
systemwise snapshot.