owned this note
owned this note
Published
Linked with GitHub
Backup-Dienst
===
###### tags: development service
## Einleitung
Standardmässig wird an jedem Schultag um 17 Uhr ein Backup aller Nutzer- und Systemdaten erstellt.
## Server einrichten
Der Backup-Dienst benötigt eine Kiste mit einer ZFS-Partition resp. ZFS-Pool von ausreichender Grösse.
Dann die ansible-rules anwenden. Fertig
## Backup einrichten
In der Datei
```
/etc/opinsys/backup/debian_hosts_to_backup.txt
```
steht die Liste der PuavoBoxen von denen ein Backup gemacht werden soll.
z.B.:
```
pbx-halix.einszueins.opinsys.fi
pbx-gts-fried.einzueins.opinsys.fi
```
Beachte, dass die "offiziellen" kanonischen Namen zu verwenden sind.
:::info
Es gibt noch die Datei `/etc/opinsys/backup/ubuntu_hosts_to_backup.txt`, welche für ältere PuavoBoxen bentötigt wurde. Es genügt daher eine leere Datei mit `touch /etc/opinsys/backup/ubuntu_hosts_to_backup.txt` anzulegen.
:::
Das Backup wird dann von Montag bis Freitag um 17Uhr vom Backuphost angestossen.
## PuavoBox konfigurieren
Auf der Seite der PuavoBox (Organisation --> Servers --> Bootservers) muss noch die IP und der PublicKey eingetragen werden, damit sich der Backupdienst auf der PuavoBox per ssh anmelden kann:
```json
{
"puavo.admin.backup.authorized_hosts": "127.0.0.1"
"puavo.admin.backup.authorized_ssh_pubkey": "ssh-rsa ...."
}
```
## Dateien für den Backupdienst
### File: /etc/cron.d/backup
```bash
#
# backup script
#
#PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
# do backups
00 18 * * 1-5 root /usr/local/sbin/backup --auto
# end backups
30 7 * * 1-5 root /usr/bin/pkill -fx '/bin/sh /usr/local/sbin/backup --auto'
```
### File: /root/.profile
```bash
# ~/.profile: executed by Bourne-compatible login shells.
if [ "$BASH" ]; then
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
fi
keychain ~/.ssh/backup_id_rsa
mesg n
```
### File: /usr/local/sbin/backup
```bash
#!/bin/sh
set -eu
backupbasedir=/backup
ssh_cmd='ssh -i /root/.ssh/backup_id_rsa -o StrictHostKeyChecking=no
-o NumberOfPasswordPrompts=0'
topdomain='opinsys.fi' # hardcode topdomain because we might have conflicts
# with backup paths otherwise
error_report=''
rsync_stats_report=''
success_report=''
exitstatus=0
datetime="$(date -Iseconds | awk -F+ '{ print $1 }')"
if [ -z "$datetime" ]; then
echo 'Could not determine the current date' >&2
exit 1
fi
debian_hosts_to_backup="$(cat /etc/opinsys/backup/debian_hosts_to_backup.txt)"
ubuntu_hosts_to_backup="$(cat /etc/opinsys/backup/ubuntu_hosts_to_backup.txt)"
if [ "${1:-}" = '--auto' ]; then
hosts_to_backup="$debian_hosts_to_backup $ubuntu_hosts_to_backup"
elif [ $# -gt 0 ]; then
hosts_to_backup="$@"
else
echo "Usage: $(basename $0) [--auto | list of hosts to backup]" >&2
exit 1
fi
hostdirs_to_backup=''
for host in $hosts_to_backup; do
if echo "$ubuntu_hosts_to_backup" | grep -Fqx "$host"; then
hostdirs_to_backup="${hostdirs_to_backup} ${host}:/etc ${host}:/home"
else
hostdirs_to_backup="${hostdirs_to_backup} ${host}:/home ${host}:/state"
fi
done
rsync_pids=''
get_target_dir() {
local hostorg subdir host organisation
hostorg=$1
subdir=$2
host="${hostorg%%.*}"
org="${hostorg#*.}"
if [ -z "$host" ]; then
echo "Could not determine host from $hostorg" >&2
return 1
fi
if [ -z "$org" ]; then
echo "Could not determine organisation from $hostorg" >&2
return 1
fi
if [ -z "$subdir" ]; then
echo "subdir was not set for $hostorg" >&2
return 1
fi
echo "${backupbasedir}/${org}/${host}${subdir}"
}
kill_backup_processes() {
echo 'Interrupted, killing remaining backup_processes' >&2
pkill -s 0 -x rsync || true
i=0
while [ "$i" -lt 10 ] && pkill -s 0 -x rsync; do
sleep $i
done
if [ "$i" -eq 10 ]; then pkill -9 -s 0 -x rsync || true; fi
}
last_successful_backups_report() {
find "${backupbasedir}/.zfs/snapshot" -mindepth 5 -maxdepth 5 \
-name BACKUP_OK -type f \
| sort \
| awk -F/ -v hostdirs_to_backup="$hostdirs_to_backup" '
# Find the last successful backup time for each host,
# or NONE if no successful backups were done.
BEGIN {
split(hostdirs_to_backup, hostdirlist, " ")
for (i in hostdirlist) {
hostdir_str = hostdirlist[i]
split(hostdir_str, hostdir, ":")
host = hostdir[0]
subdir = hostdir[1]
hostcomp_count = split(host, host_components, ".")
if (hostcomp_count < 2) { continue }
host = host_components[1]
org = host_components[2]
last_successful_backup[ org, host, subdir ] = ""
last_successful_backup[ org, host, subdir ] = ""
}
}
{
snapshot_time = $5
org = $6
host = $7
subdir = $8
last_successful_backup[ org, host, subdir ] = snapshot_time
}
END {
for (backupinfo in last_successful_backup) {
split(backupinfo, a, SUBSEP)
org = a[1]
host = a[2]
subdir = a[3]
snapshot_time = last_successful_backup[backupinfo]
if (!snapshot_time) { snapshot_time = "NONE" }
print snapshot_time, (host "." org), org, host, subdir
}
}
' \
| sort -s -k 5,5 \
| sort -s -k 4,4 -V \
| sort -s -k 3,3 \
| sort -s -k 1,1 \
| awk -v topdomain="$topdomain" '
# Now lines are sorted by organisation, hostname, subdir and snapshot
# time. Collect hosts with the same previous_time to same line.
$1 != previous_time {
if (previous_time) { print previous_time, hosts }
previous_time = $1
hosts = $2 "." topdomain ":/" $5
next
}
{ hosts = hosts " " $2 "." topdomain ":/" $5 }
END { print previous_time, hosts }
' \
| awk '
# Move NONE lines to beginning, those hosts with no successful backups
# are the most critical.
$1 != "NONE" { oklist = oklist $0 "\n"; next }
{ print }
END { printf "%s", oklist }
' \
| awk '
$1 == "NONE" { print "These hosts/subdirs have no successful backups:" }
$1 != "NONE" {
timestamp = $1
cmd = sprintf("date -R -d %s", timestamp)
cmd | getline timestamp_verbose
printf "These hosts/subdirs have their last successful backup on %s (%s):\n",
timestamp_verbose, timestamp
close(cmd)
}
{ for (i = 2; i <= NF; i++) { print " " $i } }
'
}
trap kill_backup_processes INT TERM
. "/root/.keychain/$(hostname)-sh"
echo 'Starting backups...'
for hostdirorg in $hostdirs_to_backup; do
hostorg=$(echo "$hostdirorg" | awk -F: '{ print $1 }')
subdir=$(echo "$hostdirorg" | awk -F: '{ print $2 }')
if ! target_dir="$(get_target_dir "$hostorg" "$subdir")"; then
exitstatus=1
continue
fi
mkdir -p "$target_dir"
errlogfile="${target_dir}/latest-backup-rsync.errlog"
logfile="${target_dir}/latest-backup-rsync.log"
statsfile="${target_dir}/latest-backup-rsync.stats"
rm -f "$errlogfile" \
"$logfile" \
"$statsfile" \
"${target_dir}/BACKUP_FAILED" \
"${target_dir}/BACKUP_OK"
rsync -aHS --delete --stats \
--exclude /latest-backup-rsync.errlog \
--exclude /latest-backup-rsync.log \
--exclude /latest-backup-rsync.stats \
--log-file="${logfile}" -e "$ssh_cmd" \
"${hostorg}.${topdomain}:${subdir}/" "${target_dir}/" \
> "$statsfile" 2>"$errlogfile" &
rsync_pids="$rsync_pids ${hostorg}|${subdir}|$!"
done
echo 'Waiting for all backups to complete...'
for hostorg_subdir_pid in $rsync_pids; do
hostorg="$(echo "$hostorg_subdir_pid" | awk -F\| '{ print $1 }')"
subdir="$(echo "$hostorg_subdir_pid" | awk -F\| '{ print $2 }')"
pid="$(echo "$hostorg_subdir_pid" | awk -F\| '{ print $3 }')"
if ! target_dir="$(get_target_dir "$hostorg" "$subdir")"; then
exitstatus=1
continue
fi
if wait "$pid"; then
success_report="${success_report}Backing up ${hostorg}.${topdomain}:${subdir} OK!\n"
touch "${target_dir}/BACKUP_OK"
oklabel='OK'
else
error_report="${error_report}Backing up ${hostorg}.${topdomain}:${subdir} FAILED, see ${target_dir}/latest-backup-rsync.log for details.\n"
touch "${target_dir}/BACKUP_FAILED"
oklabel='FAILED'
exitstatus=1
fi
rsync_stats_report="${rsync_stats_report}${hostorg}.${topdomain}:${subdir} transfer statistics: ($oklabel)\n"
rsync_stats_report="${rsync_stats_report}$(sed 's/^/ ERROR: /' "${target_dir}/latest-backup-rsync.errlog" 2>/dev/null || true)"
rsync_stats_report="${rsync_stats_report}$(sed 's/^/ STATS: /' "${target_dir}/latest-backup-rsync.stats" 2>/dev/null || true)\n\n"
done
if [ "$exitstatus" -eq 0 ]; then
echo
echo 'All backups OK!'
else
echo
echo 'Errors occurred during backup operation.' >&2
fi
if [ -n "${error_report}" ]; then
echo "\nERRORS:\n\n${error_report}\n" >&2
fi
if [ -n "${success_report}" ]; then
echo "\nOK:\n\n${success_report}\n"
fi
if ! zfs snapshot "backup@${datetime}"; then
echo "Making a zfs snapshot for backup@${datetime} FAILED" >&2
exitstatus=1
else
echo "Created a new zfs snapshot for backup@${datetime}, access through ${backupbasedir}/.zfs/snapshot/${datetime}"
fi
echo "\nZFS state:\n"
zfs list || exitstatus=1
echo
zfs list -t snapshot || exitstatus=1
echo "\nrsync transfer statistics:\n\n${rsync_stats_report}\n"
last_successful_backups_report
exit $exitstatus
```
### File: main.yaml
```yaml
---
- name: Setup /etc/apt/sources.list
copy: src=etc/apt/sources.list
dest=/etc/apt/sources.list
- name: Install zfs packages
apt: pkg={{item}} state=latest
with_items:
- zfs-dkms
- zfsutils-linux
- name: Install keychain, needed by backup
apt: pkg=keychain state=latest
- name: Setup /etc/opinsys
file: path=/etc/opinsys state=directory
- name: Setup /etc/opinsys/backup
file: path=/etc/opinsys/backup state=directory
- name: Setup /etc/opinsys/backup/hosts_to_backup.txt
copy: src=etc/opinsys/backup/hosts_to_backup.txt
dest=/etc/opinsys/backup/hosts_to_backup.txt
owner=root
group=root
- name: Install /usr/local/sbin/backup
copy: src=usr/local/sbin/backup
dest=/usr/local/sbin/backup
owner=root
group=root
mode=755
- name: Create /root/.ssh
file: path=/root/.ssh state=directory
- name: Copy /root/.ssh/backup_id_rsa key (private)
copy: src=root/.ssh/backup_id_rsa
dest=/root/.ssh/backup_id_rsa
owner=root
group=root
mode=600
- name: Copy /root/.ssh/backup_id_rsa.pub key (public)
copy: src=root/.ssh/backup_id_rsa.pub
dest=/root/.ssh/backup_id_rsa.pub
owner=root
group=root
mode=644
# these fail before zpool "backup" has been created... it should be created
# with the following command (by hand):
# zpool create backup raidz2 sda sdb sdc sdd sde sdf sdg sdh sdi sdj sdk sdl
- name: Make sure /backup permissions are restrictive
file: path=/backup owner=root group=root mode=700
- name: Install special .profile for root to hookup keychain
copy: src=root/dot_profile
dest=/root/.profile
- name: Setup PATH in crontab
cron: cron_file=backup
env=yes
name=PATH
user="root"
value="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
- name: Setup crontab to start backups
cron: cron_file=backup
hour=18
job="/usr/local/sbin/backup --auto"
minute=00
name="do backups"
state=present
user="root"
weekday="1-5"
- name: Setup crontab to end backups
cron: cron_file=backup
hour=7
job="/usr/bin/pkill -fx '/bin/sh /usr/local/sbin/backup --auto'"
minute=30
name="end backups"
state=present
user="root"
weekday="1-5"
```