---
tags: Admin
---
<a href="https://hackmd.io/@teoroo-cluster" alt="Teoroo Cluster">
<img src="https://img.shields.io/badge/Teoroo--CMC-All%20Public%20Notes-black?style=flat-square" /></a>
# Teoroo cluster: admin guide
Here you can find information about the cluster setup and how one maintaines
the cluster. Those information should be open to all group members, but only
admin(s) should need to read those. For administration of the wordpress page, please refer to [this note](https://hackmd.io/@teoroo-cluster/wordpress).
## Routine task cheatsheet
Those are the tasks that admins have to do routinely.
### Manual backup
```bash
cd borg_backup # on clooney, swith to root and run with a tmux session
source borg.env # yyyy-mm-dd <- replace with date
borg create --list --exclude-from borg.exclude ::20xx-xx-xx /homefs/ --progress
```
```bash
# on rackham, need the arhive password.
# List all versions
borg list /proj/uppoff2019008/Backups/clooney
# List files for an archive
borg list /proj/uppoff2019008/Backups/clooney::2021-05-03
```
There is a way to limit the bandwidth used by bord, the `borg.env` configures the borg command according to [this link](https://borgbackup.readthedocs.io/en/stable/faq.html?highlight=pv-wrapper#is-there-a-way-to-limit-bandwidth-with-borg). Check the link to see how to set the limit.
To restore, use [`borg mount`](https://borgbackup.readthedocs.io/en/stable/usage/mount.html).
#### Reducing the amount of backups
```bash
borg prune --list --keep-weekly 12 --keep-monthly -1
```
This command keeps one backup a week for the past 12 months, and afterwards keeps one backup for every month. Note that this needs to be -1. This means it needs to keep one for every month.
#### Setup of borg backup
Borg backup is used through rackham using ssh keys. This means that the user who hosts the backup on rackham needs to have the ssh key in their authorized keys keyfile. Note that this only needs to be changed when ownership of the backups is transferred. Uppmax will always ask for the user password first. Setup an ssh key pair to avoid this. After this is done, backups can always be made on clooney as long as the correct password is being used. Note that this password is specific for the backup and it not the same as the root password.
When transferring the backup to a new user, make sure that they have been added to the correct uppoff project on Uppmax. Afterwards, one must give them permission to modify the backup folder. To give permission to everyone in the group:
```bash
DIR=/proj/uppoff2019008/Backups/
GROUP=uppoff2019008
find $DIR -type d ! -perm -2070 -print0 | xargs -r -n 10 -0 chmod -v +2070
find $DIR -type f ! -perm -g+rw -print0 | xargs -r -n 10 -0 chmod -v g+rw
find $DIR ! -group $GROUP -print0 | xargs -r -n 10 -0 chgrp -v $GROUP
```
### Adding user
```bash
# ssh to router@router and switch to root
adduser --no-create-home --ingroup teoroo {username}
vim /etc/auto.home # here, add the NFS location of the new user to the file
cd /var/yp/
make # this add the new user to the NIS server
# ssh to clooney as root
mkdir /homefs/{username}
chown -R {username}:teoroo /homefs/{username}
vim /etc/exports #add the user's home folder to NFS exports
service nfs-server reload
su - {username} #test the new user, it should land in the home folder now
# ssh to brosnan
cp /etc/skel/.* .
```
Boiler plate to send to the user
```
Your password is: xxxxxxxxxx
Login with `ssh {username}@teoroo2.kemi.uu.se`, you will need a VPN or the university network to access teoroo2.
Once logged in, you can and should change password with `yppasswd`
Some information about software, storage and hardware can be found here:
https://hackmd.io/@teoroo-cluster/user-guide
```
### Booting/shutdown sequence
shutdown
1. shutdown jackie and w nodes
2. shutdown clooney and aberlour
3. shutdown brosnan
4. shutdown router
boot teoroo2
1. boot router
2. boot clooney, and aberlour
3. boot brosnan, jackie, and w machines
### Disk failure on Clooney
::: spoiler Outdated documentation
- On the new file server (Clooney) - I do not know the procedure. The controller is `LSI 3108` (look in the document I sent you) and reading on the net some 100 years ago I found that one can use the `MegaCli64` tool I use to monitor the health. Also, it seems that the failed disk is not marked by blinking LED so other suggested using
```bash
./MegaCli64 -PDList -aALL
# Cause the front LED of the drive to blink to help locate a particular drive:
./MegaCli64 -PdLocate -start -physdrv\[E:S\] -aALL
# Stop the blinking:
./MegaCli64 -PdLocate -stop -physdrv\[E:S\] -aALL
```
:::
#### Status report
As the admin, you will receive an email everyday from logwatch updating you on the status of Clooney's disks. This email is entitled "Logwatch for RAID", and the contains the following important information:
```bash
Device Present
================
Virtual Drives : 1
Degraded : 0
Offline : 0
Physical Devices : 14
Disks : 12
Critical Disks : 0
Failed Disks : 0
```
As of now, this is the only part of the email that contains information. **In the case of a failed disk, this will only be visible here.** It is also important to check that `Degraded: 0`. Any other status will require further investigation.
Clooney uses a RAID6 data storage approach that consists of 12 physical hard drives. This means that, in theory, two disks can fail before data loss occurs. However, in the case of a disk failling, it is important to replace it as soon as possible to reduce stress on the system.
#### Disk management tool
The updated version of the `MegaCli64` disk management tool is called `storcli` and is also available on Clooney. Both tools can be found in the `/bin` folder.
Storcli is recommended for admin tasks, as it is more recent and documentation is easier to find. The reference manual can be found [here](https://docs.broadcom.com/doc/12352476), and a useful list of commands [here](https://www.thomas-krenn.com/en/wiki/StorCLI_commands).
An alternate approach of replacing the failed disk using `MegaCli64` can be found [here](https://globalroot.wordpress.com/2013/06/18/megacli-raid-levels/) but this has never been tried.
#### Replacing a failed disk
In the case of a failed disk, the first thing to do is to check the disk statuses using
```bash
./storcli /c0 /eall /sall show
```
This command returns the status of all disks in the RAID array, and ideally the output will look something like this
```bash!
Drive Information :
=================
-------------------------------------------------------------------------------
EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp Type
-------------------------------------------------------------------------------
0:0 1 Onln 0 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U -
0:1 2 Onln 0 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U -
0:2 3 Onln 0 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U -
0:3 4 Onln 0 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U -
0:4 5 Onln 0 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U -
0:5 6 Onln 0 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U -
0:6 7 Onln 0 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U -
0:7 8 Onln 0 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U -
0:8 9 Onln 0 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U -
0:9 10 Onln 0 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U -
0:10 11 Onln 0 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U -
0:11 12 Onln 0 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U -
-------------------------------------------------------------------------------
```
The state of the drive is the important part here, if anything other than Onln (online) is shown, further investigation is needed.
To get more in-depth information on all drives use
```bash
./storcli /c0 /eall /sall show
```
If the disk has failed, it should be replaced with a new one. These, as is the case for aberlour's disks can be found in Chao's office. They are in box underneath his bookcase. The box that contains a disk for Clooney is marked with 'Cloney' so should be easy to find. The exact kind of disk is
```bash
Seagate Enterprise Exos 7E8 8TB
SATA3 3.5" 7k2 256MB 512e
HDD-T8000-ST8000NM0055
```
Note that this number is the same as is reported in the `Drive Information` block above.
When you have found the replacement disk, go down to Clooney and locate the red LED light which marks the failed disk. Make sure to double check that this is actually the failed disk by turning on the blinking LED functionality. This can be done with
```bash
./storcli /c0 /e0 /sx start locate # Here x is the slot number
```
`/sx` is the slot number of the failed disk and can be found in the `EID:Slt` column in the `Drive Information` block above.
To stop the LED blinking
```bash
./storcli /c0 /e0 /sx stop locate # Here x is the slot number
```
Now that you have located the failed disk the procedure ([ref.](https://slowkow.com/notes/raid-fix/#locate-the-failed-drive) and [ref.](https://knowledgebase.45drives.com/kb/kb450183-replacing-drives-in-an-array-using-storcli/)) is as follows
```bash
# Set the failed drive as Offline
./storcli /c0 /e0 /sx set offline
# x = Controller defined slot number
# Set the failed drive as Missing
./storcli /c0 /e0 /sx set missing
#Spindown the failed drive
./storcli /c0 /eall /sx spindown # NOTE Use /eall here
```
Then remove the failed drive and replace it with a new drive. Documentation for this can be found [here](https://www.supermicro.com/manuals/chassis/1U/SC812.pdf) (See section 2-2). You will have to replace the disk from the drive handle chassis by unscrewing it, as the new disks do not come with one.
The rebuild then starts automatically and can be monitored using
```bash
./storcli /c0 /eall /sall show rebuild
```
It is possible to change how fast the rebuild occurs using
`storcli /c0 set rebuildrate=<value>`. This value should be between `0` and `100`, and is `30` by default on Clooney. Changing this value can slow down I/O speeds significantly and might make it difficult to login to Brosnan so do this with caution. With the a rate of `30` a rebuild of one drive should take less than 24 hours.
Make sure to order a new drive for when this occurs again in the future, and that's it.
### Disk failure - HP Server
For HP servers aberlour (and mackmyra): The mail will have something like this in the report
```bash
> /usr/sbin/hpacucli controller slot=1 logicaldrive all show status
logicaldrive 1 (40.0 GB, 6): Interim Recovery Mode
logicaldrive 2 (16.3 TB, 6): Interim Recovery Mode
> /usr/sbin/hpacucli controller slot=1 physicaldrive all show status
physicaldrive 1I:1:1 (port 1I:box 1:bay 1, 3 TB): OK
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, 3 TB): OK
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, 3 TB): OK
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, 3 TB): OK
physicaldrive 1I:1:5 (port 1I:box 1:bay 5, 3 TB): OK
physicaldrive 1I:1:6 (port 1I:box 1:bay 6, 3 TB): Failed
physicaldrive 1I:1:7 (port 1I:box 1:bay 7, 3 TB): OK
physicaldrive 1I:1:8 (port 1I:box 1:bay 8, 3 TB): OK
```
- Find a replacement disk from the inventory.
- on the old HP machines (mackmyra and aberlour) the failed disk will flash. Just remove the failed and insert the replacement. Check with the command bellow, it should say rebuilding.
```bash
$ /usr/sbin/hpacucli controller slot=1 logicaldrive all show status
logicaldrive 1 (40.0 GB, 6): OK
logicaldrive 2 (16.3 TB, 6): Recovering, 36% complete
$ /usr/sbin/hpacucli controller slot=1 physicaldrive all show status
physicaldrive 1I:1:1 (port 1I:box 1:bay 1, 3 TB): OK
physicaldrive 1I:1:2 (port 1I:box 1:bay 2, 3 TB): OK
physicaldrive 1I:1:3 (port 1I:box 1:bay 3, 3 TB): OK
physicaldrive 1I:1:4 (port 1I:box 1:bay 4, 3 TB): OK
physicaldrive 1I:1:5 (port 1I:box 1:bay 5, 3 TB): OK
physicaldrive 1I:1:6 (port 1I:box 1:bay 6, 3 TB): Rebuilding
physicaldrive 1I:1:7 (port 1I:box 1:bay 7, 3 TB): OK
physicaldrive 1I:1:8 (port 1I:box 1:bay 8, 3 TB): OK
```
- Job done (to increase the priority of rebuilding you can run)
`hpacucli controller slot=1 modify rebuildpriority=[high|medium|low]`
### Adding node to TEOROO2
1. Install the OS on the system and hook it to the node
- Default usesrname, does it matter?
- Disk partition (use LVM)?
- Automatic update? (perhaps no)
- The following clients/services have to run on the node.
`openssh-server`, `autofs`, `nis`, `nfs-common`
3. Configuring DHCP and DNS on TEOROO2
- Visit https://router:10000, login with root and password
- Create the host in DHCP server
- Choose the hostname and IP address
4. Configure DHCP on the node
You should only need to configure the network so that it looks for
a DHCP service. [ref](https://www.tecmint.com/install-dhcp-server-client-on-centos-ubuntu/).
Check with `ifconfig` that the ip address is correct.
Check that you can access the node from the login node.
6. Confgure NIS
If it's the first install on Ubuntu there will be a prompt for you to input the domain. You'll still need to edit `/etc/yp.conf` and `/etc/nsswitch.conf` [ref](https://www.server-world.info/en/note?os=Ubuntu_16.04&p=nis&f=2).
add nis to the lines (?automount)
retart the services`systemctl restart rpcbind nis autofs`
known issue for Ubuntu until 16[^ubuntu16_bug]
[^ubuntu16_bug]: See https://askubuntu.com/questions/771319/in-ubuntu-16-04-not-start-rpcbind-on-boot
9. Configuring NIS on the node
- For Ubuntu machines
Edit the `/etc/yp.conf` and `/etc/defaultdomain` to match the domain and ip
- For Centos/Scientific Linux machines
Run system-config-authentication (GUI tool)
Or edit `/etc/yp.conf` and `/etc/systemconfig/network`
4. Configure the NTP
change the following line in `/etc/ntp.cfg` from
```
server mackmyra.cluster.mkem.uu.se
```
to
```
server 10.1.10.1
```
then restart the ntp service
```
/etc/init.d/ntpd restart
```
then test
```
ntpq -p
```
6. Configure the software mount
```bash
#add this to /etc/fstab
clooney:/homefs/sw /sw nfs defaults 0 0
# mount the directory
mount -a
```
12. Configure mail client
The mail client config is different for Ubuntu(Postfix) and SL(sendmail)
- On SL machine
Edit `/etc/mail/genericdomain`, set to `{node}.cluster`.
Edit `/etc/mail/sendmail.mc`, set `MASQUERADE_AS` to `kemu.uu.se`.
Test with `root` that `sendmail` works.
14. Result
After the configuration the node will function as a normal compute nodes in teoroo2. User can login using their teoroo2 account and password, they will have access to their files on teoroo2 on this node. Old, local files on this node will still be in `/homefs`.
:::
## Setup details
### Service information
Both teoroo and teoroo2 and build up with the following services:
- Router: exposes the cluster to external
- DNS: assigning domain names for nodes in the subnet
- DHCP: assigning ip addresses for nodes in the subnet
- NFS: serving files
- NIS: serving account and home folder information
- SSH (login node): the default place people lands
| Service | teoroo2 | teoroo (not accessable) |
| ------- | ------------------ | ------------------------ |
| Router | router | router |
| DNS | router | mackmyra |
| DHCP | router | mackmyra |
| NFS | clooney & aberlour | mackmyra |
| NIS | router | mackmyra |
| SSH | brosnan | teoroologin |
Thoes services are configured in a similar way such that the migration of
compute nodes are designed to be easy. The difference is that in teoroo2 most
services are running on router rather than a dedicated machine in the subnet. To
migrate machines from one server to the other one need only to modify the
configurations (for the service and the compute node).
### Fail2ban
fail2ban bans some suspicious IP address if they failed certain times (e.g. with
a wrong password). The config file locates at `/etc/fail2ban` on `brosnan`. To
check manually the banned IPs, run `fail2ban-client status sshd` on `brosnan`.
## Monitoring
A new monitor system has been set up using zabbix. This includes a zabbix **server** running on `jackie` and zabbix **agents** running on all currently running nodes of teoroo2. In a nutshell, the agents run (customizable) scripts and reports to the server so that they can be monitored in one place.
A dashbord should be running on <jackie:19999>, to which the admin has access to. This dashboard should cover most of the information an admin is interested in. And upon problems zabbix should notify the sysadmin through mail notifications (configured in Zabbix->Users->Users->teoroosys).
As admin you can add the port forward to your `.ssh/config` to have the dashboard available once you log into brosnan, at <localhost:19999>.
```
Host BROSNAN
HostName teoroo2.kemi.uu.se
ForwardAgent yes
LocalForward 19999 jackie:19999
```
The old system based on cron jobs is still running. Its setup is described below in archived notes. The admin might want to switch off those cron jobs once the new system is deemed as stable.
### Zabbix Setup details
The system is setup according to the [zabbix installation doc](https://www.zabbix.com/download). Specifically:
- Official or borrowed tempaltes:
- All nodes uses the `Linux by Zabbix agent` template;
- All nodes also uses the [`Zabbix Smartctl` template](https://github.com/v-zhuravlev/zbx-smartctl) for disk self-checks;
- Jackie and brosnan uses the [`Nvidia GPUs Performance` template](https://github.com/plambe/zabbix-nvidia-smi-multi-gpu/);
- Extra scripts (in `/etc/zabbix/zabbix_agent*.d`):
- Aberlour and clooney runs respective RAID checks defined in `raid_info.conf`.
- Cloony reports latested borg backup state defined in `borginfo.conf`
- Aberlour reports ambient temperature defined in `ambient_temp.conf`
## Troubleshooting
This is an incomplete list of bugs that past admins have solved before:
### `autofs` and `/home` directories
On all nodes the `/home/user` directories are controlled by the `autofs`
service, sometimes the services might hang, preventing users from accessing
their home, to fix that, try:
```bash
service autofs restart
```
### Restarting services
After large fixes, such as repairing the filesystem, several important services might hang. These can be restarted using:
```bash
for s in {nis,rpcbind,autofs}; do service $s restart; done
```
### DB lock on NFS fix
```bash
sudo systemctl enable rpc-statd # Enable statd on boot
sudo systemctl start rpc-statd # Start statd for the current session
```
### Failed disk on Clooney
Drive s1 failed, I marked the drive as Offline, and Missing but rebuild started automatically without needing to replace the failed disk. Rebuild finished succesfully, and it seems operational for now.
```bash
./storcli /c0 /e0 /s1 set offline
./storcli /c0 /e0 /s1 set missing
```
### File system corruption
On two different occasions Clooney's file system got corrupted, entering a read-only state. This can be fixed by restarting Clooney and then running e2fsck. Make sure that the disk is unmounted first.
```bash
umount /dev/sda
e2fsck -fp /dev/sda
```
## Past admins
Lisanne Knijff (lisanne.knijf@kemi.uu.se)
Yunqi Shao (yunqi.shao@chalmers.se)
Pavlin Mitev (pavlin.mitev@uppmax.uu.se)
---
# Archived notes
::: spoiler folded
## Copying password across clusters
copy the corresponding lines in `/etc/shadow` to the NIS server of the other
cluster.
## granting access from outside
By default the computers can only be accessed from within Sweden. If one wants
to access the cluster (without a VPN) one can do this by editting the
`/etc/host.allow` file on the login node. (you can get the IP address of that
person using https://ipv6-test.com)
## Setup of mail alert
Most alerts and monitoring information is sent to the admin(s) with email, this
requires a working mail-sending service, below is the setup for each node.
**teoroo**
```crontab
0 1 * * * /root/ProLiant_Status.awk
```
**aberlour**
```crontab
0 1 * * * /root/ProLiant_Status.awk
*/15 * * * * /root/Ambient_Temp.awk
```
**clooney**
```
7 7 * * * /root/bin/RAID-status.sh | /usr/sbin/sendmail teoroosys@kemi.uu.se
```
**W nodes**
most nodes via Ubuntu or ScientificLinux package `logwatch` - just install it
the rest is done by the package (set mail address to teoroosys@kemi.uu.se
during installation). Some nodes has `smartmontools` running to monitor the HDD
health and sends a mail in case of pre-failure, high temperature, etc.
### Disk failure report
The disk status on the file servers is monitored by cron jobs send mails every
night. On some nodes, the daemon of `smatmontools` is running and sends mails in
case of error, failure or high temperature of the disks.
On disk failure, see [here](#Disk-failure), you can probably find a spare disk
in the [inventory](/xF7Rvb_sRsKSD02lRycoBQ).
## QNAP
The qnap NAS can be accesse at the address https://qnap:443, with the username:
`admin` and admin password.
:::