--- tags: Admin --- <a href="https://hackmd.io/@teoroo-cluster" alt="Teoroo Cluster"> <img src="https://img.shields.io/badge/Teoroo--CMC-All%20Public%20Notes-black?style=flat-square" /></a> # Teoroo cluster: admin guide Here you can find information about the cluster setup and how one maintaines the cluster. Those information should be open to all group members, but only admin(s) should need to read those. For administration of the wordpress page, please refer to [this note](https://hackmd.io/@teoroo-cluster/wordpress). ## Routine task cheatsheet Those are the tasks that admins have to do routinely. ### Manual backup ```bash cd borg_backup # on clooney, swith to root and run with a tmux session source borg.env # yyyy-mm-dd <- replace with date borg create --list --exclude-from borg.exclude ::20xx-xx-xx /homefs/ --progress ``` ```bash # on rackham, need the arhive password. # List all versions borg list /proj/uppoff2019008/Backups/clooney # List files for an archive borg list /proj/uppoff2019008/Backups/clooney::2021-05-03 ``` There is a way to limit the bandwidth used by bord, the `borg.env` configures the borg command according to [this link](https://borgbackup.readthedocs.io/en/stable/faq.html?highlight=pv-wrapper#is-there-a-way-to-limit-bandwidth-with-borg). Check the link to see how to set the limit. To restore, use [`borg mount`](https://borgbackup.readthedocs.io/en/stable/usage/mount.html). #### Reducing the amount of backups ```bash borg prune --list --keep-weekly 12 --keep-monthly -1 ``` This command keeps one backup a week for the past 12 months, and afterwards keeps one backup for every month. Note that this needs to be -1. This means it needs to keep one for every month. #### Setup of borg backup Borg backup is used through rackham using ssh keys. This means that the user who hosts the backup on rackham needs to have the ssh key in their authorized keys keyfile. Note that this only needs to be changed when ownership of the backups is transferred. Uppmax will always ask for the user password first. Setup an ssh key pair to avoid this. After this is done, backups can always be made on clooney as long as the correct password is being used. Note that this password is specific for the backup and it not the same as the root password. When transferring the backup to a new user, make sure that they have been added to the correct uppoff project on Uppmax. Afterwards, one must give them permission to modify the backup folder. To give permission to everyone in the group: ```bash DIR=/proj/uppoff2019008/Backups/ GROUP=uppoff2019008 find $DIR -type d ! -perm -2070 -print0 | xargs -r -n 10 -0 chmod -v +2070 find $DIR -type f ! -perm -g+rw -print0 | xargs -r -n 10 -0 chmod -v g+rw find $DIR ! -group $GROUP -print0 | xargs -r -n 10 -0 chgrp -v $GROUP ``` ### Adding user ```bash # ssh to router@router and switch to root adduser --no-create-home --ingroup teoroo {username} vim /etc/auto.home # here, add the NFS location of the new user to the file cd /var/yp/ make # this add the new user to the NIS server # ssh to clooney as root mkdir /homefs/{username} chown -R {username}:teoroo /homefs/{username} vim /etc/exports #add the user's home folder to NFS exports service nfs-server reload su - {username} #test the new user, it should land in the home folder now # ssh to brosnan cp /etc/skel/.* . ``` Boiler plate to send to the user ``` Your password is: xxxxxxxxxx Login with `ssh {username}@teoroo2.kemi.uu.se`, you will need a VPN or the university network to access teoroo2. Once logged in, you can and should change password with `yppasswd` Some information about software, storage and hardware can be found here: https://hackmd.io/@teoroo-cluster/user-guide ``` ### Booting/shutdown sequence shutdown 1. shutdown jackie and w nodes 2. shutdown clooney and aberlour 3. shutdown brosnan 4. shutdown router boot teoroo2 1. boot router 2. boot clooney, and aberlour 3. boot brosnan, jackie, and w machines ### Disk failure on Clooney ::: spoiler Outdated documentation - On the new file server (Clooney) - I do not know the procedure. The controller is `LSI 3108` (look in the document I sent you) and reading on the net some 100 years ago I found that one can use the `MegaCli64` tool I use to monitor the health. Also, it seems that the failed disk is not marked by blinking LED so other suggested using ```bash ./MegaCli64 -PDList -aALL # Cause the front LED of the drive to blink to help locate a particular drive: ./MegaCli64 -PdLocate -start -physdrv\[E:S\] -aALL # Stop the blinking: ./MegaCli64 -PdLocate -stop -physdrv\[E:S\] -aALL ``` ::: #### Status report As the admin, you will receive an email everyday from logwatch updating you on the status of Clooney's disks. This email is entitled "Logwatch for RAID", and the contains the following important information: ```bash Device Present ================ Virtual Drives : 1 Degraded : 0 Offline : 0 Physical Devices : 14 Disks : 12 Critical Disks : 0 Failed Disks : 0 ``` As of now, this is the only part of the email that contains information. **In the case of a failed disk, this will only be visible here.** It is also important to check that `Degraded: 0`. Any other status will require further investigation. Clooney uses a RAID6 data storage approach that consists of 12 physical hard drives. This means that, in theory, two disks can fail before data loss occurs. However, in the case of a disk failling, it is important to replace it as soon as possible to reduce stress on the system. #### Disk management tool The updated version of the `MegaCli64` disk management tool is called `storcli` and is also available on Clooney. Both tools can be found in the `/bin` folder. Storcli is recommended for admin tasks, as it is more recent and documentation is easier to find. The reference manual can be found [here](https://docs.broadcom.com/doc/12352476), and a useful list of commands [here](https://www.thomas-krenn.com/en/wiki/StorCLI_commands). An alternate approach of replacing the failed disk using `MegaCli64` can be found [here](https://globalroot.wordpress.com/2013/06/18/megacli-raid-levels/) but this has never been tried. #### Replacing a failed disk In the case of a failed disk, the first thing to do is to check the disk statuses using ```bash ./storcli /c0 /eall /sall show ``` This command returns the status of all disks in the RAID array, and ideally the output will look something like this ```bash! Drive Information : ================= ------------------------------------------------------------------------------- EID:Slt DID State DG Size Intf Med SED PI SeSz Model Sp Type ------------------------------------------------------------------------------- 0:0 1 Onln 0 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - 0:1 2 Onln 0 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - 0:2 3 Onln 0 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - 0:3 4 Onln 0 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - 0:4 5 Onln 0 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - 0:5 6 Onln 0 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - 0:6 7 Onln 0 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - 0:7 8 Onln 0 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - 0:8 9 Onln 0 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - 0:9 10 Onln 0 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - 0:10 11 Onln 0 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - 0:11 12 Onln 0 7.276 TB SATA HDD N N 512B ST8000NM0055-1RM112 U - ------------------------------------------------------------------------------- ``` The state of the drive is the important part here, if anything other than Onln (online) is shown, further investigation is needed. To get more in-depth information on all drives use ```bash ./storcli /c0 /eall /sall show ``` If the disk has failed, it should be replaced with a new one. These, as is the case for aberlour's disks can be found in Chao's office. They are in box underneath his bookcase. The box that contains a disk for Clooney is marked with 'Cloney' so should be easy to find. The exact kind of disk is ```bash Seagate Enterprise Exos 7E8 8TB SATA3 3.5" 7k2 256MB 512e HDD-T8000-ST8000NM0055 ``` Note that this number is the same as is reported in the `Drive Information` block above. When you have found the replacement disk, go down to Clooney and locate the red LED light which marks the failed disk. Make sure to double check that this is actually the failed disk by turning on the blinking LED functionality. This can be done with ```bash ./storcli /c0 /e0 /sx start locate # Here x is the slot number ``` `/sx` is the slot number of the failed disk and can be found in the `EID:Slt` column in the `Drive Information` block above. To stop the LED blinking ```bash ./storcli /c0 /e0 /sx stop locate # Here x is the slot number ``` Now that you have located the failed disk the procedure ([ref.](https://slowkow.com/notes/raid-fix/#locate-the-failed-drive) and [ref.](https://knowledgebase.45drives.com/kb/kb450183-replacing-drives-in-an-array-using-storcli/)) is as follows ```bash # Set the failed drive as Offline ./storcli /c0 /e0 /sx set offline # x = Controller defined slot number # Set the failed drive as Missing ./storcli /c0 /e0 /sx set missing #Spindown the failed drive ./storcli /c0 /eall /sx spindown # NOTE Use /eall here ``` Then remove the failed drive and replace it with a new drive. Documentation for this can be found [here](https://www.supermicro.com/manuals/chassis/1U/SC812.pdf) (See section 2-2). You will have to replace the disk from the drive handle chassis by unscrewing it, as the new disks do not come with one. The rebuild then starts automatically and can be monitored using ```bash ./storcli /c0 /eall /sall show rebuild ``` It is possible to change how fast the rebuild occurs using `storcli /c0 set rebuildrate=<value>`. This value should be between `0` and `100`, and is `30` by default on Clooney. Changing this value can slow down I/O speeds significantly and might make it difficult to login to Brosnan so do this with caution. With the a rate of `30` a rebuild of one drive should take less than 24 hours. Make sure to order a new drive for when this occurs again in the future, and that's it. ### Disk failure - HP Server For HP servers aberlour (and mackmyra): The mail will have something like this in the report ```bash > /usr/sbin/hpacucli controller slot=1 logicaldrive all show status logicaldrive 1 (40.0 GB, 6): Interim Recovery Mode logicaldrive 2 (16.3 TB, 6): Interim Recovery Mode > /usr/sbin/hpacucli controller slot=1 physicaldrive all show status physicaldrive 1I:1:1 (port 1I:box 1:bay 1, 3 TB): OK physicaldrive 1I:1:2 (port 1I:box 1:bay 2, 3 TB): OK physicaldrive 1I:1:3 (port 1I:box 1:bay 3, 3 TB): OK physicaldrive 1I:1:4 (port 1I:box 1:bay 4, 3 TB): OK physicaldrive 1I:1:5 (port 1I:box 1:bay 5, 3 TB): OK physicaldrive 1I:1:6 (port 1I:box 1:bay 6, 3 TB): Failed physicaldrive 1I:1:7 (port 1I:box 1:bay 7, 3 TB): OK physicaldrive 1I:1:8 (port 1I:box 1:bay 8, 3 TB): OK ``` - Find a replacement disk from the inventory. - on the old HP machines (mackmyra and aberlour) the failed disk will flash. Just remove the failed and insert the replacement. Check with the command bellow, it should say rebuilding. ```bash $ /usr/sbin/hpacucli controller slot=1 logicaldrive all show status logicaldrive 1 (40.0 GB, 6): OK logicaldrive 2 (16.3 TB, 6): Recovering, 36% complete $ /usr/sbin/hpacucli controller slot=1 physicaldrive all show status physicaldrive 1I:1:1 (port 1I:box 1:bay 1, 3 TB): OK physicaldrive 1I:1:2 (port 1I:box 1:bay 2, 3 TB): OK physicaldrive 1I:1:3 (port 1I:box 1:bay 3, 3 TB): OK physicaldrive 1I:1:4 (port 1I:box 1:bay 4, 3 TB): OK physicaldrive 1I:1:5 (port 1I:box 1:bay 5, 3 TB): OK physicaldrive 1I:1:6 (port 1I:box 1:bay 6, 3 TB): Rebuilding physicaldrive 1I:1:7 (port 1I:box 1:bay 7, 3 TB): OK physicaldrive 1I:1:8 (port 1I:box 1:bay 8, 3 TB): OK ``` - Job done (to increase the priority of rebuilding you can run) `hpacucli controller slot=1 modify rebuildpriority=[high|medium|low]` ### Adding node to TEOROO2 1. Install the OS on the system and hook it to the node - Default usesrname, does it matter? - Disk partition (use LVM)? - Automatic update? (perhaps no) - The following clients/services have to run on the node. `openssh-server`, `autofs`, `nis`, `nfs-common` 3. Configuring DHCP and DNS on TEOROO2 - Visit https://router:10000, login with root and password - Create the host in DHCP server - Choose the hostname and IP address 4. Configure DHCP on the node You should only need to configure the network so that it looks for a DHCP service. [ref](https://www.tecmint.com/install-dhcp-server-client-on-centos-ubuntu/). Check with `ifconfig` that the ip address is correct. Check that you can access the node from the login node. 6. Confgure NIS If it's the first install on Ubuntu there will be a prompt for you to input the domain. You'll still need to edit `/etc/yp.conf` and `/etc/nsswitch.conf` [ref](https://www.server-world.info/en/note?os=Ubuntu_16.04&p=nis&f=2). add nis to the lines (?automount) retart the services`systemctl restart rpcbind nis autofs` known issue for Ubuntu until 16[^ubuntu16_bug] [^ubuntu16_bug]: See https://askubuntu.com/questions/771319/in-ubuntu-16-04-not-start-rpcbind-on-boot 9. Configuring NIS on the node - For Ubuntu machines Edit the `/etc/yp.conf` and `/etc/defaultdomain` to match the domain and ip - For Centos/Scientific Linux machines Run system-config-authentication (GUI tool) Or edit `/etc/yp.conf` and `/etc/systemconfig/network` 4. Configure the NTP change the following line in `/etc/ntp.cfg` from ``` server mackmyra.cluster.mkem.uu.se ``` to ``` server 10.1.10.1 ``` then restart the ntp service ``` /etc/init.d/ntpd restart ``` then test ``` ntpq -p ``` 6. Configure the software mount ```bash #add this to /etc/fstab clooney:/homefs/sw /sw nfs defaults 0 0 # mount the directory mount -a ``` 12. Configure mail client The mail client config is different for Ubuntu(Postfix) and SL(sendmail) - On SL machine Edit `/etc/mail/genericdomain`, set to `{node}.cluster`. Edit `/etc/mail/sendmail.mc`, set `MASQUERADE_AS` to `kemu.uu.se`. Test with `root` that `sendmail` works. 14. Result After the configuration the node will function as a normal compute nodes in teoroo2. User can login using their teoroo2 account and password, they will have access to their files on teoroo2 on this node. Old, local files on this node will still be in `/homefs`. ::: ## Setup details ### Service information Both teoroo and teoroo2 and build up with the following services: - Router: exposes the cluster to external - DNS: assigning domain names for nodes in the subnet - DHCP: assigning ip addresses for nodes in the subnet - NFS: serving files - NIS: serving account and home folder information - SSH (login node): the default place people lands | Service | teoroo2 | teoroo (not accessable) | | ------- | ------------------ | ------------------------ | | Router | router | router | | DNS | router | mackmyra | | DHCP | router | mackmyra | | NFS | clooney & aberlour | mackmyra | | NIS | router | mackmyra | | SSH | brosnan | teoroologin | Thoes services are configured in a similar way such that the migration of compute nodes are designed to be easy. The difference is that in teoroo2 most services are running on router rather than a dedicated machine in the subnet. To migrate machines from one server to the other one need only to modify the configurations (for the service and the compute node). ### Fail2ban fail2ban bans some suspicious IP address if they failed certain times (e.g. with a wrong password). The config file locates at `/etc/fail2ban` on `brosnan`. To check manually the banned IPs, run `fail2ban-client status sshd` on `brosnan`. ## Monitoring A new monitor system has been set up using zabbix. This includes a zabbix **server** running on `jackie` and zabbix **agents** running on all currently running nodes of teoroo2. In a nutshell, the agents run (customizable) scripts and reports to the server so that they can be monitored in one place. A dashbord should be running on <jackie:19999>, to which the admin has access to. This dashboard should cover most of the information an admin is interested in. And upon problems zabbix should notify the sysadmin through mail notifications (configured in Zabbix->Users->Users->teoroosys). As admin you can add the port forward to your `.ssh/config` to have the dashboard available once you log into brosnan, at <localhost:19999>. ``` Host BROSNAN HostName teoroo2.kemi.uu.se ForwardAgent yes LocalForward 19999 jackie:19999 ``` The old system based on cron jobs is still running. Its setup is described below in archived notes. The admin might want to switch off those cron jobs once the new system is deemed as stable. ### Zabbix Setup details The system is setup according to the [zabbix installation doc](https://www.zabbix.com/download). Specifically: - Official or borrowed tempaltes: - All nodes uses the `Linux by Zabbix agent` template; - All nodes also uses the [`Zabbix Smartctl` template](https://github.com/v-zhuravlev/zbx-smartctl) for disk self-checks; - Jackie and brosnan uses the [`Nvidia GPUs Performance` template](https://github.com/plambe/zabbix-nvidia-smi-multi-gpu/); - Extra scripts (in `/etc/zabbix/zabbix_agent*.d`): - Aberlour and clooney runs respective RAID checks defined in `raid_info.conf`. - Cloony reports latested borg backup state defined in `borginfo.conf` - Aberlour reports ambient temperature defined in `ambient_temp.conf` ## Troubleshooting This is an incomplete list of bugs that past admins have solved before: ### `autofs` and `/home` directories On all nodes the `/home/user` directories are controlled by the `autofs` service, sometimes the services might hang, preventing users from accessing their home, to fix that, try: ```bash service autofs restart ``` ### Restarting services After large fixes, such as repairing the filesystem, several important services might hang. These can be restarted using: ```bash for s in {nis,rpcbind,autofs}; do service $s restart; done ``` ### DB lock on NFS fix ```bash sudo systemctl enable rpc-statd # Enable statd on boot sudo systemctl start rpc-statd # Start statd for the current session ``` ### Failed disk on Clooney Drive s1 failed, I marked the drive as Offline, and Missing but rebuild started automatically without needing to replace the failed disk. Rebuild finished succesfully, and it seems operational for now. ```bash ./storcli /c0 /e0 /s1 set offline ./storcli /c0 /e0 /s1 set missing ``` ### File system corruption On two different occasions Clooney's file system got corrupted, entering a read-only state. This can be fixed by restarting Clooney and then running e2fsck. Make sure that the disk is unmounted first. ```bash umount /dev/sda e2fsck -fp /dev/sda ``` ## Past admins Lisanne Knijff (lisanne.knijf@kemi.uu.se) Yunqi Shao (yunqi.shao@chalmers.se) Pavlin Mitev (pavlin.mitev@uppmax.uu.se) --- # Archived notes ::: spoiler folded ## Copying password across clusters copy the corresponding lines in `/etc/shadow` to the NIS server of the other cluster. ## granting access from outside By default the computers can only be accessed from within Sweden. If one wants to access the cluster (without a VPN) one can do this by editting the `/etc/host.allow` file on the login node. (you can get the IP address of that person using https://ipv6-test.com) ## Setup of mail alert Most alerts and monitoring information is sent to the admin(s) with email, this requires a working mail-sending service, below is the setup for each node. **teoroo** ```crontab 0 1 * * * /root/ProLiant_Status.awk ``` **aberlour** ```crontab 0 1 * * * /root/ProLiant_Status.awk */15 * * * * /root/Ambient_Temp.awk ``` **clooney** ``` 7 7 * * * /root/bin/RAID-status.sh | /usr/sbin/sendmail teoroosys@kemi.uu.se ``` **W nodes** most nodes via Ubuntu or ScientificLinux package `logwatch` - just install it the rest is done by the package (set mail address to teoroosys@kemi.uu.se during installation). Some nodes has `smartmontools` running to monitor the HDD health and sends a mail in case of pre-failure, high temperature, etc. ### Disk failure report The disk status on the file servers is monitored by cron jobs send mails every night. On some nodes, the daemon of `smatmontools` is running and sends mails in case of error, failure or high temperature of the disks. On disk failure, see [here](#Disk-failure), you can probably find a spare disk in the [inventory](/xF7Rvb_sRsKSD02lRycoBQ). ## QNAP The qnap NAS can be accesse at the address https://qnap:443, with the username: `admin` and admin password. :::