Installation guide SLURM

# Installation guide SLURM In this file, we will explain how Slurm was installed (or attempted installed) on Egil. This is mainly to keep an overview of everything that is done, but it can also be used as a tutorial for installations on other clusters. All scripts are found in Egil > /root/slurm_tools ## EGIL ```cat /etc/centos-release CentOS Linux release 7.4.1708 (Core) ``` ## Operations on compute nodes #### Run commands on nodes rocks run host compute-0-1 "command" rocks run host compute "cp /share/apps/.bashrc /root/" collate=on #### Sharing files See chapter 5.3 in the [Rocks guide][guide]. The files in `/share/apps` are shared among all nodes in the same location. cd /share/apps ### Create global user accounts The *murge* and *slurm* users were created by the script `initialize_users.sh`: ``` bash # ./initialize_users.sh ``` Thereafter, the users were syncronized across all compute nodes using ``` bash # rocks sync users ``` ## MUNGE authentification service The next step is to install MUNGE. MUNGE can be found in the EPEL repository, which can be activated by ``` bash # yum install -y epel-release ``` Then install MUNGE RPM packages ``` bash # yum install -y munge munge-libs munge-devel ``` PS: Sometimes, this does not work as EPEL is not enabled. If executing the command ``` bash # yum repolist ``` and EPEL does not appear as one of the repositories, it is likely that you will need to enable it manually. However, as long it's installed it should appear in `# yum repolist all`. To resolve this, execute ``` bash # yum-config-manager --enable epel ``` When MUNGE is successfully installed, create a secret key and let compute nodes access it: ``` bash dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key chown munge: /etc/munge/munge.key chmod 400 /etc/munge/munge.key ``` All this is covered in `distribute_munge.sh`. ### `ImportError: No module named yummain` This error message occurs when a node does not have the require python-script stored in /usr/share/yum/cli. These can be copied to the compute node by ``` bash rsync -avz /usr/share/yum-cli/* compute-0-XX:/usr/share/yum-cli/ rsync -avz /etc/yum.repos.d compute-0-XX:/etc/ scp /etc/yum.conf compute-0-XX:/etc/yum.conf ``` To fix this on compute node XX, run `./copy_yum.sh XX`. NOTE: This might not resolve the problem ## Install htop, vim, sensors ``` bash rocks run host compute "yum install -y htop vim lm_sensors" ``` ## Install Slurm-Roll Now as Munge is up running, we are ready to install the Slurm roll. First, `slurm*.iso` has to be downloaded from [https://sourceforge.net/projects/slurm-roll/](https://sourceforge.net/projects/slurm-roll/). Thereafter, the slurm-roll is installed as described in the slurm-roll manual: ``` bash export LANG=C rocks add roll slurm*.iso rocks enable roll slurm cd /export/rocks/install rocks create distro yum clean all yum update rocks run roll slurm|sh reboot ``` by doing this, Slurm is installed on the frontend node. You can verify this by running `sinfo` or `squeue`. However, to be able to submit jobs to the compute nodes, Slurm also needs to be install on the compute nodes. 20201126 to be confirmed: Also follow the instructions in the Update section of the [Slurm Roll for Rocks Cluster][slurmroll] manual, page 4. This seems to install the slurm demon `slurmd` and get past errors when executing `rocks sync slurm`: ``` bash export LANG=Crocks disable roll slurmrocks remove roll slurmrocks add roll slurm*.isorocks enable roll slurmcd /export/rocks/installrocks create distroyum clean allyum updatesystemctl restart slurmdbd.servicesystemctl restart slurmctld.servicesystemctl restart slurmd.service ``` ### Install Slurm on compute nodes To install Slurm on the compute nodes, we first need to rebuild `411`. `411` shares between nodes all files listed in `/var/411/Files.mk` and is described in [Appendix C](http://central-7-0-x86-64.rocksclusters.org/roll-documentation/base/7.0/service-411.html) in the Rocks 7.0 user manual : ``` bash make -C /var/411 force rocks run host compute "411get --all" ``` The `libltdl` library is needed to complete the next steps (Werner Saar [dixit](https://sourceforge.net/p/slurm-roll/discussion/general/thread/5f80736e/)), so create a link in /usr/lib64: ``` bash rocks run host compute "ln -s /opt/condor/lib/condor/libltdl.so.7 /usr/lib64/libltdl.so.7" collate=on ``` Enable slurmd on all nodes: ``` bash rocks run host compute "systemctl enable slurmd" collate=on rocks run host compute "systemctl restart slurmd" collate=on ``` Then, we execute `rocks sync slurm`. Make sure that the clocks are synced: ``` bash rocks run host compute "timedatectl | grep 'Local time:'" collate=on ``` Test slurm with this script: ``` bash sbatch -vv /root/slurm_tools/job.sh ``` Logs can be found in `slurm.conf`: ``` grep SlurmctldLog /etc/slurm/slurm.conf SlurmctldLogFile=/var/log/slurm/slurmctld.log SlurmdLogFile=/var/log/slurm/slurmd.log grep `date +"%Y-%m-%dT%H:"` /var/log/slurm/slurmctld.log grep `date +"%Y-%m-%dT%H:"` /var/log/slurm/slurmd.log ... ``` # Install packages on Rocks 7.0 (Manzinata) cluster One can either install packages as RPMs and load the package with `module load <package>` (recommended) or install packages using a brute-force method. The latter is usually done for packages that are not distributed as RPMs. ## Adding RPM Packages to Compute Nodes #### yum, packages, rpms Check installed or available packages: ``` bash yum list installed | grep <package> yum --enablerepo epel list *<package>* ``` Download rpms to install a package: ``` bash yumdownloader --resolve --destdir=. --enablerepo=epel <package> ``` This can be used to create a new distribution (see below) #### Create and install a new Rocks distribution See chapter 5.1 in the [Rocks guide][guide]. In short place packages here: ``` bash # Package location. Use yumdownloader as shown above cd /export/rocks/install/contrib/7.0/x86_64/RPMS # Extend the XML configuration file cd /export/rocks/install/site-profiles/7.0/nodes cp skeleton.xml extend-compute.xml vi extend-compute.xml # Create a distribution cd /export/rocks/install rocks create distro ``` Then reinstall the nodes. Dependencies: according to [this thread](https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2013-February/061458.html), yumdownloader takes care of dependencies, and anaconda "will install all the dependencies that it can find in the local repo (the one created when you run "rocks create distro" in the /export/rocks/install directory)." 2020-11-18: Unfortunately this does not work. I do not see my packages when I do rocks create distro nor in compute node #### Reinstall nodes ``` bash #Check install action on nodes rocks list host #Force install action to one node (use % for all nodes) rocks set host boot compute-0-0 action=install #Reboot a single node rocks run host compute-0-0 "reboot" ``` # Install packages by brute-force method To install a package using the brute-force method, the package needs to be compiled manually and the executable is moved to `/share/apps`, which distribute the apps to all compute nodes. The first time this is done, a path needs to be set to the directory. To add `/share/apps` to `$PATH` for all users, run the following command: ``` bash echo 'export PATH=/share/apps:$PATH' > /etc/profile ``` To distribute the modified file to all compute nodes, add the file to `/var/411/Files.mk`: ``` bash ... # These files do not take a comment header. FILES_NOCOMMENT = /etc/passwd \ /etc/group \ /etc/shadow \ /etc/profile \ /usr/local/lib64/* \ /usr/lib64 # FILES += /my/file FILES += /etc/slurm/slurm.conf FILES += /etc/slurm/head.conf FILES += /etc/slurm/node.conf FILES += /etc/slurm/parts.conf FILES += /etc/slurm/topo.conf FILES += /etc/slurm/cgroup.conf FILES += /etc/slurm/gres.conf.1 FILES += /etc/slurm/gres.conf.2 FILES += /etc/slurm/gres.conf.3 FILES += /etc/slurm/gres.conf.4 FILES_NOCOMMENT += /etc/munge/munge.key ... ``` and build 411: ``` bash # ? make -C /var/411 cd /var/411 make clean make Files.mk ``` The executable should now be available for all users on all nodes. ## Link executable Often, one want to rename an executable, but still keep a duplicate of the original one. WRONG? An example is `python3.6`, which should be linked to `python3`, but the executable `python3.6` should resist. In Linux, this is easily done by ``` bash ln -s /opt/python/lib ln -s python3.6 /usr/bin/python3 ``` An example is this library needed by slurmd: ``` bash ln -s /opt/condor/lib/condor/libltdl.so.7 /usr/lib64/libltdl.so.7 ``` ## Files <details> <summary>/root/slurm_tools/initialize_users.sh</summary> export MUNGEUSER=1005 groupadd -g $MUNGEUSER munge useradd -m -c "MUNGE Uid 'N' Gid Emporium" -d /var/lib/munge -u $MUNGEUSER -g munge -s /sbin/nologin munge export SlurmUSER=1004 groupadd -g $SlurmUSER slurm useradd -m -c "Slurm workload manager" -d /var/lib/slurm -u $SlurmUSER -g slurm -s /bin/bash slurm </details> <details> <summary>/root/slurm_tools/distribute_munge.sh</summary> # This script was made to distribute the munge key across all compute nodes. Then, correct ownership has to be set # Author: Even Marius Nordhagen, evenmn@fys.uio.no NNODES=34 # create directories rocks run host "mkdir /etc/munge/" rocks run host "mkdir /var/log/munge/" # install and enable EPEL rocks run host "yum install -y epel-release" rocks run host "yum install -y yum-utils" rocks run host "yum-config-manager --enable epel" # install MUNGE rocks run host "yum install -y munge munge-libs munge-devel" # generate MUNGE key and distribute it dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key chown munge: /etc/munge/munge.key chmod 400 /etc/munge/munge.key NODE=0 while [ $NODE -lt $NNODES ]; do \ scp -p /etc/munge/munge.key compute-0-$NODE:/etc/munge/munge.key let NODE++; \ done # give MUNGE repositories correct ownerships rocks run host "chown -R munge: /etc/munge/ /var/log/munge/" rocks run host "chmod 0700 /etc/munge/ /var/log/munge/" # enable and start MUNGE rocks run host "systemctl enable munge" rocks run host "systemctl start munge" </details> <details> <summary>/root/slurm_tools/copy_yum.sh</summary> # This script copies all the needed yum files from the frondend node # to a compute node. # # Author: Even Marius Nordhagen, evenmn@fys.uio.no RACK=0 NODE=$1 ssh compute-$RACK-$NODE yum install -y rsync rsync -avz /usr/share/yum-cli/* compute-$RACK-$NODE:/usr/share/yum-cli/ rsync -avz /etc/yum.repos.d compute-$RACK-$NODE:/etc/ scp /etc/yum.conf compute-$RACK-$NODE:/etc/yum.conf </details> <br> <details> <summary>/var/411/Files.mk</summary> AUTOMOUNT = $(shell find /etc -type f -name 'auto.*' | grep -v RCS) # These files all take a "#" comment character. # If you alter this list, you must do a 'make clean; make'. FILES = $(AUTOMOUNT) FILES += /etc/ssh/shosts.equiv FILES += /etc/ssh/ssh_known_hosts # These files do not take a comment header. FILES_NOCOMMENT = /etc/passwd \ /etc/group \ /etc/shadow # FILES += /my/file FILES += /etc/slurm/slurm.conf FILES += /etc/slurm/head.conf FILES += /etc/slurm/node.conf FILES += /etc/slurm/parts.conf FILES += /etc/slurm/topo.conf FILES += /etc/slurm/cgroup.conf FILES += /etc/slurm/gres.conf.1 FILES += /etc/slurm/gres.conf.2 FILES += /etc/slurm/gres.conf.3 FILES += /etc/slurm/gres.conf.4 FILES_NOCOMMENT += /etc/munge/munge.key </details> <details> <summary>/root/test_slurm.sh</summary> #!/bin/bash # This file checks that the slurm user has access to files and directories listed in /etc/slurm/slurm.conf: # cp test_slurm.sh /var/lib/slurm # cd /var/lib/slurm # su - slurm /var/lib/slurm/test_slurm.sh # The following directories and files are listed in /etc/slurm/slurm.conf paths=( "/var/spool" "/var/spool/slurmd" "/var/spool/slurm.checkpoint" "/var/spool/slurm.state" "/var/run/slurmctld.pid" "/var/run/slurmd.pid" "/var/log/slurm" "/usr/lib64/slurm" "/var/log/slurm/slurmctld.log" "/var/log/slurm/slurmd.log" "/etc/slurm/suspendhost.sh" "/etc/slurm/resumehost.sh" ) for file in "${paths[@]}"; do echo "$file" test -a $file || [ -d $file ] || echo " file does not exist" if [ -a $file ] || [ -d $file ]; then test -r $file || echo " no r permissions on $file" test -w $file || echo " no w permissions on $file" test -x $file || echo " no x permissions on $file" fi done </details> --- ## Troubleshooting #### /share/apps not found in nodes Check these files: ``` bash cat /etc/auto.share apps egil.local:/export/& cat /etc/auto.master /share /etc/auto.share --timeout=1200 /home /etc/auto.home --timeout=1200 ``` Try an explicit mount of the apps directory eg. ```mount egil.local:/export/apps /mnt``` if that's OK, unmount it, and then try to restart autofs. `service autofs restart` Ale: maybe it was autofs, or another reboot, or this on node (but it would be strange): `411get --all` Logs: ``` bash tail /var/log/messages ``` #### MUNGE: Failed to access munge.socket.2 ``` # munge -n munge: Error: Failed to access "/var/run/munge/munge.socket.2": No such file or directory ``` This is likely due to a failed start of munge. Find the socket created and deleted by munge on start/shutdown, and verify that it exists: # /usr/sbin/munged -h | grep socket -S, --socket=PATH Specify local socket [/var/run/munge/munge.socket.2] # ls /var/run/munge/munge.socket.2 No such file or directory Try to restart munge with `systemctl restart munge`, solve any errors in `journalctl -xe`. #### Failed to start MUNGE authentication service Starting munge fails `# rocks run host "systemctl start munge" collate=on` See logs with `journalctl -xe`. It could be a problem with users not synched: ``` bash # journalctl -xe .. compute-0-0.local munged[3438]: munged: Error: Keyfile is insecure: "/etc/munge/munge.key" should be owned by UID 888 compute-0-0.local systemd[1]: munge.service: control process exited, code=exited status=1 compute-0-0.local systemd[1]: Failed to start MUNGE authentication service. .. ``` Check user: ``` bash [root@compute-0-0 ~]# grep 888 /etc/passwd munge:x:888:888:MUNGE authentication service:/etc/munge:/sbin/nologin ``` Then issue on these ones ```chown root: /var/log/munge/munged.log chown munge: /var/log/munge ``` #### MUNGE: permission denied Munge did not start, journal shows permission denied ``` bash # journalctl -xe .. compute-0-0.local munged[13097]: munged: Error: Failed to check logfile "/var/log/munge/munged.log": Permission denied or compute-0-0.local munged[13257]: munged: Error: Pidfile is insecure: invalid ownership of "/run/munge" .. # ls -al /var/log/munge/ drwx------ 2 888 888 4096 Nov 30 03:44 . -rw-r----- 1 root root 0 Nov 30 03:44 munged.log # ls -al /run/munge drwxr-xr-x 2 888 888 40 Nov 27 16:33 . # ls /var/lib/munge drwx--x--x 2 888 888 4096 Nov 27 13:53 . ``` There are directories and files are owned by a number instead of a user (munge), meaning that the system did not recognize the user or group. Probably the users were not synched at startup. Rebooting helps only with some of these directories (e.g. /run/munge). ``` bash # #Find directories owned by 888 # find / -group 888 .. # chown munge: /var/lib/munge ``` #### MUNGE: Failed to check pidfile dir /var/run/munge Munge cannot start and `journalctl -xe` shows ``` munged: Error: Failed to check pidfile dir "/var/run/munge": cannot canonicalize "/var/run/munge": No such file or directory Failed to start MUNGE authentication service. ``` Not sure what the cause was, but creating that directory fixed it also after a reboot #### SLURM: non responsive nodes Slurm troubleshooting guide: https://slurm.schedmd.com/troubleshoot.html#nodes `squeue` is still empty, even after distributing munge. This shows only nodes with status down: `sinfo -a` log file for slurmd is in `slurm.conf`, in our nodes here: `/var/log/slurm/slurmd.log` .. #### SLURM: permission denied Lanching a job results in permission denied: ``` bash sbatch -vv /root/slurm_tools/job.sh sbatch: error: Batch job submission failed: Access/permission denied srun -vvv hostname srun: error: Unable to allocate resources: Access/permission denied salloc -vvv --ntasks=8 --time=10 bash salloc: error: Job submit/allocate failed: Access/permission denied ``` ``` grep SlurmctldLog /etc/slurm/slurm.conf SlurmctldLogFile=/var/log/slurm/slurmctld.log SlurmdLogFile=/var/log/slurm/slurmd.log less +G /var/log/slurm/slurmctld.log grep `date +"%Y-%m-%dT%H:"` /var/log/slurm/slurmctld.log _slurm_rpc_submit_batch_job: Access/permission denied #sbatch _slurm_rpc_allocate_resources: Access/permission denied #srun _slurm_rpc_allocate_resources: Access/permission denied #salloc grep `date +"%Y-%m-%dT%H:"` /var/log/slurm/slurmd.log #not much to see here ``` NB in quick guide! **The parent directories for Slurm's log files, process ID files, state save directories, etc. are not created by Slurm. They must be created and made writable by SlurmUser as needed prior to starting Slurm daemons** ``` grep StateSaveLocation /etc/slurm/slurm.conf /var/spool/ #bigfacet: root /var/spool/slurmd #bigfacet: folder is slurm, files mixed /var/run #bigfacet: root ``` I tried this: ``` chown slurm: /var/spool/slurmd chown slurm: /var/spool/slurm.state chown slurm: /var/run/slurmctld.pid /var/run/slurmwd.pid systemctl restart slurmdbd systemctl restart slurmd systemctl restart slurmctld ``` This directory was was not accessible to slurm: ``` su - slurm [slurm] less /var/log/slurm/slurmctld.log /var/log/slurm/slurmctld.log: Permission denied exit chown -R slurm: /var/log/slurm chown slurm: /usr/lib64/slurm chown slurm:slurm /var/spool .. ``` https://github.com/Azure/azure-quickstart-templates/issues/1796 Notice that slurmctld is started by the root user, it should probably be the slurm user: ``` ps -aux | grep slurm ``` slurmd should be root, slurmctld should be slurm, (see [slurmquickstart][slurmquickstart]) in /etc/slurm/slurm.conf there should be `SlurmdUser=root`, `SlurmUser=<any user>`` A similar problem could be due to the settings below. The following says that only egil can send jobs, not e.g. [jobs from within other jobs](https://sourceforge.net/p/slurm-roll/discussion/general/thread/087ecb6e4c/?limit=25): ``` grep AllocNodes /etc/slurm/slurm.conf PartitionName=DEFAULT AllocNodes=egil,egil State=UP ``` --- #### Some slurm commands ``` bash slurmd -C NodeName=egil CPUs=32 Boards=1 SocketsPerBoard=2 CoresPerSocket=8 ThreadsPerCore=2 RealMemory=64236 UpTime=2-23:08:49 slurmctld -Dcvvv --partition debug ``` salloc -ntasks #typically for shell srun sattach ``` # Create allocation then launch. 2 tasks, then on whole node srun --ntasks=2 --label hostname #permission denied srun --nnodes=2 hostname #permission denied # Create allocation for tasks salloc --ntasks=8 --time=10 bash #permission denied > hostname > env | grep SLURM > exit ``` sinfo -N scontrol show partition --- \[1]: [Rocks Guide][guide] \[2]: [Slurm Roll for Rocks Cluster][slurmroll] \[3]: [Slurm Quick Start Administrator Guide][slurmquickstart] \[4]: [Munge installation guide][munge] \[5]: [Slurm Installation][slurminstallation] [guide]: http://central-7-0-x86-64.rocksclusters.org/roll-documentation/base/7.0/ "Rocks 7 Basic User Guide" [slurmroll]: http://129.59.141.57/roll-documentation/slurm/7.0/slurm-roll.pdf "Slurm Roll for Rocks Cluster" [slurmquickstart]: https://slurm.schedmd.com/quickstart_admin.html "Slurm Quick Start Administrator Guide" [munge]: https://github.com/dun/munge/wiki/Installation-Guide "Munge installation guide" [slurminstallation] https://wiki.fysik.dtu.dk/niflheim/Slurm_installation "Slurm Installation"