# Installation guide SLURM
In this file, we will explain how Slurm was installed (or attempted installed) on Egil. This is mainly to keep an overview of everything that is done, but it can also be used as a tutorial for installations on other clusters. All scripts are found in Egil > /root/slurm_tools
## EGIL
```cat /etc/centos-release
CentOS Linux release 7.4.1708 (Core)
```
## Operations on compute nodes
#### Run commands on nodes
rocks run host compute-0-1 "command"
rocks run host compute "cp /share/apps/.bashrc /root/" collate=on
#### Sharing files
See chapter 5.3 in the [Rocks guide][guide]. The files in `/share/apps` are shared among all nodes in the same location.
cd /share/apps
### Create global user accounts
The *murge* and *slurm* users were created by the script `initialize_users.sh`:
``` bash
# ./initialize_users.sh
```
Thereafter, the users were syncronized across all compute nodes using
``` bash
# rocks sync users
```
## MUNGE authentification service
The next step is to install MUNGE. MUNGE can be found in the EPEL repository, which can be activated by
``` bash
# yum install -y epel-release
```
Then install MUNGE RPM packages
``` bash
# yum install -y munge munge-libs munge-devel
```
PS: Sometimes, this does not work as EPEL is not enabled. If executing the command
``` bash
# yum repolist
```
and EPEL does not appear as one of the repositories, it is likely that you will need to enable it manually. However, as long it's installed it should appear in `# yum repolist all`. To resolve this, execute
``` bash
# yum-config-manager --enable epel
```
When MUNGE is successfully installed, create a secret key and let compute nodes access it:
``` bash
dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key
chown munge: /etc/munge/munge.key
chmod 400 /etc/munge/munge.key
```
All this is covered in `distribute_munge.sh`.
### `ImportError: No module named yummain`
This error message occurs when a node does not have the require python-script stored in /usr/share/yum/cli. These can be copied to the compute node by
``` bash
rsync -avz /usr/share/yum-cli/* compute-0-XX:/usr/share/yum-cli/
rsync -avz /etc/yum.repos.d compute-0-XX:/etc/
scp /etc/yum.conf compute-0-XX:/etc/yum.conf
```
To fix this on compute node XX, run `./copy_yum.sh XX`. NOTE: This might not resolve the problem
## Install htop, vim, sensors
``` bash
rocks run host compute "yum install -y htop vim lm_sensors"
```
## Install Slurm-Roll
Now as Munge is up running, we are ready to install the Slurm roll. First, `slurm*.iso` has to be downloaded from [https://sourceforge.net/projects/slurm-roll/](https://sourceforge.net/projects/slurm-roll/). Thereafter, the slurm-roll is installed as described in the slurm-roll manual:
``` bash
export LANG=C
rocks add roll slurm*.iso
rocks enable roll slurm
cd /export/rocks/install
rocks create distro
yum clean all
yum update
rocks run roll slurm|sh
reboot
```
by doing this, Slurm is installed on the frontend node. You can verify this by running `sinfo` or `squeue`. However, to be able to submit jobs to the compute nodes, Slurm also needs to be install on the compute nodes.
20201126 to be confirmed: Also follow the instructions in the Update section of the [Slurm Roll for Rocks Cluster][slurmroll] manual, page 4. This seems to install the slurm demon `slurmd` and get past errors when executing `rocks sync slurm`:
``` bash
export LANG=Crocks disable roll slurmrocks
remove roll slurmrocks add roll slurm*.isorocks
enable roll slurmcd /export/rocks/installrocks
create distroyum clean allyum updatesystemctl
restart slurmdbd.servicesystemctl restart
slurmctld.servicesystemctl restart slurmd.service
```
### Install Slurm on compute nodes
To install Slurm on the compute nodes, we first need to rebuild `411`. `411` shares between nodes all files listed in `/var/411/Files.mk` and is described in [Appendix C](http://central-7-0-x86-64.rocksclusters.org/roll-documentation/base/7.0/service-411.html) in the Rocks 7.0 user manual :
``` bash
make -C /var/411 force
rocks run host compute "411get --all"
```
The `libltdl` library is needed to complete the next steps (Werner Saar [dixit](https://sourceforge.net/p/slurm-roll/discussion/general/thread/5f80736e/)), so create a link in /usr/lib64:
``` bash
rocks run host compute "ln -s /opt/condor/lib/condor/libltdl.so.7 /usr/lib64/libltdl.so.7" collate=on
```
Enable slurmd on all nodes:
``` bash
rocks run host compute "systemctl enable slurmd" collate=on
rocks run host compute "systemctl restart slurmd" collate=on
```
Then, we execute `rocks sync slurm`.
Make sure that the clocks are synced:
``` bash
rocks run host compute "timedatectl | grep 'Local time:'" collate=on
```
Test slurm with this script:
``` bash
sbatch -vv /root/slurm_tools/job.sh
```
Logs can be found in `slurm.conf`:
```
grep SlurmctldLog /etc/slurm/slurm.conf
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdLogFile=/var/log/slurm/slurmd.log
grep `date +"%Y-%m-%dT%H:"` /var/log/slurm/slurmctld.log
grep `date +"%Y-%m-%dT%H:"` /var/log/slurm/slurmd.log
...
```
# Install packages on Rocks 7.0 (Manzinata) cluster
One can either install packages as RPMs and load the package with `module load <package>` (recommended) or install packages using a brute-force method. The latter is usually done for packages that are not distributed as RPMs.
## Adding RPM Packages to Compute Nodes
#### yum, packages, rpms
Check installed or available packages:
``` bash
yum list installed | grep <package>
yum --enablerepo epel list *<package>*
```
Download rpms to install a package:
``` bash
yumdownloader --resolve --destdir=. --enablerepo=epel <package>
```
This can be used to create a new distribution (see below)
#### Create and install a new Rocks distribution
See chapter 5.1 in the [Rocks guide][guide]. In short place packages here:
``` bash
# Package location. Use yumdownloader as shown above
cd /export/rocks/install/contrib/7.0/x86_64/RPMS
# Extend the XML configuration file
cd /export/rocks/install/site-profiles/7.0/nodes
cp skeleton.xml extend-compute.xml
vi extend-compute.xml
# Create a distribution
cd /export/rocks/install
rocks create distro
```
Then reinstall the nodes.
Dependencies: according to [this thread](https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2013-February/061458.html), yumdownloader takes care of dependencies, and anaconda "will install all the dependencies that it can find in the local repo (the one created when you run "rocks create distro" in the /export/rocks/install directory)."
2020-11-18: Unfortunately this does not work. I do not see my packages when I do rocks create distro nor in compute node
#### Reinstall nodes
``` bash
#Check install action on nodes
rocks list host
#Force install action to one node (use % for all nodes)
rocks set host boot compute-0-0 action=install
#Reboot a single node
rocks run host compute-0-0 "reboot"
```
# Install packages by brute-force method
To install a package using the brute-force method, the package needs to be compiled manually and the executable is moved to `/share/apps`, which distribute the apps to all compute nodes. The first time this is done, a path needs to be set to the directory. To add `/share/apps` to `$PATH` for all users, run the following command:
``` bash
echo 'export PATH=/share/apps:$PATH' > /etc/profile
```
To distribute the modified file to all compute nodes, add the file to `/var/411/Files.mk`:
``` bash
...
# These files do not take a comment header.
FILES_NOCOMMENT = /etc/passwd \
/etc/group \
/etc/shadow \
/etc/profile \
/usr/local/lib64/* \
/usr/lib64
# FILES += /my/file
FILES += /etc/slurm/slurm.conf
FILES += /etc/slurm/head.conf
FILES += /etc/slurm/node.conf
FILES += /etc/slurm/parts.conf
FILES += /etc/slurm/topo.conf
FILES += /etc/slurm/cgroup.conf
FILES += /etc/slurm/gres.conf.1
FILES += /etc/slurm/gres.conf.2
FILES += /etc/slurm/gres.conf.3
FILES += /etc/slurm/gres.conf.4
FILES_NOCOMMENT += /etc/munge/munge.key
...
```
and build 411:
``` bash
# ? make -C /var/411
cd /var/411
make clean
make Files.mk
```
The executable should now be available for all users on all nodes.
## Link executable
Often, one want to rename an executable, but still keep a duplicate of the original one.
WRONG? An example is `python3.6`, which should be linked to `python3`, but the executable `python3.6` should resist. In Linux, this is easily done by
``` bash
ln -s /opt/python/lib
ln -s python3.6 /usr/bin/python3
```
An example is this library needed by slurmd:
``` bash
ln -s /opt/condor/lib/condor/libltdl.so.7 /usr/lib64/libltdl.so.7
```
## Files
<details>
<summary>/root/slurm_tools/initialize_users.sh</summary>
export MUNGEUSER=1005
groupadd -g $MUNGEUSER munge
useradd -m -c "MUNGE Uid 'N' Gid Emporium" -d /var/lib/munge -u $MUNGEUSER -g munge -s /sbin/nologin munge
export SlurmUSER=1004
groupadd -g $SlurmUSER slurm
useradd -m -c "Slurm workload manager" -d /var/lib/slurm -u $SlurmUSER -g slurm -s /bin/bash slurm
</details>
<details>
<summary>/root/slurm_tools/distribute_munge.sh</summary>
# This script was made to distribute the munge key across all compute nodes. Then, correct ownership has to be set
# Author: Even Marius Nordhagen, evenmn@fys.uio.no
NNODES=34
# create directories
rocks run host "mkdir /etc/munge/"
rocks run host "mkdir /var/log/munge/"
# install and enable EPEL
rocks run host "yum install -y epel-release"
rocks run host "yum install -y yum-utils"
rocks run host "yum-config-manager --enable epel"
# install MUNGE
rocks run host "yum install -y munge munge-libs munge-devel"
# generate MUNGE key and distribute it
dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key
chown munge: /etc/munge/munge.key
chmod 400 /etc/munge/munge.key
NODE=0
while [ $NODE -lt $NNODES ]; do \
scp -p /etc/munge/munge.key compute-0-$NODE:/etc/munge/munge.key
let NODE++; \
done
# give MUNGE repositories correct ownerships
rocks run host "chown -R munge: /etc/munge/ /var/log/munge/"
rocks run host "chmod 0700 /etc/munge/ /var/log/munge/"
# enable and start MUNGE
rocks run host "systemctl enable munge"
rocks run host "systemctl start munge"
</details>
<details>
<summary>/root/slurm_tools/copy_yum.sh</summary>
# This script copies all the needed yum files from the frondend node
# to a compute node.
#
# Author: Even Marius Nordhagen, evenmn@fys.uio.no
RACK=0
NODE=$1
ssh compute-$RACK-$NODE yum install -y rsync
rsync -avz /usr/share/yum-cli/* compute-$RACK-$NODE:/usr/share/yum-cli/
rsync -avz /etc/yum.repos.d compute-$RACK-$NODE:/etc/
scp /etc/yum.conf compute-$RACK-$NODE:/etc/yum.conf
</details>
<br>
<details>
<summary>/var/411/Files.mk</summary>
AUTOMOUNT = $(shell find /etc -type f -name 'auto.*' | grep -v RCS)
# These files all take a "#" comment character.
# If you alter this list, you must do a 'make clean; make'.
FILES = $(AUTOMOUNT)
FILES += /etc/ssh/shosts.equiv
FILES += /etc/ssh/ssh_known_hosts
# These files do not take a comment header.
FILES_NOCOMMENT = /etc/passwd \
/etc/group \
/etc/shadow
# FILES += /my/file
FILES += /etc/slurm/slurm.conf
FILES += /etc/slurm/head.conf
FILES += /etc/slurm/node.conf
FILES += /etc/slurm/parts.conf
FILES += /etc/slurm/topo.conf
FILES += /etc/slurm/cgroup.conf
FILES += /etc/slurm/gres.conf.1
FILES += /etc/slurm/gres.conf.2
FILES += /etc/slurm/gres.conf.3
FILES += /etc/slurm/gres.conf.4
FILES_NOCOMMENT += /etc/munge/munge.key
</details>
<details>
<summary>/root/test_slurm.sh</summary>
#!/bin/bash
# This file checks that the slurm user has access to files and directories listed in /etc/slurm/slurm.conf:
# cp test_slurm.sh /var/lib/slurm
# cd /var/lib/slurm
# su - slurm /var/lib/slurm/test_slurm.sh
# The following directories and files are listed in /etc/slurm/slurm.conf
paths=(
"/var/spool"
"/var/spool/slurmd"
"/var/spool/slurm.checkpoint"
"/var/spool/slurm.state"
"/var/run/slurmctld.pid"
"/var/run/slurmd.pid"
"/var/log/slurm"
"/usr/lib64/slurm"
"/var/log/slurm/slurmctld.log"
"/var/log/slurm/slurmd.log"
"/etc/slurm/suspendhost.sh"
"/etc/slurm/resumehost.sh"
)
for file in "${paths[@]}"; do
echo "$file"
test -a $file || [ -d $file ] || echo " file does not exist"
if [ -a $file ] || [ -d $file ];
then
test -r $file || echo " no r permissions on $file"
test -w $file || echo " no w permissions on $file"
test -x $file || echo " no x permissions on $file"
fi
done
</details>
---
## Troubleshooting
#### /share/apps not found in nodes
Check these files:
``` bash
cat /etc/auto.share
apps egil.local:/export/&
cat /etc/auto.master
/share /etc/auto.share --timeout=1200
/home /etc/auto.home --timeout=1200
```
Try an explicit mount of the apps directory eg. ```mount egil.local:/export/apps /mnt``` if that's OK, unmount it, and then try to restart autofs.
`service autofs restart`
Ale: maybe it was autofs, or another reboot, or this on node (but it would be strange):
`411get --all`
Logs:
``` bash
tail /var/log/messages
```
#### MUNGE: Failed to access munge.socket.2
```
# munge -n
munge: Error: Failed to access "/var/run/munge/munge.socket.2": No such file or directory
```
This is likely due to a failed start of munge. Find the socket created and deleted by munge on start/shutdown, and verify that it exists:
# /usr/sbin/munged -h | grep socket
-S, --socket=PATH Specify local socket [/var/run/munge/munge.socket.2]
# ls /var/run/munge/munge.socket.2
No such file or directory
Try to restart munge with `systemctl restart munge`, solve any errors in `journalctl -xe`.
#### Failed to start MUNGE authentication service
Starting munge fails
`# rocks run host "systemctl start munge" collate=on`
See logs with `journalctl -xe`. It could be a problem with users not synched:
``` bash
# journalctl -xe
..
compute-0-0.local munged[3438]: munged: Error: Keyfile is insecure: "/etc/munge/munge.key" should be owned by UID 888
compute-0-0.local systemd[1]: munge.service: control process exited, code=exited status=1
compute-0-0.local systemd[1]: Failed to start MUNGE authentication service.
..
```
Check user:
``` bash
[root@compute-0-0 ~]# grep 888 /etc/passwd
munge:x:888:888:MUNGE authentication service:/etc/munge:/sbin/nologin
```
Then issue on these ones
```chown root: /var/log/munge/munged.log
chown munge: /var/log/munge
```
#### MUNGE: permission denied
Munge did not start, journal shows permission denied
``` bash
# journalctl -xe
..
compute-0-0.local munged[13097]: munged: Error: Failed to check logfile "/var/log/munge/munged.log": Permission denied
or
compute-0-0.local munged[13257]: munged: Error: Pidfile is insecure: invalid ownership of "/run/munge"
..
# ls -al /var/log/munge/
drwx------ 2 888 888 4096 Nov 30 03:44 .
-rw-r----- 1 root root 0 Nov 30 03:44 munged.log
# ls -al /run/munge
drwxr-xr-x 2 888 888 40 Nov 27 16:33 .
# ls /var/lib/munge
drwx--x--x 2 888 888 4096 Nov 27 13:53 .
```
There are directories and files are owned by a number instead of a user (munge), meaning that the system did not recognize the user or group. Probably the users were not synched at startup. Rebooting helps only with some of these directories (e.g. /run/munge).
``` bash
# #Find directories owned by 888
# find / -group 888
..
# chown munge: /var/lib/munge
```
#### MUNGE: Failed to check pidfile dir /var/run/munge
Munge cannot start and `journalctl -xe` shows
```
munged: Error: Failed to check pidfile dir "/var/run/munge": cannot canonicalize
"/var/run/munge": No such file or directory
Failed to start MUNGE authentication service.
```
Not sure what the cause was, but creating that directory fixed it also after a reboot
#### SLURM: non responsive nodes
Slurm troubleshooting guide: https://slurm.schedmd.com/troubleshoot.html#nodes
`squeue` is still empty, even after distributing munge. This shows only nodes with status down:
`sinfo -a`
log file for slurmd is in `slurm.conf`, in our nodes here: `/var/log/slurm/slurmd.log`
..
#### SLURM: permission denied
Lanching a job results in permission denied:
``` bash
sbatch -vv /root/slurm_tools/job.sh
sbatch: error: Batch job submission failed: Access/permission denied
srun -vvv hostname
srun: error: Unable to allocate resources: Access/permission denied
salloc -vvv --ntasks=8 --time=10 bash
salloc: error: Job submit/allocate failed: Access/permission denied
```
```
grep SlurmctldLog /etc/slurm/slurm.conf
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdLogFile=/var/log/slurm/slurmd.log
less +G /var/log/slurm/slurmctld.log
grep `date +"%Y-%m-%dT%H:"` /var/log/slurm/slurmctld.log
_slurm_rpc_submit_batch_job: Access/permission denied #sbatch
_slurm_rpc_allocate_resources: Access/permission denied #srun
_slurm_rpc_allocate_resources: Access/permission denied #salloc
grep `date +"%Y-%m-%dT%H:"` /var/log/slurm/slurmd.log
#not much to see here
```
NB in quick guide! **The parent directories for Slurm's log files, process ID files, state save directories, etc. are not created by Slurm. They must be created and made writable by SlurmUser as needed prior to starting Slurm daemons**
```
grep StateSaveLocation /etc/slurm/slurm.conf
/var/spool/ #bigfacet: root
/var/spool/slurmd #bigfacet: folder is slurm, files mixed
/var/run #bigfacet: root
```
I tried this:
```
chown slurm: /var/spool/slurmd
chown slurm: /var/spool/slurm.state
chown slurm: /var/run/slurmctld.pid /var/run/slurmwd.pid
systemctl restart slurmdbd
systemctl restart slurmd
systemctl restart slurmctld
```
This directory was was not accessible to slurm:
```
su - slurm
[slurm] less /var/log/slurm/slurmctld.log
/var/log/slurm/slurmctld.log: Permission denied
exit
chown -R slurm: /var/log/slurm
chown slurm: /usr/lib64/slurm
chown slurm:slurm /var/spool
..
```
https://github.com/Azure/azure-quickstart-templates/issues/1796
Notice that slurmctld is started by the root user, it should probably be the slurm user:
```
ps -aux | grep slurm
```
slurmd should be root, slurmctld should be slurm, (see [slurmquickstart][slurmquickstart])
in /etc/slurm/slurm.conf there should be `SlurmdUser=root`, `SlurmUser=<any user>``
A similar problem could be due to the settings below. The following says that only egil can send jobs, not e.g. [jobs from within other jobs](https://sourceforge.net/p/slurm-roll/discussion/general/thread/087ecb6e4c/?limit=25):
```
grep AllocNodes /etc/slurm/slurm.conf
PartitionName=DEFAULT AllocNodes=egil,egil State=UP
```
---
#### Some slurm commands
``` bash
slurmd -C
NodeName=egil CPUs=32 Boards=1 SocketsPerBoard=2 CoresPerSocket=8 ThreadsPerCore=2 RealMemory=64236
UpTime=2-23:08:49
slurmctld -Dcvvv
--partition debug
```
salloc -ntasks #typically for shell
srun
sattach
```
# Create allocation then launch. 2 tasks, then on whole node
srun --ntasks=2 --label hostname #permission denied
srun --nnodes=2 hostname #permission denied
# Create allocation for tasks
salloc --ntasks=8 --time=10 bash #permission denied
> hostname
> env | grep SLURM
> exit
```
sinfo -N
scontrol show partition
---
\[1]: [Rocks Guide][guide]
\[2]: [Slurm Roll for Rocks Cluster][slurmroll]
\[3]: [Slurm Quick Start Administrator Guide][slurmquickstart]
\[4]: [Munge installation guide][munge]
\[5]: [Slurm Installation][slurminstallation]
[guide]: http://central-7-0-x86-64.rocksclusters.org/roll-documentation/base/7.0/ "Rocks 7 Basic User Guide"
[slurmroll]: http://129.59.141.57/roll-documentation/slurm/7.0/slurm-roll.pdf "Slurm Roll for Rocks Cluster"
[slurmquickstart]: https://slurm.schedmd.com/quickstart_admin.html "Slurm Quick Start Administrator Guide"
[munge]: https://github.com/dun/munge/wiki/Installation-Guide "Munge installation guide"
[slurminstallation] https://wiki.fysik.dtu.dk/niflheim/Slurm_installation "Slurm Installation"