# EGIL
cat /etc/centos-release
CentOS Linux release 7.4.1708 (Core)
---
## Operations on compute nodes
#### Run commands on nodes
rocks run host compute-0-1 "command"
rocks run host compute "cp /share/apps/.bashrc /root/" collate=on
#### Sharing files
See chapter 5.3 in the [Rocks guide][guide]. The files in `/share/apps` are shared among all nodes in the same location.
cd /share/apps
### Create global user accounts
The *murge* and *slurm* users were created by the script `initialize_users.sh`:
``` bash
# ./initialize_users.sh
```
Thereafter, the users were syncronized across all compute nodes using
``` bash
# rocks sync users
```
<details>
<summary>initialize_users.sh</summary>
export MUNGEUSER=1005
groupadd -g $MUNGEUSER munge
useradd -m -c "MUNGE Uid 'N' Gid Emporium" -d /var/lib/munge -u $MUNGEUSER -g munge -s /sbin/nologin munge
export SlurmUSER=1004
groupadd -g $SlurmUSER slurm
useradd -m -c "Slurm workload manager" -d /var/lib/slurm -u $SlurmUSER -g slurm -s /bin/bash slurm
</details>
---
## Adding Packages to Compute Nodes
#### yum, packages, rpms
Check installed or available packages:
yum list installed | grep <package>
yum --enablerepo epel list *<package>*
Download rpms to install a package:
yumdownloader --resolve --destdir=. --enablerepo=epel <package>
This can be used to create a new distribution (see below)
#### Create and install a new Rocks distribution
See chapter 5.1 in the [Rocks guide][guide]. In short place packages here:
# Package location. Use yumdownloader as shown above
cd /export/rocks/install/contrib/7.0/x86_64/RPMS
# Extend the XML configuration file
cd /export/rocks/install/site-profiles/7.0/nodes
cp skeleton.xml extend-compute.xml
vi extend-compute.xml
# Create a distribution
cd /export/rocks/install
rocks create distro
Then reinstall the nodes.
Dependencies: according to [this thread](https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2013-February/061458.html), yumdownloader takes care of dependencies, and anaconda "will install all the dependencies that it can find in the local repo (the one created when you run "rocks create distro" in the /export/rocks/install directory)."
2020-11-18: Unfortunately this does not work. I do not see my packages when I do rocks create distro nor in compute node
#### Reinstall nodes
#Check install action on nodes
rocks list host
#Force install action to one node (use % for all nodes)
rocks set host boot compute-0-0 action=install
#Reboot a single node
rocks run host compute-0-0 "reboot"
## Python 3.6
Install the opt-Python roll, confirm using `module avail`. According to [this post][postpython], CentOS is based on Python 2.7, not python 3.x. Rocks users can add rpms [see above](#adding-packages-to-compute-nodes), set up environment variables, add configuration to e.g. `/etc/.profile`, etc.
**TODO: confirm this is fine**
The executable for Python 3.6 is in `/opt/python/bin/python3.6` but it won't work as is (`error while loading shared libraries: libpython3.6m.so.1.0: cannot open shared object file: No such file or directory`).
We created two aliases in a user's .bashrc to run the `python3` and `python3.6` commands:
alias python3='PATH=/opt/python/bin:$PATH ; LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/python/lib /opt/python/bin/python3.6'
alias python3.6='PATH=/opt/python/bin:$PATH ; LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/python/lib /opt/python/bin/python3.6'
Note that these two methods load the libraries for python3.6 but break other packages such as yum:
module load opt-python
or
export PATH="/opt/python/bin:$PATH"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/python/lib
and this is a hack that fixes broken packages
#Fix broken packages in .bashrc
alias yum='LD_LIBRARY_PATH=/usr/lib64 yum'
alias yumdownloader='LD_LIBRARY_PATH=/usr/lib64 yumdownloader'
---
## MUNGE authentification service
The next step is to install MUNGE. MUNGE can be found in the EPEL repository, which can be activated by
``` bash
# yum install epel-release
```
Then install MUNGE RPM packages
``` bash
# yum install munge munge-libs munge-devel
```
PS: Sometimes, this does not work as EPEL is not enabled. If executing the command
``` bash
# yum repolist
```
and EPEL does not appear as one of the repositories, it is likely that you will need to enable it manually. However, as long it's installed it should appear in `# yum repolist all`. To resolve this, execute
``` bash
# yum-config-manager --enable epel
```
When MUNGE is successfully installed, create a secret key and let compute nodes access it:
``` bash
dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key
chown munge: /etc/munge/munge.key
chmod 400 /etc/munge/munge.key
```
All this is covered in `distribute_munge.sh`.
<details>
<summary>distribute_munge.sh</summary>
# This script was made to distribute the munge key across all compute nodes. Then, correct ownership has to be set
# Author: Even Marius Nordhagen, evenmn@fys.uio.no
NNODES=34
# create directories
rocks run host "mkdir /etc/munge/"
rocks run host "mkdir /var/log/munge/"
# install and enable EPEL
rocks run host "yum install -y epel-release"
rocks run host "yum install -y yum-utils"
rocks run host "yum-config-manager --enable epel"
# install MUNGE
rocks run host "yum install -y munge munge-libs munge-devel"
# generate MUNGE key and distribute it
dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key
chown munge: /etc/munge/munge.key
chmod 400 /etc/munge/munge.key
NODE=0
while [ $NODE -lt $NNODES ]; do \
scp -p /etc/munge/munge.key compute-0-$NODE:/etc/munge/munge.key
let NODE++; \
done
# give MUNGE repositories correct ownerships
rocks run host "chown -R munge: /etc/munge/ /var/log/munge/"
rocks run host "chmod 0700 /etc/munge/ /var/log/munge/"
# enable and start MUNGE
rocks run host "systemctl enable munge"
rocks run host "systemctl start munge"
</details>
<br>
Commands [Installation guide](munge):
<pre>
munge -n #test installation
munged #aborts on each warning
munged --force #starts but skips most warnings. use to find problems
/usr/sbin/munged --help #See default locations
systemctl enable munge
systemctl status --full munge
journalctl -xe | grep munged #log
</pre>
---
## Summary: Installation procedure
1. Install rocks as described in chapter 3 of the [guide][guide]. Add all rolls, in particular the opt-Python one.
2. Share `.bash_profile` and `.bashrc` to be able to run python3 from shell
<details>
<summary>.bash_profile</summary>
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
PATH=$PATH:$HOME/bin
export PATH
</details>
<details>
<summary>.bashrc</summary>
# .bashrc
# aliases to enable python 3
alias python3='PATH=/opt/python/bin:$PATH ; LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/python/lib /opt/python/bin/python3.6'
alias python3.6='PATH=/opt/python/bin:$PATH ; LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/python/lib /opt/python/bin/python3.6'
<details>
3. Go through the [Slurm Quick Start Administrator Guide][slurmquickstart]
Alessandro 20201123: I tried to install slurm individually in compute-0-0: 1) from roll 2)from tar.bz2, but it fails: 1) OperationalError UPDATE command denied; 2) with python3 not found.
3.1. Check that the date and time are the same on all nodes. This was not needed in our second installation. See [this](https://knowm.org/how-to-synchronize-time-across-a-linux-cluster/)
3.2. Install munge (TODO). Propagate the same `/etc/munge/munge.key` on all clusters
3.3. ....
4. ....
Even Marius' Installation guide: https://hackmd.io/@e1KzQ7BfSgeJpuZiliRGvQ/rkjApUz5v
installasjon slurm:
build 411, see appendix C somewhere
eller kanskje [quick installation guide][slurmquickstart]
men kan vi bruke slurm rollen? -> rocks list roll
se slurm tools
Even Marius' guide in egil ~/slurm_tools
yum install -y rsync
copy files
yum install epel-release
\[1]: [Rocks Guide][guide]
\[2]: [Post "Python 3.x"][postpython]
\[3]: [Post "Default packages"][postpackages]
\[4]: [Slurm Quick Start Administrator Guide][slurmquickstart]
\[5]: [Munge installation guide][munge]
\[6]: [Slurm Roll for Rocks Cluster][slurmroll]
[guide]: http://central-7-0-x86-64.rocksclusters.org/roll-documentation/base/7.0/ "Rocks 7 Basic User Guide"
[postpython]: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2018-July/071999.html "Rocks-discuss - Python 2.7 and 3.x"
[postpackages]: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2012-December/060671.html "Rocks-discuss - Default Packages"
[slurmquickstart]: https://slurm.schedmd.com/quickstart_admin.html "Slurm Quick Start Administrator Guide"
[munge]: https://github.com/dun/munge/wiki/Installation-Guide "Munge installation guide"
[slurmroll]: http://129.59.141.57/roll-documentation/slurm/7.0/slurm-roll.pdf "Slurm Roll for Rocks Cluster"