# EGIL cat /etc/centos-release CentOS Linux release 7.4.1708 (Core) --- ## Operations on compute nodes #### Run commands on nodes rocks run host compute-0-1 "command" rocks run host compute "cp /share/apps/.bashrc /root/" collate=on #### Sharing files See chapter 5.3 in the [Rocks guide][guide]. The files in `/share/apps` are shared among all nodes in the same location. cd /share/apps ### Create global user accounts The *murge* and *slurm* users were created by the script `initialize_users.sh`: ``` bash # ./initialize_users.sh ``` Thereafter, the users were syncronized across all compute nodes using ``` bash # rocks sync users ``` <details> <summary>initialize_users.sh</summary> export MUNGEUSER=1005 groupadd -g $MUNGEUSER munge useradd -m -c "MUNGE Uid 'N' Gid Emporium" -d /var/lib/munge -u $MUNGEUSER -g munge -s /sbin/nologin munge export SlurmUSER=1004 groupadd -g $SlurmUSER slurm useradd -m -c "Slurm workload manager" -d /var/lib/slurm -u $SlurmUSER -g slurm -s /bin/bash slurm </details> --- ## Adding Packages to Compute Nodes #### yum, packages, rpms Check installed or available packages: yum list installed | grep <package> yum --enablerepo epel list *<package>* Download rpms to install a package: yumdownloader --resolve --destdir=. --enablerepo=epel <package> This can be used to create a new distribution (see below) #### Create and install a new Rocks distribution See chapter 5.1 in the [Rocks guide][guide]. In short place packages here: # Package location. Use yumdownloader as shown above cd /export/rocks/install/contrib/7.0/x86_64/RPMS # Extend the XML configuration file cd /export/rocks/install/site-profiles/7.0/nodes cp skeleton.xml extend-compute.xml vi extend-compute.xml # Create a distribution cd /export/rocks/install rocks create distro Then reinstall the nodes. Dependencies: according to [this thread](https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2013-February/061458.html), yumdownloader takes care of dependencies, and anaconda "will install all the dependencies that it can find in the local repo (the one created when you run "rocks create distro" in the /export/rocks/install directory)." 2020-11-18: Unfortunately this does not work. I do not see my packages when I do rocks create distro nor in compute node #### Reinstall nodes #Check install action on nodes rocks list host #Force install action to one node (use % for all nodes) rocks set host boot compute-0-0 action=install #Reboot a single node rocks run host compute-0-0 "reboot" ## Python 3.6 Install the opt-Python roll, confirm using `module avail`. According to [this post][postpython], CentOS is based on Python 2.7, not python 3.x. Rocks users can add rpms [see above](#adding-packages-to-compute-nodes), set up environment variables, add configuration to e.g. `/etc/.profile`, etc. **TODO: confirm this is fine** The executable for Python 3.6 is in `/opt/python/bin/python3.6` but it won't work as is (`error while loading shared libraries: libpython3.6m.so.1.0: cannot open shared object file: No such file or directory`). We created two aliases in a user's .bashrc to run the `python3` and `python3.6` commands: alias python3='PATH=/opt/python/bin:$PATH ; LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/python/lib /opt/python/bin/python3.6' alias python3.6='PATH=/opt/python/bin:$PATH ; LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/python/lib /opt/python/bin/python3.6' Note that these two methods load the libraries for python3.6 but break other packages such as yum: module load opt-python or export PATH="/opt/python/bin:$PATH" export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/python/lib and this is a hack that fixes broken packages #Fix broken packages in .bashrc alias yum='LD_LIBRARY_PATH=/usr/lib64 yum' alias yumdownloader='LD_LIBRARY_PATH=/usr/lib64 yumdownloader' --- ## MUNGE authentification service The next step is to install MUNGE. MUNGE can be found in the EPEL repository, which can be activated by ``` bash # yum install epel-release ``` Then install MUNGE RPM packages ``` bash # yum install munge munge-libs munge-devel ``` PS: Sometimes, this does not work as EPEL is not enabled. If executing the command ``` bash # yum repolist ``` and EPEL does not appear as one of the repositories, it is likely that you will need to enable it manually. However, as long it's installed it should appear in `# yum repolist all`. To resolve this, execute ``` bash # yum-config-manager --enable epel ``` When MUNGE is successfully installed, create a secret key and let compute nodes access it: ``` bash dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key chown munge: /etc/munge/munge.key chmod 400 /etc/munge/munge.key ``` All this is covered in `distribute_munge.sh`. <details> <summary>distribute_munge.sh</summary> # This script was made to distribute the munge key across all compute nodes. Then, correct ownership has to be set # Author: Even Marius Nordhagen, evenmn@fys.uio.no NNODES=34 # create directories rocks run host "mkdir /etc/munge/" rocks run host "mkdir /var/log/munge/" # install and enable EPEL rocks run host "yum install -y epel-release" rocks run host "yum install -y yum-utils" rocks run host "yum-config-manager --enable epel" # install MUNGE rocks run host "yum install -y munge munge-libs munge-devel" # generate MUNGE key and distribute it dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key chown munge: /etc/munge/munge.key chmod 400 /etc/munge/munge.key NODE=0 while [ $NODE -lt $NNODES ]; do \ scp -p /etc/munge/munge.key compute-0-$NODE:/etc/munge/munge.key let NODE++; \ done # give MUNGE repositories correct ownerships rocks run host "chown -R munge: /etc/munge/ /var/log/munge/" rocks run host "chmod 0700 /etc/munge/ /var/log/munge/" # enable and start MUNGE rocks run host "systemctl enable munge" rocks run host "systemctl start munge" </details> <br> Commands [Installation guide](munge): <pre> munge -n #test installation munged #aborts on each warning munged --force #starts but skips most warnings. use to find problems /usr/sbin/munged --help #See default locations systemctl enable munge systemctl status --full munge journalctl -xe | grep munged #log </pre> --- ## Summary: Installation procedure 1. Install rocks as described in chapter 3 of the [guide][guide]. Add all rolls, in particular the opt-Python one. 2. Share `.bash_profile` and `.bashrc` to be able to run python3 from shell <details> <summary>.bash_profile</summary> # .bash_profile # Get the aliases and functions if [ -f ~/.bashrc ]; then . ~/.bashrc fi # User specific environment and startup programs PATH=$PATH:$HOME/bin export PATH </details> <details> <summary>.bashrc</summary> # .bashrc # aliases to enable python 3 alias python3='PATH=/opt/python/bin:$PATH ; LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/python/lib /opt/python/bin/python3.6' alias python3.6='PATH=/opt/python/bin:$PATH ; LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/python/lib /opt/python/bin/python3.6' <details> 3. Go through the [Slurm Quick Start Administrator Guide][slurmquickstart] Alessandro 20201123: I tried to install slurm individually in compute-0-0: 1) from roll 2)from tar.bz2, but it fails: 1) OperationalError UPDATE command denied; 2) with python3 not found. 3.1. Check that the date and time are the same on all nodes. This was not needed in our second installation. See [this](https://knowm.org/how-to-synchronize-time-across-a-linux-cluster/) 3.2. Install munge (TODO). Propagate the same `/etc/munge/munge.key` on all clusters 3.3. .... 4. .... Even Marius' Installation guide: https://hackmd.io/@e1KzQ7BfSgeJpuZiliRGvQ/rkjApUz5v installasjon slurm: build 411, see appendix C somewhere eller kanskje [quick installation guide][slurmquickstart] men kan vi bruke slurm rollen? -> rocks list roll se slurm tools Even Marius' guide in egil ~/slurm_tools yum install -y rsync copy files yum install epel-release \[1]: [Rocks Guide][guide] \[2]: [Post "Python 3.x"][postpython] \[3]: [Post "Default packages"][postpackages] \[4]: [Slurm Quick Start Administrator Guide][slurmquickstart] \[5]: [Munge installation guide][munge] \[6]: [Slurm Roll for Rocks Cluster][slurmroll] [guide]: http://central-7-0-x86-64.rocksclusters.org/roll-documentation/base/7.0/ "Rocks 7 Basic User Guide" [postpython]: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2018-July/071999.html "Rocks-discuss - Python 2.7 and 3.x" [postpackages]: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2012-December/060671.html "Rocks-discuss - Default Packages" [slurmquickstart]: https://slurm.schedmd.com/quickstart_admin.html "Slurm Quick Start Administrator Guide" [munge]: https://github.com/dun/munge/wiki/Installation-Guide "Munge installation guide" [slurmroll]: http://129.59.141.57/roll-documentation/slurm/7.0/slurm-roll.pdf "Slurm Roll for Rocks Cluster"