###### tags: `env`
{%hackmd Eg_dogKwTYGxkdzZ3O5Gkg %}
# Slurm Installation
## slurm 23.02.2
### step1: download & install
Extremely easy, go to [NI SP](https://www.ni-sp.com/slurm-build-script-and-container-commercial-support/), download the [script](http://www.ni-sp.com/wp-content/uploads/2019/10/SLURM_Ubuntu_installation.sh), and just run it.
- For the 23.02.2 version, you should
- export VER=23.02.2, or
- add VER=23.02.2 at the top of the script
---
### step2: modifiy setting
After it finished, please modified the config file `/etc/slurm/slurm.conf`. The following is an example:
```rust
SlurmctldHost=srv109(192.168.0.109)
MpiDefault=none
AuthType=auth/munge
ProctrackType=proctrack/cgroup
ReturnToService=2
SlurmctldPidFile=/run/slurmctld.pid
SlurmdPidFile=/run/slurmd.pid
SlurmdSpoolDir=/var/spool/slurm/slurmd
SlurmUser=slurm
StateSaveLocation=/var/spool/slurm/
SwitchType=switch/none
TaskPlugin=task/affinity
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core
AccountingStorageType=accounting_storage/none
ClusterName=compute
JobAcctGatherType=jobacct_gather/none
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdLogFile=/var/log/slurmd.log
NodeName=srv109 NodeAddr=192.168.0.109 State=UNKNOWN CPUs=24 Boards=1 SocketsPerBoard=1 CoresPerSocket=24 ThreadsPerCore=1 RealMemory=64081
NodeName=srv102 NodeAddr=192.168.0.102 State=UNKNOWN CPUs=12 Boards=1 SocketsPerBoard=1 CoresPerSocket=12 ThreadsPerCore=1 RealMemory=31385
PartitionName=compute Nodes=srv109,srv102 Default=YES MaxTime=INFINITE State=UP
```
- Note that when the hardware setting not CPUs Socket Boards=1:1(hw) SocketsPerBoard
Note that if you do not have specific hardware specifications, such as the number of CPUs, Socket, SocketsPerBoard, ...etc, the default setting will only utilize one CPU.
- You can use the following command `slurmd -C` to view the hardware specifications automatically detected by slurmd. That's take srv109 as an example:

- If a node is designed with heterogeneous cores, taking srv109 as an example, it has an i7-13700 CPU with 8 P-cores and 8 E-cores. Each P-core corresponds to 2 threads, while each E-core corresponds to 1 thread. In this case, the equation socket * core * thread does not equal the total number of CPUs. Currently, there are two approaches:
- Not using E-cores: In this approach, the E-cores are not utilized, and only the P-cores are considered as the available processing units. So, in this case, the number of CPUs would be equal to the number of P-cores, which is 8.
- Treating all threads as individual cores: In this approach, all threads, including both P-cores and E-cores, are treated as separate cores. Therefore, the total number of CPUs would be equal to the number of threads, which is calculated as follows: P-cores (8) * threads per P-core (2) + E-cores (8) * threads per E-core (1) = 24 cores.
Please note that the term "CPU" can be used to refer to either physical cores or logical threads depending on the context.
---
### step3: auth/munge
You need to have a key that can be used by all nodes, both slurmctld and slurmd.
- If you follow the previous steps for installation, choose one key from all the nodes ('`/etc/munge/munge.key`')
- or you can generate a now one.
```bash
sudo rm /etc/munge/munge.key
sudo /usr/sbin/mungekey
```
It will generate a key `/etc/munge/munge.key`.
- copy this key to the folder `/etc/munge/` of all nodes, and change the ownership and the permissions of the key
```bash
sudo chmod 400 /etc/munge/munge.key
sudo chown munge: /etc/munge/munge.key
```
### step4: restart munge & slurm services
You should restart this two service for all nodes, and munge should restart before slurm
```bash
systemctl restart munge
systemctl restart slurmctld
systemctl restart slurmd
```
## Prescription - no guarantee of effectiveness
- STATE=drain
- wake it up by
```bash
sudo scontrol update nodename=YOUR_NODE_NAME state=resume
```