Slurm / 指令 / scontrol, etc
===
###### tags: `Slurm`
###### tags: `Slurm`, `HPC`, `Linux`, `Slinky`, `SlinkyProject`, `scontrol`, `slurm.conf`
<br>
[TOC]
<br>
## `scontrol show node`
### 用法
```bash
# 查看所有節點的詳細資訊
scontrol show node
# 查看節點詳細資訊
scontrol show node <nodename>
```
<br>
### 解釋 node 資訊 (CPU &RAM)
```
scontrol show node bear-0
NodeName=bear-0 Arch=x86_64 CoresPerSocket=22
CPUAlloc=0 CPUEfctv=86 CPUTot=88 CPULoad=0.84
AvailableFeatures=bear
ActiveFeatures=bear
Gres=(null)
NodeAddr=192.168.0.156 NodeHostName=bear-0 Version=25.05.3
OS=Linux 5.15.0-144-generic #157-Ubuntu SMP Mon Jun 16 07:33:10 UTC 2025
RealMemory=257805 AllocMem=0 FreeMem=2000 Sockets=2 Boards=1
CoreSpecCount=1 CPUSpecList=86-87 MemSpecLimit=256731
State=IDLE+DYNAMIC_NORM ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=all,book
BootTime=2025-08-14T07:57:32 SlurmdStartTime=2025-11-17T03:21:01
LastBusyTime=2025-11-14T11:00:58 ResumeAfterTime=None
CfgTRES=cpu=86,mem=257805M,billing=86
AllocTRES=
CurrentWatts=0 AveWatts=0
Comment={"namespace":"slurm","podName":"slurm-worker-bear-0"}
```
- ### Claude Sonnet-4.5 (2025/11/21)
### Slurm Node `bear-0` 的 CPU 與記憶體資訊解析
#### **CPU 相關欄位**
| 欄位 | 數值 | 說明 |
|------|------|------|
| **CPUTot** | 88 | 節點總 CPU 數 (2 sockets × 22 cores × 2 threads = 88) |
| **CPUEfctv** | 86 | **可用於排程的有效 CPU 數**<br>= CPUTot - CoreSpecCount cores<br>= 88 - 2 = 86 |
| **CPUAlloc** | 0 | 目前已分配給作業的 CPU 數 |
| **CPULoad** | 0.84 | 系統負載 (過去 1 分鐘平均) |
| **CoreSpecCount** | 1 | 保留給系統的核心數 (1 core = 2 threads)<br>(不作為 slurm 用途) |
| **CPUSpecList** | 86-87 | 保留給系統的具體 CPU 編號 (2 個 threads)<br>(不作為 slurm 用途) |
| **CoresPerSocket** | 22 | 每個實體插槽的核心數 |
| **Sockets** | 2 | 實體 CPU 插槽數 |
| **ThreadsPerCore** | 2 | 每核心的超執行緒數 (啟用 Hyper-Threading) |
#### **記憶體相關欄位**
| 欄位 | 數值 | 說明 |
|------|------|------|
| **RealMemory** | 257805 MB | 節點總實體記憶體 (~252 GB) |
| **AllocMem** | 0 MB | 已分配給作業的記憶體 |
| **FreeMem** | 2000 MB | 目前可用記憶體 (~2 GB)<br>⚠️ **與 AllocMem 概念不同** |
| **MemSpecLimit** | 256731 MB | **系統保留記憶體上限**<br>這個值代表系統可能保留的最大記憶體量 |
#### **資源追蹤 (TRES)**
```
CfgTRES=cpu=86,mem=257805M,billing=86
AllocTRES=
```
| 欄位 | 說明 |
|------|------|
| **CfgTRES** | 配置的可追蹤資源<br>• cpu=86: 可排程的 CPU 數<br>• mem=257805M: 可用記憶體<br>• billing=86: 計費單位 (通常等於 cpu) |
| **AllocTRES** | 已分配的資源 (目前為空,節點處於 IDLE) |
---
### **欄位間的關係**
#### CPU 計算邏輯
```
CPUTot (88) = Sockets (2) × CoresPerSocket (22) × ThreadsPerCore (2)
CPUEfctv (86) = CPUTot (88) - CPUSpecList 數量 (2)
↑ 可用於 Slurm 排程
CPUSpecList (86-87) = CoreSpecCount (1) × ThreadsPerCore (2)
↑ 保留給系統使用 (slurmd, OS)
```
#### 記憶體計算邏輯
```
RealMemory (257805 MB) = 節點總實體記憶體
可用於排程的記憶體 ≈ RealMemory - MemSpecLimit
(但此例中 MemSpecLimit 接近 RealMemory,可能配置異常)
FreeMem (2000 MB) = 當前 OS 報告的可用記憶體
≠ 可分配給作業的記憶體
```
---
### **關鍵觀察**
1. **CPU 配置合理**
- 保留 2 個 threads (1 core) 給系統,86 個可用於作業
2. **記憶體配置異常** ⚠️
- `MemSpecLimit=256731 MB` 接近 `RealMemory=257805 MB`
- 實際可用記憶體僅約 **1 GB** (257805 - 256731 = 1074 MB)
- 建議檢查 `slurm.conf` 中的 `MemSpecLimit` 或 `CoreSpecMemory` 設定
3. **節點狀態健康**
- `State=IDLE+DYNAMIC_NORM`: 閒置且可動態管理
- `CPULoad=0.84`: 負載極低
- `CPUAlloc=0`: 無作業執行
<br>
---
<br>
## `scontrol show config`
```
$ scontrol show config
Configuration data as of 2025-09-01T02:29:25
AccountingStorageBackupHost = (null)
AccountingStorageEnforce = associations
AccountingStorageHost = slurm-accounting
AccountingStorageExternalHost = (null)
AccountingStorageParameters = (null)
AccountingStoragePort = 6819
AccountingStorageTRES = cpu,mem,energy,node,billing,fs/disk,vmem,pages,gres/gpu,gres/gpumem,gres/gpuutil
AccountingStorageType = accounting_storage/slurmdbd
AccountingStorageUser = N/A
AccountingStoreFlags = (null)
AcctGatherEnergyType = (null)
AcctGatherFilesystemType = (null)
AcctGatherInterconnectType = (null)
AcctGatherNodeFreq = 0 sec
AcctGatherProfileType = (null)
AllowSpecResourcesUsage = no
AuthAltTypes = auth/jwt
AuthAltParameters = jwt_key=/etc/slurm/jwt_hs256.key
AuthInfo = use_client_ids
AuthType = auth/slurm
BatchStartTimeout = 10 sec
BcastExclude = /lib,/usr/lib,/lib64,/usr/lib64
BcastParameters = (null)
BOOT_TIME = 2025-08-28T06:21:24
BurstBufferType = (null)
CertgenParameters = (null)
CertgenType = (null)
CertmgrParameters = (null)
CertmgrType = (null)
CliFilterPlugins = (null)
ClusterName = slurm
CommunicationParameters = block_null_hash
CompleteWait = 0 sec
CpuFreqDef = Unknown
CpuFreqGovernors = OnDemand,Performance,UserSpace
CredType = cred/slurm
DataParserParameters = (null)
DebugFlags = (null)
DefMemPerNode = UNLIMITED
DependencyParameters = (null)
DisableRootJobs = no
EioTimeout = 60
EnforcePartLimits = ALL
Epilog[0] = epilog-01-welcome.sh
EpilogMsgTime = 2000 usec
FairShareDampeningFactor = 1
FederationParameters = (null)
FirstJobId = 1
GresTypes = gpu
GpuFreqDef = (null)
GroupUpdateForce = 1
GroupUpdateTime = 600 sec
HASH_VAL = Different Ours=0x7bdbb0bb Slurmctld=0x6e3d799d
HashPlugin = hash/k12
HealthCheckInterval = 0 sec
HealthCheckNodeState = ANY
HealthCheckProgram = (null)
InactiveLimit = 0 sec
InteractiveStepOptions = --interactive --preserve-env --pty $SHELL
JobAcctGatherFrequency = task=30
JobAcctGatherType = jobacct_gather/cgroup
JobAcctGatherParams = (null)
JobCompHost = localhost
JobCompLoc = (null)
JobCompParams = (null)
JobCompPort = 0
JobCompType = (null)
JobCompUser = root
JobContainerType = (null)
JobDefaults = (null)
JobFileAppend = 0
JobRequeue = 1
JobSubmitPlugins = (null)
KillOnBadExit = 0
KillWait = 30 sec
LaunchParameters = enable_nss_slurm,use_interactive_step,ulimit_pam_adopt
Licenses = (null)
LogTimeFormat = iso8601_ms
MailDomain = (null)
MailProg = /bin/mail
MaxArraySize = 1001
MaxBatchRequeue = 5
MaxDBDMsgs = 24096
MaxJobCount = 10000
MaxJobId = 67043328
MaxMemPerNode = UNLIMITED
MaxNodeCount = 1024
MaxStepCount = 40000
MaxTasksPerNode = 512
MCSPlugin = (null)
MCSParameters = (null)
MessageTimeout = 10 sec
MinJobAge = 300 sec
MpiDefault = (null)
MpiParams = (null)
NEXT_JOB_ID = 999
NodeFeaturesPlugins = (null)
OverTimeLimit = 0 min
PluginDir = /usr/lib/x86_64-linux-gnu/slurm
PlugStackConfig = (null)
PreemptMode = OFF
PreemptParameters = (null)
PreemptType = (null)
PreemptExemptTime = 00:00:00
PrEpParameters = (null)
PrEpPlugins = prep/script
PriorityParameters = (null)
PrioritySiteFactorParameters = (null)
PrioritySiteFactorPlugin = (null)
PriorityDecayHalfLife = 7-00:00:00
PriorityCalcPeriod = 00:05:00
PriorityFavorSmall = no
PriorityFlags =
PriorityMaxAge = 7-00:00:00
PriorityType = priority/multifactor
PriorityUsageResetPeriod = NONE
PriorityWeightAge = 0
PriorityWeightAssoc = 0
PriorityWeightFairShare = 0
PriorityWeightJobSize = 0
PriorityWeightPartition = 0
PriorityWeightQOS = 0
PriorityWeightTRES = (null)
PrivateData = none
ProctrackType = proctrack/cgroup
PrologEpilogTimeout = 65534
PrologFlags = Alloc,Contain
PropagatePrioProcess = 0
PropagateResourceLimits = ALL
PropagateResourceLimitsExcept = (null)
RebootProgram = (null)
ReconfigFlags = (null)
RequeueExit = (null)
RequeueExitHold = (null)
ResumeFailProgram = (null)
ResumeProgram = (null)
ResumeRate = 300 nodes/min
ResumeTimeout = 60 sec
ResvEpilog = (null)
ResvOverRun = 0 min
ResvProlog = (null)
ReturnToService = 2
SchedulerParameters = (null)
SchedulerTimeSlice = 30 sec
SchedulerType = sched/backfill
ScronParameters = (null)
SelectType = select/cons_tres
SelectTypeParameters = CR_CORE_MEMORY
SlurmUser = slurm(401)
SlurmctldAddr = (null)
SlurmctldDebug = debug2
SlurmctldHost[0] = slurm-controller-0
SlurmctldLogFile = /var/log/slurm/slurmctld.log
SlurmctldPort = 6817
SlurmctldSyslogDebug = (null)
SlurmctldPrimaryOffProg = (null)
SlurmctldPrimaryOnProg = (null)
SlurmctldTimeout = 120 sec
SlurmctldParameters = enable_configless,enable_stepmgr
SlurmdDebug = debug2
SlurmdLogFile = /var/log/slurm/slurmd.log
SlurmdParameters = (null)
SlurmdPidFile = /var/run/slurmd.pid
SlurmdPort = 6818
SlurmdSpoolDir = /var/spool/slurmd
SlurmdSyslogDebug = (null)
SlurmdTimeout = 300 sec
SlurmdUser = root(0)
SlurmSchedLogFile = /var/log/slurm/slurmctld.log
SlurmSchedLogLevel = 1
SlurmctldPidFile = /var/run/slurmctld.pid
SLURM_CONF = /etc/slurm/slurm.conf
SLURM_VERSION = 25.05.2
SrunEpilog = (null)
SrunPortRange = 0-0
SrunProlog = (null)
StateSaveLocation = /var/spool/slurmctld
SuspendExcNodes = (null)
SuspendExcParts = (null)
SuspendExcStates = (null)
SuspendProgram = (null)
SuspendRate = 60 nodes/min
SuspendTime = INFINITE
SuspendTimeout = 30 sec
SwitchParameters = (null)
SwitchType = (null)
TaskEpilog = (null)
TaskPlugin = task/cgroup,task/affinity
TaskPluginParam = (null type)
TaskProlog = (null)
TCPTimeout = 2 sec
TLSParameters = (null)
TLSType = tls/none
TmpFS = /tmp
TopologyParam = (null)
TopologyPlugin = topology/flat
TrackWCKey = no
TreeWidth = 16
UsePam = no
UnkillableStepProgram = (null)
UnkillableStepTimeout = 60 sec
VSizeFactor = 0 percent
WaitTime = 0 sec
X11Parameters = (null)
Cgroup Support Configuration:
AllowedRAMSpace = 100.0%
AllowedSwapSpace = 0.0%
CgroupMountpoint = /sys/fs/cgroup
CgroupPlugin = autodetect
ConstrainCores = yes
ConstrainDevices = yes
ConstrainRAMSpace = yes
ConstrainSwapSpace = yes
EnableControllers = yes
EnableExtraControllers = (null)
IgnoreSystemd = yes
IgnoreSystemdOnFailure = yes
MaxRAMPercent = 100.0%
MaxSwapPercent = 100.0%
MemorySwappiness = (null)
MinRAMSpace = 30MB
SystemdTimeout = 1000 ms
Slurmctld(primary) at slurm-controller-0 is UP
```
<br>
---
<br>
## `scontrol update nodename=<name> ...`
```
# 移除 CoreSpec 限制 (需要管理員權限)
scontrol update nodename=c2m4-0 corespeccount=0
# 然後重啟 slurmd 或重新配置節點
scontrol update nodename=c2m4-0 state=resume
```
<br>
---
<br>
## 討論
### scontrol, sacctmgr 持久化問題
- ### Q: 為何 sacctmgr 設定 user, account 可以永久生效,但是 scontrol 設定 partition 卻只是暫時的(沒有永久生效)?
- ### us.anthropic.claude-sonnet-4-5-20250929-v1:0 (2026/01/04)
這是一個很好的觀察!這兩個指令的行為差異是因為它們管理的是 Slurm 的**不同資料層**。讓我解釋一下:
### 為什麼行為不同?
- ### `sacctmgr` - 資料庫層持久化 ✅
**`sacctmgr`** 管理的是 **Slurm 會計資料庫(SlurmDB)** 中的資料:
- **User、Account、QoS** 等資訊都儲存在 **MySQL/MariaDB 資料庫**中
- 透過 `slurmdbd` (Slurm Database Daemon) 進行管理
- **修改會立即寫入資料庫,永久生效**
- 重啟服務後資料仍然存在
```bash
# 這些修改會寫入資料庫,永久生效
sacctmgr add user test001 account=test
sacctmgr add account test cluster=cluster
```
- ### `scontrol` - 記憶體層暫時修改 ⚠️
**`scontrol`** 管理的是 **`slurmctld` (控制守護程式) 的執行時狀態**:
- **Partition、Node** 設定存在於 `slurmctld` 的**記憶體**中
- `scontrol` 只修改**執行中的狀態**
- **重啟 `slurmctld` 後會從設定檔重新載入,記憶體中的修改會遺失**
- 要永久生效必須修改設定檔
```bash
# 這只是暫時修改,重啟後會消失
scontrol create partitionname=test nodes=ALL default=yes
```
---
### 如何讓 Partition 設定永久生效?
要讓 partition 設定持久化,你需要**修改 Slurm 的設定檔**:
#### 方法 1:修改 `slurm.conf` (推薦)
在 `/etc/slurm/slurm.conf` 中加入 partition 定義:
```bash
# 編輯設定檔
vim /etc/slurm/slurm.conf
# 加入 partition 定義
PartitionName=test Nodes=ALL Default=YES State=UP
```
然後**重新載入設定**(不需重啟服務):
```bash
scontrol reconfigure
```
#### 方法 2:使用 `slurm.conf.d/` 目錄(現代化做法)
如果你的 Slurm 版本支援,可以在 `/etc/slurm/slurm.conf.d/` 建立獨立的設定檔:
```bash
# 建立 partition 設定檔
cat > /etc/slurm/slurm.conf.d/partitions.conf <<EOF
PartitionName=test Nodes=ALL Default=YES State=UP
EOF
# 重新載入設定
scontrol reconfigure
```
---
### 總結對比
| 工具 | 管理對象 | 資料存放位置 | 持久化方式 | 重啟後是否保留 |
|------|---------|-------------|-----------|--------------|
| **sacctmgr** | User, Account, QoS | SlurmDB (MySQL/MariaDB) | 自動寫入資料庫 | ✅ 是 |
| **scontrol** | Partition, Node | slurmctld 記憶體 | **需修改設定檔** | ❌ 否(除非寫入設定檔) |
---
### 實務建議
1. **對於 User/Account 管理**:直接使用 `sacctmgr`,無需擔心持久化
2. **對於 Partition/Node 管理**:
- 測試時可用 `scontrol` 快速修改
- **正式環境務必修改 `slurm.conf`**
- 修改後執行 `scontrol reconfigure` 使設定生效
3. **檢查當前設定是否與設定檔一致**:
```bash
# 查看執行中的 partition 設定
scontrol show partition
# 對比設定檔
grep "PartitionName" /etc/slurm/slurm.conf
```
如果你需要我幫你檢查當前專案中的 Slurm 設定檔,或協助修改 partition 設定,請告訴我!
<br>
{%hackmd vaaMgNRPS4KGJDSFG0ZE0w %}