Slurm / 指令 / scontrol, etc

Slurm / 指令 / scontrol, etc === ###### tags: `Slurm` ###### tags: `Slurm`, `HPC`, `Linux`, `Slinky`, `SlinkyProject`, `scontrol`, `slurm.conf` [TOC] ## `scontrol show node` ### 用法 ```bash # 查看所有節點的詳細資訊 scontrol show node # 查看節點詳細資訊 scontrol show node <nodename> ``` ### 解釋 node 資訊 (CPU &RAM) ``` scontrol show node bear-0 NodeName=bear-0 Arch=x86_64 CoresPerSocket=22 CPUAlloc=0 CPUEfctv=86 CPUTot=88 CPULoad=0.84 AvailableFeatures=bear ActiveFeatures=bear Gres=(null) NodeAddr=192.168.0.156 NodeHostName=bear-0 Version=25.05.3 OS=Linux 5.15.0-144-generic #157-Ubuntu SMP Mon Jun 16 07:33:10 UTC 2025 RealMemory=257805 AllocMem=0 FreeMem=2000 Sockets=2 Boards=1 CoreSpecCount=1 CPUSpecList=86-87 MemSpecLimit=256731 State=IDLE+DYNAMIC_NORM ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=all,book BootTime=2025-08-14T07:57:32 SlurmdStartTime=2025-11-17T03:21:01 LastBusyTime=2025-11-14T11:00:58 ResumeAfterTime=None CfgTRES=cpu=86,mem=257805M,billing=86 AllocTRES= CurrentWatts=0 AveWatts=0 Comment={"namespace":"slurm","podName":"slurm-worker-bear-0"} ``` - ### Claude Sonnet-4.5 (2025/11/21) ### Slurm Node `bear-0` 的 CPU 與記憶體資訊解析 #### **CPU 相關欄位** | 欄位 | 數值 | 說明 | |------|------|------| | **CPUTot** | 88 | 節點總 CPU 數 (2 sockets × 22 cores × 2 threads = 88) | | **CPUEfctv** | 86 | **可用於排程的有效 CPU 數** = CPUTot - CoreSpecCount cores = 88 - 2 = 86 | | **CPUAlloc** | 0 | 目前已分配給作業的 CPU 數 | | **CPULoad** | 0.84 | 系統負載 (過去 1 分鐘平均) | | **CoreSpecCount** | 1 | 保留給系統的核心數 (1 core = 2 threads) (不作為 slurm 用途) | | **CPUSpecList** | 86-87 | 保留給系統的具體 CPU 編號 (2 個 threads) (不作為 slurm 用途) | | **CoresPerSocket** | 22 | 每個實體插槽的核心數 | | **Sockets** | 2 | 實體 CPU 插槽數 | | **ThreadsPerCore** | 2 | 每核心的超執行緒數 (啟用 Hyper-Threading) | #### **記憶體相關欄位** | 欄位 | 數值 | 說明 | |------|------|------| | **RealMemory** | 257805 MB | 節點總實體記憶體 (~252 GB) | | **AllocMem** | 0 MB | 已分配給作業的記憶體 | | **FreeMem** | 2000 MB | 目前可用記憶體 (~2 GB) ⚠️ **與 AllocMem 概念不同** | | **MemSpecLimit** | 256731 MB | **系統保留記憶體上限** 這個值代表系統可能保留的最大記憶體量 | #### **資源追蹤 (TRES)** ``` CfgTRES=cpu=86,mem=257805M,billing=86 AllocTRES= ``` | 欄位 | 說明 | |------|------| | **CfgTRES** | 配置的可追蹤資源 • cpu=86: 可排程的 CPU 數 • mem=257805M: 可用記憶體 • billing=86: 計費單位 (通常等於 cpu) | | **AllocTRES** | 已分配的資源 (目前為空,節點處於 IDLE) | --- ### **欄位間的關係** #### CPU 計算邏輯 ``` CPUTot (88) = Sockets (2) × CoresPerSocket (22) × ThreadsPerCore (2) CPUEfctv (86) = CPUTot (88) - CPUSpecList 數量 (2) ↑ 可用於 Slurm 排程 CPUSpecList (86-87) = CoreSpecCount (1) × ThreadsPerCore (2) ↑ 保留給系統使用 (slurmd, OS) ``` #### 記憶體計算邏輯 ``` RealMemory (257805 MB) = 節點總實體記憶體可用於排程的記憶體 ≈ RealMemory - MemSpecLimit (但此例中 MemSpecLimit 接近 RealMemory,可能配置異常) FreeMem (2000 MB) = 當前 OS 報告的可用記憶體 ≠ 可分配給作業的記憶體 ``` --- ### **關鍵觀察** 1. **CPU 配置合理** - 保留 2 個 threads (1 core) 給系統,86 個可用於作業 2. **記憶體配置異常** ⚠️ - `MemSpecLimit=256731 MB` 接近 `RealMemory=257805 MB` - 實際可用記憶體僅約 **1 GB** (257805 - 256731 = 1074 MB) - 建議檢查 `slurm.conf` 中的 `MemSpecLimit` 或 `CoreSpecMemory` 設定 3. **節點狀態健康** - `State=IDLE+DYNAMIC_NORM`: 閒置且可動態管理 - `CPULoad=0.84`: 負載極低 - `CPUAlloc=0`: 無作業執行 --- ## `scontrol show config` ``` $ scontrol show config Configuration data as of 2025-09-01T02:29:25 AccountingStorageBackupHost = (null) AccountingStorageEnforce = associations AccountingStorageHost = slurm-accounting AccountingStorageExternalHost = (null) AccountingStorageParameters = (null) AccountingStoragePort = 6819 AccountingStorageTRES = cpu,mem,energy,node,billing,fs/disk,vmem,pages,gres/gpu,gres/gpumem,gres/gpuutil AccountingStorageType = accounting_storage/slurmdbd AccountingStorageUser = N/A AccountingStoreFlags = (null) AcctGatherEnergyType = (null) AcctGatherFilesystemType = (null) AcctGatherInterconnectType = (null) AcctGatherNodeFreq = 0 sec AcctGatherProfileType = (null) AllowSpecResourcesUsage = no AuthAltTypes = auth/jwt AuthAltParameters = jwt_key=/etc/slurm/jwt_hs256.key AuthInfo = use_client_ids AuthType = auth/slurm BatchStartTimeout = 10 sec BcastExclude = /lib,/usr/lib,/lib64,/usr/lib64 BcastParameters = (null) BOOT_TIME = 2025-08-28T06:21:24 BurstBufferType = (null) CertgenParameters = (null) CertgenType = (null) CertmgrParameters = (null) CertmgrType = (null) CliFilterPlugins = (null) ClusterName = slurm CommunicationParameters = block_null_hash CompleteWait = 0 sec CpuFreqDef = Unknown CpuFreqGovernors = OnDemand,Performance,UserSpace CredType = cred/slurm DataParserParameters = (null) DebugFlags = (null) DefMemPerNode = UNLIMITED DependencyParameters = (null) DisableRootJobs = no EioTimeout = 60 EnforcePartLimits = ALL Epilog[0] = epilog-01-welcome.sh EpilogMsgTime = 2000 usec FairShareDampeningFactor = 1 FederationParameters = (null) FirstJobId = 1 GresTypes = gpu GpuFreqDef = (null) GroupUpdateForce = 1 GroupUpdateTime = 600 sec HASH_VAL = Different Ours=0x7bdbb0bb Slurmctld=0x6e3d799d HashPlugin = hash/k12 HealthCheckInterval = 0 sec HealthCheckNodeState = ANY HealthCheckProgram = (null) InactiveLimit = 0 sec InteractiveStepOptions = --interactive --preserve-env --pty $SHELL JobAcctGatherFrequency = task=30 JobAcctGatherType = jobacct_gather/cgroup JobAcctGatherParams = (null) JobCompHost = localhost JobCompLoc = (null) JobCompParams = (null) JobCompPort = 0 JobCompType = (null) JobCompUser = root JobContainerType = (null) JobDefaults = (null) JobFileAppend = 0 JobRequeue = 1 JobSubmitPlugins = (null) KillOnBadExit = 0 KillWait = 30 sec LaunchParameters = enable_nss_slurm,use_interactive_step,ulimit_pam_adopt Licenses = (null) LogTimeFormat = iso8601_ms MailDomain = (null) MailProg = /bin/mail MaxArraySize = 1001 MaxBatchRequeue = 5 MaxDBDMsgs = 24096 MaxJobCount = 10000 MaxJobId = 67043328 MaxMemPerNode = UNLIMITED MaxNodeCount = 1024 MaxStepCount = 40000 MaxTasksPerNode = 512 MCSPlugin = (null) MCSParameters = (null) MessageTimeout = 10 sec MinJobAge = 300 sec MpiDefault = (null) MpiParams = (null) NEXT_JOB_ID = 999 NodeFeaturesPlugins = (null) OverTimeLimit = 0 min PluginDir = /usr/lib/x86_64-linux-gnu/slurm PlugStackConfig = (null) PreemptMode = OFF PreemptParameters = (null) PreemptType = (null) PreemptExemptTime = 00:00:00 PrEpParameters = (null) PrEpPlugins = prep/script PriorityParameters = (null) PrioritySiteFactorParameters = (null) PrioritySiteFactorPlugin = (null) PriorityDecayHalfLife = 7-00:00:00 PriorityCalcPeriod = 00:05:00 PriorityFavorSmall = no PriorityFlags = PriorityMaxAge = 7-00:00:00 PriorityType = priority/multifactor PriorityUsageResetPeriod = NONE PriorityWeightAge = 0 PriorityWeightAssoc = 0 PriorityWeightFairShare = 0 PriorityWeightJobSize = 0 PriorityWeightPartition = 0 PriorityWeightQOS = 0 PriorityWeightTRES = (null) PrivateData = none ProctrackType = proctrack/cgroup PrologEpilogTimeout = 65534 PrologFlags = Alloc,Contain PropagatePrioProcess = 0 PropagateResourceLimits = ALL PropagateResourceLimitsExcept = (null) RebootProgram = (null) ReconfigFlags = (null) RequeueExit = (null) RequeueExitHold = (null) ResumeFailProgram = (null) ResumeProgram = (null) ResumeRate = 300 nodes/min ResumeTimeout = 60 sec ResvEpilog = (null) ResvOverRun = 0 min ResvProlog = (null) ReturnToService = 2 SchedulerParameters = (null) SchedulerTimeSlice = 30 sec SchedulerType = sched/backfill ScronParameters = (null) SelectType = select/cons_tres SelectTypeParameters = CR_CORE_MEMORY SlurmUser = slurm(401) SlurmctldAddr = (null) SlurmctldDebug = debug2 SlurmctldHost[0] = slurm-controller-0 SlurmctldLogFile = /var/log/slurm/slurmctld.log SlurmctldPort = 6817 SlurmctldSyslogDebug = (null) SlurmctldPrimaryOffProg = (null) SlurmctldPrimaryOnProg = (null) SlurmctldTimeout = 120 sec SlurmctldParameters = enable_configless,enable_stepmgr SlurmdDebug = debug2 SlurmdLogFile = /var/log/slurm/slurmd.log SlurmdParameters = (null) SlurmdPidFile = /var/run/slurmd.pid SlurmdPort = 6818 SlurmdSpoolDir = /var/spool/slurmd SlurmdSyslogDebug = (null) SlurmdTimeout = 300 sec SlurmdUser = root(0) SlurmSchedLogFile = /var/log/slurm/slurmctld.log SlurmSchedLogLevel = 1 SlurmctldPidFile = /var/run/slurmctld.pid SLURM_CONF = /etc/slurm/slurm.conf SLURM_VERSION = 25.05.2 SrunEpilog = (null) SrunPortRange = 0-0 SrunProlog = (null) StateSaveLocation = /var/spool/slurmctld SuspendExcNodes = (null) SuspendExcParts = (null) SuspendExcStates = (null) SuspendProgram = (null) SuspendRate = 60 nodes/min SuspendTime = INFINITE SuspendTimeout = 30 sec SwitchParameters = (null) SwitchType = (null) TaskEpilog = (null) TaskPlugin = task/cgroup,task/affinity TaskPluginParam = (null type) TaskProlog = (null) TCPTimeout = 2 sec TLSParameters = (null) TLSType = tls/none TmpFS = /tmp TopologyParam = (null) TopologyPlugin = topology/flat TrackWCKey = no TreeWidth = 16 UsePam = no UnkillableStepProgram = (null) UnkillableStepTimeout = 60 sec VSizeFactor = 0 percent WaitTime = 0 sec X11Parameters = (null) Cgroup Support Configuration: AllowedRAMSpace = 100.0% AllowedSwapSpace = 0.0% CgroupMountpoint = /sys/fs/cgroup CgroupPlugin = autodetect ConstrainCores = yes ConstrainDevices = yes ConstrainRAMSpace = yes ConstrainSwapSpace = yes EnableControllers = yes EnableExtraControllers = (null) IgnoreSystemd = yes IgnoreSystemdOnFailure = yes MaxRAMPercent = 100.0% MaxSwapPercent = 100.0% MemorySwappiness = (null) MinRAMSpace = 30MB SystemdTimeout = 1000 ms Slurmctld(primary) at slurm-controller-0 is UP ``` --- ## `scontrol update nodename=<name> ...` ``` # 移除 CoreSpec 限制 (需要管理員權限) scontrol update nodename=c2m4-0 corespeccount=0 # 然後重啟 slurmd 或重新配置節點 scontrol update nodename=c2m4-0 state=resume ``` --- ## 討論 ### scontrol, sacctmgr 持久化問題 - ### Q: 為何 sacctmgr 設定 user, account 可以永久生效，但是 scontrol 設定 partition 卻只是暫時的(沒有永久生效)？ - ### us.anthropic.claude-sonnet-4-5-20250929-v1:0 (2026/01/04) 這是一個很好的觀察！這兩個指令的行為差異是因為它們管理的是 Slurm 的**不同資料層**。讓我解釋一下： ### 為什麼行為不同？ - ### `sacctmgr` - 資料庫層持久化 ✅ **`sacctmgr`** 管理的是 **Slurm 會計資料庫（SlurmDB）** 中的資料： - **User、Account、QoS** 等資訊都儲存在 **MySQL/MariaDB 資料庫**中 - 透過 `slurmdbd` (Slurm Database Daemon) 進行管理 - **修改會立即寫入資料庫，永久生效** - 重啟服務後資料仍然存在 ```bash # 這些修改會寫入資料庫，永久生效 sacctmgr add user test001 account=test sacctmgr add account test cluster=cluster ``` - ### `scontrol` - 記憶體層暫時修改 ⚠️ **`scontrol`** 管理的是 **`slurmctld` (控制守護程式) 的執行時狀態**： - **Partition、Node** 設定存在於 `slurmctld` 的**記憶體**中 - `scontrol` 只修改**執行中的狀態** - **重啟 `slurmctld` 後會從設定檔重新載入，記憶體中的修改會遺失** - 要永久生效必須修改設定檔 ```bash # 這只是暫時修改，重啟後會消失 scontrol create partitionname=test nodes=ALL default=yes ``` --- ### 如何讓 Partition 設定永久生效？要讓 partition 設定持久化，你需要**修改 Slurm 的設定檔**： #### 方法 1：修改 `slurm.conf` （推薦）在 `/etc/slurm/slurm.conf` 中加入 partition 定義： ```bash # 編輯設定檔 vim /etc/slurm/slurm.conf # 加入 partition 定義 PartitionName=test Nodes=ALL Default=YES State=UP ``` 然後**重新載入設定**（不需重啟服務）： ```bash scontrol reconfigure ``` #### 方法 2：使用 `slurm.conf.d/` 目錄（現代化做法）如果你的 Slurm 版本支援，可以在 `/etc/slurm/slurm.conf.d/` 建立獨立的設定檔： ```bash # 建立 partition 設定檔 cat > /etc/slurm/slurm.conf.d/partitions.conf <<EOF PartitionName=test Nodes=ALL Default=YES State=UP EOF # 重新載入設定 scontrol reconfigure ``` --- ### 總結對比 | 工具 | 管理對象 | 資料存放位置 | 持久化方式 | 重啟後是否保留 | |------|---------|-------------|-----------|--------------| | **sacctmgr** | User, Account, QoS | SlurmDB (MySQL/MariaDB) | 自動寫入資料庫 | ✅ 是 | | **scontrol** | Partition, Node | slurmctld 記憶體 | **需修改設定檔** | ❌ 否（除非寫入設定檔） | --- ### 實務建議 1. **對於 User/Account 管理**：直接使用 `sacctmgr`，無需擔心持久化 2. **對於 Partition/Node 管理**： - 測試時可用 `scontrol` 快速修改 - **正式環境務必修改 `slurm.conf`** - 修改後執行 `scontrol reconfigure` 使設定生效 3. **檢查當前設定是否與設定檔一致**： ```bash # 查看執行中的 partition 設定 scontrol show partition # 對比設定檔 grep "PartitionName" /etc/slurm/slurm.conf ``` 如果你需要我幫你檢查當前專案中的 Slurm 設定檔，或協助修改 partition 設定，請告訴我！ {%hackmd vaaMgNRPS4KGJDSFG0ZE0w %}