Ubuntu 安裝 Slurm (使用 apt)
apt 安裝 Slurm package (所有節點)
建立設定檔 (所有節點)
以下設定檔都放在 /etc/slurm 底下
- Ubuntu 版本不同,有可能路徑稍微有差異
- 有的版本是 /etc/slurm,有的版本是 /etc/slurm-wlm
- 可以用以下指令確定確切路徑
以下的所有設定檔,所有的節點上都要有一份,且必須完全相同
cgroup.conf
參考 官方範例
slurm.conf
不同的 Slurm 版本,支援的欄位不相同,建議使用自帶的設定檔產生器製作設定檔
查詢安裝位置
用以下指令查詢產生器所在的位置
以上面的輸出為例,產生器在 /usr/share/doc/slurm-wlm/html 底下;由於產生器是 html 寫的,需要透過瀏覽器使用,這邊建議直接在該目錄啟動一個 http server
- 可以用 python 自帶的 http server
- 打開瀏覽器,瀏覽該機器的 port 8000
- 在打開的頁面中找到 configurator.html
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
- 填入必要的欄位
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
- Control Machines: SlurmctldHost
填入 Control node 的 hostname
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
- Compute Machines: NodeName
填入 Compute node 的 hostname,可用 [A-B] 表示 A, A+1, A+2, …, B
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
- Compute Machines: CPU
填入每個 Compute node 的 CPU 核心數量
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
- State Preservation
- StateSaveLocation 改成 /var/spool/slurm
- SlurmdSpoolDir 改成 /var/spool/slurm/slurmd
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
- State Preservation 的兩個路徑可以改成其他路徑,或是按照預設值
- 但是後面調整目錄權限時,需要確定自己調整的目錄是對的
- 完成設定後,按下頁面最下方的 submit,此頁面就會變成 Slurm 設定檔
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
- 全選並複製所有內容
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
- 在 /etc/slurm/slurm.conf 貼上剛剛複製的內容
設定 log 目錄權限
在 /var/spool 底下建立 slurm 目錄
將所有權改成 slurm:slurm
設定 munge key (多節點)
在 control node 上執行
munge key 的路徑是 /etc/munge/munge.key,將這個檔案複製到所有節點上 (並放在每個節點的 /etc/munge 中)
- 可以先將 key 複製到 ~
- 將權限暫時改成 777,保證他在其他節點上可以被操作
- 用 scp (或其他方法) 傳送到其他節點,這邊放到另一個節點的 ~
- 登入到其他節點上,將 key 移動到 /etc/munge 底下
- 修改權限,以符合安全要求
啟動 Slurm
- control node
- compute node
確認是否啟動成功