# PBS
## [pbs絕對正確的教學,保證一看就會,手把手教學](https://memes.tw/user-gif/a2cc82a36c9dd92e62a356de9d17901d.gif)
## 介紹
PBS為cluster job調節系統-> 目前有三個分支:
openPBS、PBS pro、Torque
基本須知:PBS只做工作調節而已,所以必須在所有環境已經設置的情況下,他會去調用各個伺服器的資源,worker之間早已免密碼
PBS架構from[PBS 排程及COMET Linux 系統架構](http://web.che.ntu.edu.tw/stlin/freshman/pbsskill)

另一圖from[openpbs centos 7 配置安装手把手教程](http://thisis.yorven.site/blog/index.php/2020/12/06/openpbs-install-instructions/)

## openpbs
### download from github
https://github.com/openpbs/openpbs
`git clone https://github.com/openpbs/openpbs`
For centos-7
```shell=
## Install the prerequisite packages for building PBS.
yum install -y gcc make rpm-build libtool hwloc-devel \
libX11-devel libXt-devel libedit-devel libical-devel \
ncurses-devel perl postgresql-devel postgresql-contrib python3-devel tcl-devel \
tk-devel swig expat-devel openssl-devel libXext libXft \
autoconf automake gcc-c++
## Install the prerequisite packages for running PBS. In addition to the commands below, you should also install a text editor of your choosing (vim, emacs, gedit, etc.).
yum install -y expat libedit postgresql-server postgresql-contrib python3 \
sendmail sudo tcl tk libical
## 進入資料夾
cd openpbs
## Generate the configure script and Makefiles.
./autogen.sh
## Configure the build for your environment. You may utilize theparameters displayed in the previous step:./configure --help.
./configure --prefix=/opt/pbs
## Build PBS by running "make".
make
##Install PBS. Use sudo to run the command as root.
make inatll
## Configure PBS by executing the post-install script.
sudo /opt/pbs/libexec/pbs_postinstall
## Edit /etc/pbs.conf to configure the PBS services that should be started. If you are installing PBS on only one system, you should change the value of PBS_START_MOM from zero to one. If you use vi as your editor, you would run:
sudo vi /etc/pbs.conf
## Some file permissions must be modified to add SUID privilege.
sudo chmod 4755 /opt/pbs/sbin/pbs_iff /opt/pbs/sbin/pbs_rcp
## Start the PBS services.
sudo /etc/init.d/pbs start
## All configured PBS services should now be running. Update your PATH and MANPATH variables by sourcing the appropriate PBS profile or logging out and back in.
##For Bourne shell (or similar) run the following:
. /etc/profile.d/pbs.sh
```
:::warning
:fire:阿然後咧
:::
* 以上是github開源碼弄得 阿不知道要怎麼弄了 網路上一時找不到 之後再說
### RPM下載
另一個方法是用rpm下載 有些版本才有
* 版本列表 https://github.com/openpbs/openpbs/tags
以下為19.1.3版本
```shell=
## prerequsite
yum install -y gcc make rpm-build libtool hwloc-devel \
libX11-devel libXt-devel libedit-devel libical-devel \
ncurses-devel perl postgresql-devel postgresql-contrib python3-devel tcl-devel \
tk-devel swig expat-devel openssl-devel libXext libXft \
autoconf automake gcc-c++
yum install -y expat libedit postgresql-server postgresql-contrib python3 \
sendmail sudo tcl tk libical
yum install -y gzip perl-Env perl-Switch
## install
cd /
wget https://github.com/openpbs/openpbs/releases/download/v19.1.3/pbspro_19.1.3.centos_7.zip
unzip pbspro_19.1.3.centos_7
cd pbspro_19.1.3.centos_7
```
因為[4/7遇到的問題](https://hackmd.io/magj1td5RIiqhGLoK2avmw#48)所以要改psql的版本
* server 台
```shell=
rpm -ivh pbspro-server-19.1.3-0.x86_64.rpm
## 因為要把client也裝在server底下(盡量不要這麼做)
rpm -ivh -nodeps pbspro-client-19.1.3-0.x86_64.rpm
vim /etc/pbs.conf
## 通常沒啥需要改的 除了PBS_START_MOM之外的PBS_START_*都要是 1
## 阿因為遇到一些問題 要改psql的版本(參照上面連結)
## 找現有版本
rpm -qa | grep postgresql
## 把 9.2.24刪掉
## 下載 PostgreSQL 套件庫資訊
wget https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
## 安裝 PostgreSQL 套件庫資訊與 EPEL
sudo yum install pgdg-redhat-repo-latest.noarch.rpm epel-release
## 安裝指定版本的 PostgreSQL
sudo yum install postgresql11-server postgresql11-contrib
## 初始化 PostgreSQL 資料庫
sudo /usr/pgsql-11/bin/postgresql-11-setup initdb
## 啟動 PostgreSQL 伺服器
sudo systemctl start postgresql-11
sudo systemctl enable postgresql-11
## 在bashrc或/etc/profile 中
export PATH=$PATH:/usr/pgsql-11/bin
## 初始化PBS
/opt/pbs/libexec/pbs_habitat
## 開啟PBS
/etc/init.d/pbs start
```
* worker台
```shell=
rpm -ivh pbspro-execution-19.1.3-0.x86_64.rpm
vim /etc/pbs.conf
## 將「CHANGE_THIS_TO_PBS_PRO_SERVER_HOSTNAME」取代為「pbsmaster」這個Hostname。\
#另外也請注意,裡面的PBS_START_MOM要是 1,其他的PBS_START_* 都應該要是 0。
vim /var/spool/pbs/mom_priv/config
## 將「CHANGE_THIS_TO_PBS_PRO_SERVER_HOSTNAME」取代為「pbsmaster」這個Hostname
## 開啟PBS
service pbs start
```
* 完成上述動作後 回到server台
```shell=
## 加入worker節點
qmgr -c "create node A01"
qmgr -c "create node A03"
qmgr -c "create node A04"
## 確認節點
pbsnodes -a
```
至此 服務已可使用
## PBS指令
[Cluster Leopard 2.0](http://leopard.mcl.math.ncu.edu.tw/usage.xhtml)
[PBS 排程及COMET Linux 系統架構](http://web.che.ntu.edu.tw/stlin/freshman/pbsskill)
[PBS 命令与使用 - Unix/Linux - 一夜孤城](http://www.360doc.com/content/11/0225/09/4182758_95923140.shtml)
[PBS Professional Quick Start Guide](https://www.cerit-sc.cz/media/2107982/tahak-pbs-pro-small.pdf)
[Commonly Used PBS Commands](https://www.nas.nasa.gov/hecc/support/kb/commonly-used-pbs-commands_174.html)
主要有在用的是
* qsub:To submit a batch job to the specified queue using a script(提交作業腳本)
* `qsub -I -q queue_name -l resource_list `看有多少resource-interactive job
* qstat:To display queue information(查詢作業狀態)
* qdel:To delete (cancel) a job(刪除已提交的作業)
* pbsnodes可查看nodes狀態
## PBS script
取自國網中心台灣杉1號中文使用手冊
```
# 指定資源種類與數量
#PBS -l <resource name>=<value>
# 指定job名稱 (選擇性)
#PBS -N <job name>
# 指定queue名稱
#PBS –q <destination queue>
# 指定計畫名稱
#PBS –P <project name>
# 合併std-err 與std-out (選擇性)
#PBS -j eo
```
EX:
```
# 序列job (1核心)
#PBS -l select=1:ncpus=1
# MPI job (2 節點、每節點 8 個處理器)
#PBS -l select=2:ncpus=8:mpiprocs=8
# 結合MPI和OpenMP的job (2 MPI 與 16 threads)
#PBS -l select=2:ncpus=8:mpiprocs=1:ompthreads=8
# 結合MPI和CUDA的job (每節點 4 GPUs)
#PBS -l select=2:ncpus=40:ngpus=4:mpiprocs=4
# 計算時間為1小時
#PBS -l walltime=1:00:00
```
PBS_O_WORKDIR
The absolute path of the current working directory of the qsub utility process.
## 出現問題時
`tracejob [job ID]`
會顯示出log,像下方

或是使用
`qstat -as job_id`
* 可以去worker pbs裡面的MOM LOG查詢計算時的出錯事項
```
cd /var/spool/pbs/mom_logs
```
* 未傳送的可以看 在
```
/var/spool/pbs/undelivered
```
PBS用scp傳送檔案時 需要在user SSH免密碼的情況以及檔案目錄可供客戶端讀寫
```
chmod 777 /pwd
```
## 成功檔案
## 自己下載來用小結
如果用我們超電A01~A04這樣的話 檔案共享用NFS 排程用PBS 在沒用容器下就是A02叫A01、A03、A04來算(A02為server 其他為worker) A02給worker執行檔 各worker用自己的mpirun執行A02給的程式執行檔(其他worker要吃到執行檔的路徑,方法為放在NFS共享資料夾下面)或是export PATH
推測容器平行計算加PBS的方法為 每個worker module load singularity和mpi 開啟client給的SIF檔 然後對client的input檔(容器外)做client要求的資源分配計算 程式執行檔在worker自己的容器內 ~~mpi理論上也是~~
* 事實上MPI可內外溝通 所以版本要一樣
## [PBS總結懶人包](https://truth.bahamut.com.tw/s01/202009/0d896d7b3b965d48d674e65eeccc27cb.JPG)
## PBS上屆的nemo.pbs
```shell=
#!/bin/bash
#PBS -N nemo-s12
#PBS -q normal
#PBS -l select=32:ncpus=24:mpiprocs=23:ompthreads=1:mem=96gb
#PBS -l walltime=10:00
#PBS -P 50000042
#PBS -j oe
#PBS -o pbsout.txt
cd $PBS_O_WORKDIR
module purge
module load singularity/3.4.0 boost/1.71.0/intel19/parallel
nn_GYRE=25
ln_bench=true
jpkglo=31
# We set the domain decomposition to 32*23 because
# NEMO not allow the 32*24 decomposition
jpni=32
jpnj=23
ln_timing=ture
node=32
ncpu=23
hostfile=hostfile
SINRUN=`which singularity`
SINPAR="exec --no-home -w --pwd /opt/NEMO/cfgs/gyre_pisces_test/EXP00/"
SANDBOX=nemo
NEMO_BIN_PATH=$SANDBOX/opt/NEMO/cfgs/gyre_pisces_test/EXP00
MPIRUN=`which mpirun`
MPIPAR="-genv FI_TCP_IFACE=mlx5_0:1 -ppn 23"
RESDIR=latest-test
echo "======================================================================================"
echo "nn_GYRE = $nn_GYRE"
echo "ln_bench = $ln_bench"
echo "jpkglo = $jpkglo"
echo "--------------------------------------------"
echo "jpni = $jpni"
echo "jpnj = $jpnj"
echo "ln_timing = $ln_timing"
echo "--------------------------------------------"
echo "node = $node"
echo "ncpu = $ncpu"
echo "hostfile = $hostfile"
echo "--------------------------------------------"
echo "SINRUN = $SINRUN"
echo "SINPAR = $SINPAR"
echo "SANDBOX = $SANDBOX"
echo "--------------------------------------------"
echo "NEMO_BIN_PATH = $NEMO_BIN_PATH"
echo "MPIRUN = $MPIRUN"
echo "MPIPAR = $MPIPAR"
echo "--------------------------------------------"
echo "RESDIR = $RESDIR"
echo "======================================================================================"
# generate the hostfile for mpi
rm -f $hostfile
for (( i=1; i<=$node; i=i+1 ))
do
S[${i}]=`sed -n "$(($i*$ncpu))p" $PBS_NODEFILE`
echo "${S[$i]}" >> $hostfile
done
#modify the configuration to namelist
sed -i "38c \ nn_GYRE = $nn_GYRE ! GYRE resolution [1/degrees]" $NEMO_BIN_PATH/namelist_cfg
sed -i "39c \ ln_bench = .$ln_bench. ! ! =T benchmark with gyre: the gridsize is kept constant" $NEMO_BIN_PATH/namelist_cfg
sed -i "40c \ jpkglo = $jpkglo ! number of model levels" $NEMO_BIN_PATH/namelist_cfg
sed -i "1301c \ jpni = $jpni ! jpni number of processors following i (set automatically if < 1)" $NEMO_BIN_PATH/namelist_ref
sed -i "1302c \ jpnj = $jpnj ! jpnj number of processors following j (set automatically if < 1)" $NEMO_BIN_PATH/namelist_ref
sed -i "1326c \ ln_timing = .$ln_timing. ! timing by routine write out in timing.output file" $NEMO_BIN_PATH/namelist_ref
#run NEMO
time -p $MPIRUN $MPIPAR $SINRUN $SINPAR $SANDBOX ./nemo
#move the output
cd $NEMO_BIN_PATH
mkdir -p $PBS_O_WORKDIR/result/$RESDI
mv ocean.output communication_report.txt layout.dat output.namelist.* timing.output *.nc $PBS_O_WORKDIR/result/$RESDIR
cp namelist_* $PBS_O_WORKDIR/result/$RESDIR
```
## 互動式job
```
qsub -X -I -l select=1:ncpus=5:ngpus=1 -l walltime=01:00:00 -q dgx -A install -P 50000033
```