---
# System prepended metadata

title: PBS

---

# PBS
## [pbs絕對正確的教學，保證一看就會，手把手教學](https://memes.tw/user-gif/a2cc82a36c9dd92e62a356de9d17901d.gif)
## 介紹
PBS為cluster job調節系統-> 目前有三個分支:
openPBS、PBS pro、Torque

基本須知:PBS只做工作調節而已，所以必須在所有環境已經設置的情況下，他會去調用各個伺服器的資源，worker之間早已免密碼

PBS架構from[PBS 排程及COMET Linux 系統架構](http://web.che.ntu.edu.tw/stlin/freshman/pbsskill)
![](https://i.imgur.com/MzXJ6ah.png)
另一圖from[openpbs centos 7 配置安装手把手教程](http://thisis.yorven.site/blog/index.php/2020/12/06/openpbs-install-instructions/)
![](https://i.imgur.com/vKC8p8Z.png)

## openpbs
### download from github

https://github.com/openpbs/openpbs

`git clone https://github.com/openpbs/openpbs`

For centos-7
```shell=
## Install the prerequisite packages for building PBS.
yum install -y gcc make rpm-build libtool hwloc-devel \
      libX11-devel libXt-devel libedit-devel libical-devel \
      ncurses-devel perl postgresql-devel postgresql-contrib python3-devel tcl-devel \
      tk-devel swig expat-devel openssl-devel libXext libXft \
      autoconf automake gcc-c++
## Install the prerequisite packages for running PBS. In addition to the commands below, you should also install a text editor of your choosing (vim, emacs, gedit, etc.).

yum install -y expat libedit postgresql-server postgresql-contrib python3 \
      sendmail sudo tcl tk libical
## 進入資料夾
cd openpbs
## Generate the configure script and Makefiles.
./autogen.sh
## Configure the build for your environment. You may utilize theparameters displayed in the previous step:./configure --help.
./configure --prefix=/opt/pbs
## Build PBS by running "make".
make
##Install PBS. Use sudo to run the command as root.
make inatll
## Configure PBS by executing the post-install script.
sudo /opt/pbs/libexec/pbs_postinstall
##  Edit /etc/pbs.conf to configure the PBS services that should be started. If you are installing PBS on only one system, you should change the value of PBS_START_MOM from zero to one. If you use vi as your editor, you would run:
sudo vi /etc/pbs.conf
## Some file permissions must be modified to add SUID privilege.
sudo chmod 4755 /opt/pbs/sbin/pbs_iff /opt/pbs/sbin/pbs_rcp
## Start the PBS services.
 sudo /etc/init.d/pbs start
## All configured PBS services should now be running. Update your PATH and MANPATH variables by sourcing the appropriate PBS profile or logging out and back in.
##For Bourne shell (or similar) run the following:
. /etc/profile.d/pbs.sh
```
:::warning
:fire:阿然後咧
:::
* 以上是github開源碼弄得 阿不知道要怎麼弄了 網路上一時找不到 之後再說

### RPM下載
另一個方法是用rpm下載 有些版本才有
* 版本列表 https://github.com/openpbs/openpbs/tags
以下為19.1.3版本
```shell=
## prerequsite
yum install -y gcc make rpm-build libtool hwloc-devel \
      libX11-devel libXt-devel libedit-devel libical-devel \
      ncurses-devel perl postgresql-devel postgresql-contrib python3-devel tcl-devel \
      tk-devel swig expat-devel openssl-devel libXext libXft \
      autoconf automake gcc-c++
yum install -y expat libedit postgresql-server postgresql-contrib python3 \
      sendmail sudo tcl tk libical
yum install -y gzip perl-Env perl-Switch
## install
cd /
wget https://github.com/openpbs/openpbs/releases/download/v19.1.3/pbspro_19.1.3.centos_7.zip
unzip pbspro_19.1.3.centos_7
cd pbspro_19.1.3.centos_7
```
因為[4/7遇到的問題](https://hackmd.io/magj1td5RIiqhGLoK2avmw#48)所以要改psql的版本
* server 台
```shell=
rpm -ivh pbspro-server-19.1.3-0.x86_64.rpm
## 因為要把client也裝在server底下(盡量不要這麼做)
rpm -ivh -nodeps pbspro-client-19.1.3-0.x86_64.rpm
vim /etc/pbs.conf
## 通常沒啥需要改的 除了PBS_START_MOM之外的PBS_START_*都要是 1

## 阿因為遇到一些問題 要改psql的版本(參照上面連結)
## 找現有版本
rpm -qa | grep postgresql
## 把 9.2.24刪掉
## 下載 PostgreSQL 套件庫資訊
wget https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm

## 安裝 PostgreSQL 套件庫資訊與 EPEL
sudo yum install pgdg-redhat-repo-latest.noarch.rpm epel-release
## 安裝指定版本的 PostgreSQL
sudo yum install postgresql11-server postgresql11-contrib
## 初始化 PostgreSQL 資料庫
sudo /usr/pgsql-11/bin/postgresql-11-setup initdb
## 啟動 PostgreSQL 伺服器
sudo systemctl start postgresql-11
sudo systemctl enable postgresql-11
## 在bashrc或/etc/profile 中
export PATH=$PATH:/usr/pgsql-11/bin
## 初始化PBS
/opt/pbs/libexec/pbs_habitat
## 開啟PBS
/etc/init.d/pbs start
```

* worker台
```shell=
rpm -ivh pbspro-execution-19.1.3-0.x86_64.rpm
vim /etc/pbs.conf
## 將「CHANGE_THIS_TO_PBS_PRO_SERVER_HOSTNAME」取代為「pbsmaster」這個Hostname。\
#另外也請注意，裡面的PBS_START_MOM要是 1，其他的PBS_START_* 都應該要是 0。
vim /var/spool/pbs/mom_priv/config
## 將「CHANGE_THIS_TO_PBS_PRO_SERVER_HOSTNAME」取代為「pbsmaster」這個Hostname
## 開啟PBS
service pbs start
```
* 完成上述動作後 回到server台
```shell=
## 加入worker節點
qmgr -c "create node A01"
qmgr -c "create node A03"
qmgr -c "create node A04"
## 確認節點
pbsnodes -a
```
至此 服務已可使用


## PBS指令
[Cluster Leopard 2.0](http://leopard.mcl.math.ncu.edu.tw/usage.xhtml)
[PBS 排程及COMET Linux 系統架構](http://web.che.ntu.edu.tw/stlin/freshman/pbsskill)
[PBS 命令与使用 - Unix/Linux - 一夜孤城](http://www.360doc.com/content/11/0225/09/4182758_95923140.shtml)
[PBS Professional Quick Start Guide](https://www.cerit-sc.cz/media/2107982/tahak-pbs-pro-small.pdf)
[Commonly Used PBS Commands](https://www.nas.nasa.gov/hecc/support/kb/commonly-used-pbs-commands_174.html)
主要有在用的是
* qsub:To submit a batch job to the specified queue using a script(提交作業腳本)
    * `qsub -I -q queue_name -l resource_list `看有多少resource-interactive job
* qstat:To display queue information(查詢作業狀態)
* qdel:To delete (cancel) a job(刪除已提交的作業)
* pbsnodes可查看nodes狀態

## PBS script
取自國網中心台灣杉1號中文使用手冊
```
# 指定資源種類與數量 
#PBS -l <resource name>=<value>
# 指定job名稱 (選擇性)
#PBS -N <job name>
# 指定queue名稱
#PBS –q <destination queue>
# 指定計畫名稱
#PBS –P <project name>
# 合併std-err 與std-out (選擇性)
#PBS -j eo
```
EX:
```
# 序列job (1核心)
#PBS -l select=1:ncpus=1
# MPI job (2 節點、每節點 8 個處理器)
#PBS -l select=2:ncpus=8:mpiprocs=8
# 結合MPI和OpenMP的job (2 MPI 與 16 threads)
#PBS -l select=2:ncpus=8:mpiprocs=1:ompthreads=8
# 結合MPI和CUDA的job (每節點 4 GPUs)
#PBS -l select=2:ncpus=40:ngpus=4:mpiprocs=4
# 計算時間為1小時
#PBS -l walltime=1:00:00
```
PBS_O_WORKDIR
    The absolute path of the current working directory of the qsub utility process.

## 出現問題時
`tracejob [job ID]`
會顯示出log，像下方
![](https://i.imgur.com/Q5OYKyQ.png)
或是使用
`qstat -as job_id`

* 可以去worker pbs裡面的MOM LOG查詢計算時的出錯事項
```
cd /var/spool/pbs/mom_logs
```
* 未傳送的可以看 在
```
/var/spool/pbs/undelivered
```
PBS用scp傳送檔案時 需要在user SSH免密碼的情況以及檔案目錄可供客戶端讀寫
```
chmod 777 /pwd
```
## 成功檔案
   


## 自己下載來用小結
如果用我們超電A01~A04這樣的話 檔案共享用NFS 排程用PBS 在沒用容器下就是A02叫A01、A03、A04來算(A02為server 其他為worker) A02給worker執行檔 各worker用自己的mpirun執行A02給的程式執行檔(其他worker要吃到執行檔的路徑，方法為放在NFS共享資料夾下面)或是export PATH

推測容器平行計算加PBS的方法為 每個worker module load singularity和mpi 開啟client給的SIF檔 然後對client的input檔(容器外)做client要求的資源分配計算 程式執行檔在worker自己的容器內 ~~mpi理論上也是~~
* 事實上MPI可內外溝通 所以版本要一樣

## [PBS總結懶人包](https://truth.bahamut.com.tw/s01/202009/0d896d7b3b965d48d674e65eeccc27cb.JPG)

## PBS上屆的nemo.pbs
```shell=
#!/bin/bash
#PBS -N nemo-s12
#PBS -q normal
#PBS -l select=32:ncpus=24:mpiprocs=23:ompthreads=1:mem=96gb
#PBS -l walltime=10:00
#PBS -P 50000042
#PBS -j oe
#PBS -o pbsout.txt

cd $PBS_O_WORKDIR

module purge
module load singularity/3.4.0 boost/1.71.0/intel19/parallel

nn_GYRE=25
ln_bench=true
jpkglo=31

# We set the domain decomposition to 32*23 because
# NEMO not allow the 32*24 decomposition
jpni=32
jpnj=23
ln_timing=ture

node=32
ncpu=23
hostfile=hostfile

SINRUN=`which singularity`
SINPAR="exec --no-home -w --pwd /opt/NEMO/cfgs/gyre_pisces_test/EXP00/"
SANDBOX=nemo

NEMO_BIN_PATH=$SANDBOX/opt/NEMO/cfgs/gyre_pisces_test/EXP00
MPIRUN=`which mpirun`
MPIPAR="-genv FI_TCP_IFACE=mlx5_0:1 -ppn 23"

RESDIR=latest-test

echo "======================================================================================"
echo "nn_GYRE = $nn_GYRE"
echo "ln_bench = $ln_bench"
echo "jpkglo = $jpkglo"
echo "--------------------------------------------"
echo "jpni = $jpni"
echo "jpnj = $jpnj"
echo "ln_timing = $ln_timing"
echo "--------------------------------------------"
echo "node = $node"
echo "ncpu = $ncpu"
echo "hostfile = $hostfile"
echo "--------------------------------------------"
echo "SINRUN = $SINRUN"
echo "SINPAR = $SINPAR"
echo "SANDBOX = $SANDBOX"
echo "--------------------------------------------"
echo "NEMO_BIN_PATH = $NEMO_BIN_PATH"
echo "MPIRUN = $MPIRUN"
echo "MPIPAR = $MPIPAR"
echo "--------------------------------------------"
echo "RESDIR = $RESDIR"
echo "======================================================================================"

# generate the hostfile for mpi
rm -f $hostfile
for (( i=1; i<=$node; i=i+1 ))
do
        S[${i}]=`sed -n "$(($i*$ncpu))p" $PBS_NODEFILE`
        echo "${S[$i]}" >> $hostfile
done

#modify the configuration to namelist
sed -i "38c \  nn_GYRE     =    $nn_GYRE     !  GYRE resolution [1/degrees]" $NEMO_BIN_PATH/namelist_cfg
sed -i "39c \  ln_bench    = .$ln_bench.   !  ! =T benchmark with gyre: the gridsize is kept constant" $NEMO_BIN_PATH/namelist_cfg
sed -i "40c \  jpkglo      =    $jpkglo     !  number of model levels" $NEMO_BIN_PATH/namelist_cfg
sed -i "1301c \   jpni        =   $jpni       !  jpni   number of processors following i (set automatically if < 1)" $NEMO_BIN_PATH/namelist_ref
sed -i "1302c \   jpnj        =   $jpnj       !  jpnj   number of processors following j (set automatically if < 1)" $NEMO_BIN_PATH/namelist_ref
sed -i "1326c \   ln_timing   = .$ln_timing.   !  timing by routine write out in timing.output file" $NEMO_BIN_PATH/namelist_ref

#run NEMO
time -p $MPIRUN $MPIPAR $SINRUN $SINPAR $SANDBOX ./nemo

#move the output
cd $NEMO_BIN_PATH
mkdir -p $PBS_O_WORKDIR/result/$RESDI
mv ocean.output communication_report.txt layout.dat output.namelist.* timing.output *.nc $PBS_O_WORKDIR/result/$RESDIR
cp namelist_* $PBS_O_WORKDIR/result/$RESDIR
```

## 互動式job
```
qsub -X -I -l select=1:ncpus=5:ngpus=1 -l walltime=01:00:00 -q dgx -A install -P 50000033 
```