# PBS ## [pbs絕對正確的教學,保證一看就會,手把手教學](https://memes.tw/user-gif/a2cc82a36c9dd92e62a356de9d17901d.gif) ## 介紹 PBS為cluster job調節系統-> 目前有三個分支: openPBS、PBS pro、Torque 基本須知:PBS只做工作調節而已,所以必須在所有環境已經設置的情況下,他會去調用各個伺服器的資源,worker之間早已免密碼 PBS架構from[PBS 排程及COMET Linux 系統架構](http://web.che.ntu.edu.tw/stlin/freshman/pbsskill)  另一圖from[openpbs centos 7 配置安装手把手教程](http://thisis.yorven.site/blog/index.php/2020/12/06/openpbs-install-instructions/)  ## openpbs ### download from github https://github.com/openpbs/openpbs `git clone https://github.com/openpbs/openpbs` For centos-7 ```shell= ## Install the prerequisite packages for building PBS. yum install -y gcc make rpm-build libtool hwloc-devel \ libX11-devel libXt-devel libedit-devel libical-devel \ ncurses-devel perl postgresql-devel postgresql-contrib python3-devel tcl-devel \ tk-devel swig expat-devel openssl-devel libXext libXft \ autoconf automake gcc-c++ ## Install the prerequisite packages for running PBS. In addition to the commands below, you should also install a text editor of your choosing (vim, emacs, gedit, etc.). yum install -y expat libedit postgresql-server postgresql-contrib python3 \ sendmail sudo tcl tk libical ## 進入資料夾 cd openpbs ## Generate the configure script and Makefiles. ./autogen.sh ## Configure the build for your environment. You may utilize theparameters displayed in the previous step:./configure --help. ./configure --prefix=/opt/pbs ## Build PBS by running "make". make ##Install PBS. Use sudo to run the command as root. make inatll ## Configure PBS by executing the post-install script. sudo /opt/pbs/libexec/pbs_postinstall ## Edit /etc/pbs.conf to configure the PBS services that should be started. If you are installing PBS on only one system, you should change the value of PBS_START_MOM from zero to one. If you use vi as your editor, you would run: sudo vi /etc/pbs.conf ## Some file permissions must be modified to add SUID privilege. sudo chmod 4755 /opt/pbs/sbin/pbs_iff /opt/pbs/sbin/pbs_rcp ## Start the PBS services. sudo /etc/init.d/pbs start ## All configured PBS services should now be running. Update your PATH and MANPATH variables by sourcing the appropriate PBS profile or logging out and back in. ##For Bourne shell (or similar) run the following: . /etc/profile.d/pbs.sh ``` :::warning :fire:阿然後咧 ::: * 以上是github開源碼弄得 阿不知道要怎麼弄了 網路上一時找不到 之後再說 ### RPM下載 另一個方法是用rpm下載 有些版本才有 * 版本列表 https://github.com/openpbs/openpbs/tags 以下為19.1.3版本 ```shell= ## prerequsite yum install -y gcc make rpm-build libtool hwloc-devel \ libX11-devel libXt-devel libedit-devel libical-devel \ ncurses-devel perl postgresql-devel postgresql-contrib python3-devel tcl-devel \ tk-devel swig expat-devel openssl-devel libXext libXft \ autoconf automake gcc-c++ yum install -y expat libedit postgresql-server postgresql-contrib python3 \ sendmail sudo tcl tk libical yum install -y gzip perl-Env perl-Switch ## install cd / wget https://github.com/openpbs/openpbs/releases/download/v19.1.3/pbspro_19.1.3.centos_7.zip unzip pbspro_19.1.3.centos_7 cd pbspro_19.1.3.centos_7 ``` 因為[4/7遇到的問題](https://hackmd.io/magj1td5RIiqhGLoK2avmw#48)所以要改psql的版本 * server 台 ```shell= rpm -ivh pbspro-server-19.1.3-0.x86_64.rpm ## 因為要把client也裝在server底下(盡量不要這麼做) rpm -ivh -nodeps pbspro-client-19.1.3-0.x86_64.rpm vim /etc/pbs.conf ## 通常沒啥需要改的 除了PBS_START_MOM之外的PBS_START_*都要是 1 ## 阿因為遇到一些問題 要改psql的版本(參照上面連結) ## 找現有版本 rpm -qa | grep postgresql ## 把 9.2.24刪掉 ## 下載 PostgreSQL 套件庫資訊 wget https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm ## 安裝 PostgreSQL 套件庫資訊與 EPEL sudo yum install pgdg-redhat-repo-latest.noarch.rpm epel-release ## 安裝指定版本的 PostgreSQL sudo yum install postgresql11-server postgresql11-contrib ## 初始化 PostgreSQL 資料庫 sudo /usr/pgsql-11/bin/postgresql-11-setup initdb ## 啟動 PostgreSQL 伺服器 sudo systemctl start postgresql-11 sudo systemctl enable postgresql-11 ## 在bashrc或/etc/profile 中 export PATH=$PATH:/usr/pgsql-11/bin ## 初始化PBS /opt/pbs/libexec/pbs_habitat ## 開啟PBS /etc/init.d/pbs start ``` * worker台 ```shell= rpm -ivh pbspro-execution-19.1.3-0.x86_64.rpm vim /etc/pbs.conf ## 將「CHANGE_THIS_TO_PBS_PRO_SERVER_HOSTNAME」取代為「pbsmaster」這個Hostname。\ #另外也請注意,裡面的PBS_START_MOM要是 1,其他的PBS_START_* 都應該要是 0。 vim /var/spool/pbs/mom_priv/config ## 將「CHANGE_THIS_TO_PBS_PRO_SERVER_HOSTNAME」取代為「pbsmaster」這個Hostname ## 開啟PBS service pbs start ``` * 完成上述動作後 回到server台 ```shell= ## 加入worker節點 qmgr -c "create node A01" qmgr -c "create node A03" qmgr -c "create node A04" ## 確認節點 pbsnodes -a ``` 至此 服務已可使用 ## PBS指令 [Cluster Leopard 2.0](http://leopard.mcl.math.ncu.edu.tw/usage.xhtml) [PBS 排程及COMET Linux 系統架構](http://web.che.ntu.edu.tw/stlin/freshman/pbsskill) [PBS 命令与使用 - Unix/Linux - 一夜孤城](http://www.360doc.com/content/11/0225/09/4182758_95923140.shtml) [PBS Professional Quick Start Guide](https://www.cerit-sc.cz/media/2107982/tahak-pbs-pro-small.pdf) [Commonly Used PBS Commands](https://www.nas.nasa.gov/hecc/support/kb/commonly-used-pbs-commands_174.html) 主要有在用的是 * qsub:To submit a batch job to the specified queue using a script(提交作業腳本) * `qsub -I -q queue_name -l resource_list `看有多少resource-interactive job * qstat:To display queue information(查詢作業狀態) * qdel:To delete (cancel) a job(刪除已提交的作業) * pbsnodes可查看nodes狀態 ## PBS script 取自國網中心台灣杉1號中文使用手冊 ``` # 指定資源種類與數量 #PBS -l <resource name>=<value> # 指定job名稱 (選擇性) #PBS -N <job name> # 指定queue名稱 #PBS –q <destination queue> # 指定計畫名稱 #PBS –P <project name> # 合併std-err 與std-out (選擇性) #PBS -j eo ``` EX: ``` # 序列job (1核心) #PBS -l select=1:ncpus=1 # MPI job (2 節點、每節點 8 個處理器) #PBS -l select=2:ncpus=8:mpiprocs=8 # 結合MPI和OpenMP的job (2 MPI 與 16 threads) #PBS -l select=2:ncpus=8:mpiprocs=1:ompthreads=8 # 結合MPI和CUDA的job (每節點 4 GPUs) #PBS -l select=2:ncpus=40:ngpus=4:mpiprocs=4 # 計算時間為1小時 #PBS -l walltime=1:00:00 ``` PBS_O_WORKDIR The absolute path of the current working directory of the qsub utility process. ## 出現問題時 `tracejob [job ID]` 會顯示出log,像下方  或是使用 `qstat -as job_id` * 可以去worker pbs裡面的MOM LOG查詢計算時的出錯事項 ``` cd /var/spool/pbs/mom_logs ``` * 未傳送的可以看 在 ``` /var/spool/pbs/undelivered ``` PBS用scp傳送檔案時 需要在user SSH免密碼的情況以及檔案目錄可供客戶端讀寫 ``` chmod 777 /pwd ``` ## 成功檔案 ## 自己下載來用小結 如果用我們超電A01~A04這樣的話 檔案共享用NFS 排程用PBS 在沒用容器下就是A02叫A01、A03、A04來算(A02為server 其他為worker) A02給worker執行檔 各worker用自己的mpirun執行A02給的程式執行檔(其他worker要吃到執行檔的路徑,方法為放在NFS共享資料夾下面)或是export PATH 推測容器平行計算加PBS的方法為 每個worker module load singularity和mpi 開啟client給的SIF檔 然後對client的input檔(容器外)做client要求的資源分配計算 程式執行檔在worker自己的容器內 ~~mpi理論上也是~~ * 事實上MPI可內外溝通 所以版本要一樣 ## [PBS總結懶人包](https://truth.bahamut.com.tw/s01/202009/0d896d7b3b965d48d674e65eeccc27cb.JPG) ## PBS上屆的nemo.pbs ```shell= #!/bin/bash #PBS -N nemo-s12 #PBS -q normal #PBS -l select=32:ncpus=24:mpiprocs=23:ompthreads=1:mem=96gb #PBS -l walltime=10:00 #PBS -P 50000042 #PBS -j oe #PBS -o pbsout.txt cd $PBS_O_WORKDIR module purge module load singularity/3.4.0 boost/1.71.0/intel19/parallel nn_GYRE=25 ln_bench=true jpkglo=31 # We set the domain decomposition to 32*23 because # NEMO not allow the 32*24 decomposition jpni=32 jpnj=23 ln_timing=ture node=32 ncpu=23 hostfile=hostfile SINRUN=`which singularity` SINPAR="exec --no-home -w --pwd /opt/NEMO/cfgs/gyre_pisces_test/EXP00/" SANDBOX=nemo NEMO_BIN_PATH=$SANDBOX/opt/NEMO/cfgs/gyre_pisces_test/EXP00 MPIRUN=`which mpirun` MPIPAR="-genv FI_TCP_IFACE=mlx5_0:1 -ppn 23" RESDIR=latest-test echo "======================================================================================" echo "nn_GYRE = $nn_GYRE" echo "ln_bench = $ln_bench" echo "jpkglo = $jpkglo" echo "--------------------------------------------" echo "jpni = $jpni" echo "jpnj = $jpnj" echo "ln_timing = $ln_timing" echo "--------------------------------------------" echo "node = $node" echo "ncpu = $ncpu" echo "hostfile = $hostfile" echo "--------------------------------------------" echo "SINRUN = $SINRUN" echo "SINPAR = $SINPAR" echo "SANDBOX = $SANDBOX" echo "--------------------------------------------" echo "NEMO_BIN_PATH = $NEMO_BIN_PATH" echo "MPIRUN = $MPIRUN" echo "MPIPAR = $MPIPAR" echo "--------------------------------------------" echo "RESDIR = $RESDIR" echo "======================================================================================" # generate the hostfile for mpi rm -f $hostfile for (( i=1; i<=$node; i=i+1 )) do S[${i}]=`sed -n "$(($i*$ncpu))p" $PBS_NODEFILE` echo "${S[$i]}" >> $hostfile done #modify the configuration to namelist sed -i "38c \ nn_GYRE = $nn_GYRE ! GYRE resolution [1/degrees]" $NEMO_BIN_PATH/namelist_cfg sed -i "39c \ ln_bench = .$ln_bench. ! ! =T benchmark with gyre: the gridsize is kept constant" $NEMO_BIN_PATH/namelist_cfg sed -i "40c \ jpkglo = $jpkglo ! number of model levels" $NEMO_BIN_PATH/namelist_cfg sed -i "1301c \ jpni = $jpni ! jpni number of processors following i (set automatically if < 1)" $NEMO_BIN_PATH/namelist_ref sed -i "1302c \ jpnj = $jpnj ! jpnj number of processors following j (set automatically if < 1)" $NEMO_BIN_PATH/namelist_ref sed -i "1326c \ ln_timing = .$ln_timing. ! timing by routine write out in timing.output file" $NEMO_BIN_PATH/namelist_ref #run NEMO time -p $MPIRUN $MPIPAR $SINRUN $SINPAR $SANDBOX ./nemo #move the output cd $NEMO_BIN_PATH mkdir -p $PBS_O_WORKDIR/result/$RESDI mv ocean.output communication_report.txt layout.dat output.namelist.* timing.output *.nc $PBS_O_WORKDIR/result/$RESDIR cp namelist_* $PBS_O_WORKDIR/result/$RESDIR ``` ## 互動式job ``` qsub -X -I -l select=1:ncpus=5:ngpus=1 -l walltime=01:00:00 -q dgx -A install -P 50000033 ```
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up