Try   HackMD

HPL - Fermi_v15

  • CPU : 4cores
  • RAM : 4GB
  • GPU : RTX3060
    -sudo apt install build-essential hwloc libhwloc-dev libevent-dev gfortran

1.安裝nvidia driver

1.1 下載runfile (.run)

找對應的顯卡版本nvidia官網
複製連結後wget

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

1.2 關閉 nouveau

sudo vim /etc/modprobe.d/blacklist-nouveau.conf

更改config

blacklist nouveau
options nouveau modeset=0

更新kernal並重開機

sudo update-initramfs -u
sudo reboot

檢查是否成功關閉

lsmod | grep nouveau

1.3進行安裝

sudo apt-get install linux-headers-$(uname -r)
// 安裝目前的kernal版本對應的標頭檔

sudo sh "剛下載的檔案.run" --no-cc-version-check

(no-cc-version-check) : 忽略 compiler 版本差異檢查,否則驅動程式會因為 GCC 版本不一致而拒絕安裝。

可供下載的兩種選項

  1. NVIDIA Proprietary

    • 封閉原始碼(Proprietary)
    • 使用 NVIDIA 官方編寫、最佳化的核心模組
    • 效能、穩定性最佳,支援功能最多(如 CUDA、Optimus、VDPAU)

    但屬於 NVIDIA 專有授權,不開放原始碼

  2. MIT/GPL

    • 這是 開放原始碼(Open Kernel Module, OKM)版本
    • 採用 MIT/GPL 授權,NVIDIA 近年開始逐步開源核心模組
    • 更容易整合到開源 Linux 發行版中(如 Fedora、Arch)

    目前尚未完全等同 proprietary 版本,在某些 GPU 上 功能不完整或效能略差

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

=>選擇 NVIDIA Proprietary

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

系統偵測到apt install 的驅動,可能會與runfile的安裝衝突,若是確定不用apt就按左邊繼續安裝。

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

系統警告不支援32bit(i386架構)

只使用64bit程式(HPL、CUDA)則沒有影響

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

不安裝EGL,只使用CUD/HPL等可以忽略這個警告
若有需要可以安裝完套件後重新執行.run

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

要不要執行nvidia-xconfig來自動修改 X server 設定檔/etc/X11/xorg.conf

  • 系統開機後使用NVIDIA驅動作為圖形顯示預設顯示卡

由於不需要使用圖形介面,只有拿來跑HPL,所以選No

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

cuda可支援到12.8

2.安裝 CUDA toolkit

https://developer.nvidia.com/cuda-toolkit-archive

嘗試安裝cuda12.8.0和12.5.0

2.1 安裝 cuda

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

  • 取消安裝driver(因為前面已經裝過了)

    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

  • 成功安裝12.8.0

    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

  • 設定環境變數

    ​​​​export CUDA_HOME=/usr/local/cuda-12.8
    ​​​​export PATH=$CUDA_HOM
    ​​​​E/bin:$PATH
    ​​​​export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
    

    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

    不是沒安裝,只是沒設定環境變數。

    Image Not Showing Possible Reasons
    • The image was uploaded to a note which you don't have access to
    • The note which the image was originally uploaded to has been deleted
    Learn More →

  • 用同樣的方法安裝另一個cuda toolkit(我裝12.5)

2.2 設定symlink以在多個版本間切換

symlink是一個符號連結,就像「指標」,指向真正的資料夾

sudo ln -s /usr/local/cuda-12.8 /usr/local/cuda
lrwxrwxrwx 1 root root 20 May 23 20:22 /usr/local/cuda -> /usr/local/cuda-12.5

則/usr/local/cuda會「代表」/usr/local/cuda-12.5
可以透過以下指令更改symlink

//移除symlink
sudo rm /usr/local/cuda
//建立新的symlink
sudo ln -s /usr/local/cuda-12.8 /usr/local/cuda

修改原本的~/.bashrc以使用symlink

export CUDA_HOME=/usr/local/cuda
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

這時只要更改symlink就可以使用不同版本cuda,而不需要改環境變數

2.3 建立指令腳本快速切換

  1. 在家目錄建立一個名為 cuda-switch 的檔案:

    ​​​​nano ~/cuda-switch
    
  2. ​​​​#!/bin/bash
    
    ​​​​if [ -z "$1" ]; then
    ​​​​  echo "Usage: cuda-switch <version> (e.g., 12.5)"
    ​​​​  exit 1
    ​​​​fi
    
    ​​​​CUDA_VERSION=$1
    ​​​​CUDA_PATH="/usr/local/cuda-$CUDA_VERSION"
    
    ​​​​if [ ! -d "$CUDA_PATH" ]; then
    ​​​​  echo "Error: CUDA version $CUDA_VERSION not found at $CUDA_PATH"
    ​​​​  exit 2
    ​​​​fi
    
    ​​​​# Remove old symlink
    ​​​​sudo rm -f /usr/local/cuda
    
    ​​​​# Create new symlink
    ​​​​sudo ln -s $CUDA_PATH /usr/local/cuda
    
    ​​​​echo "Switched /usr/local/cuda to $CUDA_PATH"
    
    ​​​​# Optional: suggest user to reload shell
    ​​​​echo "Tip: Run 'source ~/.bashrc' if CUDA_HOME is set in bashrc."
    
    

    現在可以透過執行cuda-switch改變cuda版本

    image

  3. 把剛剛寫的腳本加到PATH

    ​​​​mv ~/cuda-switch ~/.local/bin/
    

    加到PATH(~/.bashrc)

    ​​​​export PATH=$HOME/.local/bin:$PATH
    

    現在可以在任何目錄下切換版本

    image

3. SLURM

Ubuntu 安裝 Slurm (使用 apt)

3.1 在所有節點安裝Slurm package

3.2 在所有節點建立設定檔

  • cgroup.conf (資源隔離機制設定檔)
  • slurm.conf (用configurator.html產生)

4. Environment Modules

這個也要多節點

5. 開始編譯HPL Fermi V15

安裝的東西很多都裝在/opt,有用叢集的話可以NFS共用

5.0.1 NFS共用/opt

在exports中 :
/opt mpi-n1(rw,sync,no_subtree_check)
需要改成
/opt mpi-n1(rw,sync,no_subtree_check,no_root_squash)

在NFS預設會啟用 root squash,也就是 client 端的 root 權限會被映射成 nobody/nogroup,導致你即使用 root 也無法寫入。

而/opt目錄預設需要root權限才能寫入

把/opt改成所有人都可以寫入 sudo chmod 777 /opt

5.1 裝Intel MKL

https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-download.html?operatingsystem=linux&linux-install=offline

image

wget https://registrationcenter-download.intel.com/akdlm/IRC_NAS/dc93af13-2b3f-40c3-a41b-2bc05a707a80/intel-onemkl-2025.1.0.803_offline.sh

sudo sh ./intel-onemkl-2025.1.0.803_offline.sh --cli

會安裝在/opt/intel

image

設定環境變數

echo "source /opt/intel/oneapi/setvars.sh" >> ~/.bashrc
source ~/.bashrc

(叢集的話好像要連結動態連結庫)

安裝完成:(setvars.sh是Intel OneAPI附帶的設定環境的script)

image

5.2 裝OpenMPI(5.0.7)

下載壓縮檔並解壓縮

wget https://download.open-mpi.org/release/open-mpi/v5.0/openmpi-5.0.7.tar.gz
tar zxf openmpi-4.0.7.tar.gz

編譯並指定安裝路徑到/opt/openmpi

cd openmpi-4.0.7
./configure --prefix=/opt/openmpi
make
sudo make install

編譯過程可以優化(但目前沒做)

image
設定環境變數

echo "PATH=/opt/openmpi/bin:$PATH" >> ~/.bashrc
source ~/.bashrc
mpirun --version

image

5.3 下載 HPL

官網登入後 :
https://developer.nvidia.com/rdp/assets/cuda-accelerated-linpack-linux64

tar zxf hpl-2.0_FERMI_v15.tgz
cd hpl-2.0_FERMI_v15
vim Make.CUDA

更改Make.CUDA

TOPdir = /home/anson/hpl-2.0_FERMI_v15
LAdir = /opt/intel/oneapi/mkl/latest/lib/intel64
LAinc = -I/opt/intel/oneapi/mkl/latest/include

LAlib        = -L $(TOPdir)/src/cuda  -ldgemm -L/usr/local/cuda/lib64 -lcuda -lcudart -lcublas -L$(LAdir) -L/opt/intel/oneapi/mkl/latest/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -L/opt/intel/oneapi/compiler/2025.1/lib

CC = /opt/openmpi/bin/mpicc
LINKFLAGS = $(CCFLAGS) -Wl,-rpath $(TOPdir)src/cuda:/opt/intel/oneapi/mkl/2023.2.0/lib/intel64:/opt/openmpi/lib:/opt/intel/oneapi/compiler/2023.2.0/linux/compiler/lib/intel64_lin

由於HPL2.0不支援OpenMPI3.0以上之版本,有一些原始碼需要更改

vim src/comm/HPL_packL.c

第172、186、200行 MPI_Address :arrow_right: MPI_Get_address

第211行 MPI_Type_struct :arrow_right: MPI_Type_create_struct

這是錯的,address小寫

image

新增一個symbolic link

sudo ln -s /opt/intel/oneapi/compiler/latest/linux/compiler/lib/intel64_lin/libiomp5.so /opt/intel/oneapi/mkl/latest/lib/intel64/libiomp5.so

開始編譯

make arch=CUDA

image

設定一個能用的HPL.dat

18000  Ns  # line 6
512    NB  # line 8
1      Ps  # line 11
1      Qs  # line 12
0      U   # line 29

修改bin/CUDA/run_linpack

vim bin/CUDA/run_linpack
HPL_DIR=/home/anson/hpl-2.0_FERMI_v15  # line 4
/opt/openmpi/bin/mpirun -np 1 $HPL_DIR/bin/CUDA/xhpl  # line 27

把hpl-2.0_FERMI_v15加入LD_LIBRARY_PATH

echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/work/hpl-2.0_FERMI_v15/src/cuda" >> ~/.bashrc
source ~/.bashrc

測試執行

./run_linpack

image

5.4 編譯cuda 12.5

先把原本的複製一份

cd ~
cp -r hpl-2.0_FERMI_v15 hpl-2.0_FERMI_v15_cuda12.5

Make.CUDA會依照CUDA_HOME去找,在前面的cuda-switch有改過了
所以直接

make arch=CUDA

應該就會以switch後的版本編譯

並沒有
TOPdir沒改所以我根本只是把本來的重新編譯一遍

還是要改Make.CUDA
TOPdir = /work/hpl-2.0_FERMI_v15
:arrow_right: TOPdir = /work/hpl-2.0_FERMI_v15_cuda12.5

更改run_linpack,以讓執行run_linpack時先切換版本

#開頭加上
cuda-switch 12.5
source ~/.bashrc
檢查cuda版本

ldd bin/CUDA/xhpl | grep dgemm
應該會輸出:

libdgemm.so.1 => /home/anson/hpl-2.0_FERMI_v15_cuda12.5/lib/CUDA/libdgemm.so.1 (0x...)

其實好像還是沒辦法確定,但編號不一樣?

image
image

結論:用symlink會難以追蹤版本(因為紀錄裡寫的都是 /usr/local/cuda),所以應該編譯時要用實際路徑。
(又或是有其他方法?)

6. 設定HPL.dat

Ns : N^2 * 8 = 0.8*Mem

目前Ns跑不了19000

cuda 12.8

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR10L2L2       18000   512     1     1              52.36              7.426e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0053855 ...... PASSED
================================================================================

2.(這組參數作為對cuda12.5的對比)

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR10L2L2       17000   512     1     1              45.80              7.153e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0054069 ...... PASSED
================================================================================

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR10L2L2       18000   256     1     1              61.76              6.296e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0047616 ...... PASSED
================================================================================

18000  Ns  # line 6
1024    NB  # line 8
1      Ps  # line 11
1      Qs  # line 12
0      U   # line 29
================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR10L2L2       17000  1024     1     1              45.08              7.266e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0046761 ...... PASSED
================================================================================

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR10L2L2       18000   128     1     1              85.30              4.559e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0049418 ...... PASSED
================================================================================

cuda 12.5

1.看來cuda 12.8應該好一點

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR10L2L2       17000   512     1     1              48.45              6.762e+01
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0051953 ...... PASSED
================================================================================

Profiling HPL