HPL筆記

環境:RTX3060 ubuntu24.04 memory4GB 2cores

要關上 Nouveau 驅動

安裝nvidia驅動

先安裝gcc和build-essential

查詢推薦的driver

ubuntu-drivers devices

並以推薦的driver去做安裝這邊選用

sudo apt install nvidia-driver-535

下載cuda

安裝cuda toolkit選用runfile可同時安裝cuda以及nvidia driver也可選定安裝特定檔案,通常只選用cuda不裝driver,driver另外裝

這邊版本選擇cuda toolkit 12.8

wget https://developer.download.nvidia.com/compute/cuda/12.8.1/local_installers/cuda_12.8.1_570.124.06_linux.run
sudo sh cuda_12.8.1_570.124.06_linux.run

設置環境變數

echo 'export PATH=/usr/local/cuda-12.8/bin${PATH:+:${PATH}}' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.8/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc
source ~/.bashrc

測試是否安裝成功

nvcc --version

下載 Linpack

Link : https://developer.nvidia.com/rdp/assets/cuda-accelerated-linpack-linux64
從上面的連結,登入CUDA註冊開發者會員,下載linpack for Linux64版本,這裡下載到的版本為hpl-2.0_FERMI_v15.tgz。

下載 Intel® oneAPI HPC Toolkit

link : https://software.intel.com/en-us/qualify-for-free-software
按右上角搜尋 HPC Toolkit

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

選第一個

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

選擇 Linux Offline Installer,點右下角的 Continue as a Guest

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

下載完成後

sh intel-oneapi-hpc-toolkit-2025.1.0.666_offline.sh 

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

接著設定環境變數

/home/hpc/intel/oneapi
source setvars.sh

安裝 mpich2

wget http://www.mpich.org/static/downloads/3.2.1/mpich-3.2.1.tar.gz

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

tar zxvf mpich-3.2.1.tar.gz
cd mpich-3.2.1

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

./configure -prefix=/home/<username>/mpich --disable-fortran

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

make
make install

打開環境變數檔案

vim /etc/environment

裡面加上

PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/local/cuda-9.2/bin:/home/<username>/mpich/bin"

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

安裝openmpi

我下載最新版本

wget https://download.open-mpi.org/release/open-mpi/v5.0/openmpi-5.0.7.tar.gz

解壓縮他

tar zxvf openmpi-5.0.7.tar.gz
cd openmpi-5.0.7
./configure --prefix=/opt/openmpi

編譯

make
make install

配置環境變數

PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/local/cuda-9.2/bin:/home/<username>/mpich/bin"

更改ldconfig

cd /etc/ld.so.conf.d/
sudo vim hpl.conf

輸入內容

/usr/local/cuda-9.2/lib64
/lib
/opt/intel/mkl/lib/intel64
/opt/intel/lib/intel64
/home/ubuntu/hpl/src/cuda
  • 最後一行的ubuntu改成自己的user name
    執行並檢測
sudo ldconfig
sudo ldconfig -v | grep cuda

開始編譯Linpack benchmark for CUDA

建立 hpl 資料夾

mkdir -p ~/hpl
tar -xvf hpl-2.0_FERMI_v15.tgz -C ~/hpl

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

cd ~/hpl/hpl-2.0_FERMI_v15
ls

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

編輯 Make.CUDA

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

make arch=CUDA

修改~/hpl/bin/CUDA/run_linpack中的HPL_DIR為你hpl的路徑

HPL_DIR=/home/hpc/hpl/hpl-2.0_FERMI_v15