HPL-2.0_FERMI_v15建制過程by兔子跟著月亮走

# 裝置 CPU: 13th Gen Intel(R) Core(TM) i9-13900KF GPU: RTX3090 x 2(without nvlink) RAM: 128GB OS: Ubuntu 22.04 CUDA: 11.8 Nvidia driver: 535.183.01 目前只有測試單節點 # 1. Install OpenMPI 4.1.2 ```bash! sudo apt install build-essential gfortran wget -y ``` ```bash! mkdir hpl_test cd hpl_test/ wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.2.tar.gz tar xvf openmpi-4.1.2.tar.gz cd openmpi-4.1.2/ ./configure --prefix=/home/ntcucsk201/hpl_test/openmpi --with-cuda=/usr/local/cuda-11.8 CC=gcc CXX=g++ FC=gfortran make -j8 make check make install vim ~/.bashrc # export PATH=$PATH:/home/ntcucsk201/hpl_test/openmpi/bin # export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/ntcucsk201/hpl_test/openmpi/lib source ~/.bashrc mpirun --version ``` # 2. Install IntelMKL 2025.1.0.803 ``` wget https://registrationcenter-download.intel.com/akdlm/IRC_NAS/dc93af13-2b3f-40c3-a41b-2bc05a707a80/intel-onemkl-2025.1.0.803_offline.sh mkdir intelMKL mv intel-onemkl-2025.1.0.803_offline.sh intelMKL/ cd intelMKL/ sudo sh ./intel-onemkl-2025.1.0.803_offline.sh ``` # 3. Install HPL https://developer.nvidia.com/computeworks-developer-exclusive-downloads tar zxf hpl-2.0_FERMI_v15.tgz cd hpl-2.0_FERMI_v15 # 4. HPL setup ```bash # Make.CUDA TOPdir = /home/ntcucsk201/hpl_test/hpl-2.0_FERMI_v15 LAdir = /opt/intel/oneapi/mkl/latest/lib/intel64 LAinc = -I/opt/intel/oneapi/mkl/latest/include LAlib = -L $(TOPdir)/src/cuda -ldgemm -L/usr/local/cuda/lib64 -lcuda -lcudart -lcublas -L$(LAdir) -L/opt/intel/oneapi/mkl/latest/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -L/opt/intel/oneapi/compiler/2025.1/lib CC = /home/ntcucsk201/hpl_test/openmpi/bin/mpicc LINKFLAGS = $(CCFLAGS) -Wl,-rpath $(TOPdir)/src/cuda:/opt/intel/oneapi/mkl/latest/lib/intel64:/home/ntcucsk201/hpl_test/openmpi/lib:/opt/intel/oneapi/compiler/2025.1/lib ``` ```bash # src/comm/HPL_packL.c # 當使用openmpi 3.0以上需更改把MPI_Type_struct替換成MPI_Type_create_struct 把MPI_Address替換成MPI_Get_address ``` ```bash export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/ntcucsk201/hpl_test/hpl-2.0_FERMI_v15/src/cuda source ~/.bashrc ``` ```bash make arch=CUDA cd bin/CUDA/ ./run_linpack ``` # Result ``` ================================================================================ T/V N NB P Q Time Gflops -------------------------------------------------------------------------------- WR10L2L2 80000 1024 1 2 333.06 1.025e+03 -------------------------------------------------------------------------------- ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0040085 ...... PASSED ================================================================================ ``` Reference: 1. https://hackmd.io/@TWTom/HPL_cluster#%E8%A3%9DHPL-GPU 2. https://docs.google.com/presentation/d/1CJLDX9UxH3Yw8AlrNPwtbgNetnwVC6CdtLPcH7HP1E0/edit?slide=id.g274b5a25b78_0_53#slide=id.g274b5a25b78_0_53