Spack at HPCFS

# Spack at HPCFS Spack provides prebuild modules for all users on HPCFS. Specific module builds are listed below. We restric our builds to RHEL8.4 provided compiler GCC@8.5.0 in order to provide `haswell` and `rome` CPU compatibility (arch=zen) for system provided layer (SLURM, UCX, knem). Higher version GCC compilers are used on ROME with zen2 (avx2) compatibility across `rome` and `haswell` partitions, while AOCC compilers are intended just for AMD `rome` partition. ## User Spack development It is also possible to use modules locally for your own use by setting up the following configurations for local build and deployment. ~~~bash rm -rf ~/.spack mkdir ~/.spack sed -e '/root:/s,$spack,/work/$USER,' /opt/spack/etc/spack/defaults/config.yaml > ~/.spack/config.yaml sed -i -e '/source_cache:/s,:.*,: ~/.spack/cache,' ~/.spack/config.yaml sed -i -e 's/# build_jobs: 16/build_jobs: 32/' ~/.spack/config.yaml sed -e 's/# roots:/roots:/' -e 's,# lmod:.*, lmod: /work/$USER/opt/lmod,' -e 's,# tcl:.*, tcl: /work/$USER/opt/modules,' -e '/- tcl/a\ - lmod' /opt/spack/etc/spack/defaults/modules.yaml > ~/.spack/modules.yaml ~~~ Run source in active bash or add source line in your .bashrc profile file source /opt/spack/share/spack/setup-env.sh ## hwloc Hardware locality detects CUDA with the latest version since it is not used for computing but rather detection. Cuda provides OpenCL devices. NVML is provided by NVIDIA drivers and can't be part of hwloc since only few gpu nodes provides drivers and corresponding `libnvidia-ml.so.1` library. The rest of locality info are provided through ubus and pci. ~~~spack [spack@gpu02 ~]$ spack spec --install-status --long hwloc Input spec -------------------------------- - hwloc Concretized -------------------------------- - gg2gaiz hwloc@2.6.0%gcc@8.5.0~cairo+cuda~gl+libudev+libxml2~netloc~nvml+opencl+pci~rocm+shared arch=linux-almalinux8-zen [+] 5lbelaa ^cuda@11.5.1%gcc@8.5.0~dev arch=linux-almalinux8-zen ... ~~~ ## PMIX Process managment interface is used by SLURM and OpenMPI. To provide a chained compilation with input specification propagated by hash of the build (e.g. /gg2gaiz) ~~~spack [spack@gpu02 ~]$ spack spec -lI pmix@3.2.1 ^hwloc/gg2gaiz Input spec -------------------------------- - pmix@3.2.1 [+] ^hwloc@2.6.0%gcc@8.5.0~cairo+cuda~gl+libudev+libxml2~netloc~nvml+opencl+pci~rocm+shared arch=linux-almalinux8-zen [+] ^cuda@11.5.1%gcc@8.5.0~dev arch=linux-almalinux8-zen [+] ^libxml2@2.9.12%gcc@8.5.0~python arch=linux-almalinux8-zen [+] ^libiconv@1.16%gcc@8.5.0 libs=shared,static arch=linux-almalinux8-zen [+] ^xz@5.2.4%gcc@8.5.0~pic libs=shared,static arch=linux-almalinux8-zen [+] ^zlib@1.2.11%gcc@8.5.0+optimize+pic+shared arch=linux-almalinux8-zen [+] ^libpciaccess@0.16%gcc@8.5.0 arch=linux-almalinux8-zen [+] ^ncurses@6.1.20180224%gcc@8.5.0~symlinks+termlib abi=6 arch=linux-almalinux8-zen Concretized -------------------------------- - atirvkd pmix@3.2.1%gcc@8.5.0~docs+pmi_backwards_compatibility~restful arch=linux-almalinux8-zen [+] gg2gaiz ^hwloc@2.6.0%gcc@8.5.0~cairo+cuda~gl+libudev+libxml2~netloc~nvml+opencl+pci~rocm+shared arch=linux-almalinux8-zen [+] 5lbelaa ^cuda@11.5.1%gcc@8.5.0~dev arch=linux-almalinux8-zen ... ~~~ ## SLURM Slurm includes PMIX module with hwloc and PMI2 provided internally. ~~~spack [spack@gpu02 ~]$ spack spec -Il slurm@21-08-1-1%gcc@8.5.0+hwloc+pmix ^pmix/atirvkd Input spec -------------------------------- - slurm@21-08-1-1%gcc@8.5.0+hwloc+pmix [+] ^pmix@3.2.1%gcc@8.5.0~docs+pmi_backwards_compatibility~restful arch=linux-almalinux8-zen [+] ^hwloc@2.6.0%gcc@8.5.0~cairo+cuda~gl+libudev+libxml2~netloc~nvml+opencl+pci~rocm+shared arch=linux-almalinux8-zen [+] ^cuda@11.5.1%gcc@8.5.0~dev arch=linux-almalinux8-zen [+] ^libxml2@2.9.12%gcc@8.5.0~python arch=linux-almalinux8-zen [+] ^libiconv@1.16%gcc@8.5.0 libs=shared,static arch=linux-almalinux8-zen [+] ^xz@5.2.4%gcc@8.5.0~pic libs=shared,static arch=linux-almalinux8-zen [+] ^zlib@1.2.11%gcc@8.5.0+optimize+pic+shared arch=linux-almalinux8-zen [+] ^libpciaccess@0.16%gcc@8.5.0 arch=linux-almalinux8-zen [+] ^ncurses@6.1.20180224%gcc@8.5.0~symlinks+termlib abi=6 arch=linux-almalinux8-zen [+] ^libevent@2.1.8%gcc@8.5.0+openssl arch=linux-almalinux8-zen [+] ^openssl@1.1.1l%gcc@8.5.0~docs certs=system arch=linux-almalinux8-zen Concretized -------------------------------- - bpmt4tx slurm@21-08-1-1%gcc@8.5.0~gtk~hdf5+hwloc~mariadb+pmix+readline~restd sysconfdir=/etc/slurm arch=linux-almalinux8-zen ... ~~~ ## OpenMPI OpenMPI is build on top of SLURM, PMIX. UCX fabrics with knem is provided externally as part of Mellanox OFED HPC build. To sumarize these `.spack/packages.yml` ~~~yaml packages: lustre: externals: - spec: lustre@2.14.55 prefix: /usr buildable: false #... #... hwloc: compiler: [gcc@8.5.0] variants: +cuda+opencl+libudev^cuda@11.5.1 munge: compiler: [gcc@8.5.0] variants: localstatedir=/var slurm: compiler: [gcc@8.5.0] variants: +hwloc+pmix sysconfdir=/etc/slurm knem: externals: - spec: knem@1.1.4 prefix: /opt/knem-1.1.4.90mlnx1 ucx: externals: - spec: ucx@1.11.1+thread_multiple +cma +rc +ud +dc +mlx5-dv +ib-hw-tm +dm +cm +knem prefix: /usr openmpi: variants: +pmi+pmix+lustre fabrics=ucx,knem schedulers=slurm mpich: variants: +slurm pmi=pmix ~~~ For various compilers slurm concretization is applied with the above defaults. ~~~spack [spack@gpu02 apps]$ spack spec -lI openmpi%gcc@8.5.0 ^slurm/bpmt4tx Input spec -------------------------------- - openmpi%gcc@8.5.0 [+] ^slurm@21-08-1-1%gcc@8.5.0~gtk~hdf5+hwloc~mariadb+pmix+readline~restd sysconfdir=/etc/slurm arch=linux-almalinux8-zen ... Concretized -------------------------------- - nezrdtx openmpi@4.1.2%gcc@8.5.0~atomics~cuda~cxx~cxx_exceptions+gpfs~internal-hwloc~java~legacylaunchers+lustre~memchecker+pmi+pmix~singularity~sqlite3+ static~thread_multiple+vt+wrapper-rpath fabrics=knem,ucx schedulers=slurm arch=linux-almalinux8-zen [+] gg2gaiz ^hwloc@2.6.0%gcc@8.5.0~cairo+cuda~gl+libudev+libxml2~netloc~nvml+opencl+pci~rocm+shared arch=linux-almalinux8-zen [+] 5lbelaa ^cuda@11.5.1%gcc@8.5.0~dev arch=linux-almalinux8-zen ... [spack@gpu02 apps]$ cat >> $(spack location -i openmpi%gcc@8.5.0)/etc/openmpi-mca-params.conf << EOF btl=^openib opal_common_ucx_opal_mem_hooks=1 EOF [spack@gpu02 ~]$ spack install openmpi%aocc@3.1.0 ^slurm/bpmt4tx ^knem%gcc@8.5.0 ^lustre%gcc@8.5.0 [spack@gpu02 ~]$ cat >> $(spack location -i openmpi%aocc@3.1.0)/etc/openmpi-mca-params.conf << EOF btl=^openib opal_common_ucx_opal_mem_hooks=1 EOF spack spec -lI openmpi%intel@2021.4.0 ^slurm/bpmt4tx spack install openmpi%intel@2021.4.0 ^slurm/bpmt4tx ==> Warning: Intel compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors cat >> $(spack location -i openmpi%intel@2021.4.0)/etc/openmpi-mca-params.conf << EOF btl=^openib opal_common_ucx_opal_mem_hooks=1 EOF [spack@gpu02 ~]$ spack install openmpi%gcc@11.2.0 ^slurm/bpmt4tx ^ucx/i4eomv4 cat >> $(spack location -i openmpi%gcc@11.2.0)/etc/openmpi-mca-params.conf << EOF btl=^openib opal_common_ucx_opal_mem_hooks=1 EOF ~~~ Compiling and running OpenMPI with srun requires PMIX ~~~sh cat > hello.f90 <<EOF program hello use mpi integer rank, size, ierror, strlen, status(MPI_STATUS_SIZE) character(len=MPI_MAX_PROCESSOR_NAME) :: hostname call MPI_INIT(ierror) call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierror) call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror) call MPI_GET_PROCESSOR_NAME( hostname, strlen, ierror ) print*, trim(hostname), rank, size call MPI_FINALIZE(ierror) end EOF ml openmpi-4.1.2-gcc-8.5.0-nezrdtx mpif90 hello.f90 srun --mpi=pmix -p rome -N5 -n40 --mem=0 a.out ml purge ml openmpi-4.1.2-aocc-3.1.0-43vdsd3 ml aocc-3.1.0-gcc-11.2.0-plf5zph mpif90 hello.f90 srun --mpi=pmix -p rome -N5 -n40 --mem=0 a.out ~~~ Wrapper `mpirun` is not built by default anymore with OpenMPI. You should use `srun --mpi=pmix` instead. See http://hpc.fs.uni-lj.si/slurm examples. ## Linpack ~~~spack [spack@gpu02 ~]$ spack spec -lI hpl+openmp%aocc@3.1.0 ^amdblis%aocc@3.1.0 threads=openmp ^openmpi/43vdsd3 Input spec -------------------------------- - hpl%aocc@3.1.0+openmp - ^amdblis%aocc@3.1.0 threads=openmp ... ~~~ ## OpenFOAM ~~~ bash spack spec -Il openfoam%gcc@11.2.0+metis+zoltan+mgridgen+paraview ^openmpi/ip5xqcx ^paraview+qt ^mesa@21.3.8 module load openfoam-2112-gcc-11.2.0-qi6xbnd ~~~ There are also AOCC compiled OpenFOAM modules but it seems that not all utilities are compiled due to incompatibility or compiler detection. Sample OpenFOAM batch script can be ~~~bash #SBATCH -p rome #SBATCH -n 96 #SBATCH --mem=0 #SBATCH --ntasks-per-node=48 module purge module load openfoam-2112-gcc-11.2.0-qi6xbnd # Decompose solution (serial) decomposePar -force > log.decomposeParDict 2>&1 # Run the solution (parallel) srun --mpi=pmix compressibleInterFoam -parallel > log.CIF 2>&1 ~~~ or ~~~ bash ~~~ ## TAU profiler and analysis utility Built for Intel and gcc compiler ~~~ bash $ spack install tau%intel+mpi+ompt+openmp ^openmpi/hovu6pi $ spack install tau%gcc@11.2.0+mpi+ompt+openmp ^openmpi/ip5xqcx ~~~ ## AMDScalaPack, MUMPS, PETSc spack spec -I amdscalapack%aocc ^openmpi/43vdsd3 spack spec -I mumps%aocc ^amdscalapack/g4eckyd spack spec -I petsc%intel ^amdscalapack/g4eckyd spack spec -I petsc%intel +mkl-pardiso+scalapack+valgrind ^openmpi/hovu6pi ## stream benchmark According to [AMD developer instructions](https://developer.amd.com/spack/stream-benchmark/) the benchmark should be: spack spec -I stream%aocc+openmp cflags="-mcmodel=large -DSTREAM_TYPE=double -mavx2 -DSTREAM_ARRAY_SIZE=260000000 -DNTIMES=10 -ffp-contract=fast -fnt-store" ## ParaView/tajjatz spack install paraview+mpi+qt ^openmpi/nezrdtx ^mesa@21.3.8 ha7jdza spack install paraview+mpi+qt+python3 ^openmpi/nezrdtx ^mesa@21.3.8 ## Fenics vtzd4lz spack spec fenics-dolfinx ^openmpi/ip5xqcx ## gmsh@4.8.4/p2z7wue spack install gmsh%gcc ^openmpi/nezrdtx ###### tags: `HPCFS` `spack`