Try   HackMD

Omniperf - ENCCS Hackathon

We have made built the Omniperf without GUI support for use in the exercises

  • Load Omniperf:
ml rocm/5.3.3
module load cray-python
module use /cfs/klemming/home/g/gmarkoma/Public/omniperf/modulefiles/
module load omniperf
  • Reserve a GPU, compile the exercise and execute Omniperf, observe how many times the code is executed
salloc -N 1 -p gpu-tst -A edu23.enccsgpu -t 00:30:00
git clone https://github.com/AMD/HPCTrainingExamples.git
cd HPCTrainingExamples/HIP/dgemm/
mkdir build
cd build
cmake ..
make
cd bin
srun -n 1 omniperf profile -n dgemm -- ./dgemm -m 8192 -n 8192 -k 8192 -i 1 -r 10 -d 0 -o dgemm.csv
  • Run srun -n 1 --gpus 1 omniperf profile -h to see all the options

  • Now is created a workload in the directory workloads with the name dgemmoh I mean for (the argument of the -n). So, we can analyze it

 srun -n 1 --gpus 1 omniperf analyze -p workloads/dgemm/mi200/ &> dgemm_analyze.txt
  • If you want to only roofline analysis, then execute: srun -n 1 --gpus 1 omniperf profile -n dgemm --roof-only -- ./dgemm -m 8192 -n 8192 -k 8192 -i 1 -r 10 -d 0 -o dgemm.csv

  • If tou want to know the kernel names, it creates a second pdf with the markers and corresponding names, then run: srun -n 1 --gpus 1 omniperf profile -n dgemm --kernel-names --roof-only -- ./dgemm -m 8192 -n 8192 -k 8192 -i 1 -r 10 -d 0 -o dgemm.csv

There is no need for srun to analyze but we want to avoid everybody to use the login node. Explore the file dgemm_analyze.txt

  • We can select specific IP Blocks, like:
srun -n 1 --gpus 1 omniperf analyze -p workloads/dgemm/mi200/ -b 7.1.2

But you need to know the code of the IP Block

  • If you have installed Omniperf on your laptop (no ROCm required for analysis) then you can download the data and execute:
omniperf analyze -p workloads/dgemm/mi200/ --gui