Reserve a GPU
Load Omnitrace
Allocate resources with salloc
Check the various options and their values and also a second command for description
srun -n 1 --gpus 1 omnitrace-avail --categories omnitrace
srun -n 1 --gpus 1 omnitrace-avail --categories omnitrace --brief --description
srun -n 1 omnitrace-avail -G omnitrace_all.cfg --all
Declare to use this configuration file: export OMNITRACE_CONFIG_FILE=/path/omnitrace_all.cfg
Get the file https://github.com/ROCm-Developer-Tools/HIP/tree/develop/samples/2_Cookbook/0_MatrixTranspose/MatrixTranspose.cpp
or cp /project/project_465000388/exercises/AMD/MatrixTranspose.cpp .
Compile hipcc --offload-arch=gfx90a -o MatrixTranspose MatrixTranspose.cpp
Execute the binary: time srun -n 1 --gpus 1 ./MatrixTranspose
and check the duration
time srun –n 1 –-gpus 1 omnitrace -- ./MatrixTranspose
and check the durationnm --demangle MatrixTranspose | egrep -i ' (t|u) '
srun -n 1 --gpus 1 omnitrace -v 1 --simulate --print-available functions -- ./MatrixTranspose
Binary rewriting: srun -n 1 --gpus 1 omnitrace -v -1 --print-available functions -o matrix.inst -- ./MatrixTranspose
Executing the new instrumented binary: time srun -n 1 --gpus 1 ./matrix.inst
and check the duration
See the list of the instrumented GPU calls: cat omnitrace-matrix.inst-output/TIMESTAMP/roctracer.txt
perfetto-trace.proto
to your laptop, open the web page https://ui.perfetto.dev/ click to open the trace and select the filesrun -n 1 --gpus 1 omnitrace-avail --all
OMNITRACE_ROCM_EVENTS = GPUBusy,Wavefronts,VALUBusy,L2CacheHit,MemUnitBusy
srun -n 1 --gpus 1 ./matrix.inst
and copy the perfetto file and visualizeActivate in your configuration file OMNITRACE_USE_SAMPLING = true
and OMNITRACE_SAMPLING_FREQ = 100
, execute and visualize
omnitrace-binary-output/timestamp/wall_clock.txt
(replace binary and timestamp with your information)OMNITRACE_USE_TIMEMORY = true
and OMNITRACE_FLAT_PROFILE = true
, execute the code and open again the file omnitrace-binary-output/timestamp/wall_clock.txt