cp -r /Shared/HPCTrainingExamples/ .
saxpy.hip
or use the file from HPCTrainingExamples/HIP/saxpy/
hipcc --offload-arch=gfx90a -o saxpy saxpy.hip
export HIP_VISIBLE_DEVICES=X
./saxpy
saxpy
with rocgdbrocgdb saxpy
hipcc -ggdb --offload-arch=gfx90a -o saxpy saxpy.hip
rocgdb saxpy
hipcc -ggdb -O0 --offload-arch=gfx90a -o saxpy saxpy.cpp
Before you execute any Omnitrace call, select a specific GPU:
export HIP_VISIBLE_DEVICES=X
Check the various options and their values and also a second command for description
omnitrace-avail --categories omnitrace --brief --description
omnitrace-avail -G omnitrace.cfg --all
export OMNITRACE_CONFIG_FILE=/path/omnitrace.cfg
omnitrace.cfg
, edit the file, find the OMNITRACE_USE_TIMEMORY
and declare it equal to trueOMNITRACE_USE_TIMEMORY = true
git clone https://github.com/amd/HPCTrainingExamples.git
cd HPCTrainingExamples/HIP/jacobi
make clean;make
jacobi_hip
Now execute the binary
time mpirun -np 1 Jacobi_hip -g 1 1
Check the duration
Binary rewriting: omnitrace-instrument -o jacobi.inst -- ./Jacobi_hip
We created a new instrumented binary called jacobi.inst
Executing the new instrumented binary: time mpirun -n 1 omnitrace-run -- ./jacobi.inst -g 1 1
and check the duration
See the list of the instrumented GPU calls: cat omnitrace-jacobi.inst-output/TIMESTAMP/roctracer-0.txt
See the list of the instrumented CPU calls: cat omnitrace-jacobi.inst-output/TIMESTAMP/wallclock-0.txt
or wallclock-1.txt
Check the MPI calls
perfetto-trace.proto
to your laptop, open the web page https://ui.perfetto.dev/ click to open the trace and select the file perfetto-trace-0.proto or perfetto-trace-1.proto.Edit your omnitrace.cfg:
Execute again the instrumented binary and now you can see the call-stack when you visualize with perfetto.
omnitrace-avail --all
OMNITRACE_ROCM_EVENTS = GPUBusy,Wavefronts,VALUBusy,L2CacheHit,MemUnitBusy
omnitrace-run -- ./jacobi.inst -g 1 1
and copy the perfetto file and visualizeomnitrace-binary-output/timestamp/wall_clock.txt
(replace binary and timestamp with your information)OMNITRACE_USE_TIMEMORY = true
and OMNITRACE_FLAT_PROFILE = true
, execute the code and open again the file omnitrace-binary-output/timestamp/wall_clock.txt
Add this in your path:
export PATH=/opt/conda/bin/:$PATH
Load Omniperf
Enter each directory and read the instructions even from a web page: https://github.com/amd/HPCTrainingExamples/tree/main/OmniperfExamples
Before you execute any Omniper call, select a specific GPU:
export HIP_VISIBLE_DEVICES=X