# Omnitrace - ENCCS Hackathon
* Reserve a GPU
* Load Omnitrace
```
ml rocm/5.3.3
source /cfs/klemming/home/g/gmarkoma/Public/omnitrace/1.7.4/share/omnitrace/setup-env.sh
```
* Allocate resources with `salloc`
* Check the various options and their values and also a second command for description
`srun -n 1 --gpus 1 omnitrace-avail --categories omnitrace`
`srun -n 1 --gpus 1 omnitrace-avail --categories omnitrace --brief --description`
* Create an Omnitrace configuration file with description per option
`srun -n 1 omnitrace-avail -G omnitrace_all.cfg --all`
* Declare to use this configuration file: `export OMNITRACE_CONFIG_FILE=/path/omnitrace_all.cfg`
* Get the file https://github.com/ROCm-Developer-Tools/HIP/tree/develop/samples/2_Cookbook/0_MatrixTranspose/MatrixTranspose.cpp
or `cp /project/project_465000388/exercises/AMD/MatrixTranspose.cpp .`
* Compile `hipcc --offload-arch=gfx90a -o MatrixTranspose MatrixTranspose.cpp`
* Execute the binary: `time srun -n 1 --gpus 1 ./MatrixTranspose` and check the duration
#### Dynamic instrumentation
* Execute dynamic instrumentation: `time srun –n 1 –-gpus 1 omnitrace -- ./MatrixTranspose` and check the duration
* Check what the binary calls and gets instrumented: `nm --demangle MatrixTranspose | egrep -i ' (t|u) '`
* Available functions to instrument: `srun -n 1 --gpus 1 omnitrace -v 1 --simulate --print-available functions -- ./MatrixTranspose`
* the simulate option means that it will not execute the binary
#### Binary rewriting (to be used with MPI codes and decreases overhead)
* Binary rewriting: `srun -n 1 --gpus 1 omnitrace -v -1 --print-available functions -o matrix.inst -- ./MatrixTranspose`
* We created a new instrumented binary called matrix.inst
* Executing the new instrumented binary: `time srun -n 1 --gpus 1 ./matrix.inst` and check the duration
* See the list of the instrumented GPU calls: `cat omnitrace-matrix.inst-output/TIMESTAMP/roctracer.txt`
#### Visualization
* Copy the `perfetto-trace.proto` to your laptop, open the web page https://ui.perfetto.dev/ click to open the trace and select the file
#### Hardware counters
* See a list of all the counters: `srun -n 1 --gpus 1 omnitrace-avail --all`
* Declare in your configuration file: `OMNITRACE_ROCM_EVENTS = GPUBusy,Wavefronts,VALUBusy,L2CacheHit,MemUnitBusy`
* Execute: `srun -n 1 --gpus 1 ./matrix.inst` and copy the perfetto file and visualize
#### Sampling
Activate in your configuration file `OMNITRACE_USE_SAMPLING = true` and `OMNITRACE_SAMPLING_FREQ = 100`, execute and visualize
#### Kernel timings
* Open the file `omnitrace-binary-output/timestamp/wall_clock.txt` (replace binary and timestamp with your information)
* In order to see the kernels gathered in your configuration file, make sure that `OMNITRACE_USE_TIMEMORY = true` and `OMNITRACE_FLAT_PROFILE = true`, execute the code and open again the file `omnitrace-binary-output/timestamp/wall_clock.txt`