ITAC (included in OneAPI) is a MPI tracer and analyer for intelmpi, if your program can be compiled with Intel MPI, then you can get a detailed MPI trace data by ITAC.
Just list some useful:
for more options you can find in Intel® Trace Collector
User and Reference Guide
turn down the MEM-MAXBLOCKS
and use less process to trace.