MPI Ghost Exchange Optimization Examples
Changes Between Example Versions
This code contains several implementations of the same ghost exchange algorithm at varying stages
of optimization:
Orig: Shows a CPU-only implementation that uses MPI, and serves as the starting point for further optimizations. It is recommended to start here!
Ver1: Shows an OpenMP target offload implementation that uses the Managed memory model to port the code to GPUs using host allocated memory for MPI communication.
Ver2: Shows the usage and advantages of using roctx ranges to get more easily readable profiling output from Omnitrace.
Ver3: Under Construction, not expected to work at the moment
Ver4: Explores heap-allocating communication buffers once on host.