# Overlapping of Computation and Communication
CPU:你好 NIC,我有一條消息要發給我在 XYZ 的好朋友。可以幫我發給他嗎?
NIC:當然可以我的好朋友,稍等一下,等我發完之後會再跟你說
CPU:好的。在你傳送的時候,我會先繼續其他需要computation的工作,晚點再跟你確認
---時間過去了---
CPU:我已經做好一些任務了,你好了嗎?
NIC:還沒...再稍等一下
---時間過去了---
CPU:我已經完成更多任務了,你傳好了嗎!!
NIC:哦哦我已經傳好了,這是你的rescript
CPU:...
Notice how the CPU goes off and does other things while NIC handles all the details of sending.
In other words: "CPU has offloaded the task of sending the message to NIC."
- 下圖是有沒有用 communication-computation overlap 的差別

---
## How to measure?
#### Here are four general ways to measure.
>Source: Intel® MPI Benchmarks User Guide
- Measure the time needed for a pure communication call.
- Start a nonblocking collective operation.
- Start computation using the IMB_cpu_exploit function, as described in the IMB-IO Nonblocking Benchmarks chapter.
- To ensure correct measurement conditions, the computation time used by the benchmark is close to the pure communication time measured at step 1.
- Wait for communication to finish using the MPI_Wait function.
``` int MPI_Wait(MPI_Request *request, MPI_Status *status)```
---
## 增加 Overlapping 效率的方法
nonblocking collective operation
把```MPI_Recv()```改成```MPI_Irecv()``` + ```int MPI_Wait```
每次傳送資料的時候會先等到一定的量再傳送
可以讓空檔變大 -> 增加 CPU 進行 Computing 的時間
進而提高 Overlapping
example:
```
MPI_Request request;
MPI_Status status;
if(world_rank != 0){
MPI_Send(&(c[world_rank]), MPI_INT, 0, 0, MPI_COMM_WORLD);
MPI_Irecv(&c, world_size - 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &request);
MPI_Wait(&request, &status);
}
```
---
Reference:
http://ipv6.ncnu.org/Course/Preparing/RDMA/Ref/overlap.pdf
https://blogs.cisco.com/performance/overlap-of-communication-and-computation-part-1
https://www.hpcadvisorycouncil.com/subgroups_hpc_scale.php
https://software.intel.com/content/www/us/en/develop/documentation/imb-user-guide/top/mpi-3-benchmarks/imb-nbc-benchmarks/measuring-communication-and-computation-overlap.html
https://www.open-mpi.org/doc/v4.0/