---
tags: SCC_Camp
---
# SCC Homework 1 report
---
## Algorithm
----
### odd-even sort part
----
```
// pseudo code
read the array from file
sort the array // each node
for i = 1 to (# of nodes) / 2 // which is # of section / 2
if at even node
if (i is even) // even phase
send the last element in array to the right node,
and get the first element in array of the right node, called it preload.
if preload < the last element in array
merge the array with the array of the right node,
and save the smaller half in array.
else // odd phase
send the first element in array to the left node,
and get the last element in array of the left node, called it preload.
if preload > the first element in array
merge the array with the array of the left node,
and save the larger half in array.
else if at odd node
if (i is even) // even phase
send the first element in array to the left node,
and get the last element in array of the left node, called it preload.
if preload > the first element in array
merge the array with the array of the left node,
and save the larger half in array.
else // odd phase
send the last element in array to the right node,
and get the first element in array of the right node, called it preload.
if preload < the last element in array
merge the array with the array of the right node,
and save the smaller half in array.
```
----

----
**timeline**
- classic version
- classic version but bounded world_size/2 time
- preload version
----
### the merge part
By traversing both the arrays to keep track of the current element in each arrays and finding the minimum value among the two and updating the output_array with the least value.

> trick: switch pointer
----
**Advance**
When I was search about merge algorithm, I found out that is actually a parallel merge algorithm. Below is the pseudo code
[wiki merge algorithm reference](https://en.wikipedia.org/wiki/Merge_algorithm)
----
```
algorithm merge(A[i...j], B[k...ℓ], C[p...q]) is
inputs A, B, C : array
i, j, k, ℓ, p, q : indices
let m = j - i,
n = ℓ - k
if m < n then
swap A and B // ensure that A is the larger array: i, j still belong to A; k, ℓ to B
swap m and n
if m ≤ 0 then
return // base case, nothing to merge
let r = ⌊(i + j)/2⌋
let s = binary-search(A[r], B[k...ℓ])
let t = p + (r - i) + (s - k)
C[t] = A[r]
in parallel do
merge(A[i...r], B[k...s], C[p...t])
merge(A[r+1...j], B[s...ℓ], C[t+1...q])
```
----
### the send and receive part
I also tried other version on send & recv
----
1. Call send, recv seperately
- this would cause a little latency because the latter must wait the former
```
MPI_Send(...)
MPI_Recv(...)
```
----
2. Call Isend, Irecv seperately and call wait request
- this performed better than above because the latter don't need to wait the former fully completed to start
- But the below still have to wait two operation done.
```
MPI_Isend(...)
MPI_Irecv(...)
...
MPI_Wait(...)
```
----
3. Use Sendrecv
- this outperformed the above two due to MPI's implementation on Sendrecv
```
MPI_Sendrecv(...)
```
----
> ... <!-- A send-receive operation is useful for executing a shift operation across a chain of processes. If blocking sends and receives are used for such a shift, then one needs to order the sends and receives correctly (for example, even processes send, then receive; odd processes receive first, then send) --> in order to prevent cyclic dependencies that may lead to deadlock. **When a send-receive operation is used, the communication subsystem takes care of these issues.**
- https://www.open-mpi.org/doc/v4.0/man3/MPI_Sendrecv.3.php
---
## Makefile
----
### Compiler Optimization
----
- Use `g++`
I tested `-O1`, `-O2` and `-O3`. And the `-O3` version is the fastest.
Also `-Ofast` is tested, but it will enables some non standard compliant optimisations and get wrong answer.
`-O3` does increase the performance a lot.
---
## Others
----
### spreadsort
which heuristically mixes bucket and comparison sorting.
[spreadsort wiki](https://en.wikipedia.org/wiki/Spreadsort)
https://stackoverflow.com/questions/21251061/does-c-algorithm-boost-lib-have-radix-sort
This does increase the performance a lot.
----
### PMPI
[PMPI stackoverflow](https://stackoverflow.com/questions/43002936/what-are-the-differences-between-the-mpi-functions-and-pmpi-functions?rq=1)
It doesn't increase the performance to much since it just weak symbols pointing topic. (MPI -> PMPI)
---
## Thanks
----