# 多處理機平行程式設計 - HW1
`AN4086056` `黃祥瑋`
## Problem 1
To make the program execute in parallel way , I cut the range needed to be check into several slice . Each slice has a additional process to calculate .When the calculation is done , we merge the result in a tree structure .
### FLow Chart
```graphviz
Digraph G{
"cutting the range" -> "Process 1"->"Merge"
"cutting the range" -> "Process 2"->"Merge"
"cutting the range" -> "Process 3"->"Merge"
"cutting the range" -> "Process 4"->"Merge"
"Merge" -> "Result"
}
```
### Tree Structure Merge Method
Merge in a tree structure way has a time cost $O(log(n))$ , it is faster than merging to root process one by one , which the time cost is $O(n)$.
We implement this by making the process with even id to receive the data and odd id to send data . We merge data from bigger process id t smaller process id .
After each round , the id iterator will be divide by 2 , which make all id has both even and odd number .
When sending and receieving data , we get the correct id from the calculate of id iterator and rounds .
* **Example**
For 8 process doing
```clike
id : id iterator :
round 1 : 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
round 2 : 0 2 4 6 0 1 2 3
round 3 : 0 4 0 1
round 4 : 0 0
```
```
To get correct src id or dest id :
id = round * (idt + 1) or id = round * (idt - 1)
```
* **Special case**
When number of workers is odd , we let the last one merge with root first .
When the recv id is out of range (bigger than (number of worker - 1)) , we let the process skip that round .
### Analyze

From the execution , we get this result .
```
worker : sec
1 : 0.00110
2 : 0.00156
3 : 0.00107
4 : 0.00043
5 : 0.00082
6 : 0.00100
7 : 0.00144
8 : 0.00047
9 : 0.00069
10 : 0.00093
```
It is obvious that executing with 4 is the most efficiency way . I think the reason that more processes don't be more efficient is that the merging cost too much time .
So I take a test .

```
worker : merge sec / total sec
1 : 0
4 : 11%
8 : 54%
16 : 57%
24 : 89%
```
We can find that merging time has a increament relative to the total execute time .It indicates that more processses don't gets a speed up , because we might consider the time cost of merging , too .
### Difficulty
* **the way to merge**
It took me a long time to implement the tree structure , because the order and the rule of merging has to be evaluated perfectly .
## Problem 2
To estimate $\pi$ we have to toss n point on the square .The implementation is similar to Problem 1 ,the only difference is that we have a brocast process . To make each process do $\dfrac{n}{worker}$ toss , we brocast the correct toss number to processes , than we calculate and finally merge the result into root process .
We use $n = 1000000$ in the program , because the larger toss is more accurate .
### FLow Chart
```graphviz
Digraph G{
"cutting the range" -> "brocast"
"brocast" -> "Process 1"->"Merge"
"brocast" -> "Process 2"->"Merge"
"brocast" -> "Process 3"->"Merge"
"brocast" -> "Process 4"->"Merge"
"Merge" -> "Result"
}
```
### Brocast Method
I brocast the number to each process by find the ceiling $n^2$ of the worker number . And the worker that choosed to be helper will help the root distribute the data to n process . And if the target is out of worker's number , the helper stops .
* **Example 1**
If we have 4 process , so $n = 2$ , then `1` is helper .
```
period 1 : 0 -> 1
period 2 : 0 -> 3 , 1 -> 2
```
* **Example 2**
If we have 7 process , so $n = 3$ , then `1` , `5` is helper .
```
period 1 : 0 -> 1
period 2 : 0 -> 5 ,1 -> 2
period 3 : 1 -> 3 , 5 -> 6
period 4 : 1 -> 4
```
### Analyze

We can find that this brocasting way has an affinity with $n^2$ process ,because if the worker number is $n^2$ , the brocast time cost can reduced with $\dfrac{1}{n}$ .But if the number isn't $n^2$ , the other process has a extra time cost for waiting helper to distribute the data .
And it is similar to merge , more process also takes a longer time to brocast .The more process will cost more in some cases .
### Difficulty
I spend a lot of time in thinking the way to brocast . At first i try to contruct a tree structure brocast , it needs to use a recursive brocasting .But I stuck in the sequence of get and send . I use the way above instead of tree structure .If we use $n^2$ process to execute it , it is quite efficient !
## Feed Back
The compiler has some limit on using for loop which makes the classic iterator not usable , the error is showed below .

And i find that i can reach the home page in the system easily .
