多處理機平行程式設計 - HW1

# 多處理機平行程式設計 - HW1 `AN4086056` `黃祥瑋` ## Problem 1 To make the program execute in parallel way , I cut the range needed to be check into several slice . Each slice has a additional process to calculate .When the calculation is done , we merge the result in a tree structure . ### FLow Chart ```graphviz Digraph G{ "cutting the range" -> "Process 1"->"Merge" "cutting the range" -> "Process 2"->"Merge" "cutting the range" -> "Process 3"->"Merge" "cutting the range" -> "Process 4"->"Merge" "Merge" -> "Result" } ``` ### Tree Structure Merge Method Merge in a tree structure way has a time cost $O(log(n))$ , it is faster than merging to root process one by one , which the time cost is $O(n)$. We implement this by making the process with even id to receive the data and odd id to send data . We merge data from bigger process id t smaller process id . After each round , the id iterator will be divide by 2 , which make all id has both even and odd number . When sending and receieving data , we get the correct id from the calculate of id iterator and rounds . * **Example** For 8 process doing ```clike id : id iterator : round 1 : 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 round 2 : 0 2 4 6 0 1 2 3 round 3 : 0 4 0 1 round 4 : 0 0 ``` ``` To get correct src id or dest id : id = round * (idt + 1) or id = round * (idt - 1) ``` * **Special case** When number of workers is odd , we let the last one merge with root first . When the recv id is out of range (bigger than (number of worker - 1)) , we let the process skip that round . ### Analyze ![](https://i.imgur.com/5wOcqHD.png) From the execution , we get this result . ``` worker : sec 1 : 0.00110 2 : 0.00156 3 : 0.00107 4 : 0.00043 5 : 0.00082 6 : 0.00100 7 : 0.00144 8 : 0.00047 9 : 0.00069 10 : 0.00093 ``` It is obvious that executing with 4 is the most efficiency way . I think the reason that more processes don't be more efficient is that the merging cost too much time . So I take a test . ![](https://i.imgur.com/zMiyJlN.png) ``` worker : merge sec / total sec 1 : 0 4 : 11% 8 : 54% 16 : 57% 24 : 89% ``` We can find that merging time has a increament relative to the total execute time .It indicates that more processses don't gets a speed up , because we might consider the time cost of merging , too . ### Difficulty * **the way to merge** It took me a long time to implement the tree structure , because the order and the rule of merging has to be evaluated perfectly . ## Problem 2 To estimate $\pi$ we have to toss n point on the square .The implementation is similar to Problem 1 ,the only difference is that we have a brocast process . To make each process do $\dfrac{n}{worker}$ toss , we brocast the correct toss number to processes , than we calculate and finally merge the result into root process . We use $n = 1000000$ in the program , because the larger toss is more accurate . ### FLow Chart ```graphviz Digraph G{ "cutting the range" -> "brocast" "brocast" -> "Process 1"->"Merge" "brocast" -> "Process 2"->"Merge" "brocast" -> "Process 3"->"Merge" "brocast" -> "Process 4"->"Merge" "Merge" -> "Result" } ``` ### Brocast Method I brocast the number to each process by find the ceiling $n^2$ of the worker number . And the worker that choosed to be helper will help the root distribute the data to n process . And if the target is out of worker's number , the helper stops . * **Example 1** If we have 4 process , so $n = 2$ , then `1` is helper . ``` period 1 : 0 -> 1 period 2 : 0 -> 3 , 1 -> 2 ``` * **Example 2** If we have 7 process , so $n = 3$ , then `1` , `5` is helper . ``` period 1 : 0 -> 1 period 2 : 0 -> 5 ,1 -> 2 period 3 : 1 -> 3 , 5 -> 6 period 4 : 1 -> 4 ``` ### Analyze ![](https://i.imgur.com/hbL03zy.png) We can find that this brocasting way has an affinity with $n^2$ process ,because if the worker number is $n^2$ , the brocast time cost can reduced with $\dfrac{1}{n}$ .But if the number isn't $n^2$ , the other process has a extra time cost for waiting helper to distribute the data . And it is similar to merge , more process also takes a longer time to brocast .The more process will cost more in some cases . ### Difficulty I spend a lot of time in thinking the way to brocast . At first i try to contruct a tree structure brocast , it needs to use a recursive brocasting .But I stuck in the sequence of get and send . I use the way above instead of tree structure .If we use $n^2$ process to execute it , it is quite efficient ! ## Feed Back The compiler has some limit on using for loop which makes the classic iterator not usable , the error is showed below . ![](https://i.imgur.com/VTrYqrj.png) And i find that i can reach the home page in the system easily . ![](https://i.imgur.com/rFlMtHL.png)