# Parallel Programming HW2 ###### tags: `Parallel Programming` ## Part 1 * Performance ![](https://i.imgur.com/ZnxzYVW.png =500x) ## Part 2 ### Q1 * In your write-up, produce a graph of speedup compared to the reference sequential implementation as a function of the number of threads used FOR VIEW 1. ![](https://i.imgur.com/J3CCgHQ.png =500x) * Is speedup linear in the number of threads used? **Ans:** No. ![](https://i.imgur.com/PLKqSr0.png =200x) * In your writeup hypothesize why this is (or is not) the case? **Ans:** White part of the image needs to be computed. Depending on the distribution method, each thread may receive a different amount of white space. In a 3-thread example, one of the three threads will compute the middle part which has the most white parts, so it takes longer. As a result, despite the fact that the work area is divided equally, if the workload is unequal in each work area, then the parallel results cannot be linearly related to the number of threads. * original(no partition) ![](https://i.imgur.com/6rA4cb9.png =300x) * partition of 2-thread spatial regions ![](https://i.imgur.com/23FTTcI.png =300x) * partition of 3-thread spatial regions ![](https://i.imgur.com/likOTiS.png =300x) * (You may also wish to produce a graph for VIEW 2 to help you come up with a good answer. Hint: take a careful look at the three-thread data-point.) ![](https://i.imgur.com/gxsIQDC.png =500x) ![](https://i.imgur.com/WWihlxN.png =200x) ### Q2 * How do your measurements explain the speedup graph you previously created? **Ans:** When the number of threads exceeds 2, several threads will take longer to execute. It is obvious that, when there are 3 threads, the upper and lower parts of the calculation amount are small. The large amount of calculation in the middle is allocated to one of the 3 threads. By dividing the calculation into smaller parts, several threads will be used for a long time. For example, in the 4-thread version, thread 1 and 2 take more time. ![](https://i.imgur.com/vn1GzlJ.png =300x) ![](https://i.imgur.com/FnuR61m.png =300x) ![](https://i.imgur.com/k7mppfR.png =300x) ### Q3 * In your write-up, describe your approach to parallelization and report the final 4-thread speedup obtained. **Ans:** When calling the *mandelbrotSerial()*, only one column's value is calculated each time, and the *mandelbrotSerial()* is continuously called by *workerThreadStart()*. To calculate the $i-th$ column, use the variable $i$. To avoid index collisions, $i+=numThreads$, and the allocation method is relatively fair, so there is no time difference regardless of the size of the block being calculated. ![](https://i.imgur.com/rxK6i0C.png =500x) ![](https://i.imgur.com/s9pHBoE.png =300x) ### Q4 * Now run your improved code with eight threads. Is performance noticeably greater than when running with four threads? Why or why not? (Notice that the workstation server provides 4 cores 4 threads.) **Ans:** No. Workstations provide 4 cores 4 threads, so it is best to use four threads, each using one provided-thread alone to reduce context switching overhead. Using more than four threads will lower efficiency, because there will be greater context switch overhead, so overall time will be longer, and speedup will be lower than using only four threads. ![](https://i.imgur.com/vB8q4i8.png =300x)