PP assignment 2

# PP assignment 2 ###### tags: `PP Assignment` ## Q1 | Number of Thread | Speed Up(View 1) | Speed Up(View 2) | | ---------------- | ---------------- | ---------------- | | 2 | 1.94x | 1.67x | | 3 | 1.63x | 2.20x | | 4 | 2.38x | 2.62x | The speedup according to the thread used. I've also noticed that speed up of 3 threads in view 1 is much slower compared to 2 and 4 threads. All threads are given the same number of rows to compute, but each row might have different workload. My hypothesis is workload were not evenly distributed throughtout the threads. The result is some threads may take longer time to complete it's computation. :::info You should provide a graph that shows your measurements. >[name=TA] ::: ## Q2 ![](https://i.imgur.com/ST8Uzv5.png) From the result shown above, thread 1 took longer time to complete when compared to thread 2 and 3. This means that the first 1/3 part of the image needed more computation. This conclude that the hypothesis that I proposed is correct. :::info There's no thread 3 in the picture. >[name=TA] ::: ## Q3 ### Implementation To distribute the workload evently through out all the threads. I separate the image into smaller section, where the number of section is equal to the number of thread squared. Eg, for 4 threads, the image is separated into 16 sections. Each section is processed by the thread id equals to ```s mod t```, where ```s``` is the section number and ```t``` is number of thread. The result is section will be computed by threads in a interleaved manner. By calculating ```height mod num_sec```, where ```height``` is the total number of rows and ```num_sec``` is total number of sections, we can get rows that don't belong to any section. These row will be computed in serial by the thread which has the largest thread ID. **Eg.** Lets say we can an image consist of 17 rows and the number of threads is 2. Number of section = $2 * 2 = 4$ Number of row that are not belong to any section = $17 \mod 4 = 1$ * For section 0 (row 0~3), because 0 mod 2 = 0, so section 0 will be compute by thread 0 * For section 1 (row 4~7), because 1 mod 2 = 1, so section 1 will be compute by thread 1 * For section 2 (row 8~11), because 0 mod 2 = 0, so section 2 will be compute by thread 0 * For section 3 (row 12~15), because 1 mod 2 = 1, so section 3 will be compute by thread 1 Row 16 which do not belong to any section will be computed by thread with thread ID 2 ### Result ![](https://i.imgur.com/RalBZuT.png, =500x) From the picture shown above, we can see the this implemenration can achieve a speed up of 3.6x with 4 threads. From the duration used by each thread, we can know that workload are distributed evenly between all threads. ## Q4 **4 threads** ![](https://i.imgur.com/enECsf7.png) **8 threads** ![](https://i.imgur.com/fdqAHn7.png) Different in speedup between 4 threads and 8 threads is minor. The total number of hardware threads that is available by the system is 4. Adding more thread will not speed up the process as the 8 threads are not running concurrently.