# Parallel Programming Assignment 2
# 0816176 張辰宇
## Q1: In your write-up, produce a graph of speedup compared to the reference sequential implementation as a function of the number of threads used FOR VIEW 1. Is speedup linear in the number of threads used? In your writeup hypothesize why this is (or is not) the case? (You may also wish to produce a graph for VIEW 2 to help you come up with a good answer. Hint: take a careful look at the three-thread data-point.)
The results using different number of threads are listed below.
### View 1
```
816176@pp037-ubuntu:~/HW2/part2$ ./mandelbrot -t 2
[mandelbrot serial]: [491.160] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]: [249.895] ms
Wrote image file mandelbrot-thread.ppm
(1.97x speedup from 2 threads)
816176@pp037-ubuntu:~/HW2/part2$ ./mandelbrot -t 3
[mandelbrot serial]: [478.259] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]: [291.454] ms
Wrote image file mandelbrot-thread.ppm
(1.64x speedup from 3 threads)
816176@pp037-ubuntu:~/HW2/part2$ ./mandelbrot -t 4
[mandelbrot serial]: [482.081] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]: [238.224] ms
Wrote image file mandelbrot-thread.ppm
(2.02x speedup from 4 threads)
```

### view 2
```
816176@pp037-ubuntu:~/HW2/part2$ ./mandelbrot -t 2 --view 2
[mandelbrot serial]: [299.227] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]: [180.306] ms
Wrote image file mandelbrot-thread.ppm
(1.66x speedup from 2 threads)
816176@pp037-ubuntu:~/HW2/part2$ ./mandelbrot -t 3 --view 2
[mandelbrot serial]: [305.834] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]: [148.633] ms
Wrote image file mandelbrot-thread.ppm
(2.06x speedup from 3 threads)
816176@pp037-ubuntu:~/HW2/part2$ ./mandelbrot -t 4 --view 2
[mandelbrot serial]: [300.232] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]: [110.677] ms
Wrote image file mandelbrot-thread.ppm
(2.72x speedup from 4 threads)
```

In view 1, the growth is not linear.
In view 2, the growth is linear.

According to the picture above, I think the middle thread is assigned a lot more points to calculate in view 1, which results in longer elapsed time when using 3 thread. In view 2, the distibution is more even. Therefore, the growth is liner/
My hypothesize is the white points represents the amount of work a thread has to do. This hypothesize explains the growth trend of different views.
## Q2: How do your measurements explain the speedup graph you previously created?
- Thread num = 2
```
816176@pp037-ubuntu:~/HW2/part2$ ./mandelbrot -t 2
[mandelbrot serial]: [461.094] ms
Wrote image file mandelbrot-serial.ppm
thread: 0 times: 237.278
thread: 1 times: 237.506
thread: 0 times: 236.003
thread: 1 times: 238.448
thread: 0 times: 235.794
thread: 1 times: 237.157
thread: 0 times: 237.164
thread: 1 times: 238.455
thread: 0 times: 236.994
thread: 1 times: 237.125
[mandelbrot thread]: [237.323] ms
Wrote image file mandelbrot-thread.ppm
(1.94x speedup from 2 threads)
```
We can observe that two threads have almost the same amount of work to do in this case.
- Thread num = 3
```
816176@pp037-ubuntu:~/HW2/part2$ ./mandelbrot -t 3
[mandelbrot serial]: [460.616] ms
Wrote image file mandelbrot-serial.ppm
thread: 0 times: 93.146
thread: 2 times: 93.949
thread: 1 times: 284.180
thread: 0 times: 93.496
thread: 2 times: 93.618
thread: 1 times: 282.634
thread: 0 times: 93.820
thread: 2 times: 93.930
thread: 1 times: 282.910
thread: 0 times: 93.329
thread: 2 times: 93.762
thread: 1 times: 283.378
thread: 0 times: 93.131
thread: 2 times: 94.722
thread: 1 times: 287.022
[mandelbrot thread]: [282.897] ms
Wrote image file mandelbrot-thread.ppm
(1.63x speedup from 3 threads)
```
We can observe that due to the geographic ditribution of the white points, thread number 1 has the most jobs to do, which becomes the bottleneck of the whole program. This is the reason that it is worse than the 2 threaded case - the distibution of the work is too unbalanced.
- Thread num = 4
```
816176@pp037-ubuntu:~/HW2/part2$ ./mandelbrot -t 4
[mandelbrot serial]: [461.560] ms
Wrote image file mandelbrot-serial.ppm
thread: 0 times: 45.426
thread: 3 times: 52.458
thread: 2 times: 196.468
thread: 1 times: 208.487
thread: 0 times: 45.738
thread: 3 times: 46.693
thread: 1 times: 194.270
thread: 2 times: 194.359
thread: 0 times: 45.266
thread: 3 times: 46.120
thread: 1 times: 193.091
thread: 2 times: 194.167
thread: 0 times: 45.288
thread: 3 times: 45.701
thread: 1 times: 192.587
thread: 2 times: 193.466
thread: 0 times: 47.374
thread: 3 times: 49.587
thread: 1 times: 192.929
thread: 2 times: 194.471
[mandelbrot thread]: [193.652] ms
Wrote image file mandelbrot-thread.ppm
(2.38x speedup from 4 threads)
```
We can observe that compare to three threaded case, the bottleneck is less obvious, so the effieciency starts to grow again.
## Q3: In your write-up, describe your approach to parallelization and report the final 4-thread speedup obtained.
``` cpp
void workerThreadStart(WorkerArgs *const args)
{
int id, height, numThreads;
id = args->threadId;
numThreads = args->numThreads;
height = args->height;
for( ; id < height ; id += numThreads){
mandelbrotSerial(args->x0, args->y0, args->x1, args->y1, args->width, args->height, id, 1, args->maxIterations, args->output);
}
}
```
I let every thread calculates one row at a time, and increase to the next numThreads of row. This will gurantee all the treads will do same amount of jobs(lines of row). For example, if there are total of 6 rows and 4 threads. Then thread 0 will do row 0 and row 3, thread 1 will do row 1, row 4, thread 2 will do row 2, row 5.
The final result is as follow.
```
816176@pp037-ubuntu:~/HW2/part2$ ./mandelbrot -t 4
[mandelbrot serial]: [461.797] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]: [121.088] ms
Wrote image file mandelbrot-thread.ppm
(3.81x speedup from 4 threads)
```
## Q4: Now run your improved code with eight threads. Is performance noticeably greater than when running with four threads? Why or why not? (Notice that the workstation server provides 4 cores 4 threads.)
```
816176@pp037-ubuntu:~/HW2/part2$ ./mandelbrot -t 4
[mandelbrot serial]: [461.797] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]: [121.088] ms
Wrote image file mandelbrot-thread.ppm
(3.81x speedup from 4 threads)
816176@pp037-ubuntu:~/HW2/part2$ ./mandelbrot -t 8
[mandelbrot serial]: [461.287] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread]: [127.628] ms
Wrote image file mandelbrot-thread.ppm
(3.61x speedup from 8 threads)
```
We can observe that 8 threaded is slower than 4 threaded since we only have 4 cores, any number more than that will only lead to more context switch and more overhead.
:::danger
你應該要根據server使用的cpu是4core&4thread去判斷, 重點是4thread, 若是支援Hyper-threading的4core 8thread的cpu開8個thread就不一定比較糟了
by TA
:::