# PP-f22 Assignment 1
0816034 蔡家倫
---
## Q1-1: Does the vector utilization increase, decrease or stay the same as VECTOR_WIDTH changes? Why?
The vector utilization decreases as VECTOR_WIDTH increases.
I think it is because when VECTOR_WIDTH increases, it includes more numbers, and the parallel mode ```while (count > 0)``` code needs to run more useless times until the largest value goes to 0.
eg. vector: [1 2 3 4]
VECTOR_WIDTH 1: 1+2+3+4=10, 10/10 = 1
VECTOR_WIDTH 2: 2+4=6, 10/( 6 * 2 ) = 0.83..
VECTOR_WIDTH 4: 4, 10/(4 * 4) = 0.625
Test command:
```bash
./myexp -s 10000
```
| VECTOR_WIDTH | vector utilization |
| - | - |
| 2 | 76.9% |
| 4 | 69.7% |
| 8 | 65.9% |
| 16 | 64.2% |
## Q2-2: What speedup does the vectorized code achieve over the unvectorized code? What can you infer about the bit width of the default vector registers on the PP machines?
About 3x (8.2 sec -> 2.6sec)
The default vector registers (xmm) have 128 bit width.
## Q2-3: Provide a theory for why the compiler is generating dramatically different assembly.
The if-else version of code can easily be vectorized using masked assignments. The original version has "if", but the logic is more like "switch" statement, which has different control flow for different data (some data change to ```b[j]``` while the others keep ```a[j]```) , hence it can't be vectorized.