Home Work 1 - HackMD

# Home Work 1 **My Name: DuBu** **My ID: 0616108** ## Part1 #### Q1-1. Does the vector utilization increase, decrease or stay the same as VECTOR_WIDTH changes? Why? We knew from logger.cpp Vector Utilization equation ```(stats.utilized_lane / stats.total_lane) * 100```, it's mean how many vector were utilize. Vector utilization **Decrease** when VECTOR_WIDTH **Increase**. It is because the only way to break below loop is mask has no set bit. During the loop, we can tell the vector lane are not fully utilized because some of the mask's bit are zero. As the vector's width increase, the probability of **mask's zero bits** waiting for some **mask's one bits** to set to zero become higher, that's why the vector utilization decrease. ```cpp= // Using for Description not real code while(_pp_cntbits(mask) == 0){ ... if(exponents <= 0 || result >= 9.999999f) mask[i] = 0 ... } ``` ## Part2 ### Q2-1. AVX2 can only support **32-byte alignment**. ``` C++ // filename: test1.cpp void test1(float* __restrict a, float* __restrict b, float* __restrict c, int N) { __builtin_assume(N == 1024); a = (float *)__builtin_assume_aligned(a, 32); b = (float *)__builtin_assume_aligned(b, 32); c = (float *)__builtin_assume_aligned(c, 32); fasttime_t time1 = gettime(); for (int i=0; i<I; i++) { for (int j=0; j<N; j++) { c[j] = a[j] + b[j]; } } fasttime_t time2 = gettime(); double elapsedf = tdiff(time1, time2); std::cout << "Elapsed execution time of the loop in test1():\n" << elapsedf << "sec (N: " << N << ", I: " <Q2-2. Average of ten experiments. > Unvectorized: 8.24488sec > Vectorized: 2.61487sec > AVX2 Vector Registers: 1.39983sec > > Remark: > Vectorized almost x3 faster than Unvectorized > AVX2 Vector Registers almost x6 faster than Unvectorized Using AVX2 vector registers speed up almost 2x compare to vectorized program, at the same time vectorized code speed up nearest x3 compare to unvectorized pragram. #### What can you infer about the bit width of the default vector registers on the PP machines? What about the bit width of the AVX2 vector registers? We can tell from below figure, PP machines can only support 4x32bit float which is 16bytes of registers. On the other hand, AVX can only support 8x32-bit of float which is 32bytes of registers. **Important: XMM registers are 128 bits long, whereas YMM are 256bit.** ![](https://i.imgur.com/IMduOkO.png) Reference from: https://www.codingame.com/playgrounds/283/sse-avx-vectorization/what-is-sse-and-avx ### Q2-3. #### Provide a theory for why the compiler is generating dramatically different assembly? We might think these two code are doing exactly the same stuff because they both have the same result, in fact they have the same result but their procedure are totally difference. They might be similar but not the same. In **test1.cpp** compiler has to move value of ```a[j]``` to ```c[j]``` before the if comparison. Beside that, **test2cpp** directly compare the value of ```b[j] > a[j]``` find out the maximum value and store it into ```c[j]``` which SSE did supported with **MAXPS** instruction. >MAXPS: Maximum of Packed Single-Precision Floating-Point Values ![](https://i.imgur.com/Z5LD1nH.png) The reason of **test1.cpp** can't be supported by **maxps** instruction is because of ```c[j] = a[j];``` compiler has to create a MOV instruction, and the rest of the code are not consider as a maximum comparison. So we know the way you write the code is important even they have the same meaning!! ```cpp=--- --- test1.cpp +++ test2.cpp @@ -14,9 +14,8 @@ for (int j = 0; j < N; j++) { /* max() */ - c[j] = a[j]; - if (b[j] > a[j]) - c[j] = b[j]; + if (b[j] > a[j]) c[j] = b[j]; + else c[j] = a[j]; } } ``` :::info Nice work! >[name=TA] :::

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.