# PP hw1 ## Question 1. ``` void test1(float* __restrict a, float* __restrict b, float* __restrict c, int N) { __builtin_assume(N == 1024); a = (float *)__builtin_assume_aligned(a, 32); b = (float *)__builtin_assume_aligned(b, 32); c = (float *)__builtin_assume_aligned(c, 32); fasttime_t time1 = gettime(); for (int i=0; i<I; i++) { for (int j=0; j<N; j++) { c[j] = a[j] + b[j]; } } fasttime_t time2 = gettime(); double elapsedf = tdiff(time1, time2); std::cout << "Elapsed execution time of the loop in test1():\n" << elapsedf << "sec (N: " << N << ", I: " << I << ")\n"; } ``` ## Question 2. Before optimized: ``` 7.99886sec 9.14774sec 8.72205sec 9.1878sec 9.14816sec ``` After optimized: ``` 3.70272sec 3.38494sec 3.11554sec 3.47381sec 3.37357sec ``` Speed up: $9.14774/3.38494\sim2.70248217103$ After enable AVX2: ``` 1.9841sec 1.89361sec 1.95184sec 1.85509sec 1.8603sec ``` Speed up: $3.38494/1.89361\sim1.7875592123$ ## Question 3. After porting the patch of test2.cpp, the logic flow would become more direct, helping compiler easily use vectorization.
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up