# 2018q3 第 12 週測驗題 ### 測驗 `1` Consider two functions foo and bar: ```C void foo(int *px, int *py, int *pz) { int t = *px; *px = *py; *py = *pz; *pz = t; } void bar(int *px, int *py, int *pz) { int t = *py; *py = *pz; *pz = *px; *px = t; } ``` Compilers are not allowed to generate the same code for foo and bar. Why is this? ==作答區== Select one correct statement. X1 = ? * `(a)` the two functions should behave differently if `*px == *py` and `px == pz` * `(b)` the two functions should generate different cache misses * `(c)` the two functions should behave differently if `px == py` --- 測驗 `2` Consider the following two versions of a loop in assembly: Version X: ``` movq $1023, %rax movq $1, %rbx loop: imulq (%rcx,%rax,8), %rbx sub $1, %rax jne loop ``` Version Y: ``` movq $1022, %rax movq $1, %rbx loop: imulq 8(%rcx,%rax,8), %rbx imulq (%rcx,%rax,8), %rbx sub $2, %rax jne loop ``` Version X performs at least one hundred more ______ than version Y. OP1: subtractions OP2: multiplications OP3: conditional jumps ==作答區== Select all that apply X2 = ? * `(a)` OP1 * `(b)` OP2 * `(c)` OP1 + OP2 * `(d)` OP3 * `(e)` OP1 + OP3 * `(f)` OP2 + OP3 --- 測驗 `3` Suppose a loop unrolled 8 times executes at the same speed as a loop unrolled 4 times on a processor. Which of the following changes to the processor is likely to make the loop unrolled 8 times execute faster than the 4 times unrolled loop instead? S1: increasing the performance of the data cache S2: increasing the number of execution units by providing duplicate copies of execution units used by the loop S3: increasing the throughput of execution units used by the loop by converting unpipelined execution units into pipelined execution units S4: decreasing the latency of execution units used by the loop ==作答區== Select all that apply. X3 = ? * `(a)` S1 * `(b)` S1 + S2 * `(c)` S2 * `(d)` S3 * `(e)` S2 + S3 * `(f)` S1 + S4 * `(g)` S3 + S4 * `(h)` S2 + S4 --- 測驗 `4` Consider a 5-stage pipelined processor with the following pipeline stages: * fetch * decode * execute * memory * writeback Assume this processor reads registers during the middle of the decode stage of an instruction and writes registers at the end of the writeback stage. If the processor resolves hazards with stalling only, how many cycles of stalling will it require to execute the following assembly correctly? ``` addq %rcx, %rdx subq %rcx, %rax xorq %rax, %rdx ``` ==作答區== X4 = ? * `(a)` 0 * `(b)` 1 * `(c)` 2 * `(d)` 3 * `(e)` 4 * `(f)` 5 * `(g)` 6 * `(h)` 7 * `(i)` 8 * `(j)` none of the above ---