# 2018q3 第 12 週測驗題
### 測驗 `1`
Consider two functions foo and bar:
```C
void foo(int *px, int *py, int *pz) {
int t = *px;
*px = *py;
*py = *pz;
*pz = t;
}
void bar(int *px, int *py, int *pz) {
int t = *py;
*py = *pz;
*pz = *px;
*px = t;
}
```
Compilers are not allowed to generate the same code for foo and bar. Why is this?
==作答區==
Select one correct statement.
X1 = ?
* `(a)` the two functions should behave differently if `*px == *py` and `px == pz`
* `(b)` the two functions should generate different cache misses
* `(c)` the two functions should behave differently if `px == py`
---
測驗 `2`
Consider the following two versions of a loop in assembly:
Version X:
```
movq $1023, %rax
movq $1, %rbx
loop:
imulq (%rcx,%rax,8), %rbx
sub $1, %rax
jne loop
```
Version Y:
```
movq $1022, %rax
movq $1, %rbx
loop:
imulq 8(%rcx,%rax,8), %rbx
imulq (%rcx,%rax,8), %rbx
sub $2, %rax
jne loop
```
Version X performs at least one hundred more ______ than version Y.
OP1: subtractions
OP2: multiplications
OP3: conditional jumps
==作答區==
Select all that apply
X2 = ?
* `(a)` OP1
* `(b)` OP2
* `(c)` OP1 + OP2
* `(d)` OP3
* `(e)` OP1 + OP3
* `(f)` OP2 + OP3
---
測驗 `3`
Suppose a loop unrolled 8 times executes at the same speed as a loop unrolled 4 times on a processor. Which of the following changes to the processor is likely to make the loop unrolled 8 times execute faster than the 4 times unrolled loop instead?
S1: increasing the performance of the data cache
S2: increasing the number of execution units by providing duplicate copies of execution units used by the loop
S3: increasing the throughput of execution units used by the loop by converting unpipelined execution units into pipelined execution units
S4: decreasing the latency of execution units used by the loop
==作答區==
Select all that apply.
X3 = ?
* `(a)` S1
* `(b)` S1 + S2
* `(c)` S2
* `(d)` S3
* `(e)` S2 + S3
* `(f)` S1 + S4
* `(g)` S3 + S4
* `(h)` S2 + S4
---
測驗 `4`
Consider a 5-stage pipelined processor with the following pipeline stages:
* fetch
* decode
* execute
* memory
* writeback
Assume this processor reads registers during the middle of the decode stage of an instruction and writes registers at the end of the writeback stage. If the processor resolves hazards with stalling only, how many cycles of stalling will it require to execute the following assembly correctly?
```
addq %rcx, %rdx
subq %rcx, %rax
xorq %rax, %rdx
```
==作答區==
X4 = ?
* `(a)` 0
* `(b)` 1
* `(c)` 2
* `(d)` 3
* `(e)` 4
* `(f)` 5
* `(g)` 6
* `(h)` 7
* `(i)` 8
* `(j)` none of the above
---