2018q3 第 12 週測驗題

測驗 `1`

Consider two functions foo and bar:

void foo(int *px, int *py, int *pz) {
    int t = *px;
    *px = *py;
    *py = *pz;
    *pz = t;
}
void bar(int *px, int *py, int *pz) {
    int t = *py;
    *py = *pz;
    *pz = *px;
    *px = t;
}

Compilers are not allowed to generate the same code for foo and bar. Why is this?

作答區

Select one correct statement.

X1 = ?

(a) the two functions should behave differently if *px == *py and px == pz
(b) the two functions should generate different cache misses
(c) the two functions should behave differently if px == py

測驗 2

Consider the following two versions of a loop in assembly:

Version X:

    movq $1023, %rax
    movq $1, %rbx
loop:
    imulq (%rcx,%rax,8), %rbx
    sub $1, %rax
    jne loop

Version Y:

    movq $1022, %rax
    movq $1, %rbx
loop:
    imulq 8(%rcx,%rax,8), %rbx
    imulq (%rcx,%rax,8), %rbx
    sub $2, %rax
    jne loop

Version X performs at least one hundred more ______ than version Y.

OP1: subtractions
OP2: multiplications
OP3: conditional jumps

作答區

Select all that apply

X2 = ?

(a) OP1
(b) OP2
(c) OP1 + OP2
(d) OP3
(e) OP1 + OP3
(f) OP2 + OP3

測驗 3

Suppose a loop unrolled 8 times executes at the same speed as a loop unrolled 4 times on a processor. Which of the following changes to the processor is likely to make the loop unrolled 8 times execute faster than the 4 times unrolled loop instead?

S1: increasing the performance of the data cache
S2: increasing the number of execution units by providing duplicate copies of execution units used by the loop
S3: increasing the throughput of execution units used by the loop by converting unpipelined execution units into pipelined execution units
S4: decreasing the latency of execution units used by the loop

作答區

Select all that apply.

X3 = ?

(a) S1
(b) S1 + S2
(c) S2
(d) S3
(e) S2 + S3
(f) S1 + S4
(g) S3 + S4
(h) S2 + S4

測驗 4

Consider a 5-stage pipelined processor with the following pipeline stages:

fetch
decode
execute
memory
writeback

Assume this processor reads registers during the middle of the decode stage of an instruction and writes registers at the end of the writeback stage. If the processor resolves hazards with stalling only, how many cycles of stalling will it require to execute the following assembly correctly?

addq %rcx, %rdx
subq %rcx, %rax
xorq %rax, %rdx

作答區

X4 = ?

(a) 0
(b) 1
(c) 2
(d) 3
(e) 4
(f) 5
(g) 6
(h) 7
(i) 8
(j) none of the above

2018q3 第 12 週測驗題

測驗 1

測驗 `1`