---
tags: 計算機組織, 資工系必修
---
# Exam 2
109暑 計算機組織 蔡文錦
[密碼: COExam2](https://docs.google.com/forms/d/e/1FAIpQLSeKhBjNur0Ipc6g2uUE5DoQwXhCJjqhRzAgkeTsN4fzLxqzvg/viewform)
#### Q1. Given the single cycle CPU and register information below. Which of following statements are correct when the instruction "beq" below is executed? Assume the "beq" is located at address 0x60000008, and the "L" is at 0x60000000. "beq $t1, $t2, L"

a. The value on the path with label (a) is 0x09.
b. The value on the path with label (e) is 0xFFFFFFF4.
c. The value on the path with label (f) is 0x00200020.
d. The value on the path with label (i) is 0x60000000.
e. The value on the path with label (j) is 0x60000000.
**Ans: a. c. d.**
b. (e) should be 0xFFFFFFFD. Since 0x60000008(PC) + 4 - ($x$ $\times$ 4 (since shift 2 bit)) = 0x60000000.
e. (j) should be 0x6000000C (PC + 4). Since \$t1 != \$t2.
#### Q2. Given the control value definitions for the forwarding MUXs as below, where the ForwardA controls the MUX of first source operand of ALU and ForwardB controls the second one. For the code sequence below, which of following statements are correct?

a. For the 2nd instruction lw, forwardA = 10 and forwardB = 00
b. For the 3rd instruction, forwardA = 00 and forwardB = 01
c. For the 4th instruction, forwardA = 10 and forwardB = 00
d. For the 5th instruction, forwardA = 01 and forwardB = 10
e. For the 6th instruction, forwardA = 00 and forward = 01
**Ans: d. e.**
| clock | forwarding & hardware | forwardA | forwardB |
|:-----:|:---------------------:|:--------:|:--------:|
| 1 | add \$t1, \$t2, \$t3 | 00 | 00 |
| 2 | lw \$t4, 4(\$t1) | 01 | 00 |
| 3 | stall | 00 | 00 |
| 4 | add \$t2, \$t1, \$t4 | 00 | 10 |
| 5 | add \$t1, \$t4, \$t3 | 00 | 00 |
| 6 | add \$t1, \$t1, \$t2 | 01 | 10 |
| 7 | add \$t2, \$t1, \$t1 | 00 | 01 |
#### Q3. Consider the five-stage piplelined CPU introduced in the textbook. Please select the signals that are required to insert a stall cycle when data hazard is detected at ID stage.
a. Set PCWrite = 0 (disable the write to Program Counter (PC) )
b. Set IF/Dwrite = 0 (disable the write to IF/ID)
c. IF.Flush (clear IF/ID)
e. ID.Flush (set 9 control signals (used in EX, MEM, WB stages) to 0
d. EX.Flush (set 5 control signals (used in MEM, WB stages) to 0
**Ans: a. b. e.**
Flush ID (this instruction) and keep PC to next clock.
#### Q4. Consider the five-stage piplelined CPU introduced in the textbook. Please select the signals that are required when exception is detected in EX stage and need to flush the instructions currently in IF, ID and EX stages.
a. Set PCWrite = 0 (disable the write to Program Counter (PC) )
b. Set IF/Dwrite = 0 (disable the write to IF/ID)
c. IF.Flush (clear IF/ID)
e. ID.Flush (set 9 control signals (used in EX, MEM, WB stages) to 0
d. EX.Flush (set 5 control signals (used in MEM, WB stages) to 0
**Ans: c. d. e.**
Need to jump to handler and flush next all instructions.
#### Q5. Consider the following sequence of actual outcomes for a branch. T means the branch is taken. N means not taken. Assume both predictors are initialized to predict "taken". Branch: T-N-T-N-N-T-N.
(Please write in capital letters T and N, and there is no space and symbols in between. For example, TNTTTTT.)
**Ans:**
List the preditions for this branch using 1-bit predictor: TTNTNNT.
List the preditions for this branch using 2-bit predictor: TTTTTNT.
a. If the same patterns are repeated thousands of times, the accuracy rate is about 1/7 when 1-bit predictor is used.
b. If the same patterns are repeated thousands of times, the accuracy rate is about 2/7 when 2-bit predictor is used.
**Ans: a.**
| Branch | TNTNNTN | TNTNNTN | TNTNNTN | repeated thousands |
| ------ |:-------:|:-------:|:-------:|:------------------:|
| 1-bit | TTNTNNT | NTNTNNT | NTNTNNT | 1/7 |
| 2-bit | TTTTTNT | NTNTNNN | NNNNNNN | 4/7 |
#### Q6. Assume the latencies for logic blocks are listed as table below. Please ignore the latency for control decoder, control signal delay, or other unspecified blocks. Which of following statements are correct?

a. The speedup of the pipelined processor versus the single-cycle processor for executing 100 lw instructions is about 3.2 (Assume there is no any data dependency in the code).
b. For a classic MIPS CPU with a 5-stage pipeline, the minimum clock cycle time it can use is 120ps.
c. The latency of instruction lw in the pipelined processor is 400.
d. For a single-cycle MIPS CPU, the minimum clock cycle time it can use is 380ps.
**Ans: a. b.**
a. (400 $\times$ 100) / (120 $\times$ 100 + 120 $\times$ 4) = ~3.2.
b. max(IF, ID, EX, MEM, WB) = 120ps (D-Mem).
c. 120 $\times$ 5 = 600.
d. 100(I-Mem) + 50(Regs) + 80(ALU) + 50(Regs) + 120(D-Mem) = 400.
#### Q7. Consider the optimized 4-bit multiplication block diagram below. Which of following statements are correct?

a. Consider a 16-bit multiplication with multiplier 0x64A5. It requires 7 additions of multiplicand if traditional multiplication algorithm is used.
b. When performing 3×(-5) using Booth’s algorithm, the Product register contains 1110 0101(binary) (not including previous bit) after step1.
c. Consider a 16-bit multiplication with multiplier 0x64A5. It requires 6 additions and 6 subtractions of multiplicand if Booth’s algorithm is used.
d. When performing 3×(-5) using traditional multiplication algorithm, the Product register contains 0001 0101(binary) after step1.
e. Consider a 16-bit multiplication with multiplier 0xE4A5. It requires 5 additions and 6 subtractions of multiplicand if Booth’s algorithm is used
**Ans: a. c. e.**
a. 0x64A5 = 0110 0100 1010 0101 (binary) which with seven 1's
b. the Product register should contain 1110 1101(binary).
c. 0x64A5 = 0110 0100 1010 0101 (binary) which with six 1's list.
d. the Product register should contain 0001 1101(binary).
e. 0xE4A5 = 1110 0100 1010 0101 (binary) which with siz 1's list and end with 1's.
#### Q8. For the static 2-issue pipelined processor given in the textbook and the code sequence below.

a. If we unroll 2 more copies of the code (i.e., 3 copies in total), the best IPC for the scheduled code is 11/6 (if we ignore the remaining 4 stages for the last instruction.)
b. If we unroll 2 more copies of the code (i.e., 3 copies in total), it will take 6 cycles to schedule the code on the 2-issue CPU (using minimum number cycles).
c. Assume the loop sequence will be executed for 9 iterations. Compared to the scheduled code without unrolling, the speedup is 2 by running the scheduled code of unrolling 3 copies.
d. Without code unrolling, it will take 4 cycles to schedule the code sequence on the 2-issue CPU (using minimum number cycles).
**Ans: a. b. c. d.**
c. (4 $\times$ 9) / (6 $\times$ 3) = 2
| clock | ALU or branch | lw & sw |
|:-----:|:----------------------:|:----------------:|
| 1 | addi \$s1, \$s1, -4 | lw \$t0, 0(\$s1) |
| 2 | nop | nop |
| 3 | addu \$t0, \$t0, \$s2 | nop |
| 4 | bne \$s1, \$zero, Loop | sw \$t0, 4(\$s1) |
| clock | ALU or branch | lw & sw |
|:-----:|:----------------------:|:-----------------:|
| 1 | addi \$s1, \$s1, -8 | lw \$t0, 0(\$s1) |
| 2 | nop | lw \$t1, 4(\$s1) |
| 3 | addu \$t0, \$t0, \$s2 | lw \$t2, 8(\$s1) |
| 4 | addu \$t1, \$t1, \$s2 | sw \$t0, 12(\$s1) |
| 5 | addu \$t2, \$t2, \$s2 | sw \$t1, 8(\$s1) |
| 6 | bne \$s1, \$zero, loop | sw \$t2, 4(\$s1) |
#### Q9. Given **XXX** CPU below (assume that stall and forwarding mechanism have been implemented) and consider the following code sequence. When the code runs to the **Y**th cycle, what are the values of control signals RegWrite, ALUsrc, RegDst and MemWrite, respectively. (Please write down your answer in order. For example, if your answer is “RegWrite=0, ALUsrc=1, RegDst=1 and MemWrite=0”, then write your answer as 0110).

If given the single-cycle CPU and the code runs to the 5th cycle.
**Ans: 1100**
If given the pipelined CPU and the code runs to the 5th cycle.
**Ans: 1011**
If given the pipelined CPU and the code runs to the 9th cycle.
**Ans: 1010**
| clock | nonpipelined | 5th cycle | pipelined & forwarding | 5th cycle | 9th cycle |
|:-----:|:--------------------:|:------------------------------------------:|:----------------------:|:----------------------:|:----------------------:|
| 1 | lw \$t1, 0(\$s1) | | lw \$t1, 0(\$s1) | WB: RegWrite=1 | |
| 2 | sw \$s1, 0(\$s2) | | sw \$s1, 0(\$s2) | MEM: MemWrite=1 | |
| 3 | add \$t2, \$s2, \$s3 | | add \$t2, \$s2, \$s3 | EX: ALUsrc=0, RegDst=1 | |
| 4 | add \$t3, \$t2, \$s2 | | add \$t3, \$t2, \$s2 | ID | |
| 5 | lw \$t1, 0(\$t2) | RegWrite=1, ALUsrc=1, RegDst=0, MemWrite=0 | lw \$t1, 0(\$t2) | IF | WB: RegWrite=1 |
| 6 | add \$t1, \$t1, \$s1 | | stall | | MEM: MemWrite=0 |
| 7 | add \$t1, \$t1, \$s2 | | add \$t1, \$t1, \$s1 | | EX: ALUsrc=0, RegDst=1 |
| 8 | add \$t2, \$s2, \$t1 | | add \$t1, \$t1, \$s2 | | ID |
| 9 | | | add \$t2, \$s2, \$t1 | | IF |
#### Q10. Consider the code sequence on the right side. Assume the two branch instructions are predicted taken, but actually both of them are NOT taken. (Note: register reads/writes can happen in the same cycle). Which of following statements are correct?

a. Assume the CPU support data forwarding and the branch outcome is determined in MEM stage. At the 10th cycle, no instruction is at WB stage.
b. Assume the CPU support data forwarding and the branch outcome is determined in MEM stage. It takes 19 cycles to complete the execution of the code.
c. Assume the CPU support data forwarding and the branch outcome is determined in ID stage. It takes 15 cycles to complete the execution of the code.
d. Assume the CPU does "not" support data forwarding and the branch outcome is determined in MEM stage. It takes 22 cycles to complete the execution of the code.
e. Assume the CPU does "not" support data forwarding and the branch outcome is determined in MEM stage. At the 10th cycle, the instruction at WB stage is bne.
f. Assume the CPU support data forwarding and the branch outcome is determined in ID stage. At the 10th cycle, no instruction is at WB stage.
**Ans: a. b. d.**
c. 17 cycles.
e. stall cycle.
f. lw instruction.
| clock | nonpipelined | without forwarding | forwarding & MEM stage | forwarding & ID stage | 10th cycle |
|:-----:|:--------------------:|:--------------------:|:----------------------:|:---------------------:|:----------:|
| 1 | add \$s2, \$s0, \$s1 | add \$s2, \$s0, \$s1 | add \$s2, \$s0, \$s1 | add \$s2, \$s0, \$s1 | |
| 2 | add \$s3, \$s1, \$s0 | add \$s3, \$s1, \$s0 | add \$s3, \$s1, \$s0 | add \$s3, \$s1, \$s0 | |
| 3 | bne \$s2, \$s3, 10 | stall | bne \$s2, \$s3, 10 | stall | |
| 4 | lw \$s0, 0(\$s2) | stall | stall | bne \$s2, \$s3, 10 | |
| 5 | lw \$s1, 0(\$s2) | bne \$s2, \$s3, 10 | stall | stall | |
| 6 | bne \$s0, \$s1, 10 | stall | stall | lw \$s0, 0(\$s2) | WB |
| 7 | add \$s2, \$s0, \$t2 | stall | lw \$s0, 0(\$s2) | lw \$s1, 0(\$s2) | MEM |
| 8 | add \$s1, \$s1, \$t4 | stall | lw \$s1, 0(\$s2) | stall | EX |
| 9 | | lw \$s0, 0(\$s2) | stall | stall | ID |
| 10 | | lw \$s1, 0(\$s2) | bne \$s0, \$s1, 10 | bne \$s0, \$s1, 10 | IF |
| 11 | | stall | stall | stall | |
| 12 | | stall | stall | add \$s2, \$s0, \$t2 | |
| 13 | | bne \$s0, \$s1, 10 | stall | add \$s1, \$s1, \$t4 | |
| 14 | | stall | add \$s2, \$s0, \$t2 | | |
| 15 | | stall | add \$s1, \$s1, \$t4 | | |
| 16 | | stall | | | |
| 17 | | add \$s2, \$s0, \$t2 | | | |
| 18 | | add \$s1, \$s1, \$t4 | | | |