# CO Final ## (15pt) 1. Assume there is no forwarding in a 5-stage pipline processor 1 (1) (5pt) Please indicate all the potential hazards in the following instruction sequence ? ``` xori x1, x1, 2 addi x6, x5, -4 lw x10, 8(x6) add x5, x6, x6 sub, x2, x2, x1 ``` ### Ans: * addi && lw (2.5pt) * lw && add (2.5pt) (2) (5pt)What is the total execution cycles of the above instruction sequence running on processor 1 ? ### Ans: 11cycles (3) (5pt)Assume there is a 5-stage pipline processor 2 with two forwarding: * forwarding path from the output of the EX stage to the input of the next EX stage. * forwarding path from the output of the MEM stage to the input of the next EX stage. What is the total execution cycles of the above instruction sequence running on processor 2 ? ### Ans: 9 cycles ## (30pt) 2. There is a direct-mapped cache with a block size of 8 words (i.e. 8 words per line). The cache can store a total of 64 words. Assume that addresses and data words are 32 bits wide. (1) (3pt) What is the bit width for byte offset? ### Ans: 8 * 4 = 32 = 2^5, 5 (2) (3pt) What is the bit width of index? ### Ans: 64 / 8 = 8 = 2^3, 3 (3) (3pt) What is the bit width for tag? ### Ans: 32 - 5 - 3 = 24 (4) (3pt) What is the cache entry size (bits)? ### Ans: 1 (valid bit) + 24 (tag) + 32*8 (data) = 281 (5) (18pt) Starting from power on, the following byte-addressed cache references are recorded. 0, 128, 4, 32, 2180, 140, 180, 232, 2020, 132 5.1 (6pt) How many blocks are replaced? | address | index | Hit | Replace| | -------- | -------- | -------- |-------- | | 0 | 000 | Miss|| | 128 | 100 | Miss|| | 4 | 000 | Hit || | 32 | 001 | Miss| | | 2180 | 100 | Miss|R| | 140 | 100 | Miss|R| | 180 | 101 | Miss|| | 232 | 111 | Miss|| | 2020 | 111 | Miss|R| | 132 | 100 | Hit|| ### Ans: 4 5.2 (6pt) What is the cache miss rate? ### Ans: 9/10 = 0.9 5.3 (6pt) Assume there is a CPU with 1ns clock cycle, cache hit time = 1 cycle, cache miss penalty = 20 cycles, and its cache miss rate is equal to the answer of question 5.2. What is the Average Memory Access Time(AMAT) of this CPU? ### Ans: 1 + 0.9 * 20 = 19 ## (10pt) 3. What are the values(0, 1 or X(don't care)) of control signals generated by the control for instruction lw and beq? ![](https://i.imgur.com/nOIylbS.png) (1) lw ![](https://i.imgur.com/ak0O8qo.png) (2) beq ![](https://i.imgur.com/ak0O8qo.png) ### Ans: 各五分 ![](https://i.imgur.com/wIFIdQ9.png) ## (15pt) 4. Given the pipelined CPU in the figure below [原圖](https://i.imgur.com/r6UqtuN.png) [題目要出的圖](https://i.imgur.com/rJEBzb9.png) (1) (5pt) Which type of forwarding line is missing? (a) EX/MEM to EX (b) MEM/WB to EX ### Ans: B (2) (10pt) Which code sequence(s) will still run correctly? (a) add $s1, $t0, $t1 add $s3, $s2, $s1 add $s2, $s3, $s1 (b) add $s3, $t0, $t1 add $s1, $s2, $s1 add $s2, $s3, $s1 (c) add $s1, $t0, $t1 add $s1, $s2, $s1 add $s2, $s3, $s1 (d) add $s1, $t0, $t1 add $s2, $s2, $s1 add $s2, $s3, $s1 ### Ans: BC ## (10pt)5. Consider the following sequence of actual outcomes for a branch. T means the branch is taken. N means the branch is not taken. Assume both predictors are initialized to predict taken Branch: T-T-N-T-N-N-T-N (1) If the one-bit predictor is used, what is the predictions will be? (2) If the two-bit predictor is used, what is the predictions will be? ### Ans * T-T-T-N-T-N-N-T * T-T-T-T-T-T-N-T ## (15pt) 6. Given that memory space is 32-bit byte addressable, main memory size is $2^x$ bytes, page frame size is $2^y$ bytes: (1) (5pt) The memory space available to a program is __ bytes. ### Ans: $2^{32}$ (2) (5pt) The number of virtual pages is __. ### Ans: $2^{32-y}$ (3) (5pt) The page size is __. ### Ans: $2^y$ ## (16pt) 7. The function strdouble receives a null-terminated string as input, and writes a string where every letter is repeated two times. For example, if we called strdouble on the input string "Hello World!", we would get the output string "HHeelllloo WWoorrlldd!!!". More specifically: strdouble has the following inputs: a0 stores a null-terminated string. It is guaranteed that the string is well-formatted. Let the string have strlen of n. a1 contains a pointer to a buffer of length at least 2 * n + 1 characters, potentially containing garbage data. You may assume that the buffer does not overlap with the string stored in a0. strdouble outputs the following: a0 should return the pointer to the buffer we initially received in a1. The buffer should contain the string originally provided in a0, with every character doubled. Please complete the unfinished assembly to make the above work as expected. For full credit, you must follow proper calling convention. ``` strdouble: mv t2, a1 loop: lbu t0 C01`0`(C02`a0`) beq t0, x0, end slli C03`t1` C04`t0` C05`8` add t1, t1, t0 sh t1, 0(C06`a1`) addi a0, a0, 1 addi a1 C07`a1` C08`2` j loop end: sb x0, 0(a1) addi a0, t2, 0 jr ra ``` * 計分方式: 8格, 1格 2分 ## (10pt) 8. Which of the following RISC-V instructions is/are **not** R-type? (a) sll (b) jr (c) addi (d) bne (e) sub (f) lw (g) beq ### Ans: cdfg * 計分方式: 錯一個扣2分,扣到0分為止 # 小考 3 ## 題組 1 Assume the following operation times for each major functional unit of a RISC-V processor that supports Load word (lw), Store word (sw), R-format (add,sub,and,or), and Branch (beq): - 300 ps for memory access for instructions or data - 250 ps for ALU operation - 150 ps for register file read or write Assume that the multiplexors, control unit, PC accesses, and sign extension unit have no delay (a) In the single-cycle model, how much time does each instruction take? briefly explain why. (b) In the 5-stage pipeline model (i.e. IF/ID/EXE/MEM/WB), what is the execution time of the following code sequence? | Code sequence | | -------- | | lw x1, 100(x4)| | lw x2, 200(x4)| | lw x3, 400(x4)| ## Ans: (a) 1150 ps, because the slowest instruction is Load word (lw), and Load word (lw) needs 300 + 150 + 250 +300 150 = 1150 ps | Instruction class | IF | ID | EXE | MEM | WB | Total time | | -------- | -------- | -------- | -------- | -------- | -------- | -------- | | Lw | 300 ps | 150 ps | 250 ps | 300 ps | 150 ps | 1150 ps | | Sw | 300 ps | 150 ps | 250 ps | 300 ps || 1000 ps | | R-format | 300 ps | 150 ps | 250 ps || 150 ps | 850 ps | | Branch | 300 ps | 150 ps | 250 ps | | | 700 ps | (b) 300 ps * 7 = 2100 ps ## 題目 2 Select the corresponding statement for the three hazards: (a) Arising from the need to make a decision based on the results of one instruction while others are executing (b) The pipeline must be stalled because one step must wait for another to complete (c) The hardware cannot support the combination of instructions that we want to execute in the same clock cycle Structural Hazard: ____ Data Hazard: ______ Control Hazard: ______ ## Ans: Structural Hazard: C Data Hazard: B Control Hazard: A