Lab1: RV32I Simulator === ###### tags: `RISC-V` `Computer Architecture 2022` ## Problem [**LeetCode 35 Search Insert Position**](https://leetcode.com/problems/search-insert-position/ "LeetCode 35") Given a sorted array of distinct integers and a target value, return the index if the target is found. If not, return the index where it would be if it were inserted in order. >##### **Example 1:** ``` Input: nums = [1,3,5,6], target = 5 Output: 2 ``` >##### **Example 2:** ``` Input: nums = [1,3,5,6], target = 2 Output: 1 ``` >##### **Example 3:** ``` Input: nums = [1,3,5,6], target = 7 Output: 4 ``` ## Implementation You can find the source code [here](https://github.com/qwer951212/ComputerArchitecture_HW1) ### C code ``` c= #include <stdio.h> int searchInsert(int *nums, int numsSize, int target) { int left = 0; int right = numsSize - 1; while (left <= right) { int mid = (left + right) / 2; if (target == nums[mid]) return mid; else if (target < nums[mid]) right = mid - 1; else left = mid + 1; } return left; } int main() { int data[] = {1, 3, 5, 6}; int size = 4; int tar1 = 5, tar2 = 2, tar3 = 7; int index1 = searchInsert(data, size, tar1); int index2 = searchInsert(data, size, tar2); int index3 = searchInsert(data, size, tar3); printf("The target1 insert position is %d\n", index1); printf("The target2 insert position is %d\n", index2); printf("The target3 insert position is %d\n", index3); return 0; } ``` ### Assembly code ```asm= .data data: .word 1, 3, 5, 6 # data[] = {1, 3, 5, 6} size: .word 4 # array size = 4 tar1: .word 5 # target1 = 5 tar2: .word 2 # target2 = 2 tar3: .word 7 # target3 = 7 str: .string "The insert position is " n1: .string "\n" .text main: la s1, data # s1 = nums[] lw s2, size # s2 = size addi s5, x0, 0 # s5 = 0 //left = 0 addi s6, s2, -1 # s6 = size - 1 //right = size - 1 lw s3, tar1 # s3 = 5 //tar1 jal ra, searchInsert jal ra, print lw s3, tar2 # s3 = 2 //tar2 jal ra, searchInsert jal ra, print lw s3, tar3 # s3 = 7 //tar3 jal ra, searchInsert jal ra, print li a7, 10 # System call: Exit ecall searchInsert: addi t0, s5, 0 addi t1, s6, 0 loop: slt t6, t1, t0 # if left > right, t6 = 1 bne t6, zero, return add t2, t0, t1 # mid = left + right srli t2, t2, 1 # mid = mid / 2 slli t3, t2, 2 # offset = mid * 4 add t4, s1, t3 # address = base address + offset lw t5, 0(t4) # t5 = data[mid] beq s3, t5, equal # if target == data[mid], go to equal blt s3, t5, less # if target < data[mid], go to less addi t0, t2, 1 # if target > data[mid], left = mid + 1 j loop # jump to loop equal: addi s4, t2, 0 # s4 = mid ret # return to main less: addi t1, t2, -1 # if target < data[mid], right = mid - 1 j loop # jump to loop return: addi s4, t0, 0 # s4 = left ret # return to main print: la a0, str # load string li a7, 4 # System call: PrintString ecall addi a0, s4, 0 # load result li a7, 1 # System call: PrintInt ecall la a0, n1 li a7, 4 ecall ret # return to main ``` :::warning ![](https://i.imgur.com/WkXg15O.png =45%x) ![](https://i.imgur.com/yZzpISw.png =45%x) We can click help to check the system call supported by ripes. ::: ## Pipeline feature in [Ripes](https://github.com/mortbopet/Ripes) ### Disassembled By Ripes ``` 00000000 <main>: 0: 10000497 auipc x9 0x10000 4: 00048493 addi x9 x9 0 8: 10000917 auipc x18 0x10000 c: 00892903 lw x18 8 x18 10: 10000997 auipc x19 0x10000 14: 0049a983 lw x19 4 x19 18: 00000293 addi x5 x0 0 1c: fff90313 addi x6 x18 -1 20: 040000ef jal x1 64 <loop> 24: 080000ef jal x1 128 <print> 28: 00000293 addi x5 x0 0 2c: fff90313 addi x6 x18 -1 30: 10000997 auipc x19 0x10000 34: fe89a983 lw x19 -24 x19 38: 028000ef jal x1 40 <loop> 3c: 068000ef jal x1 104 <print> 40: 00000293 addi x5 x0 0 44: fff90313 addi x6 x18 -1 48: 10000997 auipc x19 0x10000 4c: fd49a983 lw x19 -44 x19 50: 010000ef jal x1 16 <loop> 54: 050000ef jal x1 80 <print> 58: 00a00893 addi x17 x0 10 5c: 00000073 ecall 00000060 <loop>: 60: 00532fb3 slt x31 x6 x5 64: 020f9c63 bne x31 x0 56 <return> 68: 006283b3 add x7 x5 x6 6c: 0013d393 srli x7 x7 1 70: 00239e13 slli x28 x7 2 74: 01c48eb3 add x29 x9 x28 78: 000eaf03 lw x30 0 x29 7c: 01e98863 beq x19 x30 16 <equal> 80: 01e9ca63 blt x19 x30 20 <less> 84: 00138293 addi x5 x7 1 88: fd9ff06f jal x0 -40 <loop> 0000008c <equal>: 8c: 00038a13 addi x20 x7 0 90: 00008067 jalr x0 x1 0 00000094 <less>: 94: fff38313 addi x6 x7 -1 98: fc9ff06f jal x0 -56 <loop> 0000009c <return>: 9c: 00028a13 addi x20 x5 0 a0: 00008067 jalr x0 x1 0 000000a4 <print>: a4: 10000517 auipc x10 0x10000 a8: f7c50513 addi x10 x10 -132 ac: 00400893 addi x17 x0 4 b0: 00000073 ecall b4: 000a0513 addi x10 x20 0 b8: 00100893 addi x17 x0 1 bc: 00000073 ecall c0: 10000517 auipc x10 0x10000 c4: f7850513 addi x10 x10 -136 c8: 00400893 addi x17 x0 4 cc: 00000073 ecall d0: 00008067 jalr x0 x1 0 ``` #### Results in Ripes ![](https://i.imgur.com/TUlsXcL.png =30%x) ### 5-stage pipelined processor We can select different processors on ripes to simulate. In this work, I choose 5-stage pipelined processor with hazard detection and forwarding. The block diagram is as follows. ![](https://i.imgur.com/xCxU3nq.jpg) The five parts include: IF, ID, EX, MEM and WB. Take the **first instruction `auipc x9 0x10000`** for example: #### IF(Instruction Fetch) During the Instruction Fetch stage, a instruction is fetched from the instruction memory. At the same time, a calculation is done to determine the next PC. ![](https://i.imgur.com/auKTnEA.jpg =35%x) - Started at `0x00000000` and with no control hazard, the next instruction should be `0x00000004`. - The instruction fetched from the instruction memory is `0x10000497`. #### ID(Instruction decode & Register read) During the decode stage, the instruction's opcode and operands are decoded to determine what function to perform. Different types of intructions will need different bits field. The below is provided for reference. * ![](https://i.imgur.com/0deKcr2.png =70%x) * ![](https://i.imgur.com/8FceabF.png =70%x) ![](https://i.imgur.com/6IiicIE.jpg =35%x) - The instruction `0x10000497` is decoded to `auipc x9 0x10000`. - The opcode is decoded to `0x0010111`. And the instruction type is belong to U-type. - The source operands from register are both `0x00000000`. - The immediate will be extended to `0x10000000` by immediate generator. #### EX(Execute) During the execute stage, CPU will perform the operation specified by the instruction. ![](https://i.imgur.com/ST67HX3.jpg =35%x) - Multiplexer forwards the selected input to be ALU input. In this case, ALU input1 source is pc value `0x00000000`and input2 source is immediate `0x10000000`. - This is not a branch instruction, so branch is not taken. - The result of ALU is `0x10000000`, means the address data need to write back to Rd register. #### MEM(Memory Access) If there is any data that needs to be accessed, it is done in this stage. ![](https://i.imgur.com/V717ozn.jpg =30%x) - In this case, we don't need to accessed data memory, so the control signals `Wr_en` is set to false. - The result of ALU is pass through to the next stage. #### WB(Write Back) If there is any data that needs to store in the destination register, it is done in this stage. ![](https://i.imgur.com/ccXoPz7.jpg =20%x) - In this case, we need to write back the ALU result to Rd `x9`, so the multiplexer will choose the ALU output as write back data. - At the same time, register is set to write-enable, so it is able write back the data to Rd. ### **Hazards** #### **Data Hazards** Data hazards occurs when an instruction attempt to use data before the data is available in the register file. We use two solutions to avoid the hazards. 1. **Forwarding** Consider the following instructions: ``` 68: 006283b3 add x7 x5 x6 6c: 0013d393 srli x7 x7 1 ``` ![](https://i.imgur.com/kphzypL.png =50%x) ![](https://i.imgur.com/SJb3oCD.png) - When `add x7 x5 x6` in EX stage, ALU calculates the new value for `x7`. In the same cycle, `srli x7 x7 1` in ID stage get the data from `x7`. However, the `add` instruction haven't write back the new valu to `x7`. So, for `srli` instruction, the value get from register file is incorrect. - Within Forwarding, we pass the correct data calculated by `add` instruction back to EX stage. Therefore, the result of `srli x7 x7 1` will be the correct value. 2. **Stall the pipeline**: Insert NOP intruction Consider the following instructions: ``` 78: 000eaf03 lw x30 0 x29 7c: 01e98863 beq x19 x30 16 <equal> ``` ![](https://i.imgur.com/MfBGcpF.png) ![](https://i.imgur.com/sDYKfQH.png) ![](https://i.imgur.com/klhViyu.png) - `lw` instruction will get the value of `x30` in MEM stage and write back at next cycle. By this time, `beq` instruction is already through ALU with wrong value of `x30`. - To solve this data hazard, we need to stall the `beq` instruction by one cycle. The load-use data hazard is detected in ID stage, and if it is happened, the IF and ID stages are stalled. Then after one cycle, we can use forwarding solution to pass the correct value of `x30` in WB stage to `beq` instruction in EX stage. - ![](https://i.imgur.com/Suwez40.png) #### **Control Hazards** Control hazards happened when there is whether conditional or unconditional braching. Consider the following instructions: ``` 7c: 01e98863 beq x19 x30 16 <equal> 80: 01e9ca63 blt x19 x30 20 <less> . . . 0000008c <equal>: 8c: 00038a13 addi x20 x7 0 90: 00008067 jalr x0 x1 0 ``` ![](https://i.imgur.com/TKqwMba.png) - `beq` instruction will decide whether to branch or not in EX stage. If `x19` and `x30` are equal, it will go to `<equal>` label. In this case, the value in `x19` and `x30` are equal, so the branch is taken. Because we have to wait for the execution cycle to complete before knowing the target, we must throw away two cycles of work on instructions in the path not taken. And the PC will take the result of a branch / jump calculation as the next PC. ## Reference [Classic RISC pipeline](https://en.wikipedia.org/wiki/Classic_RISC_pipeline)