--- tags: computer-arch --- # Lab3: `srv32` - RISCV `RV32IM` Soft CPU ## [srv32](https://github.com/sysprog21/srv32)^MIT^: Simple 3-stage pipeline RISC-V processor A simple RISC-V 3-stage pipeline processor featuring: * Three-stage pipeline processor * RV32IM instruction sets * Pass RV32IM compliance test * Trap exception * Interrupt handler * [FreeRTOS](https://www.freertos.org/) support * ISS simulator ## Run RTL sim - The simulator generated by Verilator is called `sim`. This is a RTL-level generator that is capable of simulating the execution of RISC-V binary at RTL level. - RTL (Register-Transistor-Level) simulation is done on either [Verilator](https://www.veripool.org/wiki/verilator) (default) or Icarus Verilog. - RTL simulator is located in `sim/` directory. - **Result** `make all` command build the core and run RTL sim, all simulation passed. ## Run ISS sim - This repo also comes with a software RISC-V simulator that is capable of simulating the execution of RISC-V binary in software level. - The source code of ISS simulator is located in `tools` directory. - **Result** Various of benchmarks/hello world program/riscv-compliance tests are run on the ISS simulator. - Benchmark - Coremark score obtained: `2.681152 CoreMark/MHz` - Dhrystone score obtained: ```c Number_Of_Runs: 100 User_Time: 31249 cycles, 26443 insn Cycles_Per_Instruction: 1.181 Dhrystones_Per_Second_Per_MHz: 3200 DMIPS_Per_MHz: 1.821 ``` - The result of RISC-V compliance tests will be covered in the following section. ## Run RISC-V compliance test (v1.0) There are two ways of running RISC-V binary. Namely, the RTL simulator called `sim` generated by Verilator. Another one is the software RISC-V simulator called `rvsim` located in `tools` directory. ### Run compliance tests on RTL simulator This repo test the compliance of hardware implementation by comparing the output results running on both simulator. To be precise, when type `make tests` in `./tests` directory, compliance tests will be run on the RTL simulator (`sim`) and the output will be compared with the reference output specified by `riscv-compliance` AND the output of software simulator (`rvsim`). The memory dump of RTL simulator `dump.txt` will be renamed to `*.signature.output` and will be automatically compared to the reference output provided by `riscv-compliance` repo. The output of RTL simulator (`sim`) is stored in `trace.log` file while the output of software simulator (`rvsim`) is stored in `trace_sw.log` file. These two files contains detailed information of each instruction such as: value write to a certain register, value write to a certain memory address etc. These two files will be compared through a `diff --brief` command. In summary, the memory dump files from RTL simulator will be compared with the reference output. Then, the output between RTL simulator and software simulator will be compared. Notice if the first comparison fails, the error will be signaled by `riscv-compliance` ; however, a failure on second comparison will results in a failed `make` command (The second failure is raised by `diff --brief` command). ### Run compliance tests on SW simulator When one types `make tests-sw` in `./tests` directory, compliance tests will be run on the software simulator. The output results compared with itself AND the the reference output provided by `riscv-compliance` repo. - **Result** - `make tests-sw` ```c OK: 48/48 RISCV_TARGET=srv32 RISCV_DEVICE=rv32i RISCV_ISA=rv32i OK: 8/8 RISCV_TARGET=srv32 RISCV_DEVICE=rv32im RISCV_ISA=rv32im OK: 6/6 RISCV_TARGET=srv32 RISCV_DEVICE=rv32Zicsr RISCV_ISA=rv32Zicsr ``` - `make tests` ```c OK: 48/48 RISCV_TARGET=srv32 RISCV_DEVICE=rv32i RISCV_ISA=rv32i OK: 8/8 RISCV_TARGET=srv32 RISCV_DEVICE=rv32im RISCV_ISA=rv32im OK: 6/6 RISCV_TARGET=srv32 RISCV_DEVICE=rv32Zicsr RISCV_ISA=rv32Zicsr ``` --- ## Analyze `srv32` RV32 core ### Memory modeling As the time of writing, the memory of `srv32` is divided into instruction memory (I-MEM) and data memory (D-MEM). Both I-MEM and D-MEM are modelled using `mem2ports` verilog module as follow: ```c=1 module mem2ports # ( parameter SIZE = 4096, parameter FILE = "memory.bin" ) ( input clk, input resetb, input rready, input wready, output reg rresp, output reg [31: 0] rdata, input [31: 2] raddr, input [31: 2] waddr, input [31: 0] wdata, input [ 3: 0] wstrb ); ``` Notice the signal `raddr` and `waddr` are both 30 bits long. The omission of lower 2 bits shows read or write to memory are **word-aligned** or **4-byte aligned**. ### Pipeline architecture `srv32` is a 3-stage pipeline architecture with IF/ID, EX, WB stages. The follwing diagram marks some important signals for later discussion. ![](https://i.imgur.com/9lbFKBM.jpg) ### Forwarding #### Data hazard `srv32` supports full forwarding, which means RAW data hazard can be resolved WITHOUT stalling the processor. Notice only RAW data hazard is possible, other hazard (WAW, WAR) isn't possible on single issue processor. The implementation of register forwarding is as follow: ```c=1 // register reading @ execution stage and register forwarding // When the execution result accesses the same register, // the execution result is directly forwarded from the previous // instruction (at write back stage) assign reg_rdata1[31: 0] = (ex_src1_sel == 5'h0) ? 32'h0 : (!wb_flush && wb_alu2reg && (wb_dst_sel == ex_src1_sel)) ? // register forwarding (wb_mem2reg ? wb_rdata : wb_result) : regs[ex_src1_sel]; assign reg_rdata2[31: 0] = (ex_src2_sel == 5'h0) ? 32'h0 : (!wb_flush && wb_alu2reg && (wb_dst_sel == ex_src2_sel)) ? // register forwarding (wb_mem2reg ? wb_rdata : wb_result) : regs[ex_src2_sel]; ``` Consider the following instruction sequence: | IF/ID | EX | WB | | ---------------- | ---------------- | ----------------- | | `add x4, x5, x6` | `and x3, x2, x4` | `addi x2, x2, -3` | Instruction `and x3, x2, x4` at EX stage and instruction `addi x2, x2, -3` at WB stage have RAW data hazard on register `x2`. The latest result of `x2` (from `addi x2, x2, -3`) is stored in signal `wb_result` at WB stage. Since `(wb_dst_sel == ex_src1_sel)` is true and `wb_mem2reg` is false. `wb_result` is forward to `x2` register in EX stage (`and x3, x2, x4`). The value of `x2` in EX stage is stored in `reg_rdata1`. The timing diagram of the above instruction sequence is as follow: | Instruction |cycle 1| c2 | c3 | c4 | c5 | | ----------------- | ----- | --- | --- | --- | --- | | `addi x2, x2, -3` | IF/ID | EX |**WB**⬂| | | | `and x3, x2, x4` | |IF/ID|**EX**⬃| WB | | | `add x4, x5, x6` | | |IF/ID| EX | WB | #### Load-use hazard Load-use hazard is NOT an issue in `srv32` core because D-MEM is read at WB stage, and register file is also read at WB stage. A single MUX is used to switch between 2 operands (operand from register file and operand from D-MEM). Load-use hazard can be resolved WITHOUT stalling the processor. ![](https://i.imgur.com/naYkEhn.png) Consider the following instruction sequence: | IF/ID | EX | WB | | ---------------- | ---------------- | ----------------- | | `add x4, x5, x6` | `and x3, x2, x4` | `lw x2 0(x5)` | Instruction `and x3, x2, x4` at EX stage and instruction `lw x2 0(x5)` at WB stage have load-use data hazard on register `x2`. The result of `x2` is read from D-MEM in WB stage and stored in signal `wb_rdata`. Since `(wb_dst_sel == ex_src1_sel)` is true and `wb_mem2reg` is true. `wb_rdata` is forward to `x2` register in EX stage. The value of `x2` in EX stage is `reg_rdata1`. The verilog code is shown again for your reference: ```c=1 assign reg_rdata1[31: 0] = (ex_src1_sel == 5'h0) ? 32'h0 : (!wb_flush && wb_alu2reg && (wb_dst_sel == ex_src1_sel)) ? // register forwarding (wb_mem2reg ? wb_rdata : wb_result) : regs[ex_src1_sel]; assign reg_rdata2[31: 0] = (ex_src2_sel == 5'h0) ? 32'h0 : (!wb_flush && wb_alu2reg && (wb_dst_sel == ex_src2_sel)) ? // register forwarding (wb_mem2reg ? wb_rdata : wb_result) : regs[ex_src2_sel]; ``` The timing diagram of the above instruction sequence is as follow: | Instruction |cycle 1| c2 | c3 | c4 | c5 | | ----------------- | ----- | --- | --- | --- | --- | | `lw x2 0(x5)` | IF/ID | EX | WB | | | | `and x3, x2, x4` | |IF/ID| EX | WB | | | `add x4, x5, x6` | | |IF/ID| EX | WB | ### Branch penalty Branch penalty is the number of instructions killed after a branch instruction if a branch is TAKEN. Branch result is resolved at the end EX stage by ALU so the instruction fetch in IF/ID might need to be killed if a branch is taken. In this processor; however, the address of next instruction (next PC) should be fed into I-MEM a cycle ahead. Thus, the branch penalty for `srv32` is 2. To clarify, by the time next PC is resolved, one instruction has been fetch into pipeline and another PC has been calculated because address should be computed one cycle ahead. The number of instructions that should be killed (a.k.a. set to NOP) is 2 instruction after a branch instruction if the branch is actually taken. Consider the following instruction sequence: | | | IF/ID | EX | WB | | ------- | ------------------ | ------------------ | ------------------ | ----- | | next_pc |fetch_pc (imem_addr)|`if_pc` | `ex_pc` |`wb_pc`| | xxx |`add x4, x5, x6` |`and x3, x2, x4` |`beq x5, x6 (taken)`| | (Notice an additional column is inserted above the instruction. These are the PC variables in pipeline) Branch instruction `beq x5, x6 (taken)` is resolved by the END of EX stage. By the time branch instruction is resolved, two consequtive instructions, namely `add x4, x5, x6` and `and x3, x2, x4` will be fetched from I-MEM. These two instructions should be killed if branch is taken. The timing diagram of the above instruction sequence is as follow: | Instruction | c1 | c2 | c3 | c4 | c5 | c6 | | --------------------- | --- | --- | --- | --- | --- | --- | |`beq x5, x6 (taken)` |IF/ID| EX | WB | | | | | `and x3, x2, x4` | | NOP | NOP | NOP | | | | `add x4, x5, x6` | | | NOP | NOP | NOP | | | `exec if branch taken`| | | |IF/ID| EX | WB | --- ## RV32C compress extension #### Introduction to RV32C - Motivation: Typically, 50%–60% of the RISC-V instructions in a program can be replaced with RV32C instructions, resulting in a **25%–30% code-size reduction** (from RISC-V spec v2.2) - To be more specific, pointed out in [this slide](https://cdn2.hubspot.net/hubfs/3020607/An%20Introduction%20to%20the%20RISC-V%20Architecture.pdf) from SiFive. RV32C shrinks the code size for more than 50% compared to RV32I in SPECINT 2006. - ![](https://i.imgur.com/BSZ1Nvr.png) - | 32-bit arch | Code size | 64-bit arch | Code size | | ----------- | --------- | ----------- | --------- | | RV32C | 100% | RV64C | 100% | | RV32I | 140% | RV64I | 141% | | x86 | 126% | x86-64 | 131% | | armv7-A | 136% | armv8 | 129% | | Thumb-2 | 101% | MIPS64 | 169% | | MIPS32 | 173% | | MIPS16e | 126% | - RV32C uses a simple compression scheme that offers **shorter 16-bit versions** of common 32-bit RISC-V instructions when: - the **immediate or address offset is small**, or - one of the registers is the **zero register (x0)**, the **ABI link register (x1)**, or the **ABI stack pointer (x2)**, or - the **destination register and the first source register are identical**, or - the registers used are the **8 most popular ones.** - Misalignment - The C extension allows 16-bit instructions to be freely intermixed with 32-bit instructions, with the latter now able to **start on any 16-bit boundary**. **With the addition of the C extension, JAL and JALR instructions will no longer raise an instruction misaligned exception.** (from RISC-V spec v2.2) - The encoding format for RV32C (FYI) - ![](https://i.imgur.com/9XkfDSJ.png) - ![](https://i.imgur.com/22cvzM9.png) #### RV32C decoder We find a RV32C decoder module in another RISC-V processor [ibex](https://github.com/lowRISC/ibex) and tried to embed this module into `srv32`. This module decodes a 32-bit instruction, and check if this is a compressed instruction. If yes, then the module restored it back to 32-bit equivalent instruction. If no, then this 32-bit instruction passes through. RV32C instruction is designed such that checking whether an instruction is RV32C is actually really simple. Given a 32-bit instruction read from I-MEM, if `inst[1:0]` is NOT `2'b11` then this instruction is a RV32C instruction. We add RV32C decoder in this [commit](https://github.com/WeiCheng14159/srv32/commit/7932230ff006e7ea8eebdd298fdd96d7bc68cdcb). Incorporate RV32C decoder into `srv32` in this [commit](https://github.com/WeiCheng14159/srv32/commit/fc6be37ee6671772504f7374cab7861d1a9825ac). #### Modification made After a RV32C instruction is decoded, the next PC should be PC + 2 instead of PC + 4. The current memory configuration doesn't support half-word aligned memory access so some of our work involves changing the memory model. Further complicates this issue, CPU doesn't know whether an instruction is/isn't RV32C before decoding it. Which means, an additional instruction must be fetched into pipeline and then killed if the previous instruction is a RV32C instruction. - **Change of I-MEM model** - To support half-word aligned memory access, 31 out of 32 bits address must be connect to memory. - However, the memory is organized in word-aligned manner. So a 4-byte read will cross the word boundary if it is half-word-aligned. - ```c=1 case(raddr[1]) 1'b1: begin // half-word-aligned rdata[8*0+7:8*0] <= ram[radr ][8*2+7:8*2]; rdata[8*1+7:8*1] <= ram[radr ][8*3+7:8*3]; rdata[8*2+7:8*2] <= ram[radr+1][8*0+7:8*0]; rdata[8*3+7:8*3] <= ram[radr+1][8*1+7:8*1]; end default: rdata <= ram[radr]; // word-aligned endcase; ``` - Changes to memory model can be found in this [commit](https://github.com/WeiCheng14159/srv32/commit/611a09303d61c6d5430d42af3d40a242c4a626a8). - Notice that write to half-word-aligned memory is NOT allow. Only memory read has been changed. - **Modify load instruction to adopt changes in memory model** - Change the logic of load instruction (LB,LH,LW,LBU,LHU) such that half-word-aligned is supported. - Changes in this [commit](https://github.com/WeiCheng14159/srv32/commit/53134aa8f8f2826449909ef2ff2751526b40d4fa). - **Change testbench** - The 1st bit of I-MEM should be connected in testbench module. - Changes in this [commit](https://github.com/WeiCheng14159/srv32/commit/9dde6bd791df608ddf730ff65ce3d31bf947728d). - **RV32C instrctions cannot be executed back-to-back** - Mentioned earlier, CPU has to fetch and decode a word to figure out whether a word is RV32I or RV32C instruction. - Once a RV32C instruction is fetched, it is decoded into a equivalent RV32I instruction in the `compressed_decoder` unit in IF/ID stage. - Which means the upper 16 bits (possibly a RV32C instruction or is the lower of RV32I instruction) will be overwritten. - Take the following program from the actual compliance tests as an example: - | Address | Value | Instruction | | -------- | -------- | ------------- | | 0x35 | ... | ... | | 0x36 | 0x01 | `c.li tp,0` | | 0x37 | 0x42 | | | 0x38 | 0x81 | `c.li gp,0` | | 0x39 | 0x41 | | | 0x3A | 0x92 | `c.li gp,tp` | | 0x3B | 0x91 | | | 0x3C | 0x0E |`c.swsp gp,0(sp)`| | 0x3D | 0xC0 | | | 0x3E | 0x81 | `c.li s1,0` | | 0x3F | 0x44 | | | 0x40 | ... | ... | - `srv32` fetch 1 word from address `0x36` and got `0x41814201` (Be aware of little-endian byte ordering). - Decode in IF/ID stage, the RV32C instruction `c.li tp,0` (`0x4201`) is expanded to RV32I instruction `addi tp, x0, 0` (`0x00000213`). - Notice the upper half of instruction `0x4181` is overwritten. - These 2 bytes might belongs to the lower half of a RV32I instruction. Or a RV32C instruction. - In our case, `0x4181` is a RV32C instruction `c.li gp,0`. - By the end of IF/ID stage, we find an instruction is RV32C. - The next instruction should start at `0x38`. - However, it is too late to update the next PC, and the next instruction fetched is `0xC00E9192` at `0x3A`. - Which is wrong ! And should be killed (set to NOP). - Thus, two RV32C instructions cannot be executed consecutively (back-to-back). Two RV32C should be separately with 1 NOP. - This idea is similar to the idea of branch penalty; however, ==**the IPC is HALF given ALL instructions are RV32C**==. - This is definitely NOT optimal, but an engineering trade-off. - One possible solution to this problem is adding a pre-decode stage to predecode instruction (?) but that will significantly change the over all structure. - Changes in this [commit](https://github.com/WeiCheng14159/srv32/commit/2d2b420a632c5c99f4b395aff4f74a00533cef8c) - **Waveform of RV32C instructions executed on `srv32`** - Continue with the previous section, a real example is given to demonstrate how RV32C instructions can be executed on `srv32` AFTER the modification. - This is the code sequence mentioned in the previous example: - | Address | Value | Instruction | | -------- | -------- | ----------- | | 0x36 | 0x4201 | c.li tp,0 | | 0x38 | 0x4181 | c.li gp,0 | | 0x3A | 0x9192 | c.li gp,tp | | 0x3C | 0xC00E | c.swsp gp,0(sp)| | 0x3E | 0x4481 | c.li s1,0 | | 0x40 | 0x4405 | c.li s0, 1 | | 0x42 | 0x9426 | c.add s0, s1| | 0x44 | 0xC222 | c.swsp s0,4(sp)| | 0x46 | 0x4601 | c.li a2, 0 | - Will result in the following wave form - ![](https://i.imgur.com/7r840KU.png) - `if_pc` is address of `inst` in IF/ID stage. `ex_pc` is the address of `ex_insn` in EX stage - Notice the `inst` is the EXPANDED instruction. So the value is different from what in the ELF file (i.e. `c.addi` -> `addi`) - Value `0x00000013` represents a NOP instruction (`addi x0, x0, 0`) - The following marks the instructions in EX stage, notice the NOP instruction inserted between RV32C instructions - ![](https://i.imgur.com/7pjGKy2.png) - **Modify instruction misaligned exception** - Originally, misaligned exception is triggered if instruction jump/branch to half-word-aligned address. - This is allowed in RV32C as mentioned in the previous section; however, a byte-aligned access will still trigger an exception. - Changes in this [commit](https://github.com/WeiCheng14159/srv32/commit/830a3ca0f0741326ae56c298716c9ce159dc01b8). - **Trap target could be half-word aligned** - Similar to misaligned exception, trap target address is allowed to be half-word-aligned. - Changes in this [commit](https://github.com/WeiCheng14159/srv32/commit/6326559d93bfcdb9537535c13d04b5d3a0c223bd). - **RV32C instruction should have higher priority than RV32I branch** - `fetch_pc` is the address of next instruction. This diagram is used for discussion: ![](https://i.imgur.com/Hz9FNKZ.jpg) - `fetch_pc` is set by a branch in either WB or WB_next stage OR a RV32C instruction in IF/ID stage OR something else. Orignally, branches in WB or WB_next stage has a higher priority. Originally, the logic for `fetch_pc` is as follow: - ```c=1 fetch_pc <= (ex_flush) ? (fetch_pc + 4) : (ex_trap) ? (ex_trap_pc) : (c_valid) ? (if_pc + 2) : {next_pc[31:1], 1'b0}; ``` - However, consider the following program (From C-J compliance test): - | Address | Value | Instruction | | -------- | ---------- | ------------- | | 0x36 | 0x4581 | c.li a1, 0 | | 0x38 | 0xA021 | c.j 40 <..> | | 0x3A | 0x65C9 | c.lui a1,0x12 | | 0x3C | 0x3AB58593 | addi a1,a1,939| | 0x40 | 0xC02E | c.swsp a1,0(sp)| | 0x42 | 0x00020117 | auipc sp,0x20 | | 0x46 | 0x0C210113 | addi sp,sp,194| - Originally, branches in WB or WB_next stage has higher priority when deciding the value of `fetch_pc`. Thus, the code will be executed as follow: - | | | IF/ID | EX | WB | WB_next | | --------- | ---------- | --------------- | ------- | ------ | ------- | | next_pc |fetch_pc |if_pc | ex_pc | wb_pc | wb_next | | ==`0x48`==|==`0x44`== |`0x40` |`0x3A` | `0x3C` | `0x38` | | | xxx | xxx |`c.swsp a1,0(sp)`|`c.lui a1,0x12` |`NOP` | `c.j 40 <xx>` | - Waveform ![](https://i.imgur.com/PK1Ef3J.png) - Notice `c.lui a1,0x12` shouldn't exist in pipeline, it should be set to NOP. This instruction slip through the jump instruction. We will deal with this problem later and just ignore for now. - `ex_flush` signal indicates that there's a branch/jump instruction in WB/WB_next stage. - When `c.j 40 <xx>` instruction is at WB_next stage, `ex_flush` is set to 1. So `fetch_pc` is set to `0x44`. However, this is wrong since `fetch_pc` should be `0x42`. - This problem is crucial because `fetch_pc` is connected to I-MEM. We won't be able to stop a wrong instruction to enter the pipeline if `fetch_pc` is wrongly set. - `fetch_pc` is set to `0x44` (wrong) because of `ex_flush` signal. It shoule be set to `0x42` because the previous instruction `c.swsp a1,0(sp)` is a RV32C instruction. - That's the reason why RV32C instuction should have higher priority than branch/jump in WB/WB_next stage. - After this commit, the logic for setting `fetch_pc` becomes: - ```c=1 fetch_pc <= (c_valid) ? (if_pc + 2) : (ex_flush) ? (fetch_pc + 4) : (ex_trap) ? (ex_trap_pc) : {next_pc[31:1], 1'b0}; ``` - The instruction will be executed in the following order: - | | | IF/ID | EX | WB | WB_next | | --------- | ------------- | --------------- | ------- | ------ | ------- | | next_pc |fetch_pc |if_pc | ex_pc | wb_pc | wb_next | | ==`0x40`==|==`0x3C`== |`0x40` |`0x3A` | `0x3C` | `0x38` | | xxx |`auipc sp,0x20`|`c.swsp a1,0(sp)`|`c.lui a1,0x12`|`NOP`|`c.j 40 <xx>`| - Waveform ![](https://i.imgur.com/m7jv6ni.png) - Notice the result isn't correct for now, some additional work has to be done to set `fetch_pc` to correct address. - Changes in this [commit](https://github.com/WeiCheng14159/srv32/commit/5596dd6f93840873ce056cf05e21bd7b1a78e28f) - **Add c_branch_kill signal** - `c_branch_kill` signal will kill a RV32C instuction in IF/ID stage if there's a branch/jump in WB stage - By doing so, the branch penalty for RV32C branch will be 2 cycles (2 NOP inserted after branch) - Consider the following code sequence - | Address | Value | Instruction | | -------- | ---------- | ------------- | | 0x36 | 0x4581 | c.li a1, 0 | | 0x38 | 0xA021 | c.j 40 <..> | | 0x3A | 0x65C9 | c.lui a1,0x12 | | 0x3C | 0x3AB58593 | addi a1,a1,939| | 0x40 | 0xC02E | c.swsp a1,0(sp)| | 0x42 | 0x00020117 | auipc sp,0x20 | | 0x46 | 0x0C210113 | addi sp,sp,194| - Before this commit (**slip through**) - | | | IF/ID | EX | WB | | ------- | ------------ | ------------------ | ------ | ---------------- | | next_pc | fetch_pc | if_pc | ex_pc | wb_pc | | `0x44` | `0x40` | `0x3A` | `0x3C` | `0x38` | | xxx | `<target>` |==`c.lui a1,0x12`== | `NOP` | `j 40 <base>` | - Waveform: ![](https://i.imgur.com/m7jv6ni.png) - Notice that `c.j` instuction is expanded to `j` instuction after IF/ID stage by the `compressed_decoder` - The instruction right after `c.j` is set to NOP because two RV32C instruction cannot be executed back-to-back - By the time jump instruction reaches WB stage, compressed instruction in IF/ID stage should be killed - Mentioned in the previous section, we gives RV32C higher priority when calculating the next PC - A RV32C instruction (in our case `c.lui a1,0x12`) in IF/ID stage will set `fetch_pc` to PC + 2 - A branch instruction in WB or WB_next stage will set the `next_pc` to our branch target address - However, this will allow 1 RV32C instruction to **"slip through"**. In our case, the RV32C instruction `c.lui a1, 0x12` slip through if branch is TAKEN - Which means `c.lui a1, 0x12` is still executed if branch is taken. This unexpected behavior will fail the `C-J` (RV32C jump) riscv-compliance test - After this commit (kill correctly) - | | | IF/ID | EX | WB | | ------- | ------------ | ------------------ | ------ | ---------------- | | next_pc | fetch_pc | if_pc | ex_pc | wb_pc | | `0x44` | `0x40` | `0x3A` | `0x3C` | `0x38` | | xxx | `<target>` |==`NOP`== | `NOP` | `j 40 <base>` | - Waveform: ![](https://i.imgur.com/SGh72Ch.png) - Notice the `inst[31:0]` variable is set to `0x00000013` when `if_pc` is `0x3A`. Which means by the time `if_pc` is `0x3A` the instruction in IF/ID stage is set to `NOP`. - Changes in this [commit](https://github.com/WeiCheng14159/srv32/commit/c3e474b0538b36dac143754d27c7ada6ddd408fd) - **Killed instr shouldn't be seen as a RV32C instr** - This is related to the previous commit. A branch instruction in WB stage should kill an RV32C instruction in IF/ID state - `c_valid` signal set to 1 if the instruction in IF/ID stage is decoded as a valid RV32C instruction - Once a RV32C instruction is killed. The bit `c_valid` should be set to zero because NOP is NOT a RV32C instruction - This is important because a RV32C instruction in EX stage will kill the consecutive insutrction in IF/ID stage - Must clear their `c_valid` bit if they were killed - This is necessary because a RV32C instruction in EX stage will affect the processor behavior in IF/ID stage - Continue with the previous example: - Before this commit - Jump target in IF/ID stage is killed by NOP (previously c.lui) in EX stage because c_valid is wrongly set) - | | | IF/ID | EX | WB | | ------- | ------------ | ------------------ | ------ | ---------------- | | next_pc | fetch_pc | if_pc | ex_pc | wb_pc | | `0x44` | `0x40` | `0x3A` | `0x3C` | `0x38` | | xxx | `<target>` |`NOP` | `NOP`==(c_valid=1)==| `j 40 <base>`| - Waveform: ![](https://i.imgur.com/TgRmIqC.png) - After this commit (branch target not killed) - | | | IF/ID | EX | WB | | ------- | ------------ | ------------------ | ------ | ---------------- | | next_pc | fetch_pc | if_pc | ex_pc | wb_pc | | `0x44` | `0x40` | `0x3A` | `0x3C` | `0x38` | | xxx | `<target>` |`NOP` | `NOP`==(c_valid=0)==| `j 40 <base>`| - Waveform: ![](https://i.imgur.com/p1L3ec3.png) - Notice the value of `c_valid` is different. - Changes in this [commit](https://github.com/WeiCheng14159/srv32/commit/313f7547f1b36b1640263e5427d4e3ca728f283a)