owned this note
owned this note
Published
Linked with GitHub
> 蕭力文
contributed by <[`liball`](https://github.com/liballouo/ca2023-lab3)>
## Task Description
To modify the project https://github.com/sysprog21/ca2023-lab3 and enhance it to support the full RV32I instruction set along with **CSR** instructions (specifically the **Zicsr** extension) using Chisel.
The implementation must be compatible with the test programs provided in https://github.com/sysprog21/rv32emu/tree/master/tests/perfcounter. Additionally, select at least three RISC-V programs from the course exercises, rewrite them, and ensure they run successfully on the enhanced processor.
## Environment
Ubuntu Linux 24.04
## Compelete single-cycle RISC-V CPU
### Instruction Fetch

After validating the instruction, if a jump is required, the PC is updated to the target jump address. Otherwise, it is incremented to `PC + 4`.
```scala
...
when(io.jump_flag_id) {
pc := io.jump_address_id
}.otherwise {
pc := pc + 4.U
}
...
```
#### Instruction Fetch Test
Execute the command: `sbt "testOnly riscv.singlecycle.InstructionFetchTest"` for testing.
The figure below shows that we pass the test successfully.

In the figure below, the initial instruction is set to `0x00000013`, which represents the `NOP` instruction. The `pc` is initialized to `0x1000`.

At the next positive clock edge, `io_instruction_valid` is set to `HIGH`, while `io_jump_flag_id` remains `LOW`. As a result, the pc increments to `0x1004` (`pc + 4`).

Next, we set both `io_instruction_valid` and `io_jump_flag_id` to `HIGH`. As shown, the `pc` returns to `0x1000`, indicating that the jump was successfully executed.

### Instruction Decode
The following code snippet demonstrates how the control signals for memory read and write operations are generated based on the instruction's opcode during the decode stage:
```scala
...
io.memory_read_enable := (opcode === InstructionTypes.L)
io.memory_write_enable := (opcode === InstructionTypes.S)
...
```
- `io.memory_read_enable`: This signal is activated (`true`) when the instruction belongs to the **load (L)** type, which indicates that the operation will read data from memory.
- `io.memory_write_enable`: This signal is activated (`true`) when the instruction belongs to the **store (S)** type, which means the operation will write data to memory.
By comparing the `opcode` with predefined constants (e.g., `InstructionTypes.L` and `InstructionTypes.S`), the system ensures that the appropriate control signals are generated for memory operations. This enables the processor to distinguish between read and write operations and handle them accordingly in subsequent pipeline stages.
#### Instruction Decode Test
Execute the command: `sbt "testOnly riscv.singlecycle.InstructionDecoderTest"` for testing.
The figure below shows that we pass the test successfully.


The first `io_instruction` is `0x00A02223` (`0000 0000 1010 0000 0010 0010 0010 0011` in binary). Then we can see that:
- `opcode` = `23` = `0b0100011` is a `S-type` instruction.
- `funct3` = `0b010` indiates that it is `sw` instruction.
- `imm` = `0x04` = `4`
- `rs1` = `0x00` = `x0`; `rs2` = `0x0A` = `x10`
- `io_memory_read_enable`, `io_reg_write` are set to `LOW`; `io_memory_write_enable` is set to `HIGH`.
- `io_ex_aluop1_source` = `0`, and `io_ex_aluop2_source` = `1` which represent for `Register` for both sources.
```scala
io.ex_aluop1_source := Mux(
opcode === Instructions.auipc || opcode === InstructionTypes.B || opcode === Instructions.jal,
ALUOp1Source.InstructionAddress,
ALUOp1Source.Register
)
io.ex_aluop2_source := Mux(
opcode === InstructionTypes.RM,
ALUOp2Source.Register,
ALUOp2Source.Immediate
)
```
This instruction is `sw x10, 4(x0)`.
### Execute
The following code snippet demonstrates how the ALU inputs are set in the Execute stage:
```scala
...
alu.io.func := alu_ctrl.io.alu_funct
alu.io.op1 := Mux(io.aluop1_source === 1.U, io.instruction_address, io.reg1_data)
alu.io.op2 := Mux(io.aluop2_source === 1.U, io.immediate, io.reg2_data)
...
```
- `alu.io.func`: This sets the ALU's function code, which determines what operation the ALU will perform. The function code is provided by the ALU control unit, which is based on the decoded instruction’s opcode and function fields.
- `alu.io.op1`: This selects the first operand for the ALU. The value is determined by the `aluop1_source` signal:
- If `aluop1_source` is set to `1`, it uses the **instruction address** (`io.instruction_address`) as the first operand.
- Otherwise, it uses the value from **register 1** (`io.reg1_data`).
- `alu.io.op2`: This selects the second operand for the ALU. Similarly to `op1`, the value is determined by the `aluop2_source` signal:
- If `aluop2_source` is set to `1`, it uses the **immediate value** (`io.immediate`) as the second operand.
- Otherwise, it uses the value from **register 2** (`io.reg2_data`).
#### Execute Test
Execute the command: `sbt "testOnly riscv.singlecycle.ExecuteTest"` for testing.
The figure below shows that we pass the test successfully.

This test is checking two main functionalities:
##### `ADD` Instruction Testing:

`io_instruction` is `0x001101B3` which represents `x3 = x2 + x1`.
This `ADD` test performs 100 times. In each time, it:
1. Creates two random numbers as operands.
2. Inputs these random numbers into the execute stage as register values (`reg1_data` and `reg2_data`).
3. Advances the clock by one cycle
4. Checks if the ALU output matches the expected sum of the two random numbers
5. Confirms that no jump signal was generated (`if_jump_flag` remains `0`)
##### `BEQ` Instruction Testing:

`io_instruction` is `0x00208163` which represents `beq x1, x2, 2`.
Sets up the test conditions:
- Sets `instruction_address` to `2`
- Sets `immediate` to `2`
- Configures ALU operand sources (`aluop1_source` and `aluop2_source` set to `1`)
Tests two scenarios:
1. Equal case:
- Sets `reg1_data` and `reg2_data` to `9`
- Expects `if_jump_flag` to be `1` (branch taken)
- Expects `if_jump_address` to be `4` (`PC + 2`)
2. Not equal case:
- Sets `reg1_data` and `reg2_data` to `9` and `19` respectively.
- Expects `if_jump_flag` to be `0` (branch not taken)
- Still expects `if_jump_address` to be `4`
### CPU
Connect the inputs between the inputs of Execute module and the outputs of the other modules by following the CPU architecture diagram below.

```
...
ex.io.instruction := inst_fetch.io.instruction
ex.io.instruction_address := inst_fetch.io.instruction_address
ex.io.reg1_data := regs.io.read_data1
ex.io.reg2_data := regs.io.read_data2
ex.io.immediate := id.io.ex_immediate
ex.io.aluop1_source := id.io.ex_aluop1_source
ex.io.aluop2_source := id.io.ex_aluop2_source
...
```
### Single-cycle CPU Test
Execute the command: `sbt test` for testing the whole single-cycle CPU.
The figure below shows that we pass the test successfully.

## Control and Status Register(CSR)
The **Control and Status Register (CSR)** is a key feature of the RISC-V architecture. It provides a mechanism for managing system-level configuration, monitoring, and exception handling. CSRs are special-purpose registers used for tasks such as storing control bits, enabling interrupts, tracking performance counters, or handling trap and exception states. Unlike general-purpose registers, CSRs are accessed via special CSR instructions, which allow reading, writing, and modifying these registers atomically. There will be 12 bits to address a CSR which means that there are up to 4096($2^{12}$) registers for CSRs.
### Purpose of CSR
The purpose of CSRs in a RISC-V processor is to:
1. **Enable System Control**: Manage configuration settings, such as enabling/disabling interrupts or switching privilege levels (e.g., user, supervisor, or machine mode).
2. **Provide Status Information**: Store and retrieve system state information, such as exception codes or trap causes.
3. **Facilitate Performance Monitoring**: Support features like performance counters, which track the number of executed instructions, clock cycles, or cache hits/misses.
4. **Support Exception and Interrupt Handling**: CSRs store and manage critical data related to traps (e.g., program counter at the time of the exception, trap vector addresses).
5. **Allow Fine-Grained Privilege Control**: Enable software to control hardware features securely, such as restricting or granting access to specific features based on privilege levels.
### Key CSR Registers
- `mstatus` (**Machine Status Register**):
Is used to record the status of the machine mode, such as whether interrupts are enabled, etc.
- `mtvec` (**Machine Trap Vector Register**):
Holds the base address for the trap vector table, determining where the processor jumps when handling exceptions or interrupts.
- `mepc` (**Machine Exception Program Counter**):
Stores the return address where the CPU resumes execution after handling an interrupt or exception. Given its critical role in program flow control, careful consideration must be given to the content stored in `mepc` during both interrupt and exception handling.
- `mcause` (**Machine Cause Register**):
Records the reason for an exception or interrupt, including whether it was caused by software, hardware, or timer-related events.
- **Performance Counters** (`mcycle` and `minstret`):
Measure the number of clock cycles (`mcycle`) and instructions retired (`minstret`), aiding in profiling and debugging.
### CSR Operations

CSR instructions provide flexible control over these registers:
- `CSRRW`:
Reads the old value of the CSR, **zero-extends** the value to `XLEN` bits, then writes it to integer register rd. The initial value in rs1 is written to the CSR. If `rd = x0`, then the instruction shall not read the CSR and shall not cause any of the side effects that might occur on a CSR read.
- `CSRRS`:
Reads the value of the CSR, **zeroextends** the value to `XLEN` bits, and writes it to integer register `rd`. The initial value in integer register `rs1` is treated as a bit mask that specifies bit positions to be set in the CSR. Any bit that is high in `rs1` will cause the corresponding bit to be **set** in the CSR, if that CSR bit is writable.
- `CSRRC`:
Reads the value of the CSR, **zeroextends** the value to ` XLEN ` bits, and writes it to integer register `rd`. The initial value in integer register `rs1` is treated as a bit mask that specifies bit positions to be cleared in the CSR. Any bit that is high in `rs1` will cause the corresponding bit to be **cleared** in the CSR, if that CSR bit is writable. Other bits in the CSR are unaffected.
- `CSRRWI`:
Similar to `CSRRW`, except it update the CSR using an `XLEN`-bit value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the `rs1` field instead of a value from an integer register.
- `CSRRSI`:
Similar to `CSRRS`, except it update the CSR using an `XLEN`-bit value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the `rs1` field instead of a value from an integer register.
- `CSRRCI`:
Similar to `CSRRC`, except it update the CSR using an `XLEN`-bit value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the `rs1` field instead of a value from an integer register.
## Core Local Interrupt (CLINT)
The **Core Local Interrupt (CLINT)** handles interrupts, such as timer and software-generated interrupts. It plays a critical role in managing timer-based events and enabling communication between cores in multi-core systems.
### Purpose of CLINT
1. **Interrupt Management**:
CLINT coordinates local interrupts specific to each core and processes requests from software or timers.
2. **Timer Facilities**:
CLINT includes programmable timers to support time-sensitive tasks like scheduling and context switching.
3. **Software-Generated Interrupt**s:
Enables inter-core communication and task signaling by allowing software to trigger interrupts.
### Key Components of CLINT
1. **Timer Registers**:
- `mtime`: A global timer that holds the current machine time.
- `mtimecmp`: Stores a compare value. When `mtime` exceeds `mtimecmp`, a timer interrupt is generated.
2. **Interrupt Registers**:
- `msip` (Machine Software Interrupt Pending): Indicates pending software interrupts for the core.
3. **Memory-Mapped Registers**: CLINT registers are exposed via memory-mapped I/O, allowing software and peripherals to configure them.
### CLINT Operations
1. **Timer Interrupt Generation**:
When the `mtime` value surpasses `mtimecmp`, the CLINT signals the core by raising an interrupt.
2. **Interrupt Handling Process**:
Upon an interrupt:
- The processor saves the current context.
- It jumps to the Interrupt Service Routine (ISR), as specified by the mtvec register.
- The ISR performs the necessary action, such as servicing the interrupt or scheduling tasks.
3. **Software Interrupts**:
Software can trigger interrupts by writing to msip, enabling communication and synchronization between cores.
## Revise Single-cycle CPU to support CSR instructions

The figure above is the objective CPU architecture. [Reference](https://yatcpu.sysu.tech/labs/lab2-interrupt/)
### Instruction Fetch
To handle interrupts, additional singals are introduced: `interrupt_assert` and `interrupt_handler_address`.
- `interrupt_assert`:
The singal acts as a trigger, indicating whether an interrupt needs to be handled. When `interrupt_assert` is set to `1`, it signifies that the CPU must handle an interrupt.
- `interrupt_handler_address`:
When the `interrupt_assert` is set to `1`, the `pc` jumps to the `interrupt_handler_address`, redirecting execution to the interrupt handler routine.
Interrupt handling is given the highest priority. If both a jump and an interrupt occur simultaneously, the interrupt takes precedence, as it is checked before the jump condition.
The implementation is shown as below.
```scala
...
when(io.interrupt_assert){
pc := io.interrupt_handler_address
}.elsewhen(io.jump_flag_id) {
pc := io.jump_address_id
}.otherwise {
pc := pc + 4.U
}
...
```
#### Instruction Fetch Test
The test has been modified to include the ability to verify interrupt handling functionality.
```scala
...
case 2 => // interrupt
c.io.interrupt_assert.poke(true.B)
c.io.interrupt_handler_address.poke(interruptHandlerAddress)
c.clock.step()
c.io.instruction_address.expect(interruptHandlerAddress)
pre = interruptHandlerAddress
c.io.interrupt_assert.poke(false.B) // clear interrupt after handling
...
```

### Instruction Decode

`csr[31:20] | rs1/uimm[19:15] | funct3[14:12] | rd[11:7] | opcode[6:0]`

When a CSR instruction is decoded. The `csr_address` specifies the address of the target CSR register. The `csr_write_enable` indicates if the instruction needs to write to the specified CSR. The execution stage uses these signals to determine whether to read, modify, or write to the CSR register according to the instruction's requirements.
- `csr_address`:
- Holds the address of the CSR being accessed by the instruction.
- As the figure shown above, `csr_address` is the [31:20] part of `io.insturction`
- `csr_write_enable`:
- Determines whether a CSR write operation should be performed during the execution stage.
- When `opcode === Instructions.csr` and `funct3` is one of the `csrrw`, `csrrwi`, `csrrs`, `csrrsi`, `csrrc`, `csrrci`, it will be set to `1`.
```scala
...
val csr_write_enable = Output(Bool())
val csr_address = Output(UInt(Parameters.CSRRegisterAddrWidth))
...
```
```scala
...
io.csr_address := io.instruction(31, 20)
io.csr_write_enable := (opcode === Instructions.csr) && (
funct3 === InstructionsTypeCSR.csrrw || funct3 === InstructionsTypeCSR.csrrwi ||
funct3 === InstructionsTypeCSR.csrrs || funct3 === InstructionsTypeCSR.csrrsi ||
funct3 === InstructionsTypeCSR.csrrc || funct3 === InstructionsTypeCSR.csrrci
)
...
```
In the [The RISC-V Instruction Set Manual Volume I](https://drive.google.com/file/d/1uviu1nH-tScFfgrovvFCrj7Omv8tFtkp/view) p.46~p.48, indicating that for:
- `CSRRW` and `CSRRWI`:
If `rd == x0`, then the instruction shall not read the CSR and shall not cause any of the side effects that might occur on a CSR read.
- `CSRRS` and `CSRRC`:
If `rs1 == x0`, then the instruction will not write to the CSR at all, and so shall not cause any of the side effects that might otherwise occur on a CSR write, such as raising illegal instruction exceptions on accesses to read-only CSRs.
- `CSRRSI` and `CSRRCI`:
- If the `uimm[4:0]` field is `zero`, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write.
- Will always read the CSR and cause any read side effects regardless of `rd` and `rs1` fields.
So we need to handle the `rs1 == x0` situation.
:::info
TODO: Handle the `rd == x0` and `rs1 == x0` situations.
:::
#### Instruction Decode Test
Create a test of the instruction `csrrw x0, mtvec, x0`.
```scala
...
c.io.instruction.poke(0x30501073L.U) // csrrw x0, mtvec, x0
c.io.wb_reg_write_source.expect(RegWriteSource.CSR)
c.io.regs_reg1_read_address.expect(0.U)
c.io.csr_address.expect(0x305.U) // CSR address mtvec
c.io.csr_write_enable.expect(true.B) // CSR write enable should be enabled
c.clock.step()
...
```

### Execute

Implement the CSR instructions respectively.
- `csrrw rd, csr, rs1`:
1. Read the value in `csr` to `rd`.
2. Write `rs1` to the `csr`.
3. If `rd == x0`, the `csr` remains unchanged.
- `csrrs rd, csr, rs1`:
1. Read the value in `csr` to `rd`.
2. The value in `rs1` is bitwise `OR` with the current value in the `csr` , and write the result back to the `csr`.
3. If `rs1 == x0`, the `csr` remains unchanged.
- `csrrc rd, csr, rs1`:
1. Read the value in `csr` to `rd`.
2. The complement of the value in `rs1` is bitwise `AND` with the complement of the value in the `csr` , and write the result back to the `csr`.
3. If `rs1 == x0`, the `csr` remains unchanged.
- `csrrwi rd, csr, uimm`:
1. Read the value in `csr` to `rd`.
2. Write `uimm` to the `csr`. (`uimm` is only 5-bit, so it needs zero-extension to 32-bit while computing)
3. If `rd == x0`, the `csr` remains unchanged.
- `csrrsi rd, csr, uimm`:
1. Read the value in `csr` to `rd`.
2. The value in `uimm` is bitwise `OR` with the current value in the `csr` , and write the result back to the `csr`. (`uimm` is only 5-bit, so it needs zero-extension to 32-bit while computing)
3. If `rs1 == x0`, the `csr` remains unchanged.
- `csrrci rd, csr, uimm`:
1. Read the value in `csr` to `rd`.
2. The complement of the value in `uimm` is bitwise `AND` with the current value in the `csr` , and write the result back to the `csr`. (`uimm` is only 5-bit, so it needs zero-extension to 32-bit while computing)
3. If `rs1 == x0`, the `csr` remains unchanged.
```scala
...
val csr_reg_read_data = Input(UInt(Parameters.DataWidth))
val csr_reg_write_data = Output(UInt(Parameters.DataWidth))
...
```
```scala
...
io.csr_reg_write_data := MuxLookup(
funct3,
0.U,
IndexedSeq(
InstructionsTypeCSR.csrrw -> io.reg1_data,
InstructionsTypeCSR.csrrs -> (io.csr_reg_read_data | io.reg1_data),
InstructionsTypeCSR.csrrc -> (io.csr_reg_read_data & ~(io.reg1_data)),
InstructionsTypeCSR.csrrwi -> io.immediate,
InstructionsTypeCSR.csrrsi -> (io.csr_reg_read_data | io.immediate),
InstructionsTypeCSR.csrrci -> (io.csr_reg_read_data & ~(io.immediate))
)
)
...
```
#### Execute Test
Create a test of the instruction `csrrsi x1, mtvec, 0x10`.
```scala
...
c.io.csr_reg_read_data.poke(15.U)
c.io.immediate.poke(16.U)
c.io.instruction.poke(0x305860f3L.U)
c.clock.step()
c.io.csr_reg_write_data.expect(31.U)
...
```

### Write Back

Where `0` is `alu_result`; `1` is `memory_read_data`; `2` is `csr_reg_read_data`; `3` is `instruction_address + 4`; control singal is `regs_write_source`
```scala
...
val csr_read_data = Input(UInt(Parameters.DataWidth))
...
```
```scala
...
io.regs_write_data := MuxLookup(
io.regs_write_source,
io.alu_result,
IndexedSeq(
RegWriteSource.Memory -> io.memory_read_data,
RegWriteSource.CSR -> io.csr_read_data
RegWriteSource.NextInstructionAddress -> (io.instruction_address + 4.U)
)
)
...
```
### CPU
Connect the components together according to the [single-cycle CPU architecture diagram](https://hackmd.io/uzxZOTiZQbyeLgIPdZ3Lqw?view#Revise-Single-cycle-CPU-to-support-CSR-instructions).
```scala
...
val csr = Module(new CSR)
val clint = Module(new CLINT)
...
```
```scala
...
inst_fetch.io.jump_address_id := Mux(clint.io.interrupt_assert === 1.U, clint.io.interrupt_handler_address, ex.io.if_jump_address)
inst_fetch.io.jump_flag_id := (ex.io.if_jump_flag | clint.io.interrupt_assert)
inst_fetch.io.interrupt_assert := clint.io.interrupt_assert
inst_fetch.io.interrupt_handler_address := clint.io.interrupt_handler_address
...
csr.io.reg_read_address_id := id.io.csr_address
csr.io.reg_write_enable_id := id.io.csr_write_enable
csr.io.reg_write_address_id := id.io.csr_address
csr.io.reg_write_data_ex := ex.csr_reg_write_data
clint.io.Interrupt_Flag := io.Interrupt_Flag
clint.io.Instruction := inst_fetch.io.instruction
clint.io.IF_Instruction_Address := inst_fetch.io.instruction_address
clint.io.jump_flag := ex.io.if_jump_flag
clint.io.jump_address := ex.io.if_jump_address
...
ex.io.csr_reg_read_data := csr.io.reg_read_data
...
wb.io.csr_read_data := csr.io.reg_read_data
...
```
### Test
Execute the command: `sbt test`.

Failed tests:
- ByteAccessTest
- FibonacciTest
- QuicksortTest
## References
[The RISC-V Instruction Set Manual Volume I: Unprivileged ISA ](https://drive.google.com/file/d/1uviu1nH-tScFfgrovvFCrj7Omv8tFtkp/view)
[The RISC-V Instruction Set Manual Volume II: Privileged Architecture](https://cs107e.github.io/readings/riscv-privileged-20190608-1.pdf)
[RISC-V Architecture Instruction Encoding](https://www.youtube.com/watch?v=NmxhoMCeH8I)
[YatCPU](https://yatcpu.sysu.tech/labs/)