Try   HackMD

Single-cycle RISC-V

蕭力文

contributed by <liball>

Task Description

To modify the project https://github.com/sysprog21/ca2023-lab3 and enhance it to support the full RV32I instruction set along with CSR instructions (specifically the Zicsr extension) using Chisel.
The implementation must be compatible with the test programs provided in https://github.com/sysprog21/rv32emu/tree/master/tests/perfcounter. Additionally, select at least three RISC-V programs from the course exercises, rewrite them, and ensure they run successfully on the enhanced processor.

Environment

Ubuntu Linux 24.04

Compelete single-cycle RISC-V CPU

Fix the permissions of the uploaded pictures.

Instruction Fetch

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

After validating the instruction, if a jump is required, the PC is updated to the target jump address. Otherwise, it is incremented to PC + 4.

...
    when(io.jump_flag_id) {
      pc := io.jump_address_id
    }.otherwise {
      pc := pc + 4.U
    }
...

Instruction Fetch Test

Execute the command: sbt "testOnly riscv.singlecycle.InstructionFetchTest" for testing.
The figure below shows that we pass the test successfully.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

In the figure below, the initial instruction is set to 0x00000013, which represents the NOP instruction. The pc is initialized to 0x1000.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

At the next positive clock edge, io_instruction_valid is set to HIGH, while io_jump_flag_id remains LOW. As a result, the pc increments to 0x1004 (pc + 4).

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Next, we set both io_instruction_valid and io_jump_flag_id to HIGH. As shown, the pc returns to 0x1000, indicating that the jump was successfully executed.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Instruction Decode

The following code snippet demonstrates how the control signals for memory read and write operations are generated based on the instruction's opcode during the decode stage:

...
  io.memory_read_enable  := (opcode === InstructionTypes.L)
  io.memory_write_enable := (opcode === InstructionTypes.S)
...
  • io.memory_read_enable: This signal is activated (true) when the instruction belongs to the load (L) type, which indicates that the operation will read data from memory.
  • io.memory_write_enable: This signal is activated (true) when the instruction belongs to the store (S) type, which means the operation will write data to memory.

By comparing the opcode with predefined constants (e.g., InstructionTypes.L and InstructionTypes.S), the system ensures that the appropriate control signals are generated for memory operations. This enables the processor to distinguish between read and write operations and handle them accordingly in subsequent pipeline stages.

Instruction Decode Test

Execute the command: sbt "testOnly riscv.singlecycle.InstructionDecoderTest" for testing.
The figure below shows that we pass the test successfully.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

The first io_instruction is 0x00A02223 (0000 0000 1010 0000 0010 0010 0010 0011 in binary). Then we can see that:

  • opcode = 23 = 0b0100011 is a S-type instruction.
  • funct3 = 0b010 indiates that it is sw instruction.
  • imm = 0x04 = 4
  • rs1 = 0x00 = x0; rs2 = 0x0A = x10
  • io_memory_read_enable, io_reg_write are set to LOW; io_memory_write_enable is set to HIGH.
  • io_ex_aluop1_source = 0, and io_ex_aluop2_source = 1 which represent for Register for both sources.
    ​​​​io.ex_aluop1_source := Mux(
    ​​​​opcode === Instructions.auipc || opcode === InstructionTypes.B || opcode === Instructions.jal,
    ​​​​ALUOp1Source.InstructionAddress,
    ​​​​ALUOp1Source.Register
    ​​)
    ​​
    ​​io.ex_aluop2_source := Mux(
    ​​​​opcode === InstructionTypes.RM,
    ​​​​ALUOp2Source.Register,
    ​​​​ALUOp2Source.Immediate
    ​​)   
    

This instruction is sw x10, 4(x0).

Execute

The following code snippet demonstrates how the ALU inputs are set in the Execute stage:

...
  alu.io.func   := alu_ctrl.io.alu_funct
  alu.io.op1    := Mux(io.aluop1_source === 1.U, io.instruction_address, io.reg1_data)
  alu.io.op2    := Mux(io.aluop2_source === 1.U, io.immediate, io.reg2_data)
...
  • alu.io.func: This sets the ALU's function code, which determines what operation the ALU will perform. The function code is provided by the ALU control unit, which is based on the decoded instruction’s opcode and function fields.
  • alu.io.op1: This selects the first operand for the ALU. The value is determined by the aluop1_source signal:
    • If aluop1_source is set to 1, it uses the instruction address (io.instruction_address) as the first operand.
    • Otherwise, it uses the value from register 1 (io.reg1_data).
  • alu.io.op2: This selects the second operand for the ALU. Similarly to op1, the value is determined by the aluop2_source signal:
    • If aluop2_source is set to 1, it uses the immediate value (io.immediate) as the second operand.
    • Otherwise, it uses the value from register 2 (io.reg2_data).

Execute Test

Execute the command: sbt "testOnly riscv.singlecycle.ExecuteTest" for testing.
The figure below shows that we pass the test successfully.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

This test is checking two main functionalities:

ADD Instruction Testing:

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

io_instruction is 0x001101B3 which represents x3 = x2 + x1.
This ADD test performs 100 times. In each time, it:

  1. Creates two random numbers as operands.
  2. Inputs these random numbers into the execute stage as register values (reg1_data and reg2_data).
  3. Advances the clock by one cycle
  4. Checks if the ALU output matches the expected sum of the two random numbers
  5. Confirms that no jump signal was generated (if_jump_flag remains 0)
BEQ Instruction Testing:

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

io_instruction is 0x00208163 which represents beq x1, x2, 2.

Sets up the test conditions:

  • Sets instruction_address to 2
  • Sets immediate to 2
  • Configures ALU operand sources (aluop1_source and aluop2_source set to 1)

Tests two scenarios:

  1. Equal case:

    • Sets reg1_data and reg2_data to 9
    • Expects if_jump_flag to be 1 (branch taken)
    • Expects if_jump_address to be 4 (PC + 2)
  2. Not equal case:

    • Sets reg1_data and reg2_data to 9 and 19 respectively.
    • Expects if_jump_flag to be 0 (branch not taken)
    • Still expects if_jump_address to be 4

CPU

Connect the inputs between the inputs of Execute module and the outputs of the other modules by following the CPU architecture diagram below.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

...
  ex.io.instruction         := inst_fetch.io.instruction
  ex.io.instruction_address := inst_fetch.io.instruction_address
  ex.io.reg1_data           := regs.io.read_data1
  ex.io.reg2_data           := regs.io.read_data2
  ex.io.immediate           := id.io.ex_immediate
  ex.io.aluop1_source       := id.io.ex_aluop1_source
  ex.io.aluop2_source       := id.io.ex_aluop2_source
...

Single-cycle CPU Test

Execute the command: sbt test for testing the whole single-cycle CPU.
The figure below shows that we pass the test successfully.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Control and Status Register(CSR)

The Control and Status Register (CSR) is a key feature of the RISC-V architecture. It provides a mechanism for managing system-level configuration, monitoring, and exception handling. CSRs are special-purpose registers used for tasks such as storing control bits, enabling interrupts, tracking performance counters, or handling trap and exception states. Unlike general-purpose registers, CSRs are accessed via special CSR instructions, which allow reading, writing, and modifying these registers atomically. There will be 12 bits to address a CSR which means that there are up to 4096(

212) registers for CSRs.

Purpose of CSR

The purpose of CSRs in a RISC-V processor is to:

  1. Enable System Control: Manage configuration settings, such as enabling/disabling interrupts or switching privilege levels (e.g., user, supervisor, or machine mode).
  2. Provide Status Information: Store and retrieve system state information, such as exception codes or trap causes.
  3. Facilitate Performance Monitoring: Support features like performance counters, which track the number of executed instructions, clock cycles, or cache hits/misses.
  4. Support Exception and Interrupt Handling: CSRs store and manage critical data related to traps (e.g., program counter at the time of the exception, trap vector addresses).
  5. Allow Fine-Grained Privilege Control: Enable software to control hardware features securely, such as restricting or granting access to specific features based on privilege levels.

Key CSR Registers

  • mstatus (Machine Status Register):
    Is used to record the status of the machine mode, such as whether interrupts are enabled, etc.
  • mtvec (Machine Trap Vector Register):
    Holds the base address for the trap vector table, determining where the processor jumps when handling exceptions or interrupts.
  • mepc (Machine Exception Program Counter):
    Stores the return address where the CPU resumes execution after handling an interrupt or exception. Given its critical role in program flow control, careful consideration must be given to the content stored in mepc during both interrupt and exception handling.
  • mcause (Machine Cause Register):
    Records the reason for an exception or interrupt, including whether it was caused by software, hardware, or timer-related events.
  • Performance Counters (mcycle and minstret):
    Measure the number of clock cycles (mcycle) and instructions retired (minstret), aiding in profiling and debugging.

CSR Operations

image
CSR instructions provide flexible control over these registers:

  • CSRRW:
    Reads the old value of the CSR, zero-extends the value to XLEN bits, then writes it to integer register rd. The initial value in rs1 is written to the CSR. If rd = x0, then the instruction shall not read the CSR and shall not cause any of the side effects that might occur on a CSR read.

  • CSRRS:
    Reads the value of the CSR, zeroextends the value to XLEN bits, and writes it to integer register rd. The initial value in integer register rs1 is treated as a bit mask that specifies bit positions to be set in the CSR. Any bit that is high in rs1 will cause the corresponding bit to be set in the CSR, if that CSR bit is writable.

  • CSRRC:
    Reads the value of the CSR, zeroextends the value to XLEN bits, and writes it to integer register rd. The initial value in integer register rs1 is treated as a bit mask that specifies bit positions to be cleared in the CSR. Any bit that is high in rs1 will cause the corresponding bit to be cleared in the CSR, if that CSR bit is writable. Other bits in the CSR are unaffected.

  • CSRRWI:
    Similar to CSRRW, except it update the CSR using an XLEN-bit value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the rs1 field instead of a value from an integer register.

  • CSRRSI:
    Similar to CSRRS, except it update the CSR using an XLEN-bit value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the rs1 field instead of a value from an integer register.

  • CSRRCI:
    Similar to CSRRC, except it update the CSR using an XLEN-bit value obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0]) field encoded in the rs1 field instead of a value from an integer register.

Core Local Interrupt (CLINT)

The Core Local Interrupt (CLINT) handles interrupts, such as timer and software-generated interrupts. It plays a critical role in managing timer-based events and enabling communication between cores in multi-core systems.

Purpose of CLINT

  1. Interrupt Management:
    CLINT coordinates local interrupts specific to each core and processes requests from software or timers.

  2. Timer Facilities:
    CLINT includes programmable timers to support time-sensitive tasks like scheduling and context switching.

  3. Software-Generated Interrupts:
    Enables inter-core communication and task signaling by allowing software to trigger interrupts.

Key Components of CLINT

  1. Timer Registers:

    • mtime: A global timer that holds the current machine time.
    • mtimecmp: Stores a compare value. When mtime exceeds mtimecmp, a timer interrupt is generated.
  2. Interrupt Registers:

    • msip (Machine Software Interrupt Pending): Indicates pending software interrupts for the core.
  3. Memory-Mapped Registers: CLINT registers are exposed via memory-mapped I/O, allowing software and peripherals to configure them.

CLINT Operations

  1. Timer Interrupt Generation:
    When the mtime value surpasses mtimecmp, the CLINT signals the core by raising an interrupt.

  2. Interrupt Handling Process:
    Upon an interrupt:

    • The processor saves the current context.
    • It jumps to the Interrupt Service Routine (ISR), as specified by the mtvec register.
    • The ISR performs the necessary action, such as servicing the interrupt or scheduling tasks.
  3. Software Interrupts:
    Software can trigger interrupts by writing to msip, enabling communication and synchronization between cores.

Revise Single-cycle CPU to support CSR instructions

image
The figure above is the objective CPU architecture. Reference

Instruction Fetch

To handle interrupts, additional singals are introduced: interrupt_assert and interrupt_handler_address.

  • interrupt_assert:
    The singal acts as a trigger, indicating whether an interrupt needs to be handled. When interrupt_assert is set to 1, it signifies that the CPU must handle an interrupt.
  • interrupt_handler_address:
    When the interrupt_assert is set to 1, the pc jumps to the interrupt_handler_address, redirecting execution to the interrupt handler routine.

Interrupt handling is given the highest priority. If both a jump and an interrupt occur simultaneously, the interrupt takes precedence, as it is checked before the jump condition.
The implementation is shown as below.

...
when(io.interrupt_assert){
  pc := io.interrupt_handler_address
}.elsewhen(io.jump_flag_id) {
  pc := io.jump_address_id
}.otherwise {
  pc := pc + 4.U
}
...

Instruction Fetch Test

The test has been modified to include the ability to verify interrupt handling functionality.

...
          case 2 => // interrupt
            c.io.interrupt_assert.poke(true.B)
            c.io.interrupt_handler_address.poke(interruptHandlerAddress)
            c.clock.step()
            c.io.instruction_address.expect(interruptHandlerAddress)
            pre = interruptHandlerAddress
            c.io.interrupt_assert.poke(false.B) // clear interrupt after handling
...

image

Instruction Decode

image
csr[31:20] | rs1/uimm[19:15] | funct3[14:12] | rd[11:7] | opcode[6:0]

image
When a CSR instruction is decoded. The csr_address specifies the address of the target CSR register. The csr_write_enable indicates if the instruction needs to write to the specified CSR. The execution stage uses these signals to determine whether to read, modify, or write to the CSR register according to the instruction's requirements.

  • csr_address:
    • Holds the address of the CSR being accessed by the instruction.
    • As the figure shown above, csr_address is the [31:20] part of io.insturction
  • csr_write_enable:
    • Determines whether a CSR write operation should be performed during the execution stage.
    • When opcode === Instructions.csr and funct3 is one of the csrrw, csrrwi, csrrs, csrrsi, csrrc, csrrci, it will be set to 1.
...
    val csr_write_enable    = Output(Bool())
    val csr_address         = Output(UInt(Parameters.CSRRegisterAddrWidth))
...
...
  io.csr_address := io.instruction(31, 20)
  io.csr_write_enable := (opcode === Instructions.csr) && (
    funct3 === InstructionsTypeCSR.csrrw || funct3 === InstructionsTypeCSR.csrrwi ||
    funct3 === InstructionsTypeCSR.csrrs || funct3 === InstructionsTypeCSR.csrrsi ||
    funct3 === InstructionsTypeCSR.csrrc || funct3 === InstructionsTypeCSR.csrrci
  )
...

In the The RISC-V Instruction Set Manual Volume I p.46~p.48, indicating that for:

  • CSRRW and CSRRWI:
    If rd == x0, then the instruction shall not read the CSR and shall not cause any of the side effects that might occur on a CSR read.
  • CSRRS and CSRRC:
    If rs1 == x0, then the instruction will not write to the CSR at all, and so shall not cause any of the side effects that might otherwise occur on a CSR write, such as raising illegal instruction exceptions on accesses to read-only CSRs.
  • CSRRSI and CSRRCI:
    • If the uimm[4:0] field is zero, then these instructions will not write to the CSR, and shall not cause any of the side effects that might otherwise occur on a CSR write.
    • Will always read the CSR and cause any read side effects regardless of rd and rs1 fields.
      So we need to handle the rs1 == x0 situation.

TODO: Handle the rd == x0 and rs1 == x0 situations.

Instruction Decode Test

Create a test of the instruction csrrw x0, mtvec, x0.

...
      c.io.instruction.poke(0x30501073L.U) // csrrw x0, mtvec, x0
      c.io.wb_reg_write_source.expect(RegWriteSource.CSR)   
      c.io.regs_reg1_read_address.expect(0.U)              
      c.io.csr_address.expect(0x305.U)                     // CSR address mtvec
      c.io.csr_write_enable.expect(true.B)                 // CSR write enable should be enabled
      c.clock.step()
...

image

Execute

image
Implement the CSR instructions respectively.

  • csrrw rd, csr, rs1:
    1. Read the value in csr to rd.
    2. Write rs1 to the csr.
    3. If rd == x0, the csr remains unchanged.
  • csrrs rd, csr, rs1:
    1. Read the value in csr to rd.
    2. The value in rs1 is bitwise OR with the current value in the csr , and write the result back to the csr.
    3. If rs1 == x0, the csr remains unchanged.
  • csrrc rd, csr, rs1:
    1. Read the value in csr to rd.
    2. The complement of the value in rs1 is bitwise AND with the complement of the value in the csr , and write the result back to the csr.
    3. If rs1 == x0, the csr remains unchanged.
  • csrrwi rd, csr, uimm:
    1. Read the value in csr to rd.
    2. Write uimm to the csr. (uimm is only 5-bit, so it needs zero-extension to 32-bit while computing)
    3. If rd == x0, the csr remains unchanged.
  • csrrsi rd, csr, uimm:
    1. Read the value in csr to rd.
    2. The value in uimm is bitwise OR with the current value in the csr , and write the result back to the csr. (uimm is only 5-bit, so it needs zero-extension to 32-bit while computing)
    3. If rs1 == x0, the csr remains unchanged.
  • csrrci rd, csr, uimm:
    1. Read the value in csr to rd.
    2. The complement of the value in uimm is bitwise AND with the current value in the csr , and write the result back to the csr. (uimm is only 5-bit, so it needs zero-extension to 32-bit while computing)
    3. If rs1 == x0, the csr remains unchanged.
...
    val csr_reg_read_data   = Input(UInt(Parameters.DataWidth))
    val csr_reg_write_data  = Output(UInt(Parameters.DataWidth))
...
...
  io.csr_reg_write_data := MuxLookup(
    funct3,
    0.U,
    IndexedSeq(
      InstructionsTypeCSR.csrrw  -> io.reg1_data,
      InstructionsTypeCSR.csrrs  -> (io.csr_reg_read_data | io.reg1_data),
      InstructionsTypeCSR.csrrc  -> (io.csr_reg_read_data & ~(io.reg1_data)),
      InstructionsTypeCSR.csrrwi -> io.immediate,
      InstructionsTypeCSR.csrrsi -> (io.csr_reg_read_data | io.immediate),
      InstructionsTypeCSR.csrrci -> (io.csr_reg_read_data & ~(io.immediate))
    )
  )
...

Execute Test

Create a test of the instruction csrrsi x1, mtvec, 0x10.

...
      c.io.csr_reg_read_data.poke(15.U)
      c.io.immediate.poke(16.U)
      c.io.instruction.poke(0x305860f3L.U)
      c.clock.step()
      c.io.csr_reg_write_data.expect(31.U)
...

image

Write Back

image
Where 0 is alu_result; 1 is memory_read_data; 2 is csr_reg_read_data; 3 is instruction_address + 4; control singal is regs_write_source

...   
    val csr_read_data       = Input(UInt(Parameters.DataWidth))
...
...
  io.regs_write_data := MuxLookup(
    io.regs_write_source,
    io.alu_result,
    IndexedSeq(
      RegWriteSource.Memory                 -> io.memory_read_data,
      RegWriteSource.CSR                    -> io.csr_read_data
      RegWriteSource.NextInstructionAddress -> (io.instruction_address + 4.U)
    )
  )
...

CPU

Connect the components together according to the single-cycle CPU architecture diagram.

...
  val csr        = Module(new CSR)
  val clint      = Module(new CLINT)
...
...
  inst_fetch.io.jump_address_id           := Mux(clint.io.interrupt_assert === 1.U, clint.io.interrupt_handler_address, ex.io.if_jump_address)
  inst_fetch.io.jump_flag_id              := (ex.io.if_jump_flag | clint.io.interrupt_assert)
  inst_fetch.io.interrupt_assert          := clint.io.interrupt_assert
  inst_fetch.io.interrupt_handler_address := clint.io.interrupt_handler_address
...
  csr.io.reg_read_address_id  := id.io.csr_address
  csr.io.reg_write_enable_id  := id.io.csr_write_enable
  csr.io.reg_write_address_id := id.io.csr_address
  csr.io.reg_write_data_ex    := ex.csr_reg_write_data

  clint.io.Interrupt_Flag         := io.Interrupt_Flag
  clint.io.Instruction            := inst_fetch.io.instruction
  clint.io.IF_Instruction_Address := inst_fetch.io.instruction_address
  clint.io.jump_flag              := ex.io.if_jump_flag
  clint.io.jump_address           := ex.io.if_jump_address
...
  ex.io.csr_reg_read_data   := csr.io.reg_read_data
...
  wb.io.csr_read_data       := csr.io.reg_read_data
...

Test

Execute the command: sbt test.
image
Failed tests:

  • ByteAccessTest
  • FibonacciTest
  • QuicksortTest

References

The RISC-V Instruction Set Manual Volume I: Unprivileged ISA
The RISC-V Instruction Set Manual Volume II: Privileged Architecture
RISC-V Architecture Instruction Encoding
YatCPU