Try   HackMD

Assignment3: Single-cycle RISC-V CPU

contributed by < ChengChiTing >

tags: RISC-V, jserv

Implementation in Chisel.

Operating Systems

Ubuntu 22.04.3

Environment Setup

Follow the instructions in Lab3: Construct a single-cycle RISC-V CPU with Chisel, install the essential package.

Install verilator and gtkwave

$ sudo apt install build-essential verilator gtkwave

verilator version : 4.038
GTKWave Analyzer version : 3.3.104

Install sbt( the Scala build tool )

$ curl -s "https://get.sdkman.io" | bash
$ source "$HOME/.sdkman/bin/sdkman-init.sh"
$ sdk install java 11.0.21-tem 
$ sdk install sbt

java version : 11.0.21
sbt version : 1.9.7

note! java version is crucial!
Version 11 & 17 can execute sbt command, but I encountered some error when I used java 21 in my first time environment setting.

Chisel Tutorial

Before we start Lab 3, we have to learn the fundamental concepts of Chisel first. Chisel is a domain specific language (DSL) implemented using Scala’s macro features. We can get the Repositiory with git command.

$ git clone https://github.com/ucb-bar/chisel-tutorial

After then, we can use the following command to check whether sbt installed successfully and executed correctly in our system.

$ cd chisel-tutorial
$ sbt run

It is needed to download necessary components for the first time. If the sbt run successfully, we can get the following output:

[info] Loading project definition from /home/riscv/chisel-tutorial/project
[info] Loading settings for project chisel-tutorial from build.sbt ...
[info] Set current project to chisel-tutorial (in build file:/home/riscv/chisel-tutorial/)
[info] running hello.Hello
[info] [0.001] Elaborating design...
[info] [0.049] Done elaborating.
Computed transform order in: 106.1 ms
Total FIRRTL Compile Time: 223.9 ms
End of dependency graph
Circuit state created
[info] [0.001] SEED 1701115466014
test Hello Success: 1 tests passed in 6 cycles taking 0.008942 seconds
[info] [0.002] RAN 1 CYCLES PASSED
[success] Total time: 2 s, completed Nov 28, 2023, 4:04:27 AM

Using Docker

Follow the instructions in Lab3: Construct a single-cycle RISC-V CPU with Chisel , and install Docker on Ubuntu. Run the following command:

$ docker run -it --rm -p 8888:8888 sysprog21/chisel-bootcamp

Copy the URL in output message, which starts with http://127.0.0.1:8888/, and paste it into your web browser to access the Jupyter Notebook.

Learning Chisel Online

Learn how to write basic Scala code and the concepts of Chisel from Chisel tutorial.
We should go through the following CHEPTER and complete the exercises:

  • 1_intro_to_scala
  • 2.1_first_module
  • 2.2_comb_logic
  • 2.3_control_flow
  • 2.4_sequential_logic
  • 2.5_exercise
  • 2.6_chiseltest
  • 3.1_parameters
  • 3.2_collections
  • 3.2_interlude
  • 3.3_higher-order_functions
  • 3.4_functional_programming
  • 3.5_object_oriented_programming
  • 3.6_types

After we have already completed all the exercises above, we can begin our work on Single-cycle RISC-V CPU

Single-cycle RISC-V CPU

Fork the GitHub repository ca2023-lab3

$ git clone https://github.com/sysprog21/ca2023-lab3
$ cd ca2023-lab3

We can use the following command to check if the single-cycle RISC-V cpu implement sucessfully.

$ sbt test

However, the Scala code in this repository is not entirely complete. Once we run the test directly without filling the lost code, we will get the error message shown below :

[info] Run completed in 9 seconds, 985 milliseconds.
[info] Total number of tests run: 9
[info] Suites: completed 7, aborted 0
[info] Tests: succeeded 3, failed 6, canceled 0, ignored 0, pending 0
[info] *** 6 TESTS FAILED ***
[error] Failed tests:
[error] riscv.singlecycle.InstructionDecoderTest
[error] riscv.singlecycle.ByteAccessTest
[error] riscv.singlecycle.InstructionFetchTest
[error] riscv.singlecycle.ExecuteTest
[error] riscv.singlecycle.FibonacciTest
[error] riscv.singlecycle.QuicksortTest
[error] (Test / test) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 15 s, completed Nov 28, 2023, 10:50:50 AM

Therefore, we have to fill the scala code and passed all of the core test. The code related to core is located in the src/main/scala/riscv directory. If we want to run a single test, such as running only InstructionFetchTest, execute the following command:

$ sbt "testOnly riscv.singlecycle.InstructionFetchTest"

We can check the single-cycle cpu architecture diagram to help us complete the cpu core.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Instruction Fetch

Code can be found in src/main/scala/riscv/core/InstructionFetch.scala

Instruction Fetch scala code
class InstructionFetch extends Module {
  val io = IO(new Bundle {
    val jump_flag_id          = Input(Bool())
    val jump_address_id       = Input(UInt(Parameters.AddrWidth))
    val instruction_read_data = Input(UInt(Parameters.DataWidth))
    val instruction_valid     = Input(Bool())

    val instruction_address = Output(UInt(Parameters.AddrWidth))
    val instruction         = Output(UInt(Parameters.InstructionWidth))
  })
  val pc = RegInit(ProgramCounter.EntryAddress)

  when(io.instruction_valid) {
    io.instruction := io.instruction_read_data
    // lab3(InstructionFetch) begin

    // lab3(InstructionFetch) end

  }.otherwise {
    pc             := pc
    io.instruction := 0x00000013.U
  }
  io.instruction_address := pc
}

We can compare the instruction fetch stage diagram shown below :

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

In instruction fetch stage, we have four inputs( jump_flag_id, jump_address_id, instruction_read_data, instruction_valid ) and two output( instruction_address, instruction ). We have to check if the instruction is valid or not with instruction_valid then check jump_flag_id. once the jump_flag_id is True, the PC is directed to jump_address_id; otherwise, it is incremented to PC + 4.

Instruction Decode

If we run InstructionDecodeTest, we will get the following error :

[info]   firrtl.passes.PassExceptions: firrtl.passes.CheckInitialization$RefNotInitializedException:  @[src/main/scala/riscv/core/InstructionDecode.scala 127:14] : [module InstructionDecode]  Reference io is not fully initialized.
[info]    : io.memory_write_enable <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException:  @[src/main/scala/riscv/core/InstructionDecode.scala 127:14] : [module InstructionDecode]  Reference io is not fully initialized.
[info]    : io.memory_read_enable <= VOID

Therefore, we need to find the correct signal drive io.memory_write_enable and io.memory_read_enable. The io.memory_write_enable is used to implement store operate and the io.memory_read_enable is used to implement load operate.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

When we check the InstructionTypes defined by InstructionDecode.scala, load instructions are defined as InstructionsTypeL and store instructions are defined as InstructionsTypeS

Code can be found in src/main/scala/riscv/core/InstructionDecode.scala

Instruction Decode scala code
object InstructionsTypeL {
  val lb  = "b000".U
  val lh  = "b001".U
  val lw  = "b010".U
  val lbu = "b100".U
  val lhu = "b101".U
}

object InstructionsTypeI {
  val addi  = 0.U
  val slli  = 1.U
  val slti  = 2.U
  val sltiu = 3.U
  val xori  = 4.U
  val sri   = 5.U
  val ori   = 6.U
  val andi  = 7.U
}

object InstructionsTypeS {
  val sb = "b000".U
  val sh = "b001".U
  val sw = "b010".U
}

We can use opcode to check whether the instruction is load or store or not, and then determine the enable signal is True or False.

Execution

If we run ExecuteTest, we will get the following error :

[info]   firrtl.passes.PassExceptions: firrtl.passes.CheckInitialization$RefNotInitializedException:  @[src/main/scala/riscv/core/Execute.scala 32:24] : [module Execute]  Reference alu is not fully initialized.
[info]    : alu.io.op1 <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException:  @[src/main/scala/riscv/core/Execute.scala 32:24] : [module Execute]  Reference alu is not fully initialized.
[info]    : alu.io.op2 <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException:  @[src/main/scala/riscv/core/Execute.scala 32:24] : [module Execute]  Reference alu is not fully initialized.
[info]    : alu.io.func <= VOID

According to the execute stage diagram shown below, we can notice that the two input, aluop1_source and aluop2_source, determine where the alu_op signal come from.

image

If we check the InstructionDecode.scala, we would know the relationship between alu_op and aluop_source. Therefore, we could use conditionals to complete our code.

Code can be found in src/main/scala/riscv/core/InstructionDecode.scala

Instruction Decode scala code
object ALUOp1Source {
  val Register           = 0.U(1.W)
  val InstructionAddress = 1.U(1.W)
}

object ALUOp2Source {
  val Register  = 0.U(1.W)
  val Immediate = 1.U(1.W)
}

Another missing part is alu.io.func. Alu_func signal comes from the ALU Control, and we could check ALUControl.scala. We will use alu_ctrl.io,alu_func as input signal and alu.io.func as output signal.

Code can be found in src/main/scala/riscv/core/ALUControl.scala

ALUControl scala code
class ALUControl extends Module {
  val io = IO(new Bundle {
    val opcode = Input(UInt(7.W))
    val funct3 = Input(UInt(3.W))
    val funct7 = Input(UInt(7.W))

    val alu_funct = Output(ALUFunctions())
  }

CPU

After filling missing scala code above, we could run sbt test again. However, there are still some error appeared, and the output message are shown below :

[info]   firrtl.passes.PassExceptions: firrtl.passes.CheckInitialization$RefNotInitializedException:  @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU]  Reference ex is not fully initialized.
[info]    : ex.io.aluop2_source <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException:  @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU]  Reference ex is not fully initialized.
[info]    : ex.io.aluop1_source <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException:  @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU]  Reference ex is not fully initialized.
[info]    : ex.io.reg2_data <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException:  @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU]  Reference ex is not fully initialized.
[info]    : ex.io.immediate <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException:  @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU]  Reference ex is not fully initialized.
[info]    : ex.io.instruction_address <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException:  @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU]  Reference ex is not fully initialized.
[info]    : ex.io.reg1_data <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException:  @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU]  Reference ex is not fully initialized.
[info]    : ex.io.instruction <= VOID

After checking the error message, we have to drive the missing imput signal during Execute stage. Comparing with the single-cycle cpu architecture diagram above, we can figure out that the signal of aluop_source, immediate and instruction come from InstructionDecoded stage. The signal of reg_data come from RegisterFile and the signal of instruction_address come from InstructionFetch stage. Meanwhile, we can check the corresponded scala code then fill the right input signal.

sbt test

As long as we fix all the missing part of the scala code , we could run sbt test again. If all of the scala code is correct, we would get the following output.

[info] InstructionFetchTest:
[info] InstructionFetch of Single Cycle CPU
[info] - should fetch instruction
[info] ByteAccessTest:
[info] Single Cycle CPU
[info] - should store and load a single byte
[info] QuicksortTest:
[info] Single Cycle CPU
[info] - should perform a quicksort on 10 numbers
[info] InstructionDecoderTest:
[info] ExecuteTest:
[info] InstructionDecoder of Single Cycle CPU
[info] Execution of Single Cycle CPU
[info] - should produce correct control signal
[info] - should execute correctly
[info] RegisterFileTest:
[info] Register File of Single Cycle CPU
[info] - should read the written content
[info] - should x0 always be zero
[info] - should read the writing content
[info] FibonacciTest:
[info] Single Cycle CPU
[info] - should recursively calculate Fibonacci(10)
[info] Run completed in 13 seconds, 26 milliseconds.
[info] Total number of tests run: 9
[info] Suites: completed 7, aborted 0
[info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 17 s, completed Nov 29, 2023, 12:23:55 AM

GTKWave Analysis

We could use waveform simulation to check the signals generated by our cpu. Run the following command.

$ WRITE_VCD=1 sbt test

Afterward, we can find .vcd files generated in test_run_dir folder; Thus , we could open file with GTKWave and analisis relationship between different signals.

InstructionFetch test

image

At the time 33ps, the io_instruction_address = 100C and when it comes to time 35ps. Since the io_instruction_valid is valid and io_jump_id is False(low signal), the io_instruction_address has to add 4 and send into pc. Therefore we can see the pc and io_instruction_address are 1010 at time 35ps.

When it comes to time 37ps, the pc and io_instruction_address should add 4 and be equal to 1014. However, io_jump_flag is True(high signal) and io_jump_address_id is 1000 so pc and io_instruction_address have to be 1000, too.

InstructionDecode test

image

At time 2ps, the input instruction is 00A02223, we can change it from machine code to RISC-V assembly code. we will get sw x10, 4(x0). We can check the output signal. The offset 4 correspond to the io_ex_immediate and io_memory_write_enable is True because we are executing store instruction. The target memory address rd is x4 because 4(x0) and the source data comes from x10 correspond to io_regs_regs2_read_address

Execute test

image

At time 2ps, we can figure out the io_instruction is 001101B3 correspond to add x3, x2, x1. The io_func is equal to 1 and its definition in ALUControl.scala is also add instruction. Therefore, we have to add io_op1 to io_op2, and the result should be 19CAEB99 (155EEA9E + 046C00FB).

Run HW2 ON MyCPU

We could use GNU toolchain to help us run HW2 on MyCPU.
First, we have to set environment on our system. According to Assignment2: RISC-V Toolchain, we have to execute the following command on root folder.

$ cd $HOME
$ source riscv-none-elf-gcc/setenv

Second, keep our RISC-V assembly code in the csrs directory and modify Makefile. After that, to regenerate the RISC-V programs utilized for unit tests, change to the csrc directory and run the make update command

Makefile
CROSS_COMPILE ?= riscv-none-elf-

ASFLAGS = -march=rv32i_zicsr -mabi=ilp32
CFLAGS = -O0 -Wall -march=rv32i_zicsr -mabi=ilp32
LDFLAGS = --oformat=elf32-littleriscv

AS := $(CROSS_COMPILE)as
CC := $(CROSS_COMPILE)gcc
LD := $(CROSS_COMPILE)ld
OBJCOPY := $(CROSS_COMPILE)objcopy

%.o: %.S
$(AS) -R $(ASFLAGS) -o $@ $<
%.elf: %.S
$(AS) -R $(ASFLAGS) -o $(@:.elf=.o) $<
$(CROSS_COMPILE)ld -o $@ -T link.lds $(LDFLAGS) $(@:.elf=.o)
%.elf: %.c init.o
$(CC) $(CFLAGS) -c -o $(@:.elf=.o) $<
$(CROSS_COMPILE)ld -o $@ -T link.lds $(LDFLAGS) $(@:.elf=.o) init.o

%.asmbin: %.elf
$(OBJCOPY) -O binary -j .text -j .data $< $@

BINS = \
fibonacci.asmbin \
hello.asmbin \
mmio.asmbin \
quicksort.asmbin \
sb.asmbin

# Clear the .DEFAULT_GOAL special variable, so that the following turns
# to the first target after .DEFAULT_GOAL is not set.
.DEFAULT_GOAL :=

all: $(BINS)

update: $(BINS)
cp -f $(BINS) ../src/main/resources

clean:
$(RM) *.o *.elf *.asmbin
$ cd csrc
$ make update

If make command run successfully, we will get new .asmbin in csrc folder. In addition, we can use sbt test to test our file. Sbt test activates the test cases for validating the CPU implementation relies on Chiseltest. We have to modify CPUTest.scala first.

File: src/test/scala/riscv/singlecycle/CPUTest.scala

The code I add to the file is shown below. We can run sbt test command and confirm whether our .asmbin file pass test or not.

CPUTest.scala
class ipcheckTest extends AnyFlatSpec with ChiselScalatestTester {
  behavior.of("Single Cycle CPU")
  it should "report success" in {
    test(new TestTopModule("ipcheck.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
      for (i <- 1 to 2000) {
        c.clock.step()
        c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
      }

      c.io.regs_debug_read_address.poke(1.U)
      c.io.regs_debug_read_data.expect(0x1300.U)
      c.io.regs_debug_read_address.poke(2.U)

      c.io.regs_debug_read_data.expect(0xf80.U)      
      c.io.regs_debug_read_address.poke(8.U)

      c.io.regs_debug_read_data.expect(0xfb0.U)
      c.io.regs_debug_read_address.poke(9.U)

      c.io.regs_debug_read_data.expect(0xfc4.U)
      c.io.regs_debug_read_address.poke(12.U)

      c.io.regs_debug_read_data.expect(0x10.U)    
      c.io.regs_debug_read_address.poke(15.U)

      c.io.regs_debug_read_data.expect(0xfcc.U)    
      c.io.regs_debug_read_address.poke(16.U)

      c.io.regs_debug_read_data.expect(0x8.U)                
    }
    println("success")
  }
}

Third, we need to execute the following command in the project’s root directory to generate Verilog files :

$ make verilator

To load the ipcheck.asmbin file, simulate for 1000 cycles, and save the simulation waveform to the dump.vcd file, we can run :

$ ./run-verilator.sh -instruction src/main/resources/ipcheck.asmbin -time 2000 -vcd dump.vcd

Then, open dump.vcd with GTKWave to check its waveform.

InstructionFetch

IF1
At time 2ps, io_instruction_valid is True and io_jump_flag_id is False so the pc will point to instruction address 0x00001000. The pointed instruction is 0x00001137 which will be send to InstructionDecode stage.

InstructionDecode

ID1
The instruction will decoded in this stage. If we check the RISC-V instruction set manual, we will find the correspond code is lui x2, 1. We can confirm the signals in the GTKWave. Since lui is a U-type instruction, io_reg_write_enable will be True. rd and io_reg_write_address is 02 due to the target register is x2. lui places the U-immediate value in the top 20 bits of the destination register rd, filling in the lowest 12 bits with zeros. Therefore, 0x00001000 will be placed in x2 register.

Execute

EXE1
io_aluop1_source is 0 and io_aluop2_source is 1. Alu will implement add function because of func is 1, so the result will be 0x00001000.

Register

REG1

io_write_address is 2 and io_write_data is 0x00001000 . The value of register x2 change into 0x00001000 at time 5ps, when is the next cycle signal begin.

Memory

MEM1
The lui instruction doesn't implement read or write in memory so most of the signals will be 0 in this stage.

Writeback

WRB1
It will send 0x00001000 back to the registers, and x2 will be changed when the next cycle begin.

Reference