# Assignment3: Single-cycle RISC-V CPU
contributed by < [`ChengChiTing`](https://github.com/ChengChiTing) >
###### tags: `RISC-V`, `jserv`
## Implementation in Chisel.
### Operating Systems
Ubuntu 22.04.3
### Environment Setup
Follow the instructions in [Lab3: Construct a single-cycle RISC-V CPU with Chisel](https://hackmd.io/@sysprog/r1mlr3I7p), install the essential package.
#### Install verilator and gtkwave
```
$ sudo apt install build-essential verilator gtkwave
```
verilator version : 4.038
GTKWave Analyzer version : 3.3.104
#### Install sbt( the Scala build tool )
```shell
$ curl -s "https://get.sdkman.io" | bash
$ source "$HOME/.sdkman/bin/sdkman-init.sh"
$ sdk install java 11.0.21-tem
$ sdk install sbt
```
java version : 11.0.21
sbt version : 1.9.7
>note! java version is crucial!
>Version 11 & 17 can execute sbt command, but I encountered some error when I used java 21 in my first time environment setting.
## Chisel Tutorial
Before we start Lab 3, we have to learn the fundamental concepts of Chisel first. Chisel is a domain specific language (DSL) implemented using Scala’s macro features. We can get the Repositiory with git command.
```
$ git clone https://github.com/ucb-bar/chisel-tutorial
```
After then, we can use the following command to check whether sbt installed successfully and executed correctly in our system.
```shell
$ cd chisel-tutorial
$ sbt run
```
It is needed to download necessary components for the first time. If the sbt run successfully, we can get the following output:
```
[info] Loading project definition from /home/riscv/chisel-tutorial/project
[info] Loading settings for project chisel-tutorial from build.sbt ...
[info] Set current project to chisel-tutorial (in build file:/home/riscv/chisel-tutorial/)
[info] running hello.Hello
[info] [0.001] Elaborating design...
[info] [0.049] Done elaborating.
Computed transform order in: 106.1 ms
Total FIRRTL Compile Time: 223.9 ms
End of dependency graph
Circuit state created
[info] [0.001] SEED 1701115466014
test Hello Success: 1 tests passed in 6 cycles taking 0.008942 seconds
[info] [0.002] RAN 1 CYCLES PASSED
[success] Total time: 2 s, completed Nov 28, 2023, 4:04:27 AM
```
### Using Docker
Follow the instructions in [Lab3: Construct a single-cycle RISC-V CPU with Chisel](https://hackmd.io/@sysprog/r1mlr3I7p) , and install Docker on Ubuntu. Run the following command:
```
$ docker run -it --rm -p 8888:8888 sysprog21/chisel-bootcamp
```
Copy the URL in output message, which starts with http://127.0.0.1:8888/, and paste it into your web browser to access the [Jupyter Notebook](https://jupyter.org/).
### Learning Chisel Online
Learn how to write basic Scala code and the concepts of Chisel from [Chisel tutorial](https://hub.ovh2.mybinder.org/user/freechipsproject-chisel-bootcamp-kinh6utn/lab/tree/1_intro_to_scala.ipynb).
We should go through the following CHEPTER and complete the exercises:
* 1_intro_to_scala
* 2.1_first_module
* 2.2_comb_logic
* 2.3_control_flow
* 2.4_sequential_logic
* 2.5_exercise
* 2.6_chiseltest
* 3.1_parameters
* 3.2_collections
* 3.2_interlude
* 3.3_higher-order_functions
* 3.4_functional_programming
* 3.5_object_oriented_programming
* 3.6_types
After we have already completed all the exercises above, we can begin our work on Single-cycle RISC-V CPU
## Single-cycle RISC-V CPU
Fork the GitHub repository ca2023-lab3
```shell
$ git clone https://github.com/sysprog21/ca2023-lab3
$ cd ca2023-lab3
```
We can use the following command to check if the single-cycle RISC-V cpu implement sucessfully.
```shell
$ sbt test
```
However, the Scala code in this repository is not entirely complete. Once we run the test directly without filling the lost code, we will get the error message shown below :
```
[info] Run completed in 9 seconds, 985 milliseconds.
[info] Total number of tests run: 9
[info] Suites: completed 7, aborted 0
[info] Tests: succeeded 3, failed 6, canceled 0, ignored 0, pending 0
[info] *** 6 TESTS FAILED ***
[error] Failed tests:
[error] riscv.singlecycle.InstructionDecoderTest
[error] riscv.singlecycle.ByteAccessTest
[error] riscv.singlecycle.InstructionFetchTest
[error] riscv.singlecycle.ExecuteTest
[error] riscv.singlecycle.FibonacciTest
[error] riscv.singlecycle.QuicksortTest
[error] (Test / test) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 15 s, completed Nov 28, 2023, 10:50:50 AM
```
Therefore, we have to fill the scala code and passed all of the core test. The code related to core is located in the `src/main/scala/riscv` directory. If we want to run a single test, such as running only InstructionFetchTest, execute the following command:
```shell
$ sbt "testOnly riscv.singlecycle.InstructionFetchTest"
```
We can check the single-cycle cpu architecture diagram to help us complete the cpu core.

### Instruction Fetch
>Code can be found in src/main/scala/riscv/core/InstructionFetch.scala
:::spoiler Instruction Fetch scala code
```scala
class InstructionFetch extends Module {
val io = IO(new Bundle {
val jump_flag_id = Input(Bool())
val jump_address_id = Input(UInt(Parameters.AddrWidth))
val instruction_read_data = Input(UInt(Parameters.DataWidth))
val instruction_valid = Input(Bool())
val instruction_address = Output(UInt(Parameters.AddrWidth))
val instruction = Output(UInt(Parameters.InstructionWidth))
})
val pc = RegInit(ProgramCounter.EntryAddress)
when(io.instruction_valid) {
io.instruction := io.instruction_read_data
// lab3(InstructionFetch) begin
// lab3(InstructionFetch) end
}.otherwise {
pc := pc
io.instruction := 0x00000013.U
}
io.instruction_address := pc
}
```
:::
We can compare the instruction fetch stage diagram shown below :

In instruction fetch stage, we have four inputs( jump_flag_id, jump_address_id, instruction_read_data, instruction_valid ) and two output( instruction_address, instruction ). We have to check if the instruction is valid or not with `instruction_valid` then check `jump_flag_id`. once the `jump_flag_id` is `True`, the PC is directed to `jump_address_id`; otherwise, it is incremented to PC + 4.
### Instruction Decode
If we run InstructionDecodeTest, we will get the following error :
```
[info] firrtl.passes.PassExceptions: firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/InstructionDecode.scala 127:14] : [module InstructionDecode] Reference io is not fully initialized.
[info] : io.memory_write_enable <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/InstructionDecode.scala 127:14] : [module InstructionDecode] Reference io is not fully initialized.
[info] : io.memory_read_enable <= VOID
```
Therefore, we need to find the correct signal drive `io.memory_write_enable` and `io.memory_read_enable`. The `io.memory_write_enable` is used to implement `store` operate and the `io.memory_read_enable` is used to implement `load` operate.

When we check the `InstructionTypes` defined by `InstructionDecode.scala`, `load` instructions are defined as `InstructionsTypeL` and `store` instructions are defined as `InstructionsTypeS`
>Code can be found in src/main/scala/riscv/core/InstructionDecode.scala
:::spoiler Instruction Decode scala code
```scala
object InstructionsTypeL {
val lb = "b000".U
val lh = "b001".U
val lw = "b010".U
val lbu = "b100".U
val lhu = "b101".U
}
object InstructionsTypeI {
val addi = 0.U
val slli = 1.U
val slti = 2.U
val sltiu = 3.U
val xori = 4.U
val sri = 5.U
val ori = 6.U
val andi = 7.U
}
object InstructionsTypeS {
val sb = "b000".U
val sh = "b001".U
val sw = "b010".U
}
```
:::
We can use opcode to check whether the instruction is `load` or `store` or not, and then determine the enable signal is `True` or `False`.
### Execution
If we run ExecuteTest, we will get the following error :
```
[info] firrtl.passes.PassExceptions: firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/Execute.scala 32:24] : [module Execute] Reference alu is not fully initialized.
[info] : alu.io.op1 <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/Execute.scala 32:24] : [module Execute] Reference alu is not fully initialized.
[info] : alu.io.op2 <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/Execute.scala 32:24] : [module Execute] Reference alu is not fully initialized.
[info] : alu.io.func <= VOID
```
According to the execute stage diagram shown below, we can notice that the two input, `aluop1_source` and `aluop2_source`, determine where the alu_op signal come from.

If we check the `InstructionDecode.scala`, we would know the relationship between alu_op and aluop_source. Therefore, we could use conditionals to complete our code.
>Code can be found in src/main/scala/riscv/core/InstructionDecode.scala
:::spoiler Instruction Decode scala code
```scala
object ALUOp1Source {
val Register = 0.U(1.W)
val InstructionAddress = 1.U(1.W)
}
object ALUOp2Source {
val Register = 0.U(1.W)
val Immediate = 1.U(1.W)
}
```
:::
Another missing part is `alu.io.func`. Alu_func signal comes from the `ALU Control`, and we could check `ALUControl.scala`. We will use `alu_ctrl.io,alu_func` as input signal and `alu.io.func` as output signal.
>Code can be found in src/main/scala/riscv/core/ALUControl.scala
:::spoiler ALUControl scala code
```
class ALUControl extends Module {
val io = IO(new Bundle {
val opcode = Input(UInt(7.W))
val funct3 = Input(UInt(3.W))
val funct7 = Input(UInt(7.W))
val alu_funct = Output(ALUFunctions())
}
```
:::
### CPU
After filling missing scala code above, we could run `sbt test` again. However, there are still some error appeared, and the output message are shown below :
```
[info] firrtl.passes.PassExceptions: firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU] Reference ex is not fully initialized.
[info] : ex.io.aluop2_source <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU] Reference ex is not fully initialized.
[info] : ex.io.aluop1_source <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU] Reference ex is not fully initialized.
[info] : ex.io.reg2_data <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU] Reference ex is not fully initialized.
[info] : ex.io.immediate <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU] Reference ex is not fully initialized.
[info] : ex.io.instruction_address <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU] Reference ex is not fully initialized.
[info] : ex.io.reg1_data <= VOID
[info] firrtl.passes.CheckInitialization$RefNotInitializedException: @[src/main/scala/riscv/core/CPU.scala 17:26] : [module CPU] Reference ex is not fully initialized.
[info] : ex.io.instruction <= VOID
```
After checking the error message, we have to drive the missing imput signal during Execute stage. Comparing with the single-cycle cpu architecture diagram above, we can figure out that the signal of `aluop_source`, `immediate` and `instruction` come from InstructionDecoded stage. The signal of `reg_data` come from RegisterFile and the signal of `instruction_address` come from InstructionFetch stage. Meanwhile, we can check the corresponded scala code then fill the right input signal.
### sbt test
As long as we fix all the missing part of the scala code , we could run `sbt test` again. If all of the scala code is correct, we would get the following output.
```
[info] InstructionFetchTest:
[info] InstructionFetch of Single Cycle CPU
[info] - should fetch instruction
[info] ByteAccessTest:
[info] Single Cycle CPU
[info] - should store and load a single byte
[info] QuicksortTest:
[info] Single Cycle CPU
[info] - should perform a quicksort on 10 numbers
[info] InstructionDecoderTest:
[info] ExecuteTest:
[info] InstructionDecoder of Single Cycle CPU
[info] Execution of Single Cycle CPU
[info] - should produce correct control signal
[info] - should execute correctly
[info] RegisterFileTest:
[info] Register File of Single Cycle CPU
[info] - should read the written content
[info] - should x0 always be zero
[info] - should read the writing content
[info] FibonacciTest:
[info] Single Cycle CPU
[info] - should recursively calculate Fibonacci(10)
[info] Run completed in 13 seconds, 26 milliseconds.
[info] Total number of tests run: 9
[info] Suites: completed 7, aborted 0
[info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 17 s, completed Nov 29, 2023, 12:23:55 AM
```
## GTKWave Analysis
We could use waveform simulation to check the signals generated by our cpu. Run the following command.
```shell
$ WRITE_VCD=1 sbt test
```
Afterward, we can find `.vcd` files generated in `test_run_dir` folder; Thus , we could open file with GTKWave and analisis relationship between different signals.
### InstructionFetch test

At the time 33ps, the `io_instruction_address` = 100C and when it comes to time 35ps. Since `the io_instruction_valid` is valid and `io_jump_id` is `False`(low signal), the `io_instruction_address` has to add 4 and send into `pc`. Therefore we can see the `pc` and `io_instruction_address` are 1010 at time 35ps.
When it comes to time 37ps, the `pc` and `io_instruction_address` should add 4 and be equal to 1014. However, io_jump_flag is `True`(high signal) and io_jump_address_id is 1000 so `pc` and `io_instruction_address` have to be 1000, too.
### InstructionDecode test

At time 2ps, the input instruction is `00A02223`, we can change it from machine code to RISC-V assembly code. we will get `sw x10, 4(x0)`. We can check the output signal. The offset `4` correspond to the `io_ex_immediate` and `io_memory_write_enable` is `True` because we are executing store instruction. The target memory address `rd` is `x4` because `4(x0)` and the source data comes from `x10` correspond to `io_regs_regs2_read_address`
### Execute test

At time 2ps, we can figure out the `io_instruction` is `001101B3` correspond to `add x3, x2, x1`. The `io_func` is equal to `1` and its definition in `ALUControl.scala` is also `add` instruction. Therefore, we have to add io_op1 to io_op2, and the result should be `19CAEB99` (`155EEA9E` + `046C00FB`).
## Run HW2 ON MyCPU
We could use GNU toolchain to help us run HW2 on MyCPU.
First, we have to set environment on our system. According to [Assignment2: RISC-V Toolchain](https://hackmd.io/5TIUG_u-RPmyWI7YBmSx0w), we have to execute the following command on root folder.
```shell
$ cd $HOME
$ source riscv-none-elf-gcc/setenv
```
Second, keep our RISC-V assembly code in the `csrs` directory and modify `Makefile`. After that, to regenerate the RISC-V programs utilized for unit tests, change to the `csrc` directory and run the `make update` command
:::spoiler Makefile
```
CROSS_COMPILE ?= riscv-none-elf-
ASFLAGS = -march=rv32i_zicsr -mabi=ilp32
CFLAGS = -O0 -Wall -march=rv32i_zicsr -mabi=ilp32
LDFLAGS = --oformat=elf32-littleriscv
AS := $(CROSS_COMPILE)as
CC := $(CROSS_COMPILE)gcc
LD := $(CROSS_COMPILE)ld
OBJCOPY := $(CROSS_COMPILE)objcopy
%.o: %.S
$(AS) -R $(ASFLAGS) -o $@ $<
%.elf: %.S
$(AS) -R $(ASFLAGS) -o $(@:.elf=.o) $<
$(CROSS_COMPILE)ld -o $@ -T link.lds $(LDFLAGS) $(@:.elf=.o)
%.elf: %.c init.o
$(CC) $(CFLAGS) -c -o $(@:.elf=.o) $<
$(CROSS_COMPILE)ld -o $@ -T link.lds $(LDFLAGS) $(@:.elf=.o) init.o
%.asmbin: %.elf
$(OBJCOPY) -O binary -j .text -j .data $< $@
BINS = \
fibonacci.asmbin \
hello.asmbin \
mmio.asmbin \
quicksort.asmbin \
sb.asmbin
# Clear the .DEFAULT_GOAL special variable, so that the following turns
# to the first target after .DEFAULT_GOAL is not set.
.DEFAULT_GOAL :=
all: $(BINS)
update: $(BINS)
cp -f $(BINS) ../src/main/resources
clean:
$(RM) *.o *.elf *.asmbin
```
:::
```shell
$ cd csrc
$ make update
```
If `make` command run successfully, we will get new `.asmbin` in `csrc` folder. In addition, we can use `sbt test` to test our file. Sbt test activates the test cases for validating the CPU implementation relies on Chiseltest. We have to modify `CPUTest.scala` first.
>File: src/test/scala/riscv/singlecycle/CPUTest.scala
The code I add to the file is shown below. We can run `sbt test` command and confirm whether our `.asmbin` file pass test or not.
:::spoiler CPUTest.scala
```scala
class ipcheckTest extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("Single Cycle CPU")
it should "report success" in {
test(new TestTopModule("ipcheck.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
for (i <- 1 to 2000) {
c.clock.step()
c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
}
c.io.regs_debug_read_address.poke(1.U)
c.io.regs_debug_read_data.expect(0x1300.U)
c.io.regs_debug_read_address.poke(2.U)
c.io.regs_debug_read_data.expect(0xf80.U)
c.io.regs_debug_read_address.poke(8.U)
c.io.regs_debug_read_data.expect(0xfb0.U)
c.io.regs_debug_read_address.poke(9.U)
c.io.regs_debug_read_data.expect(0xfc4.U)
c.io.regs_debug_read_address.poke(12.U)
c.io.regs_debug_read_data.expect(0x10.U)
c.io.regs_debug_read_address.poke(15.U)
c.io.regs_debug_read_data.expect(0xfcc.U)
c.io.regs_debug_read_address.poke(16.U)
c.io.regs_debug_read_data.expect(0x8.U)
}
println("success")
}
}
```
:::
Third, we need to execute the following command in the project’s root directory to generate Verilog files :
```shell
$ make verilator
```
To load the ipcheck.asmbin file, simulate for 1000 cycles, and save the simulation waveform to the dump.vcd file, we can run :
```shell
$ ./run-verilator.sh -instruction src/main/resources/ipcheck.asmbin -time 2000 -vcd dump.vcd
```
Then, open `dump.vcd` with `GTKWave` to check its waveform.
### InstructionFetch

At time 2ps, `io_instruction_valid` is `True` and `io_jump_flag_id` is `False` so the pc will point to instruction address `0x00001000`. The pointed instruction is `0x00001137` which will be send to InstructionDecode stage.
### InstructionDecode

The instruction will decoded in this stage. If we check the RISC-V instruction set manual, we will find the correspond code is `lui x2, 1`. We can confirm the signals in the GTKWave. Since `lui` is a U-type instruction, `io_reg_write_enable` will be `True`. `rd` and `io_reg_write_address` is 02 due to the target register is `x2`. `lui` places the U-immediate value in the top 20 bits of the destination register rd, filling in the lowest 12 bits with zeros. Therefore, `0x00001000` will be placed in `x2` register.
### Execute

`io_aluop1_source` is 0 and `io_aluop2_source` is 1. Alu will implement `add` function because of `func` is 1, so the result will be `0x00001000`.
### Register

`io_write_address` is 2 and `io_write_data` is `0x00001000` . The value of register `x2` change into `0x00001000` at time 5ps, when is the next cycle signal begin.
### Memory

The `lui` instruction doesn't implement read or write in memory so most of the signals will be `0` in this stage.
### Writeback

It will send `0x00001000` back to the registers, and `x2` will be changed when the next cycle begin.
# Reference
* [The RISC-V Instruction Set Manual](https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf)
* [chisel-tutorial](https://github.com/ucb-bar/chisel-tutorial)
* [chisel ca2023-lab3](https://github.com/sysprog21/ca2023-lab3)
* [Lab3: Construct a single-cycle RISC-V CPU with Chisel](https://hackmd.io/@sysprog/r1mlr3I7p#Single-cycle-RISC-V-CPU)