# Assignment3: Single-cycle RISC-V CPU
contributed by < [`yutingshih`](https://github.com/yutingshih) >
## Environment Setup
- OS: macOS 14.1
- Architecture: AArch64
### Install Verilator and GTKWave
```shell
brew install verilator gtkwave
```
### Install JDK and SBT
```shell
## Install sdkman
curl -s "https://get.sdkman.io" | bash
source "$HOME/.sdkman/bin/sdkman-init.sh"
## Install Eclipse Temurin JDK 11
sdk install java 11.0.21-tem
sdk install sbt
```
## Chisel HDL
### Hello World in Chisel
This `Hello` module blinks the LED every 50000000 cycles with a duty ratio of 50%. That is, the LED is turned on for 25000000 cycles, and then turned off for 25000000 cycles.
```scala=
class Hello extends Module {
val io = IO(new Bundle {
val led = Output(UInt(1.W))
})
val CNT_MAX = (50000000 / 2 - 1).U;
val cntReg = RegInit(0.U(32.W))
val blkReg = RegInit(0.U(1.W))
cntReg := cntReg + 1.U
when(cntReg === CNT_MAX) {
cntReg := 0.U
blkReg := ~blkReg
}
io.led := blkReg
}
```
Since the maximum value of `CNT_MAX` is less than 4294967295 ($=2^{32} - 1$), we don't need 32 bits for `cntReg`. We only need $ceil(log_2(24999999)) = 25$ bits for the `cntReg`. Thus, the enhanced version of `Hello` module is as following, which saves 7 bits registers.
```scala=
class Hello extends Module {
val io = IO(new Bundle {
val led = Output(UInt(1.W))
})
val CNT_MAX = (50000000 / 2 - 1).U;
val cntReg = RegInit(0.U(25.W))
val blkReg = RegInit(0.U(1.W))
cntReg := cntReg + 1.U
when(cntReg === CNT_MAX) {
cntReg := 0.U
blkReg := ~blkReg
}
io.led := blkReg
}
```
For more flexible design, we can further enhance the `Hello` module's functionality by parameterizing the blinking period with the syntax of hardware generator of Chisel. Note that we use `log2Up` function to calculate the maximum number of bits of the counter `cntReg` needs.
```scala=
class Hello(val period: Int = 50000000) extends Module {
val io = IO(new Bundle {
val led = Output(UInt(1.W))
})
val CNT_MAX = (period / 2 - 1).U;
val cntReg = RegInit(0.U(log2Up(CNT_MAX).W))
val blkReg = RegInit(0.U(1.W))
cntReg := cntReg + 1.U
when(cntReg === CNT_MAX) {
cntReg := 0.U
blkReg := ~blkReg
}
io.led := blkReg
}
```
## Single Cycle CPU
[yutingshih/ca2023-lab3](https://github.com/yutingshih/ca2023-lab3)
Get the repository:
```shell
$ git clone https://github.com/sysprog21/ca2023-lab3
$ cd ca2023-lab3
```
To simulate and run tests for this project, execute the following commands under the `ca2023-lab3` directory.
```shell
$ sbt test
```
Alternately, run `make` or `make test`.
Since we have not made any efforts to correct the given implementation, all test cases will fail as shown below:
```
[info] *** 6 TESTS FAILED ***
[error] Failed tests:
[error] riscv.singlecycle.InstructionDecoderTest
[error] riscv.singlecycle.ByteAccessTest
[error] riscv.singlecycle.InstructionFetchTest
[error] riscv.singlecycle.ExecuteTest
[error] riscv.singlecycle.FibonacciTest
[error] riscv.singlecycle.QuicksortTest
[error] (Test / test) sbt.TestsFailedException: Tests unsuccessful
```
### Waveform Analysis
[GTKWave](https://gtkwave.sourceforge.net/) have problem to open on macOS 14 (see [issue #250](https://github.com/gtkwave/gtkwave/issues/250)). Fortunately, there is a waveform viewer extension for VSCode, [WaveTrace](https://www.wavetrace.io/), can be the alternative of GTKWave.
### Instruction Fetch Stage
The `InstructionFetchTest` randomly generates the control signal `jump_flag_id` for branch taken or not taken to test if the value of `instruction_address` (program counter, PC) outputed by `InstructionFetch` module is as expected.
- For the case of **branch not taken**, the value of next PC should be the current value of PC + 4.
- For the case of **branch taken**, the value of next PC would be `jump_address_id`.
#### Testbench
```shell
$ WRITE_VCD=1 sbt "testOnly riscv.singlecycle.InstructionFetchTest"
```
```
[info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21)
[info] loading settings for project ca2023-lab3-build from plugins.sbt ...
[info] loading project definition from /Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/)
[info] InstructionFetchTest:
[info] InstructionFetch of Single Cycle CPU
[info] - should fetch instruction
[info] Run completed in 4 seconds, 625 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
#### Case 1: Branch Not Taken
**21 ps**: `io_jump_flag_id` is low, the next value of `pc` is set to be `pc + 4` (`0x1004`)

#### Case 2: Branch Taken
**23 ps**: `io_jump_flag_id` is high, the next value of `pc` is set to be `io_jump_address_id` (`0x1000`)

### Instruction Decode Stage
The `IntructionDecoderTest` sequentially gives sw (`sw x10, 4(x0)`), lui (`lui x5, 2`), and add (`add x3, x1, x2`) instructions to test if the `IntructionDecoder` module correctly decodes the instructions into the corresponding fields.
The blank lines left in `InstructionDeocder.scala` should be implemented to generate the `memory_read_enable` and `memory_write_enable` control signals for the data memory (DM) read/write.
#### Testbench
```shell
$ WRITE_VCD=1 sbt "testOnly riscv.singlecycle.InstructionDecoderTest"
```
```
[info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21)
[info] loading settings for project ca2023-lab3-build from plugins.sbt ...
[info] loading project definition from /Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/)
[info] InstructionDecoderTest:
[info] InstructionDecoder of Single Cycle CPU
[info] - should produce correct control signal
[info] Run completed in 5 seconds, 118 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
**3 ps**: when the `io_instruction` is `0x00A02223` (`sw a0, 4(x0)`), the `io_memory_write_enable` should be `0b1`, `io_ex_immediate` should be `0x4`, `io_reg1_read_address` should be `0x0`, and `io_reg2_read_address` should be `0xA` (`a0`).

### Execute Stage
The `Execute` module is responsible for the following computations:
- arithmetic and logic computation
- address of data memory access
- control signal and address of branch jump
Therefore, the `ExecuteTest` executes 100 additions (`add x3, x2, x1`) with randomly-generated operands assigned to `x1` and `x2` and then test the branch jump (`beq x1, x2, 2`) functionality with both taken and not taken cases.
#### Testbench
```shell
$ WRITE_VCD=1 sbt "testOnly riscv.singlecycle.ExecuteTest"
```
```
[info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21)
[info] loading settings for project ca2023-lab3-build from plugins.sbt ...
[info] loading project definition from /Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/)
[info] ExecuteTest:
[info] Execution of Single Cycle CPU
[info] - should execute correctly
[info] Run completed in 4 seconds, 769 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
#### Case 1: Arithmetic Operations
`io_instruction` is `0x001101B3` (`beq x1, x2, 2`)
**3 ps**: `io_reg1_data` is `0x0517DC88` and `io_reg2_data` is `0x0848030E`, so the `io_mem_alu_result` should be `0x0D5FDF96` (`= 0x0517DC88 + 0x0848030E`)
**5 ps**: `io_reg1_data` is `0x0CC22046` and `io_reg2_data` is `0x049DBDC0`, so the `io_mem_alu_result` should be `0x115FDE06` (`= 0x0CC22046 + 0x049DBDC0`)

#### Case 2: Jump Branch Operations
`io_instruction` is `0x00208163` (`beq x1, x2, 2`)
**205 ps**: `io_reg1_data` is not equal to `io_reg2_data`, so `io_if_jump_flag` should be low.
**207 ps**: `io_reg1_data` is not equal to `io_reg2_data`, so `io_if_jump_flag` should be high, and the next `pc` should be set as the value of `io_if_jump_address`.

### CPU Integration - Fibonacci
The `FibonacciTest` recursively calculates the 10th term of Fibonacci sequence by loading the `src/main/resources/fibonacci.asmbin` execuable file.
```shell
$ WRITE_VCD=1 sbt "testOnly riscv.singlecycle.FibonacciTest"
```
```
[info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21)
[info] loading settings for project ca2023-lab3-build from plugins.sbt ...
[info] loading project definition from /Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/)
[info] FibonacciTest:
[info] Single Cycle CPU
[info] - should recursively calculate Fibonacci(10)
[info] Run completed in 6 seconds, 186 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
### CPU Integration - Quick Sort
The `QuicksortTest` performs a quicksort on 10 integers by loading the `src/main/resources/quicksort.asmbin` execuable file.
```shell
$ WRITE_VCD=1 sbt "testOnly riscv.singlecycle.QuicksortTest"
```
```
[info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21)
[info] loading settings for project ca2023-lab3-build from plugins.sbt ...
[info] loading project definition from /Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/)
[info] QuicksortTest:
[info] Single Cycle CPU
[info] - should perform a quicksort on 10 numbers
[info] Run completed in 6 seconds, 840 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
### CPU Integration - Byte Access
The `ByteAccessTest` stores and loads a single byte by loading the `src/main/resources/sb.asmbin` execuable file.
```shell
$ WRITE_VCD=1 sbt "testOnly riscv.singlecycle.ByteAccessTest"
```
```
[info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21)
[info] loading settings for project ca2023-lab3-build from plugins.sbt ...
[info] loading project definition from /Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/)
[info] ByteAccessTest:
[info] Single Cycle CPU
[info] - should store and load a single byte
[info] Run completed in 5 seconds, 959 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
### Combine All Together
```shell
$ sbt test
```
```
[info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21)
[info] loading settings for project ca2023-lab3-build from plugins.sbt ...
[info] loading project definition from /Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/)
[info] InstructionFetchTest:
[info] InstructionFetch of Single Cycle CPU
[info] - should fetch instruction
[info] InstructionDecoderTest:
[info] ExecuteTest:
[info] FibonacciTest:
[info] Execution of Single Cycle CPU
[info] InstructionDecoder of Single Cycle CPU
[info] - should execute correctly
[info] Single Cycle CPU
[info] - should produce correct control signal
[info] - should recursively calculate Fibonacci(10)
[info] ByteAccessTest:
[info] Single Cycle CPU
[info] - should store and load a single byte
[info] QuicksortTest:
[info] Single Cycle CPU
[info] - should perform a quicksort on 10 numbers
[info] RegisterFileTest:
[info] Register File of Single Cycle CPU
[info] - should read the written content
[info] - should x0 always be zero
[info] - should read the writing content
[info] Run completed in 25 seconds, 9 milliseconds.
[info] Total number of tests run: 9
[info] Suites: completed 7, aborted 0
[info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
## Software Porting
Port the program in [Homework 2](https://hackmd.io/@yutingshih/arch2023-homework2) to MyCPU.
### Modification
Instead of showing the result on the screen, I modify `csrc/ln_bf16.c` to write the result (error code returned by the unit test function) to the address `0x4`, the same as `csrc/fibonacci.c`.
```c
int main() {
float average_error = 0, maximal_error = 0;
int error_code = test_ln_bf16(40, &average_error, &maximal_error);
*((volatile int *) (4)) = error_code;
return error_code;
}
```
I reuse the Makefile used in Homework 2 but add the additional targets `update` and `%.asmbin` for Homework 3.
```makefile
# Usage:
# make [all] compile all the targets
# make TARGET compile a specific target
# make TARGET [TARGET [...]] compile specific targets
# make test run tests for all the targets
# make test_TARGET run test for a specific target
# make test_TARGET [test_TARGET [...]] run tests for specific targets
# make clean delete all the executables
#
# Example:
# make (compile all the targets)
# make clean test (delete all the executables, compile and run all the tests)
# make all test_mul_bf16 (compile all the targets but only run test for mul_bf16)
NAME ?= ln_bf16
ELF = $(NAME).elf
BIN = $(NAME).asmbin
OBJ = $(addsuffix .o, $(BIN))
CROSS ?= riscv-none-elf-
CC := $(CROSS)gcc
OBJCOPY := $(CROSS)objcopy
OPT := 0
CFLAGS := -Wall -O$(OPT)
CPPFLAGS =
LDLIBS = -lm
ifdef CROSS
CFLAGS += -march=rv32i_zicsr_zifencei -mabi=ilp32
RUNTIME ?= rv32emu
endif
all: $(BIN)
%.elf: %.c
-$(CC) $(CPPFLAGS) $(CFLAGS) -o $@ $< $(LDLIBS)
%.asmbin: %.elf
$(OBJCOPY) -O binary -j .text -j .data $< $@
test: $(addprefix test_, $(NAME))
test_%: %.elf
-@$(RUNTIME) $<
bench: CPPFLAGS += -I../perfcounter
bench: LDLIBS += perfcounter/perfcount.a
bench: test
update: $(BIN)
cp -f $(BIN) ../src/main/resources
clean:
-@$(RM) -v $(BIN)
```
### Testbench
Extend the testbench in `src/test/scala/riscv/singlecycle/CPUTest.scala` with `LnBf16Test`, which runs BFloat16 natural logarithm approximation with `src/main/resources/ln_bf16.asmbin` to verify the design of MyCPU. It will read the address `0x4` to get the error code of unit test function, and check if it is correct.
```scala
class LnBf16Test extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("Single Cycle CPU")
it should "calculate approximation of ln_bf16" in {
test(new TestTopModule("ln_bf16.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
for (i <- 1 to 50) {
c.clock.step(1000)
c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
}
c.io.mem_debug_read_address.poke(4.U)
c.clock.step()
c.io.mem_debug_read_data.expect(0.U)
}
}
}
```
### Correctness
Compile the C source code and copy to the `src/main/resources` directory.
```shell
$ make -f ln_bf16.mk test_ln_bn16 update
```
Run the testbench.
```shell
$ WRITE_VCD=1 sbt "testOnly riscv.singlecycle.LnBf16Test"
```
```
[info] compiling 1 Scala source to /Users/yuting/projects/school/ca2023/hw3/ca2023-lab3/target/scala-2.13/test-classes ...
[info] LnBf16Test:
[info] Single Cycle CPU
[info] - should calculate approximation of ln_bf16
[info] Run completed in 10 seconds, 276 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
When executing to 102003 ps, the testbench peek the memory address `0x4`, and the value stored is `0x0` which is as our expected (error code 0 means no error).

### Competibility
To ensure the competibility between the program `ln_bf16.asmbin` and MyCPU, I temporarily removed the `RDCYCLE`/`RDCYCLEH` instructions by commenting the benchmark-related C code in `ln_bf16.c`.