# Assignment3: single-cycle RISC-V CPU
contributed by < [kkkkk1109](https://github.com/kkkkk1109) >
## Introduction
In this assignment, we are asked to learn [Chisel](https://www.chisel-lang.org/) and implement a single-cycle RISC-V CPU. By following the steps in [Lab3](https://hackmd.io/@sysprog/r1mlr3I7p#Lab3-Construct-a-single-cycle-RISC-V-CPU-with-Chisel), I complete `mycpu`, which is the object mentioned above, and test my assembly code from [Assignment 2](https://hackmd.io/Jt_bFcnUQMWSnUI55kU-GQ#Assignment2-RISC-V-Toolchain).In addition, there are explanations for the signals at different stages of `mycpu` along with evidence of successful tests.
## Hello World in Chisel
```scala
class Hello extends Module {
val io = IO(new Bundle {
val led = Output(UInt(1.W))
})
val CNT_MAX = (50000000 / 2 - 1).U;
val cntReg = RegInit(0.U(32.W))
val blkReg = RegInit(0.U(1.W))
cntReg := cntReg + 1.U
when(cntReg === CNT_MAX) {
cntReg := 0.U
blkReg := ~blkReg
}
io.led := blkReg
}
```
* `led` is the output of this circuit
* `CNT_MAX` is an unsign integer number 24999999
* `cntReg` is a 32-bit register initialed with 0 value
* `blkReg` is a 1-bit register initialed with 0 value
The `cntReg` increase one every cycle, and when `cntReg` equals to `CNT_MAX`,
the bit in `blkReg` will flip, and the output is the value in `blkReg`
```scala
class Hello extends Module {
val io = IO(new Bundle {
val led = Output(UInt(1.W))
})
val CNT_MAX = (50000000 / 2 - 1).U;
val cntReg = RegInit(0.U(32.W))
val blkReg = RegInit(0.U(1.W))
cntReg := Mux(cntReg === CNT_MAX,
0.U, cntReg + 1.U)
blkReg := Mux(cntReg === CNT_MAX,
~blkReg, blkReg)
io.led := blkReg
}
```
we can simply use the Mux to implement **Hello World in Chisel**
## Single-cycle RISC-V CPU
To complete the Single-cycle RISC-V CPU, we need to add code to Scala files in `src/main/scala/riscv/core`. The following strategies outline how to complete each module and what the code should look like when tests pass.
### **Instruction Fetch**
Check the `jump_flag_id` to determine whether it is true or not. If it is true, set the `program counter (pc) to the jump_address`; otherwise, add `4.U to the program counter (pc)`
```
$ sbt "testOnly riscv.singlecycle.InstructionFetchTest"
```
```
[info] InstructionFetchTest:
[info] InstructionFetch of Single Cycle CPU
[info] - should fetch instruction
[info] Run completed in 4 seconds, 609 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 6 s, completed Nov 29, 2023, 10:13:46 PM
```
### **Instruction Decode**
The space we need to filled in is to output the `memory read and write enable`.To determine the `memory read and write enable`, decode the instruction. If the instruction is of `L-type`, set memory_read_enable to true; if it is of `S-type`, set memory_write_enable to true.
```
$ sbt "testOnly riscv.singlecycle.InstructionDecoderTest"
```
```
[info] InstructionDecoderTest:
[info] InstructionDecoder of Single Cycle CPU
[info] - should produce correct control signal
[info] Run completed in 4 seconds, 820 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 12 s, completed Nov 29, 2023, 10:28:23 PM
```
### **Execute**
In Execute stage, we should define the Input of op1 and op2. First, I write the code like this.
```
alu.io.op1 := io.reg1_data
alu.io.op2 := io.reg2_data
```
However, I still pass the test
```
$ sbt "testOnly riscv.singlecycle.ExecuteTest"
```
```
[info] ExecuteTest:
[info] Execution of Single Cycle CPU
[info] - should execute correctly
[info] Run completed in 4 seconds, 685 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 10 s, completed Nov 29, 2023, 10:35:42 PM
```
While, this mistake leads to the tests Failed in CPU
```
$ sbt test
```
```
[error] Failed tests:
[error] riscv.singlecycle.ByteAccessTest
[error] riscv.singlecycle.FibonacciTest
[error] riscv.singlecycle.QuicksortTest
[error] (Test / test) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 31 s, completed Nov 29, 2023, 10:37:59 PM
```
It takes me hours to debug since I passed the Execute test and thought it was the CPU code went wrong. I found that I miss reading a secten in [Lab3](https://hackmd.io/@sysprog/r1mlr3I7p#Single-cycle-RISC-V-CPU)
> Taking ex_aluop1_source control signal as an example, this control signal determines the input for the first operand of the ALU. It assigns a value to ex_aluop1_source based on the opcode. When the instruction type is either auipc, jal, or B, ex_aluop1_source is set to 0, controlling the ALU’s first operand input to be the instruction address. In other cases, ex_aluop1_source is set to 1, controlling the ALU’s first operand input to be a register.
I forgot to check the op1 and op2 should be an address,a register value or an immediate, after take this into consideration, I passed all the tests.
```
[info] Run completed in 27 seconds, 916 milliseconds.
[info] Total number of tests run: 10
[info] Suites: completed 8, aborted 0
[info] Tests: succeeded 10, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
## waveform
I made a assembly code `Instruction.s` in `csrc` to see the waveform.
```
.text
_start:
addi x1, x0, 3
exit:
j exit
```
* **Instruction Fetch**
We can see when the `instruction_valid` on, the `instruction` get the `instruction_read_data`signal. Also, when the `jump_flag_id` on, `insturcion_address` become the jump address where here is the label `exit`

* **Instruction Decode**
Since the instruction here is `addi`, `aluop1_source` is 0 which means the data input is register; when`aluop2_source` is 1 ,which means the data input should be immediate.

when the instruction is `j`, the `aluop1_source` become 1, which means the data input now is address.

Also, this stage also decode the instruction into opcode, rs, rd, register address and so on.
* **Execute**
In this stage, the signal `op1` and `op2` are the value 0 from `x0` and `immediate` 3, and show the result 3 in the signal `result`. When the instruction changes to `j`, the `jump_flag ` goes to 1.

* **Memory**
There is no memory write or read in `Instruction.s`, so the `memory_read_enable` and `memory_write_enable` are both 0.

* **Write Back**
In the write-back stage, the computed data or data read from memory is written into registers

## Modify handwritten RISC-V code in Assignment 2
Modify the assembly in assignment 2 by removing the `ecall` and `RDCYCLE/RDCYCLEH` instruction, and the result is stored in the register `s3`.
### Makefile
Add assembly code `hw3.s` and `instruction.s` to generate `.asmbin`
```diff
CROSS_COMPILE ?= riscv-none-elf-
ASFLAGS = -march=rv32i_zicsr -mabi=ilp32
CFLAGS = -O0 -Wall -march=rv32i_zicsr -mabi=ilp32
LDFLAGS = --oformat=elf32-littleriscv
AS := $(CROSS_COMPILE)as
CC := $(CROSS_COMPILE)gcc
LD := $(CROSS_COMPILE)ld
OBJCOPY := $(CROSS_COMPILE)objcopy
%.o: %.S
$(AS) -R $(ASFLAGS) -o $@ $<
%.elf: %.S
$(AS) -R $(ASFLAGS) -o $(@:.elf=.o) $<
$(CROSS_COMPILE)ld -o $@ -T link.lds $(LDFLAGS) $(@:.elf=.o)
%.elf: %.c init.o
$(CC) $(CFLAGS) -c -o $(@:.elf=.o) $<
$(CROSS_COMPILE)ld -o $@ -T link.lds $(LDFLAGS) $(@:.elf=.o) init.o
%.asmbin: %.elf
$(OBJCOPY) -O binary -j .text -j .data $< $@
BINS = \
fibonacci.asmbin \
hello.asmbin \
mmio.asmbin \
quicksort.asmbin \
sb.asmbin\
+ hw3.asmbin\
+ instruction.asmbin\
# Clear the .DEFAULT_GOAL special variable, so that the following turns
# to the first target after .DEFAULT_GOAL is not set.
.DEFAULT_GOAL :=
all: $(BINS)
update: $(BINS)
cp -f $(BINS) ../src/main/resources
clean:
$(RM) *.o *.elf *.asmbin
```
```
$ make update
```
In `src/test/scala/riscv/singlecycle/CPUTest.scala`, add the test file
```scala
class hw3 extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("Single Cycle CPU")
it should "do bfloat16 multiplication "in {
test(new TestTopModule("hw3.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
for (i <- 1 to 500) {
c.clock.step()
c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
}
c.io.regs_debug_read_address.poke(19.U) // t0
c.io.regs_debug_read_data.expect(0x43000000.U)
}
}
}
```
In my assembly code, the result of bfloat16 multiplication should be stored in `s3`, which is `x19`, and the multiplication result of `1.29999* 99.09999` should be `0x43000000`.
Then, run the test file.
```
$ sbt test
```
```
[info] - should do bfloat16 multiplication *** FAILED ***
[info] io_regs_debug_read_data=133 (0x85) did not equal expected=1124073472 (0x43000000) (lines in CPUTest.scala: 126, 120) (CPUTest.scala:126)
info] *** 1 TEST FAILED ***
[error] Failed tests:
[error] riscv.singlecycle.hw3
[error] (Test / test) sbt.TestsFailedException: Tests unsuccessful
```
Test Failed, the register value is not corresponded to the expected value `0x43000000`, using the gtkwave to check the signal.
```
$ ./run-verilator.sh -instruction src/main/resources/hw3.asmbin -time 2000 -vcd dump.vcd
$ gtkwave dump.vcd
```

When running the program, it only reaches halfway at cycle 500. It can be observed that by cycle 681, the register s3 has the expected value.

Change the run cycle to 1000.
```scala
class hw3 extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("Single Cycle CPU")
it should "do bfloat16 multiplication "in {
test(new TestTopModule("hw3.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
for (i <- 1 to 1000) {
c.clock.step()
c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
}
c.io.regs_debug_read_address.poke(19.U) // t0
c.io.regs_debug_read_data.expect(0x43000000.U)
}
}
}
```
All tests passed!
```
[info] hw3:
[info] Single Cycle CPU
[info] - should do bfloat16 multiplication
[info] Run completed in 29 seconds, 278 milliseconds.
[info] Total number of tests run: 10
[info] Suites: completed 8, aborted 0
[info] Tests: succeeded 10, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```