## Assignment3: Single-cycle RISC-V CPU
contributed by < [yuchen0620](https://github.com/yuchen0620) >
[Lab3: Construct a single-cycle RISC-V CPU with Chisel](https://hackmd.io/@sysprog/r1mlr3I7p)
## Prerequisites
Install the dependent packages
- For Ubuntu Linux
```shell
$ sudo apt install build-essential verilator gtkwave
```
Install [sbt](https://www.scala-sbt.org/)
```shell
# Install sdkman
$ curl -s "https://get.sdkman.io" | bash
$ source "$HOME/.sdkman/bin/sdkman-init.sh"
# Install Eclipse Temurin JDK 11
$ sdk install java 11.0.21-tem
$ sdk install sbt
```
### Hello World in Chisel
```scala
class Hello extends Module {
val io = IO(new Bundle {
val led = Output(UInt(1.W))
})
val CNT_MAX = (50000000 / 2 - 1).U;
val cntReg = RegInit(0.U(32.W))
val blkReg = RegInit(0.U(1.W))
cntReg := cntReg + 1.U
when(cntReg === CNT_MAX) {
cntReg := 0.U
blkReg := ~blkReg
}
io.led := blkReg
}
```
The character each val represent.
- CNT_MAX : a register contains the maximium value of 24999999 uint.
- cntReg : a 32-bit register acts as a counter
- blkReg : a 1-bit register acts as a flag which is used to decide the output value.
- led : 1-bit output, it only has the value 0 or 1 depend on blkReg,
The Hello world in Chisel is acting as a circuit, cntReg will plus 1 every clock cycle, after 25000000 clock cycle, cntReg will meet CNT_MAX, cntReg will be reset to 0 and blkReg will be inverted. Thus the output led will switch between 0 and 1 every 250000 cycles which is dependent on the value of blkReg.
## Single-cycle RISC-V CPU
We have to construct a single-cycle RISC-V CPU named `Mycpu` by completing `IntructionFetch.scala`、`IntructionDecode.scala`、`Execute.scala` and `CPU.scala`.
### InstructionFetch

At the instruction fetch stage, we need to modify the value of the PC register to determind the address of next instruction.
Based on `jump_flag_id`, we can figure out that the PC should be the jump address or `PC + 4`.
**jump_flag_id = 0**

when `jump_flag_id = 0`, the PC just normally plus 4.
**jump_flag_id = 1**

when `jump_flag_id = 1`, the PC jumps back to `0x00001000` from `0x0000100C`
### InstructionDecode
At the instruction decode stage, we need to handle the `memory_read_enable` and `memory_write_enable` by opcode.
If InstrcutionType is L, we have to set `memory_read_enable` to 1, otherwise set to 0.
If InstrcutionType is S, we have to set `memory_write_enable` to 1, otherwise set to 0.

`Instruction = 0x00A02223` is s-type instruction, thus the `io_memory_write_enable` is set to 1 and `io_memory_read_enable` is set to 0.
`Instruction = 0x000022B7` is `lui` instruction, thus both `io_memory_write_enable` and `io_memory_read_enable` are set to 0.
`Instruction = 0x002081B3` is `add` instruction, thus both `io_memory_write_enable` and `io_memory_read_enable` are set to 0.
### Execute
At the execute stage, we have to assign value to the input ports of ALU.
```scala
class ALU extends Module {
val io = IO(new Bundle {
val func = Input(ALUFunctions())
val op1 = Input(UInt(Parameters.DataWidth))
val op2 = Input(UInt(Parameters.DataWidth))
val result = Output(UInt(Parameters.DataWidth))
})
```
By `aluop_source`, we can determind that `op1` should be reg_data or instruction_address and `op2`should be reg_data or immdiate .
**The testing way**
First of the testing is to run `add` instruction 100 times with random `op1` and `op2`. The next is `beq` test, and there are two different type of `beq` test. One is equal, the other is not equal. To see that in this two different situation `if_jump_flag` and `if_jump_address` can meet our expect or not.
```scala
class ExecuteTest extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("Execution of Single Cycle CPU")
it should "execute correctly" in {
test(new Execute).withAnnotations(TestAnnotations.annos) { c =>
c.io.instruction.poke(0x001101b3L.U) // x3 = x2 + x1
var x = 0
for (x <- 0 to 100) {
val op1 = scala.util.Random.nextInt(429496729)
val op2 = scala.util.Random.nextInt(429496729)
val result = op1 + op2
val addr = scala.util.Random.nextInt(32)
c.io.reg1_data.poke(op1.U)
c.io.reg2_data.poke(op2.U)
c.clock.step()
c.io.mem_alu_result.expect(result.U)
c.io.if_jump_flag.expect(0.U)
}
// beq test
c.io.instruction.poke(0x00208163L.U) // pc + 2 if x1 === x2
c.io.instruction_address.poke(2.U)
c.io.immediate.poke(2.U)
c.io.aluop1_source.poke(1.U)
c.io.aluop2_source.poke(1.U)
c.clock.step()
// equ
c.io.reg1_data.poke(9.U)
c.io.reg2_data.poke(9.U)
c.clock.step()
c.io.if_jump_flag.expect(1.U)
c.io.if_jump_address.expect(4.U)
// not equ
c.io.reg1_data.poke(9.U)
c.io.reg2_data.poke(19.U)
c.clock.step()
c.io.if_jump_flag.expect(0.U)
c.io.if_jump_address.expect(4.U)
}
}
}
```
**add test**
We can observe that `io_aluop1_source = 0` and `io_aluop2_source = 0` ,hence alu.io.op should be reg_data. Besides, `io_if_jump_address = 0` 、 `io_if_jump_flag = 0` and the `io_mem_alu_result (1AAA7DB1) = io_reg1_data(1180D150) + io_reg2_data(0929AC61)`

**beq test**
`equal`
When `io_reg1_data = io_reg2_data`, the `io_if_jump_address = 4` and `io_if_jump_flag = 1`;

`not equal`
When `io_reg1_data != io_reg2_data`, the `io_if_jump_address = 4` (stay the same with last clock cycle) and `io_if_jump_flag = 0`;

We can observe that `io_aluop1_source = 1` and `io_aluop2_source = 1` in those two situation.
### Combining into a CPU
We have to conect all the components together according to the single-cycle CPU architecture diagram

In this section, we have to connect the inputs of the Execute module with the outputs of other modules.
The execute moudle has totally 7 inputs, we have to connect those input with correct wire one by one.
### sbt test
Run the following command to check our implement.
```shell
$ sbt test
```
The sucessful message I get!
```
[info] InstructionFetchTest:
[info] InstructionFetch of Single Cycle CPU
[info] - should fetch instruction
[info] ByteAccessTest:
[info] Single Cycle CPU
[info] - should store and load a single byte
[info] ExecuteTest:
[info] Execution of Single Cycle CPU
[info] - should execute correctly
[info] InstructionDecoderTest:
[info] InstructionDecoder of Single Cycle CPU
[info] - should produce correct control signal
[info] QuicksortTest:
[info] Single Cycle CPU
[info] - should perform a quicksort on 10 numbers
[info] FibonacciTest:
[info] Single Cycle CPU
[info] - should recursively calculate Fibonacci(10)
[info] RegisterFileTest:
[info] Register File of Single Cycle CPU
[info] - should read the written content
[info] - should x0 always be zero
[info] - should read the writing content
[info] Run completed in 8 seconds, 500 milliseconds.
[info] Total number of tests run: 9
[info] Suites: completed 7, aborted 0
[info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
## Modify Handwritten RISC-V assembly code
I replace `ecall` which is used to print the result on console by storing the result into memory `0x4` 、`0x8`、`0x12` and `0x16` for four test datas.
```
main:
la s1, test
lw a0, 0(s1)
lw a1, 4(s1)
jal ra, palindrome_detected
addi s2, s2 , 4
sw a0, 0(s2)
lw a0, 8(s1)
lw a1, 12(s1)
jal ra, palindrome_detected
sw a0, 4(s2)
lw a0, 16(s1)
lw a1, 20(s1)
jal ra, palindrome_detected
sw a0, 8(s2)
lw a0, 24(s1)
lw a1, 28(s1)
jal ra, palindrome_detected
sw a0, 12(s2)
```
I also modify the main function of my C code to store the result in the memory.
``` c
int main(){
uint64_t testA = 0x0000000000000000; //0 is palindrome
uint64_t testB = 0x0000000000000001; //testB not palindrome
uint64_t testC = 0x00000C0000000003; //testC is palindrome
uint64_t testD = 0x0F000000000000F0; //testD not palindrome
*((volatile int *) 4) = palindrome_detected(testA);
*((volatile int *) 8) = palindrome_detected(testB);
*((volatile int *) 12) = palindrome_detected(testC);
*((volatile int *) 16) = palindrome_detected(testD);
return 0;
}
```
The next thing to do is to put the c file and assembly file into the directory `ca2023-lab3/csrc`
To regenerate the RISC-V programs for unit tests, change to `csrc` directory and use `make update` to generate the asmbin file!
```
$ cd ~/ca2023-lab3/csrc
$ make update
```
Change the Makefile in the csrc directory by adding the asmbin file below `BINS`
```
BINS = \
fibonacci.asmbin \
hello.asmbin \
mmio.asmbin \
quicksort.asmbin \
sb.asmbin\
palindrome_opt_hw3.asmbin
```
Add the PalindromeTest in the CPUTest.scala file.
```scala
class PalindromeTest extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("Single Cycle CPU")
it should "check a 64-bit uint is palindrome or not" in {
test(new TestTopModule("palindrome_opt_hw3.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
for (i <- 1 to 50) {
c.clock.step(1000)
c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
}
c.io.mem_debug_read_address.poke(4.U)
c.clock.step()
c.io.mem_debug_read_data.expect(1.U)
c.io.mem_debug_read_address.poke(8.U)
c.clock.step()
c.io.mem_debug_read_data.expect(0.U)
c.io.mem_debug_read_address.poke(12.U)
c.clock.step()
c.io.mem_debug_read_data.expect(1.U)
c.io.mem_debug_read_address.poke(16.U)
c.clock.step()
c.io.mem_debug_read_data.expect(0.U)
}
}
}
```
Go back to the directory `ca2023-lab3` and run the `PalindromeTest`
```
$ cd ~/ca2023-lab3
$ sbt "testOnly riscv.singlecycle.PalindromeTest"
```
The successful message we get.
```
[info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21)
[info] loading settings for project ca2023-lab3-build from plugins.sbt ...
[info] loading project definition from /home/ubuntu/ca2023-lab3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/home/ubuntu/ca2023-lab3/)
[info] PalindromeTest:
[info] Single Cycle CPU
[info] - should check a 64-bit uint is palindrome or not
[info] Run completed in 4 seconds, 582 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
### Verilator analysis
To Load the `palindrome_opt_hw3.asmbin` file, simulate for 1000 cycles, and save the simulation waveform to the `dump.vcd` file, we can run the following command.
```
$ make verilator
$ ./run-verilator.sh -instruction csrc/palindrome_opt_hw3.asmbin -time 2000 -vcd dump.vcd
```