# Assignment3: Single-cycle RISC-V CPU
contributed by < [`Yao1201`](https://github.com/Yao1201) >
###### tags: `RISC-V`, `jserv`
## Prerequisites
### Environment settings
Ther is a very detailed explanation on [Lab3: Construct a single-cycle CPU with Chisel](https://hackmd.io/@sysprog/r1mlr3I7p).
### Chisel Bootcamp
I used [Chisel Bootcamp](https://github.com/freechipsproject/chisel-bootcamp) to learn CHISEL by completing the exercises provided.
## Hello word in Chisel
```scala!
// Hello World in Chisel
class Hello extends Module {
val io = IO(new Bundle {
val led = Output(UInt(1.W))
})
val CNT_MAX = (50000000 / 2 - 1).U;
val cntReg = RegInit(0.U(32.W))
val blkReg = RegInit(0.U(1.W))
cntReg := cntReg + 1.U
when(cntReg === CNT_MAX) {
cntReg := 0.U
blkReg := ~blkReg
}
io.led := blkReg
}
```
This code imply when `cntReg` equals to `CNT_MAX`, `cntReg` resets to 0, `blkReg` toggles its state, and the output is the current state of `blkReg`. Besides, `cntReg` increases by one after each cycle.
* `io.led` is output which
* `CNT_MAX` is an unsigned integer which equals to 24999999
* `cntReg` is a 32-bits unsigned integer register initialized to 0
* `blkReg` is a 1-bit unsigned integer register initialized to 0, which will be assigned to `io.led`
The code below is my enhancement of the original code using logical circuits:
```scala!
// After enhancing
class Hello extends Module {
val io = IO(new Bundle {
val led = Output(UInt(1.W))
})
val CNT_MAX = (50000000 / 2 - 1).U;
val cntReg = RegInit(0.U(32.W))
val blkReg = RegInit(0.U(1.W))
cntReg := Mux(cntReg === CNT_MAX, 0.U, cntReg + 1.U)
blkReg := Mux(cntReg === CNT_MAX, ~blkReg, blkReg)
io.led := blkReg
}
```
I use `Mux` instruction substituting for `when` instruction.
## Single-cycle RISC-V CPU
First, we need to complete the following files:
- [ ] InstructionFetch.scala
- [ ] InstructionDecode.scala
- [ ] Execute.scala
- [ ] CPU.scala
>src/main/scala/riscv/core/*.scala
The completed code can be found in my [repositories](https://github.com/Yao1201/ca2023-lab3) on GitHub (Forked from [sysprog21/ca2023-lab3](https://github.com/sysprog21/ca2023-lab3))
### Single-cycle CPU architecture diagram

### Implementation
After completing the files above, we can use following command to run single test:
```clike!
$ sbt "testOnly riscv.singlecycle.InstructionFetchTest"
```
### Waveform
While running tests, we set the environment variable `WRITE_VCD` to 1, waveform files will be generated.
```clike!
$ WRITE_VCD=1 sbt test
```
Afterward, we can find .vcd files in various subdirectories under the `test_run_dir` directory.
#### InstructionFetch
**previous**

**current**

From these two diagrams, in the beginning, `io_instruction_address = 0x1000` and we can observe that when `io_jump_flag_id = 0`, ` pc = pc + 4`.
**previous**

**current**

In the other hand, when `io_jump_flag_id = 1, `pc` will depend on `io_jump_address_id`, which equals to `0x1000`(i.e. `pc = io_jump_address_id`)
We can compare with the `InstructionFetchTest.scala`:
```c1!
class InstructionFetchTest extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("InstructionFetch of Single Cycle CPU")
it should "fetch instruction" in {
test(new InstructionFetch).withAnnotations(TestAnnotations.annos) { c =>
val entry = 0x1000
var pre = entry
var cur = pre
c.io.instruction_valid.poke(true.B)
var x = 0
for (x <- 0 to 100) {
Random.nextInt(2) match {
case 0 => // no jump
cur = pre + 4
c.io.jump_flag_id.poke(false.B)
c.clock.step()
c.io.instruction_address.expect(cur)
pre = pre + 4
case 1 => // jump
c.io.jump_flag_id.poke(true.B)
c.io.jump_address_id.poke(entry)
c.clock.step()
c.io.instruction_address.expect(entry)
pre = entry
}
}
}
}
```
> src/test/scala/riscv/singlecycle/InstructionFetchTest.scala
* The `Entry = 0x1000`, this is why `io_instruction_address = 0x1000` in the beginning.
* When `io.jump_flag_id = false`, `pre = pre + 4` (i.e. `pc = pc + 4`),
* When `io.jump_flag_id =true`, `pre = entry` (i.e. `pc = 0x1000`)
Consistent with our previous analysis.
#### InstructionDecoder

In this diagram, we can see `io_insruction = 0x00A02223`.Based on the RISC-V ISA and the diagram below, we can get `funct3 = 0x2`, `rs1 = 0`, `rs2 = 0xA`, `imm = 4`. Hence, we can infer the instruction which is `sw a1, 4(x0)`.Because it's a S type instruction, it should be memory write enable but not memory read enable.

Back to the waveform diagram, comparing the result of waveform and my inferences.`opcode = 23`, `io_memory_read_enable = 0`,`io_memory_write_enable = 1` and so on.
Besides, we also can compare with the `InstructionDecoder.scala`:
```clike!
class InstructionDecoderTest extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("InstructionDecoder of Single Cycle CPU")
it should "produce correct control signal" in {
test(new InstructionDecode).withAnnotations(TestAnnotations.annos) { c =>
c.io.instruction.poke(0x00a02223L.U) // S-type
c.io.ex_aluop1_source.expect(ALUOp1Source.Register)
c.io.ex_aluop2_source.expect(ALUOp2Source.Immediate)
c.io.regs_reg1_read_address.expect(0.U)
c.io.regs_reg2_read_address.expect(10.U)
c.clock.step()
}
}
}
```
>src/test/scala/riscv/singlecycle/InstructionDecoderTest.scala
All of them align with our inferences.
#### Execute
##### R-type

In this diagram, `opcode = 0x33`, `funt3 = 0x0`, `funct7 = 0x0`, which means the instruction is an `add` instruction. So, we can observe that :
* `io_op1` equals to `alu_io_op1` and `io_reg1_data`.
* `io_op2` equals to `alu_io_op2` and `io_reg2_data`.
* `io_result = alu_io_op1 + alu_io_op2`.
* `io_if_jump_flag = 0`
##### B-type

In this diagram, `opcode = 0x63`, `funt3 = 0x0`, which means the instruction is a `beq` instruction. In this moment, we can observe that :
* `io_aluop1_source` and `io_aluop2_source` equals to 1.
* Because `io_reg1_data != io_reg2_data`, `io_if_jump_flag = 0`.
* `alu_io_op1 = io_instruction_address` and `alu_io_op2 = io_immediate`
* `io_alu_funct = 1` means which is `ALUFunction.add`
* `io_if_jump_address = 4`, which equals to `io_result = alu_io_op1 + alu_io_op2`,

Next moment, we can see `io_reg1_data = io_reg2_data`.Therefore, `io_if_jump_flag` is changed to 1, `pc` will jump to `io_if_jump_address`.
## Assembly Code Adaptation from Homework2
To ensure the smooth operation and testing of the assembly code on MyCPU, we need to take the following modification :
- [ ] Because single-cycle CPU lacks a system call for printing, we can't print our result directly.We need to modify the assembly code to store the result in memory instead.
- [ ] To ensure compatibility between the programs used in Homework2 and MyCPU, you should remove the `RDCYCLE/RDCYCLEH` instructions.
- [ ] Edit the `Makeflie` to generate `.asmbin`, then put the `Hammingdist.s` into the correct folder.
- [ ] Add function, `HammingdistTest`, in the `CPUTest.scala`.
- [ ] Run the test.
The code below is only partial snippet. You can find the complete code in [ca2023-lab3](https://github.com/Yao1201/ca2023-lab3) on GitHub (Forked from [sysprog21/ca2023-lab3](https://github.com/sysprog21/ca2023-lab3))
### 1. Modify assembly code
In Homework2, I add a syscall for printing integer :
```cl!
# print the result #
li a7, SYSINT #"printint" syscall
add a1, a0, x0 # address of string(move result of hd_cal to a1)
li a0, 1 #1 = standard output (stdout)
ecall # print result of hd_cal
```
However, in Homework3, I store the result in memory instead :
```clike!
# store the result in memory(0x4, 0x8, 0x12)
addi s11, s11, 4 #s11 = 0
sw a0, 0(s11)
```
Besides, remove the `RDCYCLE/RDCYCLEH` instruction.
### 2. Edit the `Makefile` and regenerate RISC-V program
After the modifications mentioned above, we need to edit the `csrc/Makefile` to generate `.asmbin`
```diff!
BINS = \
fibonacci.asmbin \
hello.asmbin \
mmio.asmbin \
quicksort.asmbin \
sb.asmbin \
+ Hammingdist.asmbin
```
> csrc/Makefile
>
Next, place the modificated assembly code `Hammingdist.S` into `/csrc`.
To regenerate the RISC-V programs utilized for unit tests, change to the `csrc` directory and run the `make update` command. Ensure that the `$PATH` environment variable is correctly configured to include the GNU toolchain for RISC-V.
```clike!
$ cd $HOME/riscv-none-elf-gcc
$ echo "export PATH=`pwd`/bin:$PATH" > setenv
$ cd $HOME
$ source riscv-none-elf-gcc/setenv
$ cd csrc
$ make update
```
### 3. Add functional test in `CPUTest.scala`
```cl!
class HammingdistTest extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("Single Cycle CPU")
it should "Calculate hammingdistance of two sequences" in {
test(new TestTopModule("Hammingdist.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
for (i <- 1 to 50) {
c.clock.step(1000)
c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
}
c.io.mem_debug_read_address.poke(4.U) //read memory(0x4)
c.clock.step()
c.io.mem_debug_read_data.expect(21.U) //expect result = 21
c.io.mem_debug_read_address.poke(8.U) // read memory(0x8)
c.clock.step()
c.io.mem_debug_read_data.expect(63.U) //expect result = 63
c.io.mem_debug_read_address.poke(12.U) //read memory(0x12)
c.clock.step()
c.io.mem_debug_read_data.expect(0.U) //expect result = 0
}
}
}
```
> src/test/scala/riscv/singlecycle/CPUTest.scala
### 4. Run test
Finally, we can test our assembly code on MyCPU by using following commands :
```cl!
$ cd $HOME/ca2023-lab3
$ sbt "testOnly riscv.singlecycle.HammingdistTest"
```
Here is the output when you pass the test :
```
lab02@ubuntu:~/ca2023-lab3$ sbt "testOnly riscv.singlecycle.HammingdistTest"
[info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21)
[info] loading settings for project ca2023-lab3-build from plugins.sbt ...
[info] loading project definition from /home/lab02/ca2023-lab3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/home/lab02/ca2023-lab3/)
[info] HammingdistTest:
[info] Single Cycle CPU
[info] - should Calculate hammingdistance of two sequences
[info] Run completed in 8 seconds, 665 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 11 s, completed Nov 29, 2023, 2:44:51 AM
```
## Make Verilator
In order to quickly test our written program, we can use Verilator for simulation. After the first run and every time you modify the Chisel code, you need to execute the following command in the project’s root directory to generate Verilog files:
```
$ make verilator
```
After compilation, we can load `Hammingdist.asmbin` file for simulating 1000 cycles, and save the simulation waveform to the `dump1.vcd` file, we can run:
```
$ ./run-verilator.sh -instruction src/main/resources/Hammingdist.asmbin
-time 2000 -vcd dump1.vcd
```
Output:
```
-time 2000
-memory 1048576
-instruction src/main/resources/Hammingdist.asmbin
[-------------------->] 100%
```
Then, run `gtkwave dump1.vcd` to analyze its waveform.

We can observe that the signal `io_instruction` begins with `000000000` and `00100000`. In the meantime, let’s verify the hexadecimal representation of Hammingdist.asmbin:
```l!
$ hexdump src/main/resources/Hammingdist.asmbin | head -1
```
Its output:
```
0000000 0000 0010 0000 0000 ffff 000f 0000 0000
```
It's aligns with the expected waveform.
## Waveform analyze

The `io_instruction_address` start from `0x1000`, which correspond to `link.lds`:
```c1!
OUTPUT_ARCH("riscv")
ENTRY(_start)
SECTIONS
{
. = 0x00001000;
.text : { *(.text.init) *(.text.startup) *(.text) }
.data ALIGN(0x1000) : { *(.data*) *(.rodata*) *(.sdata*) }
.bss : { *(.bss) }
_end = .;
}
```
>csrc/link.lds
Next, I analyze the assembly code in comparaison with the waveform.
```clike!
section .text.Hammingdist
.global _start
.data
test_data_1: .dword 0x0000000000100000, 0x00000000000FFFFF
test_data_2: .dword 0x0000000000000001, 0x7FFFFFFFFFFFFFFE
test_data_3: .dword 0x000000028370228F, 0x000000028370228F
_start: #0x1000
addi sp, sp, -12
# push pointers of test data onto the stack
la t0, test_data_1
sw t0, 0(sp)
la t0, test_data_2
...
```
**First instruction :** `addi, sp, sp, -12`

* `io_instruction_address = 0x1030`
* `io_instructiom = 0xFF410113`
* `io_jump_flag_id = 0`
* `io_immediate = -12`
* `io_memory_read_enable = 0`, `io_memory_write_enable = 0`
Because it's I-type, `io_jump_flag_id = 0`, `io_memory_read_enable = 0` and `io_memory_write_enable = 0`.
:::danger
In theory, the `io_instruction_address` for `addi` should be at `0x1004`, but from the waveform, it is observed to be `0x1030`. My speculation is that the preceding input data is executed first, causing the address for `addi` to become `0x1030`.
:::
**Next instruction:** `la t0, test_data_1`

Because `la` is a pseudo-instruction, it use two instrutions ,`0x00000297` and `0xFCC28293`, implementing it.
* `io_instruction_address = 0x1034, 0x1038`
* `io_instructiom = 0x00000297, 0xFCC28293`
* `io_jump_flag_id = 0`
* `io_immediate = 0`
* `io_memory_read_enable = 0`, `io_memory_write_enable = 0`
**Next instruction:** `sw t0, 0(sp)`

* `io_instruction_address = 0x0000103c=C`
* `io_instructiom = 0x00512023`
* `io_jump_flag_id = 0`
* `io_immediate = 0`
* `io_memory_read_enable = 0`, `io_memory_write_enable = 1`
Because it's S-type, `io_memory_write_enable = 1`.
**Branch instruction:** `beq a1, zero, clz_lower_set_one`

* `io_instructiom = 0x02058A63`
* `io_jump_flag_id = 1`
* `pc = 0x1190`
* `io_immediate = 0x34`
* `io_jump_address_id = 0x000011C4`, which equals to `pc + io_immediate`
Because it's B-type, `io_jump_flag_id = 1`.
Since `io_jump_flag_id = 1`, next `pc` will jump to `io_jump_address_id = 0x000011C4`. The waveform is shown as:

## Reference
* [Lab3: Construct a single-cycle CPU with Chisel](https://hackmd.io/@sysprog/r1mlr3I7p)