# Assignment3: Single-cycle RISC-V CPU
contributed by < [`jimmylu890303`](https://github.com/jimmylu890303) >
## Hello World in Chisel
```scala
class Hello extends Module {
val io = IO(new Bundle {
val led = Output(UInt(1.W))
})
val CNT_MAX = (50000000 / 2 - 1).U;
val cntReg = RegInit(0.U(32.W))
val blkReg = RegInit(0.U(1.W))
cntReg := cntReg + 1.U
when(cntReg === CNT_MAX) {
cntReg := 0.U
blkReg := ~blkReg
}
io.led := blkReg
}
```
- There is no input signal detected here. However, there is an output signal called `led`. The `led` is of type unsigned int and has a bit width of 1.
- `CNT_MAX` is a constant set to 29999999.
- `cntReg` is a 32-bit unsigned integer register initialized with a value of 0.
- `blkReg` is a 1-bit unsigned integer register initialized with a value of 0.
- On each clock cycle, cntReg is incremented by one.
- When cntReg reaches CNT_MAX, `cntReg` is reset to zero, but blkReg remains unchanged.
- The output `led` is controlled by the value stored in blkReg
Below is the code where I'm using logic circuits to enhance the original code, employing a Mux to control the signal when cntReg equals CNT_MAX.
```scala
class Hello extends Module {
val io = IO(new Bundle {
val led = Output(UInt(1.W))
})
val CNT_MAX = (50000000 / 2 - 1).U;
val cntReg = RegInit(0.U(32.W))
val blkReg = RegInit(0.U(1.W))
cntReg := Mux(cntReg === CNT_MAX,0.U,cntReg + 1.U)
blkReg := Mux(cntReg === CNT_MAX,~blkReg,blkReg)
io.led := blkReg
}
```
## Lab 3 : Single Cycle RISC-V CPU
We need to add code to four Scala files to complete the modules.
- InstructionFetch.scala
- InstructionDecode.scala
- Execute.scala
- CPU.scala
Above are the four Scala files.
### InstructionFetch.scala:
In the InstructionFetch.scala file, the IF module needs to determine the next instruction address to be stored in the program counter based on the `jump_flag_id` signal.

#### Test InstructionFetch
We will test the InstructionFetch process 100 times. Each time, a random number (0 or 1) will be generated.
- If the number is 0, indicating no jump, the output signal `instruction_address` is expected to be `pre + 4`.
- If the number is 1, indicating a jump, the target address for the jump is `entry`. Thus, the output signal `instruction_address` is expected to be `entry`.
```scala
for (x <- 0 to 100) {
Random.nextInt(2) match {
case 0 => // no jump
cur = pre + 4
c.io.jump_flag_id.poke(false.B)
c.clock.step()
c.io.instruction_address.expect(cur)
pre = pre + 4
case 1 => // jump
c.io.jump_flag_id.poke(true.B)
c.io.jump_address_id.poke(entry)
c.clock.step()
c.io.instruction_address.expect(entry)
pre = entry
}
}
```
> src/test/scala/riscv/singlecycle/InstructionFetchTest.scala
#### Analysis with GTKWave
- jump_flag_id is set to 1


when the `jump_flag_id` is set to 1, the program counter (pc) will be set to 0x1000 (entry) in the next cycle.
- jump_flag_id is set to 0


when the `jump_flag_id` is set to 0, the program counter (pc) will be set to PC+4(0x1004+4) in the next cycle.
### InstructionDecode.scala:
In the InstructionDecode.scala file, the ID module is responsible for decoding the input signal `instruction` and generating multiple control signals for the circuit.

Within the complete InstructionDecode.scala module, this section will ascertain the following 10 output signals by parsing the 32-bit instruction.
- regs_reg1_read_address
- regs_reg2_read_address
- ex_immediate
- ex_aluop1_source
- ex_aluop2_source
- memory_read_enable
- memory_write_enable
- wb_reg_write_source
- reg_write_enable
- reg_write_address
#### Test InstructionDecode
```scala
c.io.instruction.poke(0x00a02223L.U) // S-type
c.io.ex_aluop1_source.expect(ALUOp1Source.Register)
c.io.ex_aluop2_source.expect(ALUOp2Source.Immediate)
c.io.regs_reg1_read_address.expect(0.U)
c.io.regs_reg2_read_address.expect(10.U)
c.clock.step()
c.io.instruction.poke(0x000022b7L.U) // lui
c.io.regs_reg1_read_address.expect(0.U)
c.io.ex_aluop1_source.expect(ALUOp1Source.Register)
c.io.ex_aluop2_source.expect(ALUOp2Source.Immediate)
c.clock.step()
c.io.instruction.poke(0x002081b3L.U) // add
c.io.ex_aluop1_source.expect(ALUOp1Source.Register)
c.io.ex_aluop2_source.expect(ALUOp2Source.Register)
c.clock.step()
```
> src/test/scala/riscv/singlecycle/InstructionDecoderTest.scala
In this test, it will check three types of instructions: S-type, lui, and add.
#### Analysis with GTKWave
- Instruction `0x00a02223`(S-type)

- Instruction `0x000022b7`(lui)

- Instruction `0x002081b3`(add)

### Execute.scala
In the Execute.scala file, there are two main modules.
One is the `ALU control`, responsible for generating the corresponding ALU function code based on the opcode, funct3, and funct7 of the input instruction.
The other is the `ALU`, which performs the designated function determined by the ALU function code generated by the ALU control.
In the complete Execute module, it will produce the result from the ALU, as well as output the signals if_jump_flag and if_jump_address.

#### Test Execute
```scala
c.io.instruction.poke(0x001101b3L.U) // x3 = x2 + x1
var x = 0
for (x <- 0 to 100) {
val op1 = scala.util.Random.nextInt(429496729)
val op2 = scala.util.Random.nextInt(429496729)
val result = op1 + op2
val addr = scala.util.Random.nextInt(32)
c.io.reg1_data.poke(op1.U)
c.io.reg2_data.poke(op2.U)
c.clock.step()
c.io.mem_alu_result.expect(result.U)
c.io.if_jump_flag.expect(0.U)
}
// beq test
c.io.instruction.poke(0x00208163L.U) // pc + 2 if x1 === x2
c.io.instruction_address.poke(2.U)
c.io.immediate.poke(2.U)
c.io.aluop1_source.poke(1.U)
c.io.aluop2_source.poke(1.U)
c.clock.step()
// equ
c.io.reg1_data.poke(9.U)
c.io.reg2_data.poke(9.U)
c.clock.step()
c.io.if_jump_flag.expect(1.U)
c.io.if_jump_address.expect(4.U)
// not equ
c.io.reg1_data.poke(9.U)
c.io.reg2_data.poke(19.U)
c.clock.step()
c.io.if_jump_flag.expect(0.U)
c.io.if_jump_address.expect(4.U)
```
> src/test/scala/riscv/singlecycle/ExecuteTest.scala
In this test,it will test 2 types of instrucitons(`x3 = x2 + x1` and `beq`).
#### Analysis with GTKWave
- x3 = x2 + x1

- beq (equal occur)

- beq (not equal occur)

## Modify the handwritten RISC-V assembly code in Homework2
### Modify the origin homework2 code
Because the Single Cycle CPU lacks a system call for printing, I'm unable to directly print the output result while executing the assembly code. Instead of utilizing the print system call, I've adapted the code to store the output result in memory.
In homework 2, we are required to modify the relevant system call in rv32emu to display the result and convert the output from numerical to ASCII format.
```scala
jal ra, pimo
addi a0, a0, 48
la t0, buffer
sb zero, 1(t0)
sb a0, 0(t0)
li a0, 1
la a1, buffer
li a2, 2
li a7, SYSWRITE
ecall # print result of pimo (which is in a0)
```
In homework3,
```scala
# sw result in mem
sw a0, 0(s3)
```
The subsequent steps outline the modifications I made to enable the code to run on the Single Cycle CPU.
- Put my code `main.S` into `/csrc` directory.
- Saves the output result at memory addresses `0x4, 0x8, 0xC, and 0x10`.
- Modify the `Makefile` to generate `main.asmbin`.
- After generating `main.asmbin`, move this file to the directory `src/main/resources`.
- Add a corresponding test named `Hw2Test` in the `CPUTest.scala` file.
### Test my RISC-V assembly
To test my RISC-V assembly code, I've added a test named Hw2Test to CPUTest.scala. Here, I verify the results at memory addresses `0x4, 0x8, 0xC, and 0x10`.
```scala
class Hw2Test extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("Single Cycle CPU")
it should "Implementation of multiplication overflow prediction for unsigned integers using CLZ" in {
test(new TestTopModule("main.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
for (i <- 1 to 10) {
c.clock.step(1000)
c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
}
// result should be 0 0 1 1
c.io.mem_debug_read_address.poke(4.U)
c.clock.step()
c.io.mem_debug_read_data.expect(0.U)
c.io.mem_debug_read_address.poke(8.U)
c.clock.step()
c.io.mem_debug_read_data.expect(0.U)
c.io.mem_debug_read_address.poke(12.U)
c.clock.step()
c.io.mem_debug_read_data.expect(1.U)
c.io.mem_debug_read_address.poke(16.U)
c.clock.step()
c.io.mem_debug_read_data.expect(1.U)
}
}
}
```
Run test:
```scala
sbt "testOnly riscv.singlecycle.Hw2Test"
```
Output:
```
[info] welcome to sbt 1.9.7 (OpenLogic Java 11.0.21)
[info] loading settings for project ca2023-lab3-build from plugins.sbt ..
[info] loading project definition from /home/jimmy/ca2023-lab3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/home/jimmy/ca2023-lab3/)
[info] Hw2Test:
[info] Single Cycle CPU
[info] - should Implementation of multiplication overflow prediction for unsigned integers using CLZ
[info] Run completed in 29 seconds, 876 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 35 s, completed Nov 25, 2023, 2:34:59 PM
```
### Using Verilator to Run the Assembly
```scala
./run-verilator.sh -instruction src/main/resources/main.asmbin -time 2000 -vcd dump01.vcd
```
Output:
```
-time 2000
-memory 1048576
-instruction src/main/resources/main.asmbin
[-------------------->] 100%
```
Use GTKWave to see wave
Case 1:

> prev cycle

> next cycle
- Instruction `0x024000EF` is equal to `jal ra, pimo`(jal x1, 36).
- PC is now `0x00001050`,and regs_write_source=`0b11`.So write back value ra = PC+4.
- 
- Target jump address is `0x1074`(Computed from ALU),and `if_jump_flag` is 1(Computed from Jump judge).
- So next cycle PC is set to `0x1074`.
Case 2:

- Instruction `0x00512023` is equal to `sw t0, 0(sp)`(sw x5, 0(x2)).
- `io_memory_write_enable` is 1,because it is a store word instruction.
- ALU.op1 is the address of sp, and ALU.op2 is the offset(immeditate).
- `im_mem_alu_result` is the target writing memory address.
- `regs_io_read_datas` is the value stored in $a0.
Case 3:

- Instruction `0x00512023` is equal to `li s3, 4`(addi x19, x0, 4).
- ALU.op1 is the value of x0, and ALU.op2 is the value of immediate.
- `ALU.mem_alu_result` is 0x4 and `wb_reg_write_source` is 0b00. So `regs_write_data` is set by `ALU.mem_alu_result`.
- Target register is 0x13(`$s3`),and `regs_io_write_enable` is 1. So $s3 will be set to 4.