# Assignment3: Single-cycle RISC-V CPU
contributed by [chihenliu](https://hackmd.io/@chihenliu)
## 1.Environment Setup
My OS is `Ubuntu 22.04.3 LTS`
### 1.1. Install the dependent package
```shell
$sudo apt install build-essential verilator gtkwave
```
### 1.2.Install sbt/JDK/SDKMAN
#### 1.2.1 Install SDKMAN
follow the instructions install SDKman
```shell
$curl -s "https://get.sdkman.io" | bash
$source "$HOME/.sdkman/bin/sdkman-init.sh"
$sdk version
```
#### 1.2.2 Install sbt
follow the instructions install sbt
```shell
$sdk install java $(sdk list java | grep -o "\b8\.[0-9]*\.[0-9]*\-tem" | head -1)
$sdk install sbt
```
The installation of sbt is complete.
#### 1.2.3 Install JDK
follow [Lab3: Construct a single-cycle RISC-V CPU with Chisel](https://hackmd.io/@sysprog/r1mlr3I7p#Lab3-Construct-a-single-cycle-RISC-V-CPU-with-Chisel) instructions
```shell
$sdk install java 11.0.21-tem
```
The installation of JDK is complete.
### 1.2.4 Install GTKWave
For general install
1.`Type./configure` 2.`make` 3.`sudo make install`
However, my Ubuntu is encountering errors, so I'm following the installation instructions based on the README.md as follows:
```shell!
$sudo apt-get install libjudy-dev
$sudo apt-get install libbz2-dev
$sudo apt-get install liblzma-dev
$sudo apt-get install libgconf2-dev
$sudo apt-get install libgtk2.0-dev
$sudo apt-get install tcl-dev
$sudo apt-get install tk-dev
$sudo apt-get install gperf
$sudo apt-get install gtk2-engines-pixbuf
```
After above instrcution install Package ,Iinstall GTKWave using `Type./configure`,`make`,`sudo make install`
## 2. Explaination of Hello World in Chisel
### 2.1 Chisel tutorials
follow the instructions:
```shell
$ git clone https://github.com/ucb-bar/chisel-tutorial
$ cd chisel-tutorial
$ git checkout release
$ sbt run
```
Output:
```
test Hello Success: 1 tests passed in 6 cycles taking 0.004980 seconds
[info] [0.002] RAN 1 CYCLES PASSED
[success] Total time: 2 s, completed
```
You also can run all examples:
```shell
$./run-examples.sh all
```
### 2.2 Chisel Bootcamp
- [x] 1.Introduction to Scala
- [x] 2.1.Your First Chisel Module
- [x] 2.2.Combinational Logic
- [x] 2.3.Control Flow
- [x] 2.4.Sequential Logic
- [x] 2.5.Putting it all Together: An FIR Filter
- [x] 2.6.More on ChiselTest
- [x] 3.1.Generators: Parameters
- [x] 3.2.Generators: Collections
- [x] 3.3.Chisel Standard Library
- [x] 3.4.Higher-Order Functions
- [x] 3.5.Functional Programming
- [x] 3.6.Object Oriented Programming
- [x] 3.7.Generators: Types
before `Dec.1` I will go through all the steps.
### 2.3 Explaination of Hello World
```scala
class Hello extends Module {
val io = IO(new Bundle {
val led = Output(UInt(1.W))
})
val CNT_MAX = (50000000 / 2 - 1).U;
val cntReg = RegInit(0.U(32.W))
val blkReg = RegInit(0.U(1.W))
cntReg := cntReg + 1.U
when(cntReg === CNT_MAX) {
cntReg := 0.U
blkReg := ~blkReg
}
io.led := blkReg
}
```
* It is observed I/O Bundle there is only one output signal 'led' with no input signals, and `'led'` is an unsigned integer with a bits width of 1.
* `cntReg ` It is a 32-bit unsigned integer register, initialized with 0
* `blkReg` It is a 1-bit unsigned integer register, initialized with 0 and Used to control the state of the LED
* `CNT_MAX`It is a constant with a value of `24999999`. This value is typically set based on the system's clock frequency and is used to control the flashing frequency of the LED
* In each clock cycle, the value of `cntReg` increases by 1
* When` cntReg` reaches the value of `CNT_MAX`, `cntReg` is reset to 0, and the value of `blkReg` is inverted
We can achieve another LED functionality by eliminating blkReg
```scala
// Hello in chisel ,after eliminating blkReg
class Hello extends Module {
val io = IO(new Bundle {
val led = Output(UInt(1.W))
})
val cntMax = (50000000 / 2 - 1).U
val cntReg = RegInit(0.U(32.W))
cntReg := Mux(cntReg === cntMax, 0.U, cntReg + 1.U)
io.led := cntReg === cntMax
}
```
* when cntReg reaches `cntMax`, we directly set the LED to ON (1), while at other times, the LED is turned off (0)
* The LED will briefly flash each time `cntReg` reaches cntMax, rather than remaining illuminated until the next counting cycle is completed
## 3. Complete Lab 3 `MyCPU` code
### 3.1. Single-cycle CPU
#### Single-cycle CPU diagarm

#### InstrcutionFetch stage
#### InstrcutionDecode stage
#### Execute stage
#### Memory Access stage
#### Write-Back stage
### 3.2. Finsh My CPU code
We need to add code to four Scala files to complete the modules in `src/main/scala/riscv/core`
* InstructionFetch.scala
* InstructionDecode.scala
* Execute.scala
* CPU.scala
By completing the `Instruction Fetch`, `Instruction Decode`, and `Execute` stages, and then using the aforementioned components, I have completed the `CPU` section.
Here is my [repository](https://github.com/chihen0709/ca2023-lab3) for Lab 3, which was forked from [ca2023-lab3](https://github.com/sysprog21/ca2023-lab3).
### 3.3. MyCPU test and Waveform
Test command:
```shell
$sbt test
```
However, since the CPU code was not initially completed, you will receive the following Output:
```shell
[info] *** 6 TESTS FAILED ***
[error] Failed tests:
[error] riscv.singlecycle.InstructionDecoderTest
[error] riscv.singlecycle.ByteAccessTest
[error] riscv.singlecycle.InstructionFetchTest
[error] riscv.singlecycle.ExecuteTest
[error] riscv.singlecycle.FibonacciTest
[error] riscv.singlecycle.QuicksortTest
[error] (Test / test) sbt.TestsFailedException: Tests unsuccessful
```
After completing the missing code for the `Instruction Fetch`, `Instruction Decode`, and `Execute` stages as well as the CPU, I proceeded to test according to the command provided in Lab 3.
```shell
$sbt test
```
we can get following Output:
```shell
[info] ExecuteTest:
[info] Execution of Single Cycle CPU
[info] - should execute correctly
[info] InstructionFetchTest:
[info] InstructionFetch of Single Cycle CPU
[info] - should fetch instruction
[info] QuicksortTest:
[info] Single Cycle CPU
[info] - should perform a quicksort on 10 numbers
[info] ByteAccessTest:
[info] Single Cycle CPU
[info] - should store and load a single byte
[info] FibonacciTest:
[info] Single Cycle CPU
[info] - should recursively calculate Fibonacci(10)
[info] InstructionDecoderTest:
[info] InstructionDecoder of Single Cycle CPU
[info] - should produce correct control signal
[info] RegisterFileTest:
[info] Register File of Single Cycle CPU
[info] - should read the written content
[info] - should x0 always be zero
[info] - should read the writing content
[info] Run completed in 9 seconds, 325 milliseconds.
[info] Total number of tests run: 9
[info] Suites: completed 7, aborted 0
[info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 10 s, completed Nov 28, 2023, 5:41:06 PM
```
To test a single test case, you can use the following command
```shell
$sbt "testOnly riscv.singlecycle.XXXTest"
```
#### 3.3.1. InstructionFetch test
The `PC` is initialized to `ProgramCounter.EntryAddress`. The `jump_flag_id` is used to determine whether a jump should be executed; it is a control signal. If it is true, a jump is executed, and the `PC` is updated to the memory location provided by `jump_address_id`. If it is false, `PC` is incremented by 4 to execute the next instruction
##### InstructionFetchTest.scala
```scala
class InstructionFetchTest extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("InstructionFetch of Single Cycle CPU")
it should "fetch instruction" in {
test(new InstructionFetch).withAnnotations(TestAnnotations.annos) { c =>
val entry = 0x1000
var pre = entry
var cur = pre
c.io.instruction_valid.poke(true.B)
var x = 0
for (x <- 0 to 100) {
Random.nextInt(2) match {
case 0 => // no jump
cur = pre + 4
c.io.jump_flag_id.poke(false.B)
c.clock.step()
c.io.instruction_address.expect(cur)
pre = pre + 4
case 1 => // jump
c.io.jump_flag_id.poke(true.B)
c.io.jump_address_id.poke(entry)
c.clock.step()
c.io.instruction_address.expect(entry)
pre = entry
}
}
```
In the given example, a random number is generated. If this random number is 0, the program continues without any jump, and the Program Counter (`PC`) simply increments by 4 (to `pre + 4`). Conversely, if the random number is 1, the program executes a jump to the entry address
##### Waveform
* jump_flag_id set to 1


When `jump_flag_id` is set to `1`, you can observe that instead of incrementing PC by `4` to become `0x1012`, it directly jumps to `0x1000` from its original memory Address at `0x1008`
* jump_flag_id set to 0


You can observe that when `jump_flag_id` is set to `0`, the PC memory address transitions from `0x1000` to `0x1004` after the next clock cycle, following the `PC+4 `
#### 3.3.2. Instruction Decode test
In the `ID` stage, an input signal `instruction` is decoded by the `ID` unit, generating various control signals for the circuit,After completing the `ID` module, you will obtain a total of `10` complete outputs。
##### InstructionDecodeTest.scala
```scala
class InstructionDecoderTest extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("InstructionDecoder of Single Cycle CPU")
it should "produce correct control signal" in {
test(new InstructionDecode).withAnnotations(TestAnnotations.annos) { c =>
c.io.instruction.poke(0x00a02223L.U) // S-type
c.io.ex_aluop1_source.expect(ALUOp1Source.Register)
c.io.ex_aluop2_source.expect(ALUOp2Source.Immediate)
c.io.regs_reg1_read_address.expect(0.U)
c.io.regs_reg2_read_address.expect(10.U)
c.clock.step()
c.io.instruction.poke(0x000022b7L.U) // lui
c.io.regs_reg1_read_address.expect(0.U)
c.io.ex_aluop1_source.expect(ALUOp1Source.Register)
c.io.ex_aluop2_source.expect(ALUOp2Source.Immediate)
c.clock.step()
c.io.instruction.poke(0x002081b3L.U) // add
c.io.ex_aluop1_source.expect(ALUOp1Source.Register)
c.io.ex_aluop2_source.expect(ALUOp2Source.Register)
c.clock.step()
}
}
}
```
The above code verifies three instructions: `S-type`, `lui`, and `add`. I added two signals,` memory_read_enable` and `memory_write_enable`, in the InstructionDecoder.scala file,
and the above test case lacks testing for`memory_write_enable`. Perhaps, additional test cases can be added for `memory_write_enable` as part of completing Assignment 3
##### Waveform

**S-type** Waveform

**lui** Waveform

**add** Waveform



#### 3.3.3. Execute test
Based on `Execute.scala`, this stage is primarily composed of two modules: `ALU` and `ALU Control`. `ALU Control` is responsible for generating `opcode`, `funct3`, and `funct7`. Subsequently, ALU performs operations using the code it generates, resulting in output signals `if_jump_flag` and `if_jump_address`
##### ExecuteTest.scala
```scala
class ExecuteTest extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("Execution of Single Cycle CPU")
it should "execute correctly" in {
test(new Execute).withAnnotations(TestAnnotations.annos) { c =>
c.io.instruction.poke(0x001101b3L.U) // x3 = x2 + x1
var x = 0
for (x <- 0 to 100) {
val op1 = scala.util.Random.nextInt(429496729)
val op2 = scala.util.Random.nextInt(429496729)
val result = op1 + op2
val addr = scala.util.Random.nextInt(32)
c.io.reg1_data.poke(op1.U)
c.io.reg2_data.poke(op2.U)
c.clock.step()
c.io.mem_alu_result.expect(result.U)
c.io.if_jump_flag.expect(0.U)
}
// beq test
c.io.instruction.poke(0x00208163L.U) // pc + 2 if x1 === x2
c.io.instruction_address.poke(2.U)
c.io.immediate.poke(2.U)
c.io.aluop1_source.poke(1.U)
c.io.aluop2_source.poke(1.U)
c.clock.step()
// equ
c.io.reg1_data.poke(9.U)
c.io.reg2_data.poke(9.U)
c.clock.step()
c.io.if_jump_flag.expect(1.U)
c.io.if_jump_address.expect(4.U)
// not equ
c.io.reg1_data.poke(9.U)
c.io.reg2_data.poke(19.U)
c.clock.step()
c.io.if_jump_flag.expect(0.U)
c.io.if_jump_address.expect(4.U)
}
}
}
```
I have added the signal assignments for `alu.io.func`, `alu.io.op1`, and `alu.io.op2` in Execute that were previously incomplete. This test is conducted to verify three types of operations: `x1+x2=x3`, `equ (equal)`, and `not equ (not equal)`
##### Waveform
**X3=X1+X2**

**beq**

**not beq**

## 4. HomeWork2 Assembly Code Adapt on MyCPU
### 4.1. Modify the origin homework2 code
Because the single-cycle CPU lacks system calls, I will remove the `ecall`, `rdcycle`, and` rdcycleh` instructions, and instead, I will add the `start` and `loop` label。
```Assembly
.global itof_clz
.global _start
_start:
la t0, num
lw a0, 12(t0)
lw a1, 8(t0)
jal itof_clz
li t0,1
li t1,2
li t2,3
loop:
j loop
```
### 4.2. Test my RISC-V assembly
I'm writing my program in CPUtest, and here is my test program
```scala
class itof_clzTest extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("Single Cycle CPU")
it should "convert integer to floating point" in {
test(new TestTopModule("itof_clz.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
for (i <- 1 to 500) {
c.clock.step(1000) // Avoid timeout
c.io.mem_debug_read_address.poke((i * 4).U) // Assume the converted result is stored in memory sequentially
}
c.io.regs_debug_read_address.poke(10.U)
println(s"${c.io.regs_debug_read_data.peek()}")
c.io.regs_debug_read_data.expect(1088462400.U)
c.io.regs_debug_read_address.poke(11.U)
println(s"${c.io.regs_debug_read_data.peek()}")
c.io.regs_debug_read_data.expect(0.U)
}
}
}
```
The main goal is to test whether my integer can be converted into IEEE-754 floating point.
run single test command
```shell
$sbt "testOnly riscv.singlecycle.itof_clzTest"
```
so I run this test Program get Success output message
```shell
[info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21)
[info] loading settings for project ca2023-lab3-build from plugins.sbt ...
[info] loading project definition from /home/chihen/ca2023-lab3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/home/chihen/ca2023-lab3/)
UInt<32>(1088462400)
UInt<32>(0)
[info] itof_clzTest:
[info] Single Cycle CPU
[info] - should convert integer to floating point
[info] Run completed in 18 seconds, 664 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 19 s, completed Nov 29, 2023, 8:36:22 PM
```
| Input | Output |
| -------- | -------- |
| 0x84f2 | UInt<32>(1088462400) |
My output valid by IEEE754 converter is correct
### 4-3 Using Verilator to Run the Assembly and Visualizing using waveforms
Use the following command to generate the simulation executable file of the CPU
```shell
$make verilator
$./run-verilator.sh -instruction src/main/resources/itof_clz.asmbin -time 4000 -vcd itofclz01.vcd
```
Output:
```shell
-time 4000
-memory 1048576
-instruction src/main/resources/itof_clz.asmbin
[-------------------->] 100%
```
#### Waveform
Using an online [RISC-V instruction encoder/decoder](https://luplab.gitlab.io/rvcodecjs/#q=sw+a0,+0(sp)&abi=false&isa=AUTO) allows us to quickly understand the registers behind the instructions and easily determine their memory locations, enabling us to better observe the waveform variations
* R-type
**sub a3,a3,a2**
Assembly =**sub x13, x13, x12**
Binary =`0100 0000 1100 0110 1000 0110 1011 0011`
Hexadecimal =`0x40c686b3`
ID stage

`io_reg_write_enable` is used to indicate whether `R-Type` instructions should write to IO device registers
EX stage

`alu_op` has successfully retrieved the value from the register and is ready to perform operations using it
Reg

For this stage, after the `clock` enters the next phase, the values in the registers will undergo a change.
* I-type
**lw a0,8(t0)**
Assembly =**lw x10, 12(x5)**
Binary =`0000 0000 1100 0010 1010 0101 0000 0011`
Hexadecimal =`0x00c2a503`
ID stage

`mem_read_enable` and `reg_write_enable` have been set to extract data from memory addresses and prepare for writing into registers.
Ex stage

We can obtain the address `00001308` read from the registers
Reg stage

For this stage, after the clock enters the next phase, the values in the registers will undergo a change
* S-type
**sw a0, 0(sp)**
Assembly =**sw x10, 0(x2)**
Binary =`0000 0000 1010 0001 0010 0000 0010 0011`
Hexadecimal =`0x00a12023`
ID stage

`io_reg_write_enable` is used to indicate whether `S-Type` instructions should write to IO device registers
EX stage

We can observe that `alu_op` is determined by the changes in `alu_op_source`, which in turn affects the data in the register
## 5.Conculsion
Through this practical assignment, I have come to realize my own shortcomings and have learned a new programming language. Going through Lab 3 step by step to understand the architecture of a `single-cycle CPU` has given me a deeper understanding of the essence of computer architecture and its design. Perhaps in the future, there may be assignments related to `GPU` design that will allow us to delve even further into the implications and principles behind computer components. I also look forward to continuously learning through the guidance of our teacher and pushing myself to bridge the significant gap between myself and those who excel in the field.
## 6.Reference
[Construct a single-cycle RISC-V CPU with Chisel](https://hackmd.io/CvrOEhLKSxOJKdblTjhEqQ?view)
[Single-Cycle Processor](https://hackmd.io/@joanne8826/S1jWF0it8)
[Building a RISC-V Processor](https://docs.google.com/presentation/d/1SbeyDTycsb97201QvzxGa4CmE9bmRDyd/edit#slide=id.p2)
[Datapath Control](https://docs.google.com/presentation/d/1UvXegiqDEGa5IOWMnnybxK4jftMY7MOF/edit#slide=id.p3)
[Chisel Breakdown 3](https://docs.google.com/presentation/d/1gMtABxBEDFbCFXN_-dPyvycNAyFROZKwk-HMcnxfTnU/edit#slide=id.p)