# Assignment3: Single-cycle RISC-V CPU
contributed by < [`paulpeng-popo`](https://github.com/paulpeng-popo/ca2023-lab3) >
## Prerequisites
In order to avoid affecting the original computer environment, a container is set up to provide the experimental environment for [Assignment 3](https://hackmd.io/@sysprog/2023-arch-homework3).
Here, I use Docker for building container.
```Dockerfile
FROM arm64v8/ubuntu:22.04
# set the working directory
WORKDIR /root
# set the environment variable
ENV DEBIAN_FRONTEND=noninteractive
# update the repository sources list
RUN apt update
# install sudo
RUN apt install sudo -y
# create a new user as popo
RUN useradd -ms /bin/bash popo
# add the user to sudo group
RUN usermod -aG sudo popo
# set user popo as sudoer without password
RUN echo "popo ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers
# change the user to popo
USER popo
# change the working directory to home
WORKDIR /home/popo
# set the environment variable
ENV HOME /home/popo
ENV USER popo
ENV PATH $PATH:/home/popo/.local/bin
# install packages
RUN sudo apt install git wget curl xauth dbus-x11 -y
ENTRYPOINT ["/bin/bash"]
```
Then, following the instructions provided in [Lab3: Construct a single-cycle RISC-V CPU with Chisel](https://hackmd.io/@sysprog/r1mlr3I7p#Prerequisites) to install necessary dependency packages and tools.
```sh
$ sudo apt install build-essential verilator gtkwave
$ curl -s "https://get.sdkman.io" | bash
$ sdk install java 11.0.21-tem
$ sdk install sbt
```
```sh
# install scala on aarch64 linux
$ curl -fL https://github.com/VirtusLab/coursier-m1/releases/latest/download/cs-aarch64-pc-linux.gz | gzip -d > cs && chmod +x cs && ./cs setup
# change to scala 2
$ cs install scala:2.13.12 scalac:2.13.12
```
## Hello World in Chisel
```scala
// Hello.scala
class Hello extends Module {
val io = IO(new Bundle {
val led = Output(UInt(1.W))
})
val CNT_MAX = (50000000 / 2 - 1).U;
val cntReg = RegInit(0.U(32.W))
val blkReg = RegInit(0.U(1.W))
cntReg := cntReg + 1.U
when(cntReg === CNT_MAX) {
cntReg := 0.U
blkReg := ~blkReg
}
io.led := blkReg
}
```
The module has two registers, `cntReg` and `blkReg`, both initialized with zero values. `cntReg` is a 32-bit counter that increments by 1 in each clock cycle. When `cntReg` reaches a certain value (CNT_MAX), it resets to zero, and `blkReg` toggles its value.
I test [`Hello.scala`](https://hackmd.io/@sysprog/r1mlr3I7p#Hello-World-in-Chisel) on [chisel-template](https://github.com/freechipsproject/chisel-template)
For testing convenience, I have reduced the number from 50,000,000 to 10, making it easier to observe the differences in the output.
The updated `CNT_MAX` value is now set to 4
Then create `HelloSpec.scala` in `scr/test/scala/example`
```scala
// HelloSpec.scala
class HelloSpec extends AnyFreeSpec with ChiselScalatestTester {
"Hello" in {
test(new Hello) { hello =>
for (clk <- 0 until 10) {
hello.clock.step(1)
val led = hello.io.led.peek()
println(s"clk: $clk, led: $led")
}
}
}
}
```
It checks whether the module correctly simulates by stepping the simulation forward for 10 clock cycles and printing the values of the `clk` and `led` signals at each step.
```sh
$ sbt "testOnly example.HelloSpec"
```
The output would look like:
```
clk: 0, led: UInt<1>(0)
clk: 1, led: UInt<1>(0)
clk: 2, led: UInt<1>(0)
clk: 3, led: UInt<1>(0)
clk: 4, led: UInt<1>(1)
clk: 5, led: UInt<1>(1)
clk: 6, led: UInt<1>(1)
clk: 7, led: UInt<1>(1)
clk: 8, led: UInt<1>(1)
clk: 9, led: UInt<1>(0)
```
### Enhancement
> Using `when` blocks in Hardware Description Language (HDL) designs is not necessarily something that should be avoided in all cases. However, in some situations, particularly when dealing with simple state machines or conditional assignments, it might be more readable and synthesizable to use multiplexers (muxes) instead of `when` blocks.
>
> The primary reason for preferring `muxes` in some cases is that they directly map to hardware multiplexing structures, which synthesis tools can often recognize and implement more efficiently. This is especially true for simple conditions or state machines where a `mux` can directly represent the selection of one value from several inputs.
>
> --- From ChatGPT ---
```diff=
- cntReg := cntReg + 1.U
- when(cntReg === CNT_MAX) {
- cntReg := 0.U
- blkReg := ~blkReg
- }
+ cntReg := Mux(cntReg === CNT_MAX, 0.U, cntReg + 1.U)
+ blkReg := blkReg ^ (cntReg === CNT_MAX)
```
Here, a multiplexer is employed to determine whether the counter should increment or reset to zero. Simultaneously, a logical `XOR` operation is utilized to toggle the state of blkReg. The `XOR` operation ensures that blkReg changes its state whenever the counter is reset, providing the desired functionality without using a `when` block.
## Single Cycle RISC-V CPU (MyCPU)
### Overview of Implementation


1. Instruction Fetch: Fetching the instruction data from memory.
2. Decode: Understanding the meaning of the instruction and reading register data.
3. Execute: Calculating the result using the ALU.
4. Memory Access (load/store instructions): Reading from and writing to memory.
5. Write-back (for all instructions except store): Writing the result back to registers.
### Check waveform by [GTKWave](https://gtkwave.sourceforge.net/)
```sh
$ WRITE_VCD=1 sbt test
$ gtkwave test_run_dir/<xxx>/<xxx>.vcd
```
### Instruction Fetch

Instruction fetch stage does:
- Fetch the instruction from memory based on the current address in the PC register.
- Modify the value of the PC register to point to the next instruction.
The PC register is initially set to the entry address of the program. Upon encountering a valid instruction, the CPU fetches the instruction located at the address specified by the PC. If a jump is necessary, the CPU checks the `jump_flag_id` to determine whether a jump should be taken. If a jump is required, the PC is then updated with the address specified by `jump_address_id`. Otherwise, the PC is incremented by 4 to move to the next sequential instruction.

PC initiates at address `0x1000`. In the first test case, where no jump occurs, the PC advances to fetch the next instruction by incrementing to PC + 4. Subsequently, in the second test case, a jump to address `0x1000` is executed, causing the PC to update its value to `0x1000` during the next clock cycle.
### Instruction Decode

Decode stage does:
- Read the opcode to determine instruction type and field lengths
- Read in data from all necessary registers
- for `add`, read two registers
- for `addi`, read one register
- for `jal`, no reads are necessary
- Output control signals

At this stage, 8 signals need to be generated, and the remaining two outputs, namely `memory_read_enable` and `memory_write_enable`, have not been implemented yet.
These two signals appear to be associated with load and store instructions.
To finalize their implementation, we can easily configure `memory_read_enable` to be `true.B` when processing L type instructions, and set `memory_write_enable` to `true.B` for S type instructions; otherwise, the default value remains `false.B`.
:::warning
A warning occurs during compilation:
> method apply in object **MuxLookup is deprecated** (since Chisel 3.6): Use **MuxLookup(key, default)(mapping)** instead
To address this warning, simply relocate the mapping sequence section to eliminate the deprecation message.
```scala
val immediate = MuxLookup(
opcode,
Cat(..., ...)
) {
IndexedSeq(
...,
...
)
}
```
:::

Three test cases:
- 0x00a02223 (S-type)
- 0x000022b7 (lui)
- 0x002081b3 (add)
```scala
object InstructionTypes {
val L = "b0000011".U // 0x3
val I = "b0010011".U
val S = "b0100011".U // 0x23
val RM = "b0110011".U
val B = "b1100011".U
}
```
According to our design specification, when the opcode is `0x3`, the signal `memory_read_enable` should be set to `true.B`, and when the opcode is `0x23`, the signal `memory_write_enable` should be set to `true.B`. The waveform chart above conveniently validates this behavior.
### Execution

Execution stage does:
- Perform ALU computation.
- Determine if there is a branch.
The control line for the ALU, denoted as `alu.io.func`, is derived from the output of the ALU control module, specifically `alu_ctrl.io.alu_funct`. Additionally, the two inputs of the ALU are determined by the control lines `aluop1_source` and `aluop2_source`. These control lines drive the corresponding inputs through two Muxes.

Initially, there are some test cases that involve the **ADD** instruction, aiming to evaluate the normal functioning of the ALU. The final two tests involve the **BEQ** instruction, assessing both jump and non-jump scenarios. In the case where the jump is taken, the program counter advances to PC + 2, equivalent to `0x4`.
### Combining into a CPU
With the completion of modules for each stage, the subsequent phase involves connecting the inputs and outputs of these stages. Once this integration is accomplished, the single-cycle RISC-V CPU will be considered complete.
```
[info] Run completed in 10 seconds, 938 milliseconds.
[info] Total number of tests run: 9
[info] Suites: completed 7, aborted 0
[info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 12 s, completed Dec 3, 2023, 12:05:01 AM
```
## Make handwritten RISC-V assembly code functions correctly on "MyCPU"