# Assignment3: single-cycle RISC-V CPU
contributed by < [`st10740`](https://github.com/st10740) >
## Environment
- VMware Ubuntu 22.04
## Chisel Bootcamp
### Use Docker to Run Chisel Bootcamp
After I installed Docker Engine on Ubuntu, I tried to run `docker run -it --rm -p 8888:8888 sysprog21/chisel-bootcamp` to download `chisel-bootcamp` docker image and run it. I got the following error message:
```shell
docker: permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/containers/create": dial unix /var/run/docker.sock: connect: permission denied.
See 'docker run --help'.
```
According to ChatGPT's advice, the error message indicated that when attempting to run a Docker command, the system is denying access to the Docker daemon socket (`/var/run/docker.sock`) due to insufficient permissions. This is typically because the user has not been added to the docker group, which has the necessary permissions to access the Docker daemon. So I took the following steps to resolve the problem.
- Add the User to the Docker Group:
```shell
$ sudo usermod -aG docker $USER
```
- Apply Group Changes:
```shell
$ sudo systemctl start docker
```
The problem was solved at the first time, but I sometimes encountered the same error while running the command. Therefore, I added `sudo` in front of the original command, which looks like below:
```shell
$ sudo docker run -it --rm -p 8888:8888 sysprog21/chisel-bootcamp
```
However, after I successfully ran the docker image, I couldn't use the ip, port and tocken it gave to open the Jupyter Notebook page. I also try to pull `ucbbar/chisel-bootcamp` instead but still couldn't work. I was thinking it might be a problem with Jupyter notebook IP setting in Docker. But this problem had been solved in `Dockerfile` of this line:
```dockerfile
CMD jupyter notebook --no-browser --ip 0.0.0.0 --port 8888
```
Surprisingly, the problem was solved after I restarted the VM and ran `ucbbar/chisel-bootcamp` next day.
### My Note
:::spoiler 2.1_first_module
#### Problem
I encountered the same problem as this issue [Failed to resolve ivy dependencies:/coursier_cache/.structure.lock (Permission denied)](https://github.com/freechipsproject/chisel-bootcamp/issues/140) and followed the recommmand below to resolve it.
- Check running container id
```shell
$ dokcer ps -q
```
Output : c299bdcceac4
- Change the owener of `/coursier_cache` to `bootcamp:bootcamp` (The docker image should be ran at the same time while executing)
```
docker exec --user root -it c299bdcceac4 chown -R bootcamp:bootcamp /coursier_cache
```
:::
:::spoiler 2.2_comb_logic
- Debugging with `print` statement
- print during simulation
- `printf("Print during simulation: Input is %d\n", io.in)`
- `printf(p"Print during simulation: IO is $io\n")`
- print during generation
- `println(s"Print during generation: Input is ${io.in}")`
- print during testing
- `println(s"Print during testing: Input is ${c.io.in.peek()}")`
```scala
class PrintingModule extends Module {
val io = IO(new Bundle {
val in = Input(UInt(4.W))
val out = Output(UInt(4.W))
})
io.out := io.in
printf("Print during simulation: Input is %d\n", io.in)
// chisel printf has its own string interpolator too
printf(p"Print during simulation: IO is $io\n")
println(s"Print during generation: Input is ${io.in}")
}
test(new PrintingModule ) { c =>
c.io.in.poke(3.U)
c.clock.step(5) // circuit will print
println(s"Print during testing: Input is ${c.io.in.peek()}")
}
```
:::
:::spoiler 2.3_control_flow
#### `when`, `elsewhen`, `otherwise`
```scala
when(someBooleanCondition) {
...
}.elsewhen(someOtherBooleanCondition) {
...
}.otherwise {
...
}
```
#### `Wire`
- `Wire` defines a circuit component that can appear on the right hand side or left hand side of a connect `:=` operator.
- Can be used to store data temporarily.
#### `Enum`
```scala
val idle :: coding :: writing :: grad :: Nil = Enum(4)
```
idle -> 0, coding -> 1, writing -> 2, grad -> 3
:::
:::spoiler 2.4_sequential_logic
#### Reigsters
- `Reg`
- `val myReg = Reg(UInt(12.W))`
- e.g.,
```scala
val register = Reg(UInt(12.W))
register := io.in + 1.U
io.out := register
```
- `RegNext`
- e.g.,
```scala
io.out := RegNext(io.in + 1.U)
```
- `RegInit`
- e.g.,
```scala
val myReg = RegInit(UInt(12.W), 0.U)
val myReg = RegInit(0.U(12.W))
```
- The register is still initialized to random junk before reset is called.
#### Registers Test
`step(1)` tells the test harness to tick the clock once, which will cause the register to pass its input to its output.
```scala
test(new RegisterModule) { c =>
for (i <- 0 until 100) {
c.io.in.poke(i.U)
c.clock.step(1)
c.io.out.expect((i + 1).U)
}
}
```
:::
:::spoiler 2.5_exercise
None.
:::
::: spoiler 2.6_chiseltest
#### `Decoupled` interfaces
`Decoupled` takes a chisel data type and provides it with `ready` and `valid` signals.
Producer triggers `valid`, consumer triggers `ready`
#### ChiselTest
- `enqueueNow` : Add (enqueue) one element to a Decoupled input interface.
- `expectDequeueNow` : Removes (dequeues) one element from a Decoupled output interface.
- `enqueueSeq` : Continues to add (enqueue) elements from the Seq to a Decoupled input interface, one at a time, until the sequence is exhausted
- `expectDequeueSeq` : Removes (dequeues) elements from a Decoupled output interface, one at a time, and compares each one to the next element of the Seq
#### `Bundle` literal notation
```scala
class GcdInputBundle(val w: Int) extends Bundle {
val value1 = UInt(w.W)
val value2 = UInt(w.W)
}
```
Initialize a bundle `GcdInputBundle` and assign values to its fields `value1` and `value2`.
```scala
new GcdInputBundle(16).Lit(_.value1 -> x.U, _.value2 -> y.U)
```
:::
::: spoiler 3.1_parameters
:::
::: spoiler 3.2_collections
:::
::: spoiler 3.2_interlude
:::
::: spoiler 3.3_higher-order_functions
:::
::: spoiler 3.4_functional_programming
:::
::: spoiler 3.5_object_oriented_programming
:::
::: spoiler 3.6_types
:::
### Hello World in Chisel
The code represents a module that toggles its output `led` state at a specific frequency determined by the counter. The `CNT_MAX` value sets the period of toggling. Everytime the number of clock cycles reachs `CNT_MAX`, the output state changes.
```scala
class Hello extends Module {
val io = IO(new Bundle {
val led = Output(UInt(1.W))
})
val CNT_MAX = (50000000 / 2 - 1).U;
val cntReg = RegInit(0.U(32.W))
val blkReg = RegInit(0.U(1.W))
cntReg := cntReg + 1.U
when(cntReg === CNT_MAX) {
cntReg := 0.U
blkReg := ~blkReg
}
io.led := blkReg
}
```
## Single-cycle RISC-V CPU

There are a little lines of code need to be added in [ca2023-lab3](https://github.com/sysprog21/ca2023-lab3) to complete MyCPU. Below is the files I added code to:
- `src/main/scala/riscv/core/InstructionFetch.scala`
- `src/main/scala/riscv/core/InstructionDecode.scala`
- `src/main/scala/riscv/core/Execute.scala`
- `src/main/scala/riscv/core/CPU.scala`
After adding the missing code, I ran all the tests to ensure that it performs correctly.
```shell
$ sbt test
```
And the last few lines of output are:
```shell
[info] Run completed in 1 minute, 13 seconds.
[info] Total number of tests run: 9
[info] Suites: completed 7, aborted 0
[info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 77 s (01:17), completed Dec 2, 2023, 2:29:05 PM
```
The full changes can be seen in my [fork repository](https://github.com/st10740/ca2023-lab3).
### Instruction Fetch

The test case of `src/test/scala/riscv/singlecycle/InstructionFetchTest.scala` does the following steps:
- Run the test 100 times, with each test randomizing a number between `0` and `1`.
- If it's `0` : set the input value of `io.jump_flag_id` to `false.U`, indicating that no jump to another instruction should occur. So `pc` should become `pc+4`.
- If it's `1` : set the input values of `io.jump_flag_id` to `true.U` and `io.jump_address_id` to `0x1000`, indicating a jump to address `0x1000`. So `pc` should become `0x1000`.
The purpose of these unit tests are to verify that the PC is updated correctly in scenarios involving jump instructions and other instructions.
#### Waveform

The `io_instruction_address` currently stores the address of the current instruction as `00001004`. With `io_jump_flag_id` set to `1` and `io_jump_address_id` set to `00001000`, it indicates that upon the next rising edge of the clock, `io_instruction_address` will be updated to `00001000`.
### Instruction Decode

The test case of `src/test/scala/riscv/singlecycle/InstructionDecoderTest.scala` does the following steps:
- Set the input value of `io.instruction` to `0x00a02223L.U` indicating the instruction `sw x0 0(x10)`.
- Check the value of `io.ex_aluop1_source`, `io.ex_aluop2_source`,`io.regs_reg1_read_address.` and `io.regs_reg2_read_address` are correct.
- Set the input value of `io.instruction` to `0x000022b7L.U` indicating the instruction `lui x5, 0x00002`
- Check the value of `io.regs_reg1_read_address`, `io.ex_aluop1_source` and `io.ex_aluop2_source` are correct.
- Set the input value of `io.instrunction` to `0x002081b3L.U` indicating the instruction `add x3, x1, x2`
- Check the value of `io.ex_aluop1_source` and `io.ex_aluop2_source` are correct.
The purpose of these unit tests are to confirm the accurate decoding of the control signals for ALUOp1, ALUOp2, and the register ID of the source register. However, it does not encompass the verification of other signals and register information, such as the register ID of the destination register, immediate values, and so on.
#### Waveform

The `io_instruction` stores the current instruction as `00A02223`. (`sw x0, 4(x10)`)
```
0000000 | 01010 | 00000 | 010 | 00100 | 0100011
imm[11:5] | rs2 | rs1 | funct3 | imm[4:0] | opcode
```
The output of decoder is set to corresponding signal value according to `io_instruction`.
### Execution

The test case of `src/test/scala/riscv/singlecycle/ExecuteTest.scala` does the following steps:
- Set the input value of `io.instruction` to `0x001101b3L.U` indicating the instruction `add x3, x2, x1`.
- Run the test 100 times, with each test randomizing two integer numbers between 0 and 429496729, and set them as the input for `io.reg1_data` and `io.reg2_data`. Finally, verify that the output corresponds to the expected result, which is the sum of these two numbers.
- Set the input value of `io.instruction` to `0x00208163L.U` and other input signals indicating the instruction `beq x1, x2, pc+2`.
- Equal case : set `io.reg1_data` to `9.U` and `io.reg2_data` to `9.U` --> check if PC jumps to the target address
- Not equal case : set `io.reg1_data` to `9.U` and `io.reg2_data` to `19.U` --> check if PC jumps to PC + 4
The purpose of these unit tests is to verify that the ALU functions correctly with its inputs in the case of R-type instructions, and to ensure that SB type instructions update its PC to the correct address. However, it does not cover cases with different combinations of ALUOp1 and ALUOp2. I initially thought I was correct when I added the missing code incorrectly in `Execute.scala` and passed the `ExecuteTest.scala` test. However, upon running the entire test suite with sbt test, I received three failed test messages. It was only then that I realized I had added incorrect code in `Execute.scala`. Therefore, I would try to add more unit tests to cover more cases maybe in the future.
#### Waveform

At this moment, the `io_instruction` stores the current instruction as `001101B3` which is an `add` instruction. The `io_reg1_data` stores a random number `13A44FBC` and `io_reg2_data` stores `0D187E2C`. The `io_mem_alu_result` stores the result of summing these two numbers, `20BCCDE8`.
### Register File

The test case of `src/test/scala/riscv/singlecycle/RegisterFileTest.scala` does the following steps:
- Write `0xdeadbeefL.U` into register `x1` and then read from the same register to ensure that the write and read operations function correctly.
- Write `0xdeadbeefL.U` into register `x0` and then read from the same register to ensure that `x0` is alway zero.
#### Waveform
##### Read the Written Content

The `io_write_enable` is set to `1`, the `io_write_address` is set to `02`, the `io_write_data` is set to `DEADBEEF`, and the `io_read_address1` is set to `02` during the 2ps interval, which corresponds to the falling edge of the clock. At the next rising edge of the clock, the `io_read_data1` is updated to `DEADBEEF`, the value previously written to register `02`.
##### x0 Always Be Zero

Despite setting `io_write_enable` to `1` and configuring `io_write_address` as `00`, with `io_write_data` set to `DEADBEEF` during the 2ps interval — intending to write `DEADBEEF` to register `00` — the value read from register `00` during the next rising edge of the clock remains `00000000`. This is due to the persistent zero value in register `x0`.
### CPU
## Handwritten RISC-V Assembly Code in Homework2 on the Single-Cycle RISC-V CPU
### Modify Homework2 C Code
I modified [Homework2 C code](https://github.com/st10740/Computer-Architecture-HW/blob/main/HW2/c/palindrome_detection_using_CLZ_modified.c) `palindrome_detection_using_CLZ_modified.c` to ensure its proper functionality on MyCPU by removing `RDCYCLE/RDCYCLEH` instructions and `printf`.
### Prepare GNU Toolchain for RISC-V
I followed the steps introduced in [Lab2](https://hackmd.io/@sysprog/SJAR5XMmi) to prepare GUN Toolchain for compiling `palindrome_detection_using_CLZ_modified.c` into executable file that MyCPU can execute.
### Generate the RISC-V Programs Utilized for Unit Tests
- Copy `palindrome_detection_using_CLZ_modified.c` into `/csrc` and rename it as `hw2.c`
- Change to the `csrc` directory
- Modify `Makefile` in `csrc` to compile `hw2.c` into `hw2.asmbin` for unit test usage
```diff
BINS =
fibonacci.asmbin \
hello.asmbin \
mmio.asmbin \
quicksort.asmbin \
sb.asmbin \
+ hw2.asmbin
```
- Run the `make` command
```shell
$ make
```
- Run the `make update` command to copy new generated `hw2.asmbin` file into `src/main/resources`
```shell
$ make update
```
The summary of what `Makefile` does can be seen below:
`.c` $\to$ `.o` $\to$ `.S` $\to$ `.elf` $\to$ `.asmbin`
Because only the code and data segments from the ELF file are required, The `objcopy` tool of the final step is to duplicate the code and data segments into a distinct file, resulting in a file containing solely binary code and data.
You can check the detailed explaination of what & why `Makefile` does in [Lab3 - Prepare-Programs-to-Run-on-MyCPU](https://hackmd.io/@sysprog/r1mlr3I7p#Prepare-Programs-to-Run-on-MyCPU).
### Unit Test
- `hw2.c`
```clike!
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
// count how many zeros forwards input number
uint16_t count_leading_zeros(uint64_t x)
{
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
x |= (x >> 32);
/* count ones (population count) */
x -= ((x >> 1) & 0x5555555555555555 );
x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333);
x = ((x >> 4) + x) & 0x0f0f0f0f0f0f0f0f;
x += (x >> 8);
x += (x >> 16);
x += (x >> 32);
return (64 - (x & 0x7f));
}
int palindrome_detected(uint64_t x, int clz){
uint64_t nob = (64 - clz);
uint64_t checkEven = nob % 2;
uint64_t left = x >> (nob/2+checkEven);
uint64_t revRight = 0x0;
/* Reverse the right half of the input x*/
for(int i=0; i<nob/2; i++) {
uint64_t rightestBit = x & 0x1;
revRight = revRight << 1;
revRight |= rightestBit;
x = x >> 1;
}
return (left==revRight) ? 1 : 0;
}
int main(){
uint64_t test = 0x00000C0000000003; //test is palindrome
*((volatile int *) (4)) = palindrome_detected(test, count_leading_zeros(test));
}
```
- `CPUTest.scala`
```scala!
class Hw2Test extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("Single Cycle CPU")
it should "calculate palindrome_detected(0xC0000000003, count_leading_zeros(0xC0000000003))" in {
test(new TestTopModule("hw2.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
for (i <- 1 to 50) {
c.clock.step(1000)
c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
}
c.io.mem_debug_read_address.poke(4.U)
c.clock.step()
c.io.mem_debug_read_data.expect(1.U)
}
}
}
```
#### Procedure
- Go back to the root directory of the project
- Run the test for new added program
```shell
$ sbt "testOnly riscv.singlecycle.Hw2Test"
```
:::warning
Don't forget to generate new RISC-V program everytime you modify C code before run test or you will get old version RISC-V program and fail the test.
:::
- Results
```shell!
[info] Hw2Test:
[info] Single Cycle CPU
[info] - should calculate palindrome_detected(0xC0000000003, count_leading_zeros(0xC0000000003))
[info] Run completed in 2 minutes, 54 seconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 206 s (03:26), completed Dec 28, 2023, 1:42:28 AM
```
### Execute My Program Using Verilator
#### Procedure
- Generate Verilog files
```shell
$ make verilator
```
- Load the `hw2.asmbin` file, simulate for 1000 cycles, and save the simulation waveform to the dump.vcd file
```shell
$ /run-verilator.sh -instruction src/main/resources/hw2.asmbin -time 2000 -vcd dump.vcd
```
- Open `dump.vcd` using GTKWave to see waveform
####
## Reference
- [Assignment3: single-cycle RISC-V CPU](https://hackmd.io/@sysprog/2023-arch-homework3)
- [Lab3: Construct a single-cycle RISC-V CPU with Chisel](https://hackmd.io/@sysprog/r1mlr3I7p)