# Assignment3: single-cycle RISC-V CPU
contributed by < [`GliAmanti`](https://github.com/GliAmanti) >
## Installation
My OS: **``Ubuntu 22.04 LTS``**
### Prepare GTKWave for Evaluation
```
sudo apt install build-essential verilator gtkwave
```
### Prepare sbt for Running Scala Code
1. Install [SDKMAN](https://sdkman.io/).
```
curl -s "https://get.sdkman.io" | bash
```
2. Open a new terminal window and run the following command.
```
source "/home/cgvsl/.sdkman/bin/sdkman-init.sh"
```
3. Install JDK and sbt by using [SDKMAN](https://sdkman.io/).
```
sdk install java 11.0.21-tem
sdk install sbt
```
:::success
Remember to repeat step **2** every time you open a new terminal to run sbt.
:::
## Describtion of [Hello World in Chisel](https://hackmd.io/@sysprog/r1mlr3I7p#Hello-World-in-Chisel)
```scala
class Hello extends Module {
val io = IO(new Bundle {
val led = Output(UInt(1.W))
})
val CNT_MAX = (50000000 / 2 - 1).U;
val cntReg = RegInit(0.U(32.W))
val blkReg = RegInit(0.U(1.W))
cntReg := cntReg + 1.U
when(cntReg === CNT_MAX) {
cntReg := 0.U
blkReg := ~blkReg
}
io.led := blkReg
}
```
* ``led`` is the output of ``Hello`` class.
* ``CNT_MAX`` is the maximum of the counter value.
* ``cntReg`` is a counter.
* ``blkReg`` is the current state of the LED.
``cntReg`` will increase by 1 gradually. When ``cntReg`` equals ``CNT_MAX``, namely ``24999999``, it will be reset to ``0``. And ``blkReg`` will toggle its state. The state of the ``blkReg`` will be assigned to output ``led``.
## Complete Version of [MyCPU (Lab3)](https://hackmd.io/@sysprog/r1mlr3I7p#Implementation)
We only have to fill the blanks in [``InstructionFetch.scala``](https://github.com/GliAmanti/ComputerArchitecture_HW3/blob/main/src/main/scala/riscv/core/InstructionFetch.scala), [``InstructionDecode.scala``](https://github.com/GliAmanti/ComputerArchitecture_HW3/blob/main/src/main/scala/riscv/core/InstructionDecode.scala), [``Execute.scala``](https://github.com/GliAmanti/ComputerArchitecture_HW3/blob/main/src/main/scala/riscv/core/Execute.scala) and [``CPU.scala``](https://github.com/GliAmanti/ComputerArchitecture_HW3/blob/main/src/main/scala/riscv/core/CPU.scala).
Here is my [inplementation](https://github.com/GliAmanti/ComputerArchitecture_HW3/tree/main/src/main/scala/riscv/core) of lab3, which is forked from [ca2023-lab3](https://github.com/sysprog21/ca2023-lab3/tree/main).
The following figure is RV32I datapath.

### How to Run
1. Get the repository.
```
git clone https://github.com/GliAmanti/ComputerArchitecture_HW3.git
cd ComputerArchitecture_HW3
```
2. To simulate and run tests for this project, execute the following commands under the ``ComputerArchitecture_HW3`` directory.
```
sbt test
```
::: info
The output message will be:
```
[info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21)
[info] loading settings for project computerarchitecture_hw3-build from plugins.sbt ...
[info] loading project definition from /home/cgvsl/p76111351/computer_architecture/ComputerArchitecture_HW3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/home/cgvsl/p76111351/computer_architecture/ComputerArchitecture_HW3/)
[info] compiling 1 Scala source to /home/cgvsl/p76111351/computer_architecture/ComputerArchitecture_HW3/target/scala-2.13/test-classes ...
[info] ByteAccessTest:
[info] Single Cycle CPU
[info] - should store and load a single byte
[info] InstructionFetchTest:
[info] InstructionFetch of Single Cycle CPU
[info] - should fetch instruction
[info] InstructionDecoderTest:
[info] InstructionDecoder of Single Cycle CPU
[info] - should produce correct control signal
[info] ExecuteTest:
[info] Execution of Single Cycle CPU
[info] - should execute correctly
[info] FibonacciTest:
[info] Single Cycle CPU
[info] - should recursively calculate Fibonacci(10)
[info] QuicksortTest:
[info] Single Cycle CPU
[info] - should perform a quicksort on 10 numbers
[info] RegisterFileTest:
[info] Register File of Single Cycle CPU
[info] - should read the written content
[info] - should x0 always be zero
[info] - should read the writing content
[info] Run completed in 6 seconds, 952 milliseconds.
[info] Total number of tests run: 9
[info] Suites: completed 7, aborted 0
[info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 10 s, completed Nov 30, 2023, 1:24:28 PM
```
:::
3. If you want to run a single test, such as running only ``InstructionDecoderTest``, execute the following command:
```
sbt "testOnly riscv.singlecycle.InstructionDecoderTest"
```
::: info
The output message will be:
```
[info] welcome to sbt 1.9.7 (Eclipse Adoptium Java 11.0.21)
[info] loading settings for project computerarchitecture_hw3-build from plugins.sbt ...
[info] loading project definition from /home/cgvsl/p76111351/computer_architecture/ComputerArchitecture_HW3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/home/cgvsl/p76111351/computer_architecture/ComputerArchitecture_HW3/)
[info] InstructionDecoderTest:
[info] InstructionDecoder of Single Cycle CPU
[info] - should produce correct control signal
[info] Run completed in 2 seconds, 509 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 3 s, completed Nov 30, 2023, 1:29:16 PM
```
:::
### Description of Unit tests
:::success
If you see the icon :chart_with_upwards_trend:, please check the waveform for the details.
:::
#### InstructionFetchTest
This test verifies whether the [``InstructionFetch``](https://github.com/GliAmanti/ComputerArchitecture_HW3/blob/main/src/main/scala/riscv/core/InstructionFetch.scala) module bring ``PC`` to the right address.
:::danger
:warning:
**Refrain from copying and pasting your solution directly into the HackMD note**. Instead, provide a concise summary of the various test cases, outlining the aspects of the CPU they evaluate, the techniques employed for loading test program instructions, and the outcomes of these test cases.
:::
There are 2 cases:
#### 1. No jump
* ``jump_flag_id`` will be set to ``false``, then ``PC := PC + 4``.
* :chart_with_upwards_trend: ``jump_flag_id`` is 0, so ``instruction_address`` change from ``1000`` to ``1004``.
#### 2. Jump
* ``jump_flag_id`` will be set to ``true``, then ``PC := jump_address_id``.
* In this test, ``PC`` will jump to ``entry``, namely ``0x1000``.
* :chart_with_upwards_trend: ``jump_flag_id`` is 1, so ``instruction_address`` change from ``1008`` to ``1000``.
##### Waveform

#### InstructionDecoderTest
This test verifies whether the [``InstructionDecoder``](https://github.com/GliAmanti/ComputerArchitecture_HW3/blob/main/src/main/scala/riscv/core/InstructionDecode.scala) module accurately distinguishs the **opcode**, passing the right data for each corresponding instruction.
:::danger
:warning:
**Refrain from copying and pasting your solution directly into the HackMD note**. Instead, provide a concise summary of the various test cases, outlining the aspects of the CPU they evaluate, the techniques employed for loading test program instructions, and the outcomes of these test cases.
:::
There are many cases but this test only verifies 3:
#### 1. S-type
* This kind of instruction adds the contents of ``rs1`` and ``simm12``, regarding the result as memory address, so:
``aluop1_source := ALUOp1Source.Register``. That is, choosing gate ``0``.
``aluop2_source := ALUOp2Source.Immediate``. That is, choosing gate ``1``.
:::success
According to the definition in [``InstructionDecoder``](https://github.com/GliAmanti/ComputerArchitecture_HW3/blob/main/src/main/scala/riscv/core/InstructionDecode.scala).
```scala
object ALUOp1Source {
val Register = 0.U(1.W)
val InstructionAddress = 1.U(1.W)
}
object ALUOp2Source {
val Register = 0.U(1.W)
val Immediate = 1.U(1.W)
}
```
:::
:::success
[SW/SH/SB](https://ithelp.ithome.com.tw/articles/10268196)
```
sw/sh/sb rs2, rs1, simm12
```
:::
* :chart_with_upwards_trend: We can distinguish the instruction ``00A02223`` as S-type by the last 7 bits in binary ``010 0011``. So ``memory_write_enable`` will be set to ``1``. ``reg1_read_address = 0`` and ``immediate = 4`` will be transported to the next stage.
#### 2. lui
* This instruction loads ``uimm20`` to upper 20 bits of ``rd``, and sets the rest of bits to 0, so:
``aluop1_source := ALUOp1Source.Register``. That is, choosing gate ``0``.
``aluop2_source := ALUOp2Source.Immediate``. That is, choosing gate ``1``.
:::success
[LUI (Load upper immediate)](https://ithelp.ithome.com.tw/articles/10268196)
```
lui rd, uimm20
```
:::
#### 3. add
* This instruction adds the contents of ``rs1`` and ``rs2``, so:
``aluop1_source := ALUOp1Source.Register``. That is, choosing gate ``0``.
``aluop2_source := ALUOp2Source.Register``. That is, choosing gate ``0``.
:::success
[ADD](https://ithelp.ithome.com.tw/articles/10268196)
```
add rd, rs1, rs2
```
:::
##### Waveform

#### ExecuteTest
This test verifies whether the [``Execute``](https://github.com/GliAmanti/ComputerArchitecture_HW3/tree/main/src/main/scala/riscv/core) module makes right decision regarding branch instruction.
:::danger
:warning:
**Refrain from copying and pasting your solution directly into the HackMD note**. Instead, provide a concise summary of the various test cases, outlining the aspects of the CPU they evaluate, the techniques employed for loading test program instructions, and the outcomes of these test cases.
:::
There are many cases but this test only verifies ``add`` and ``beq``.
#### 1. add
* ``ALU`` adds ``op1`` and ``op2``, according to the ``funct``.
* In this test, ``op1`` is set to ``reg1_data``, ``op2`` is set to ``reg2_data``. Both of them are random integers.
* ``jump_flag_id`` will be set to ``false``.
#### 2. beq
* ``Branch Comp.`` compares the contents of ``rs1`` and ``rs2``.
``ALU`` computes jump address, so:
``aluop1_source := ALUOp1Source.InstructionAddress``. That is, choosing gate ``1``.
``aluop2_source := ALUOp2Source.Immediate``. That is, choosing gate ``1``.
:::success
[BEQ](https://ithelp.ithome.com.tw/articles/10268196)
```
beq rs1, rs2, simm13
```
:::
* ##### Equal
* If ``reg1_data == reg2_data``, ``jump_flag_id`` will be set to ``true``.
Then ``jump_address := immediate + instruction_address``.
* In this test, ``jump_address := 2 + 2``
* :chart_with_upwards_trend: When ``reg1_data = 9`` and ``reg2_data = 9``, the equal condition is satisfied. So ``jump_flag`` is trigger.
* ##### Not equal
* If ``reg1_data != reg2_data``, ``jump_flag_id`` will be set to ``false``.
Then ``jump_address := immediate + instruction_address``.
* In this test, ``jump_address := 2 + 2``
* :chart_with_upwards_trend: When ``reg1_data = 17FE18F6`` and ``reg2_data = 15D9D5DD``, the equal condition is not satisfied. So ``jump_flag`` is not trigger.
##### Waveform

#### CPUTest
This test verifies whether all components in **MyCPU** function properly.
:::danger
:warning:
**Refrain from copying and pasting your solution directly into the HackMD note**. Instead, provide a concise summary of the various test cases, outlining the aspects of the CPU they evaluate, the techniques employed for loading test program instructions, and the outcomes of these test cases.
:::
There are 3 test cases in CPUTest:
#### 1. FibonacciTest
* This class reads the file ``fibonacci.asmbin``, which calculates Fibonacci(10).
* In this test, ``mem_debug_read_address`` will be set to ``4``, then ``mem_debug_read_data`` will get ``55``.
#### 2. QuicksortTest
* This class reads the file ``quicksort.asmbin``, which performs a Quick Sort on 10 numbers.
* In this test, ``mem_debug_read_address`` will be set to ``4 * i``, then ``mem_debug_read_data`` will get ``i - 1``.
#### 3. ByteAccessTest
* This class reads the file ``sb.asmbin``, which stores and loads a single byte.
* In test case 1, ``regs_debug_read_address`` will be set to ``5``, then ``regs_debug_read_data`` will get ``0xdeadbeef``.
* In test case 2, ``regs_debug_read_address`` will be set to ``6``, then ``regs_debug_read_data`` will get ``0xef``.
* In test case 3, ``regs_debug_read_address`` will be set to ``1``, then ``regs_debug_read_data`` will get ``0x15ef``.
:::success
:question: My guess :question:
The variable inside the function ``withclock()`` will be implicit reset, so they won't show the waveform in gtkwave, such as ``reg_debug_address`` and ``reg_debug_data``.
:::
## Adaptation of Homework2
Here is my [adaptation](https://github.com/GliAmanti/ComputerArchitecture_HW3/tree/main/csrc) of hw2.
### How to Run
To run hw2 with **MyCPU**, I do the following operations.
1. Remove the code related to ``rdcycle`` and ``rdcycleh`` in ``myHammingDist.S``.
2. Store the results to the memory address I test in ``CPUTest.scala``.
```
# base memory address to store the result
li s9, 0x4
......
# store the result for hw3
slli t1, s1, 2
add t0, s9, t1
sw a0, 0(t0)
```
3. Add the corresponding test to ``CPUTest.scala``.
::: spoiler HammingTest
```scala
class HammingTest extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("Single Cycle CPU")
it should "caculate the hamming distance between two 64-bit integers" in {
test(new TestTopModule("myHammingDist.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
for (i <- 1 to 50) {
c.clock.step(1000)
c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
}
c.io.mem_debug_read_address.poke(4.U)
c.clock.step()
c.io.mem_debug_read_data.expect(21.U)
c.io.mem_debug_read_address.poke(8.U)
c.clock.step()
c.io.mem_debug_read_data.expect(63.U)
c.io.mem_debug_read_address.poke(12.U)
c.clock.step()
c.io.mem_debug_read_data.expect(0.U)
}
}
}
```
:::
::: danger
If you test the result with **memory address**, remember to check whether ``c.clock.step(1000)`` is in your loop. Otherwise, you will get the error message.
:::
4. Add ``myHammingDist.S`` to ``csrc`` directory.
5. Do some modification to the ``Makefile``, which is in ``csrc`` directory.
::: spoiler Makefile
```cmake
CROSS_COMPILE ?= riscv-none-elf-
ASFLAGS = -march=rv32i_zicsr -mabi=ilp32
CFLAGS = -O0 -Wall -march=rv32i_zicsr -mabi=ilp32
LDFLAGS = --oformat=elf32-littleriscv
AS := $(CROSS_COMPILE)as
CC := $(CROSS_COMPILE)gcc
LD := $(CROSS_COMPILE)ld
OBJCOPY := $(CROSS_COMPILE)objcopy
%.o: %.S
$(AS) -R $(ASFLAGS) -o $@ $<
%.elf: %.S
$(AS) -R $(ASFLAGS) -o $(@:.elf=.o) $<
$(CROSS_COMPILE)ld -o $@ -T link.lds $(LDFLAGS) $(@:.elf=.o)
%.elf: %.c init.o
$(CC) $(CFLAGS) -c -o $(@:.elf=.o) $<
$(CROSS_COMPILE)ld -o $@ -T link.lds $(LDFLAGS) $(@:.elf=.o) init.o
%.asmbin: %.elf
$(OBJCOPY) -O binary -j .text -j .data $< $@
BINS = \
fibonacci.asmbin \
hello.asmbin \
mmio.asmbin \
quicksort.asmbin \
sb.asmbin \
+ myHammingDist.asmbin
# Clear the .DEFAULT_GOAL special variable, so that the following turns
# to the first target after .DEFAULT_GOAL is not set.
.DEFAULT_GOAL :=
all: $(BINS)
update: $(BINS)
cp -f $(BINS) ../src/main/resources
clean:
$(RM) *.o *.elf *.asmbin
```
:::
6. Generate ``myHammingDist.asmbin``.
```
make update
```
or
```
make clean
make
```
7. Run the test.
```
sbt "testOnly riscv.singlecycle.HammingTest"
```
### [How to Analyze](https://hackmd.io/@sysprog/r1mlr3I7p#Waveform)
1. After the first run and every time you modify the Chisel code, you need to execute the following command in the project’s root directory to generate Verilog files.
```
make verilator
```
2. Load ``myHammingDist.asmbin`` for simulating 2000 cycles, saving the simulation waveform to the ``myHammingDist.vcd``.
```
./run-verilator.sh -instruction src/main/resources/myHammingDist.asmbin -time 1000 -vcd myHammingDist.vcd
```
::: info
The output message will be:
```
-time 1000
-memory 1048576
-instruction src/main/resources/myHammingDist.asmbin
[-------------------->] 100%
```
:::
3. Use GTKWave to view the output waveform file ``myHammingDist.vcd``
```
gtkwave myHammingDist.vcd
```
### Description of Waveform
I take the instruction ``lw a0, 0(s2)`` for example and analyze how **MyCPU** operates the instruction in different stages.
#### Instruction Fetch

* Instruction ``00092503`` is ``lw a0, 0(s2)``. (Line 52 in my code.)
* It is neither ``SB-type`` nor ``UJ-type`` instruction, so:
* ``jump_flag_id = 0``,
* ``PC = PC + 4``.
::: success
:mag: Please check [here](https://luplab.gitlab.io/rvcodecjs/) for instruction conversion.
:::
#### Instruction Decode

* ``lw`` will load data from the memory address ``rs1 + imm`` to ``rd``, so:
* ``memory_read_enable = 1``,
* ``memory_write_enable = 0``
* ``reg_write_address = A`` (``a0`` = ``x10``)
* ``reg_write_enable = 1``
* ``rs1 = 12``(``s2`` = ``x18``)
* ``rd = A``
* ``immediate = 0``
#### Execute

* According to [``ALUControl``](https://github.com/GliAmanti/ComputerArchitecture_HW3/blob/main/src/main/scala/riscv/core/ALUControl.scala) and [``ALU``](https://github.com/GliAmanti/ComputerArchitecture_HW3/blob/main/src/main/scala/riscv/core/ALU.scala), ``alu_funct = ALUFunctions.add``, namely ``1``.
```scala
is(InstructionTypes.L) {
io.alu_funct := ALUFunctions.add
}
```
```scala
object ALUFunctions extends ChiselEnum {
val zero, add, sub, sll, slt, xor, or, and, srl, sra, sltu = Value
}
```
* ``aluop1_source = 0``, so ``op1 = reg1_data``.
* ``aluop1_source = 1``, so ``op2 = immediate``.
* ``ALU`` adds ``op1 = FFFFFFF4`` and ``op2 = 00000000``, getting the ``result = FFFFFFF4``.
#### Memory Access

* ``memory_read_enable = 1``, so ``Memory`` will output ``read_data = 00000000`` , which is corresponding to the ``memory address = FFFFFFF4``.
#### Write Back

* According to [``WriteBack``](https://github.com/GliAmanti/ComputerArchitecture_HW3/blob/main/src/main/scala/riscv/core/WriteBack.scala) and [``InstructionDecode``](https://github.com/GliAmanti/ComputerArchitecture_HW3/blob/main/src/main/scala/riscv/core/InstructionDecode.scala), ``regs_write_source = 1``, so ``regs_write_data = memory_read_data``, namely ``00000000``.
```scala
io.regs_write_data := MuxLookup(
io.regs_write_source,
io.alu_result,
IndexedSeq(
RegWriteSource.Memory -> io.memory_read_data,
RegWriteSource.NextInstructionAddress -> (io.instruction_address + 4.U)
)
)
```
```scala
object RegWriteSource {
val ALUResult = 0.U(2.W)
val Memory = 1.U(2.W)
val NextInstructionAddress = 3.U(2.W)
}
```
* The result will be written back to ``write_address = A``.