# Final project: Rework Homework3
contributed by <[00853029/ca2023-lab3](https://github.com/00853029/ca2023-lab3)>
## Environment and Setup
> Follow [Lab3: Construct a single-cycle RISC-V CPU with Chisel](https://hackmd.io/@sysprog/r1mlr3I7p)
Virtual Box: **ubuntu22.04.3**
Install **sdkman**
Install **Eclipse Temurin JDK 11**
```shell
$ sudo apt install build-essential verilator gtkwave
```
* Installing **sbt** from **SDKMAN**
```shell
$ sdk install java $(sdk list java | grep -o "\b8\.[0-9]*\.[0-9]*\-tem" | head -1)
$ sdk install sbt
```
* Ubuntu and other Debian-based distributions
```shell
sudo apt-get update
sudo apt-get install apt-transport-https curl gnupg -yqq
echo "deb https://repo.scala-sbt.org/scalasbt/debian all main" | sudo tee /etc/apt/sources.list.d/sbt.list
echo "deb https://repo.scala-sbt.org/scalasbt/debian /" | sudo tee /etc/apt/sources.list.d/sbt_old.list
curl -sL "https://keyserver.ubuntu.com/pks/lookup?op=get&search=0x2EE0EA64E40A89B84B2DF73499E82A75642AC823" | sudo -H gpg --no-default-keyring --keyring gnupg-ring:/etc/apt/trusted.gpg.d/scalasbt-release.gpg --import
sudo chmod 644 /etc/apt/trusted.gpg.d/scalasbt-release.gpg
sudo apt-get update
sudo apt-get install sbt
```
## Single-cycle RISC-V CPU
#### Get the repository:
```shell
$ git clone https://github.com/sysprog21/ca2023-lab3
```

### Finish MyCPU
#### 1. InstructionFetch
The original code forgot to control the PC (Program Counter).
Therefore, I added a condition to determine whether to jump, deciding whether the PC value should go to the `jump_address` or `pc + 4`(move to next instruction).
> Source: `./ca2023-lab3/src/main/scala/riscv/core/InstructionFetch.scala`
```java=
// mycpu is freely redistributable under the MIT License. See the file
// "LICENSE" for information on usage and redistribution of this file.
package riscv.core
import chisel3._
import riscv.Parameters
object ProgramCounter {
val EntryAddress = Parameters.EntryAddress
}
class InstructionFetch extends Module {
val io = IO(new Bundle {
val jump_flag_id = Input(Bool())
val jump_address_id = Input(UInt(Parameters.AddrWidth))
val instruction_read_data = Input(UInt(Parameters.DataWidth))
val instruction_valid = Input(Bool())
val instruction_address = Output(UInt(Parameters.AddrWidth))
val instruction = Output(UInt(Parameters.InstructionWidth))
})
val pc = RegInit(ProgramCounter.EntryAddress)
when(io.instruction_valid) {
io.instruction := io.instruction_read_data
// lab3(InstructionFetch) begin
when(io.jump_flag_id){
pc := io.jump_address_id
}.otherwise {
pc := pc + 0x4.U
}
// lab3(InstructionFetch) end
}.otherwise {
pc := pc
io.instruction := 0x00000013.U
}
io.instruction_address := pc
}
```
#### 2. InstructionDecode
The original code didn't determine the control signals for `memory_read_enable`(*MemRead*) and `memory_write_enable`(*MemWrite*).
Therefore, I added conditions to control these two memory control signals.(by opcode value)

> Source: `src/main/scala/riscv/core/InstructionDecode.scala`
```java=
// mycpu is freely redistributable under the MIT License. See the file
// "LICENSE" for information on usage and redistribution of this file.
package riscv.core
import scala.collection.immutable.ArraySeq
import chisel3._
import chisel3.util._
import riscv.Parameters
object InstructionTypes {
val L = "b0000011".U
val I = "b0010011".U
val S = "b0100011".U
val RM = "b0110011".U
val B = "b1100011".U
}
object Instructions {
val lui = "b0110111".U
val nop = "b0000001".U
val jal = "b1101111".U
val jalr = "b1100111".U
val auipc = "b0010111".U
val csr = "b1110011".U
val fence = "b0001111".U
}
object InstructionsTypeL {
val lb = "b000".U
val lh = "b001".U
val lw = "b010".U
val lbu = "b100".U
val lhu = "b101".U
}
object InstructionsTypeI {
val addi = 0.U
val slli = 1.U
val slti = 2.U
val sltiu = 3.U
val xori = 4.U
val sri = 5.U
val ori = 6.U
val andi = 7.U
}
object InstructionsTypeS {
val sb = "b000".U
val sh = "b001".U
val sw = "b010".U
}
InstructionsTypeR {
val add_sub = 0.U
val sll = 1.U
val slt = 2.U
val sltu = 3.U
val xor = 4.U
val sr = 5.U
val or = 6.U
val and = 7.U
}
object InstructionsTypeM {
val mul = 0.U
val mulh = 1.U
val mulhsu = 2.U
val mulhum = 3.U
val div = 4.U
val divu = 5.U
val rem = 6.U
val remu = 7.U
}
object InstructionsTypeB {
val beq = "b000".U
val bne = "b001".U
val blt = "b100".U
val bge = "b101".U
val bltu = "b110".U
val bgeu = "b111".U
}
object InstructionsTypeCSR {
val csrrw = "b001".U
val csrrs = "b010".U
val csrrc = "b011".U
val csrrwi = "b101".U
val csrrsi = "b110".U
val csrrci = "b111".U
}
object InstructionsNop {
val nop = 0x00000013L.U(Parameters.DataWidth)
}
object InstructionsRet {
val mret = 0x30200073L.U(Parameters.DataWidth)
val ret = 0x00008067L.U(Parameters.DataWidth)
}
object InstructionsEnv {
val ecall = 0x00000073L.U(Parameters.DataWidth)
val ebreak = 0x00100073L.U(Parameters.DataWidth)
}
object ALUOp1Source {
val Register = 0.U(1.W)
val InstructionAddress = 1.U(1.W)
}
object ALUOp2Source {
val Register = 0.U(1.W)
val Immediate = 1.U(1.W)
}
object RegWriteSource {
val ALUResult = 0.U(2.W)
val Memory = 1.U(2.W)
// val CSR = 2.U(2.W)
val NextInstructionAddress = 3.U(2.W)
}
class InstructionDecode extends Module {
val io = IO(new Bundle {
val instruction = Input(UInt(Parameters.InstructionWidth))
val regs_reg1_read_address = Output(UInt(Parameters.PhysicalRegisterAddrWidth))
val regs_reg2_read_address = Output(UInt(Parameters.PhysicalRegisterAddrWidth))
val ex_immediate = Output(UInt(Parameters.DataWidth))
val ex_aluop1_source = Output(UInt(1.W))
val ex_aluop2_source = Output(UInt(1.W))
val memory_read_enable = Output(Bool())
val memory_write_enable = Output(Bool())
val wb_reg_write_source = Output(UInt(2.W))
val reg_write_enable = Output(Bool())
val reg_write_address = Output(UInt(Parameters.PhysicalRegisterAddrWidth))
})
val opcode = io.instruction(6, 0)
val funct3 = io.instruction(14, 12)
val funct7 = io.instruction(31, 25)
val rd = io.instruction(11, 7)
val rs1 = io.instruction(19, 15)
val rs2 = io.instruction(24, 20)
io.regs_reg1_read_address := Mux(opcode === Instructions.lui, 0.U(Parameters.PhysicalRegisterAddrWidth), rs1)
io.regs_reg2_read_address := rs2
val immediate = MuxLookup(
opcode,
Cat(Fill(20, io.instruction(31)), io.instruction(31, 20)),
IndexedSeq(
InstructionTypes.I -> Cat(Fill(21, io.instruction(31)), io.instruction(30, 20)),
InstructionTypes.L -> Cat(Fill(21, io.instruction(31)), io.instruction(30, 20)),
Instructions.jalr -> Cat(Fill(21, io.instruction(31)), io.instruction(30, 20)),
InstructionTypes.S -> Cat(Fill(21, io.instruction(31)), io.instruction(30, 25), io.instruction(11, 7)),
InstructionTypes.B -> Cat(
Fill(20, io.instruction(31)),
io.instruction(7),
io.instruction(30, 25),
io.instruction(11, 8),
0.U(1.W)
),
Instructions.lui -> Cat(io.instruction(31, 12), 0.U(12.W)),
Instructions.auipc -> Cat(io.instruction(31, 12), 0.U(12.W)),
// jal's imm represents a multiple of 2 bytes.
Instructions.jal -> Cat(
Fill(12, io.instruction(31)),
io.instruction(19, 12),
io.instruction(20),
io.instruction(30, 21),
0.U(1.W)
)
)
)
io.ex_immediate := immediate
io.ex_aluop1_source := Mux(
opcode === Instructions.auipc || opcode === InstructionTypes.B || opcode === Instructions.jal,
ALUOp1Source.InstructionAddress,
ALUOp1Source.Register
)
// ALU op2 from reg: R-type,
// ALU op2 from imm: L-Type (I-type subtype),
// I-type (nop=addi, jalr, csr-class, fence),
// J-type (jal),
// U-type (lui, auipc),
// S-type (rs2 value sent to MemControl, ALU computes rs1 + imm.)
// B-type (rs2 compares with rs1 in jump judge unit, ALU computes jump address PC+imm.)
io.ex_aluop2_source := Mux(
opcode === InstructionTypes.RM,
ALUOp2Source.Register,
ALUOp2Source.Immediate
)
// lab3(InstructionDecode) begin
io.memory_read_enable := Mux(
opcode === InstructionTypes.L,
1.U(1.W),
0.U(1.W)
)
io.memory_write_enable := Mux(
opcode === InstructionTypes.S,
1.U(1.W),
0.U(1.W)
)
// lab3(InstructionDecode) end
io.wb_reg_write_source := MuxCase(
RegWriteSource.ALUResult,
ArraySeq(
(opcode === InstructionTypes.RM || opcode === InstructionTypes.I ||
opcode === Instructions.lui || opcode === Instructions.auipc) -> RegWriteSource.ALUResult, // same as default
(opcode === InstructionTypes.L) -> RegWriteSource.Memory,
(opcode === Instructions.jal || opcode === Instructions.jalr) -> RegWriteSource.NextInstructionAddress
)
)
io.reg_write_enable := (opcode === InstructionTypes.RM) || (opcode === InstructionTypes.I) ||
(opcode === InstructionTypes.L) || (opcode === Instructions.auipc) || (opcode === Instructions.lui) ||
(opcode === Instructions.jal) || (opcode === Instructions.jalr)
io.reg_write_address := rd
}
```
#### 3. Execute
The original code didn't pass signal lines into the ALU for computation.
Therefore, I included the `op1`, `op2` resources, and `func` control signal into the ALU module. Then, the `alu.io.result` will be connected to the output in the following code.
> Source: `src/main/scala/riscv/core/Execute.scala`
```java=
// mycpu is freely redistributable under the MIT License. See the file
// "LICENSE" for information on usage and redistribution of this file.
package riscv.core
import chisel3._
import chisel3.util.Cat
import chisel3.util.MuxLookup
import riscv.Parameters
class Execute extends Module {
val io = IO(new Bundle {
val instruction = Input(UInt(Parameters.InstructionWidth))
val instruction_address = Input(UInt(Parameters.AddrWidth))
val reg1_data = Input(UInt(Parameters.DataWidth))
val reg2_data = Input(UInt(Parameters.DataWidth))
val immediate = Input(UInt(Parameters.DataWidth))
val aluop1_source = Input(UInt(1.W))
val aluop2_source = Input(UInt(1.W))
val mem_alu_result = Output(UInt(Parameters.DataWidth))
val if_jump_flag = Output(Bool())
val if_jump_address = Output(UInt(Parameters.DataWidth))
})
val opcode = io.instruction(6, 0)
val funct3 = io.instruction(14, 12)
val funct7 = io.instruction(31, 25)
val rd = io.instruction(11, 7)
val uimm = io.instruction(19, 15)
val alu = Module(new ALU)
val alu_ctrl = Module(new ALUControl)
alu_ctrl.io.opcode := opcode
alu_ctrl.io.funct3 := funct3
alu_ctrl.io.funct7 := funct7
// lab3(Execute) begin
alu.io.func := alu_ctrl.io.alu_funct
alu.io.op1 := Mux(io.aluop1_source.asBool, io.instruction_address, io.reg1_data)
alu.io.op2 := Mux(io.aluop2_source.asBool, io.immediate, io.reg2_data)
// lab3(Execute) end
io.mem_alu_result := alu.io.result
io.if_jump_flag := opcode === Instructions.jal ||
(opcode === Instructions.jalr) ||
(opcode === InstructionTypes.B) && MuxLookup(
funct3,
false.B,
IndexedSeq(
InstructionsTypeB.beq -> (io.reg1_data === io.reg2_data),
InstructionsTypeB.bne -> (io.reg1_data =/= io.reg2_data),
InstructionsTypeB.blt -> (io.reg1_data.asSInt < io.reg2_data.asSInt),
InstructionsTypeB.bge -> (io.reg1_data.asSInt >= io.reg2_data.asSInt),
InstructionsTypeB.bltu -> (io.reg1_data.asUInt < io.reg2_data.asUInt),
InstructionsTypeB.bgeu -> (io.reg1_data.asUInt >= io.reg2_data.asUInt)
)
)
io.if_jump_address := io.immediate + Mux(opcode === Instructions.jalr, io.reg1_data, io.instruction_address)
}
```
#### 4. CPU
I added connections between the id module and exe module that were previously unconnected in the original code.
> Source: `src/main/scala/riscv/core/CPU.scala`
```java=
// mycpu is freely redistributable under the MIT License. See the file
// "LICENSE" for information on usage and redistribution of this file.
package riscv.core
import chisel3._
import chisel3.util.Cat
import riscv.CPUBundle
import riscv.Parameters
class CPU extends Module {
val io = IO(new CPUBundle)
val regs = Module(new RegisterFile)
val inst_fetch = Module(new InstructionFetch)
val id = Module(new InstructionDecode)
val ex = Module(new Execute)
val mem = Module(new MemoryAccess)
val wb = Module(new WriteBack)
io.deviceSelect := mem.io.memory_bundle
.address(Parameters.AddrBits - 1, Parameters.AddrBits - Parameters.SlaveDeviceCountBits)
inst_fetch.io.jump_address_id := ex.io.if_jump_address
inst_fetch.io.jump_flag_id := ex.io.if_jump_flag
inst_fetch.io.instruction_valid := io.instruction_valid
inst_fetch.io.instruction_read_data := io.instruction
io.instruction_address := inst_fetch.io.instruction_address
regs.io.write_enable := id.io.reg_write_enable
regs.io.write_address := id.io.reg_write_address
regs.io.write_data := wb.io.regs_write_data
regs.io.read_address1 := id.io.regs_reg1_read_address
regs.io.read_address2 := id.io.regs_reg2_read_address
regs.io.debug_read_address := io.debug_read_address
io.debug_read_data := regs.io.debug_read_data
id.io.instruction := inst_fetch.io.instruction
// lab3(cpu) begin
ex.io.instruction := id.io.instruction
ex.io.instruction_address := inst_fetch.io.instruction_address
ex.io.reg1_data := regs.io.read_data1
ex.io.reg2_data := regs.io.read_data2
ex.io.immediate := id.io.ex_immediate
ex.io.aluop1_source := id.io.ex_aluop1_source
ex.io.aluop2_source := id.io.ex_aluop2_source
// lab3(cpu) end
mem.io.alu_result := ex.io.mem_alu_result
mem.io.reg2_data := regs.io.read_data2
mem.io.memory_read_enable := id.io.memory_read_enable
mem.io.memory_write_enable := id.io.memory_write_enable
mem.io.funct3 := inst_fetch.io.instruction(14, 12)
io.memory_bundle.address := Cat(
0.U(Parameters.SlaveDeviceCountBits.W),
mem.io.memory_bundle.address(Parameters.AddrBits - 1 - Parameters.SlaveDeviceCountBits, 0)
)
io.memory_bundle.write_enable := mem.io.memory_bundle.write_enable
io.memory_bundle.write_data := mem.io.memory_bundle.write_data
io.memory_bundle.write_strobe := mem.io.memory_bundle.write_strobe
mem.io.memory_bundle.read_data := io.memory_bundle.read_data
wb.io.instruction_address := inst_fetch.io.instruction_address
wb.io.alu_result := ex.io.mem_alu_result
wb.io.memory_read_data := mem.io.wb_memory_read_data
wb.io.regs_write_source := id.io.wb_reg_write_source
}
```
## MyCPU testing
```shell=
$ make test
sbt test
[info] welcome to sbt 1.9.7 (Temurin Java 1.8.0_392)
[info] loading settings for project ca2023-lab3-build from plugins.sbt ...
[info] loading project definition from /home/vboxuser/chisel-tutorial/ca2023-lab3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/home/vboxuser/chisel-tutorial/ca2023-lab3/)
[info] compiling 1 Scala source to /home/vboxuser/chisel-tutorial/ca2023-lab3/target/scala-2.13/test-classes ...
[info] done compiling
[info] FibonacciTest:
[info] Single Cycle CPU
[info] - should recursively calculate Fibonacci(10)
[info] RegisterFileTest:
[info] Register File of Single Cycle CPU
[info] - should read the written content
[info] - should x0 always be zero
[info] - should read the writing content
[info] QuicksortTest:
[info] Single Cycle CPU
[info] - should perform a quicksort on 10 numbers
[info] ExecuteTest:
[info] Execution of Single Cycle CPU
[info] - should execute correctly
[info] ByteAccessTest:
[info] Single Cycle CPU
[info] - should store and load a single byte
[info] InstructionDecoderTest:
[info] InstructionDecoder of Single Cycle CPU
[info] - should produce correct control signal
[info] InstructionFetchTest:
[info] InstructionFetch of Single Cycle CPU
[info] - should fetch instruction
[info] Run completed in 17 seconds, 631 milliseconds.
[info] Total number of tests run: 9
[info] Suites: completed 7, aborted 0
[info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 23 s, completed Jan 12, 2024 10:04:35 PM
```
### Modify assembly code
[Assembly code](https://github.com/00853029/ca2023-lab3/blob/main/csrc/clz.S)
- To check the correctness of the answers in HW2's assembly code involves confirming the accuracy of the values stored in the two specified registers.
```=
li s2, 0 #s2: high 32 of number
li s3, 0 #s3: low 32 of number
.
.
.
add s2, s2, s6
mv s3, s7
```
### Modify Makefile
Add `main.asmbin \` to `BINS`
```=
BINS = \
main.asmbin \
fibonacci.asmbin \
hello.asmbin \
mmio.asmbin \
quicksort.asmbin \
sb.asmbin
```
### Modify CPUTest.scala
- Add the following code in `CPUTest.scala`:
```java=116
class main extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("Single Cycle CPU")
it should "answer is s2 = 0x1234540a , s3 = 0x8f5c3d98." in {
test(new TestTopModule("main.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
for (i <- 1 to 1000) {
c.clock.step()
c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
}
c.io.regs_debug_read_address.poke(18.U)
c.io.regs_debug_read_data.expect(0x1234540aL.U)
c.io.regs_debug_read_address.poke(19.U)
c.io.regs_debug_read_data.expect(0x8f5c3d98L.U)
}
}
}
```
### Testing previous assembly by MyCPU:
- **[info] Run completed in 19 seconds, 659 milliseconds.
[info] Total number of tests run: 10
[info] Suites: completed 8, aborted 0
[info] Tests: succeeded 10, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.**
- more details :
```=
sbt test
[info] welcome to sbt 1.9.7 (Temurin Java 1.8.0_392)
[info] loading settings for project ca2023-lab3-build from plugins.sbt ...
[info] loading project definition from /home/vboxuser/chisel-tutorial/ca2023-lab3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/home/vboxuser/chisel-tutorial/ca2023-lab3/)
[info] compiling 1 Scala source to /home/vboxuser/chisel-tutorial/ca2023-lab3/target/scala-2.13/test-classes ...
[info] done compiling
[info] InstructionFetchTest:
[info] InstructionFetch of Single Cycle CPU
[info] - should fetch instruction
[info] QuicksortTest:
[info] Single Cycle CPU
[info] - should perform a quicksort on 10 numbers
[info] ExecuteTest:
[info] Execution of Single Cycle CPU
[info] - should execute correctly
[info] InstructionDecoderTest:
[info] InstructionDecoder of Single Cycle CPU
[info] - should produce correct control signal
[info] main:
[info] Single Cycle CPU
[info] - should answer is s2 = 0x1234540a , s3 = 0x8f5c3d98.
[info] RegisterFileTest:
[info] Register File of Single Cycle CPU
[info] - should read the written content
[info] - should x0 always be zero
[info] - should read the writing content
[info] ByteAccessTest:
[info] Single Cycle CPU
[info] - should store and load a single byte
[info] FibonacciTest:
[info] Single Cycle CPU
[info] - should recursively calculate Fibonacci(10)
[info] Run completed in 19 seconds, 659 milliseconds.
[info] Total number of tests run: 10
[info] Suites: completed 8, aborted 0
[info] Tests: succeeded 10, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 24 s, completed Jan 12, 2024 10:56:09 PM
```
:::warning
No usage of Verilator nor extended requirements.
:::