# Assignment3: single-cycle RISC-V CPU
contributed by < [`CSIE523`](https://github.com/CSIE523/ca2023-lab3) >
## Complete Lab3: Construct a single-cycle RISC-V CPU with Chisel
### InstructionFetch
InstructionFetchTest.scala
```scala
class InstructionFetchTest extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("InstructionFetch of Single Cycle CPU")
it should "fetch instruction" in {
test(new InstructionFetch).withAnnotations(TestAnnotations.annos) { c =>
val entry = 0x1000
var pre = entry
var cur = pre
c.io.instruction_valid.poke(true.B)
var x = 0
for (x <- 0 to 100) {
Random.nextInt(2) match {
case 0 => // no jump
cur = pre + 4
c.io.jump_flag_id.poke(false.B)
c.clock.step()
c.io.instruction_address.expect(cur)
pre = pre + 4
case 1 => // jump
c.io.jump_flag_id.poke(true.B)
c.io. id.poke(entry)
c.clock.step()
c.io.instruction_address.expect(entry)
pre = entry
}
}
}
}
}
```
From analyzing InstructionFetchTest file, the CPU instruction address starts from 0x1000. It iterates 100 times, and in each iteration, it generates a value of 0 or 1 for 'jump_flag_id' to determine whether to jump to a specified address. If not, simply add 4 to the program counter.
For example:
if jump_flag_id == 0:
The previous instruction address is 0x1000. Due to the jump_flag_id is 0, the program counter adds 4 to get the next instruction address. Therefore, the current is 0x1004.

if jump_flag_id == 1:
Although the previous instruction address is 0x1004, the jump_flag_id is 1 in next cycle. The program counter is directly assigned the specific jump_address entry, which is named entry and has a value of 0x1000.

### InstructionDecode
InstructionDecoderTest.scala
```scala
class InstructionDecoderTest extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("InstructionDecoder of Single Cycle CPU")
it should "produce correct control signal" in {
test(new InstructionDecode).withAnnotations(TestAnnotations.annos) { c =>
c.io.instruction.poke(0x00a02223L.U) // S-type
c.io.ex_aluop1_source.expect(ALUOp1Source.Register)
c.io.ex_aluop2_source.expect(ALUOp2Source.Immediate)
c.io.regs_reg1_read_address.expect(0.U)
c.io.regs_reg2_read_address.expect(10.U)
c.clock.step()
c.io.instruction.poke(0x000022b7L.U) // lui
c.io.regs_reg1_read_address.expect(0.U)
c.io.ex_aluop1_source.expect(ALUOp1Source.Register)
c.io.ex_aluop2_source.expect(ALUOp2Source.Immediate)
c.clock.step()
c.io.instruction.poke(0x002081b3L.U) // add
c.io.ex_aluop1_source.expect(ALUOp1Source.Register)
c.io.ex_aluop2_source.expect(ALUOp2Source.Register)
c.clock.step()
}
}
}
```
In the Decode stage, there are three test cases, which are 0x00a02223, 0x000022b7L and 0x002081b3L. Then I utilize [Online RISC-V Decoder tools](https://luplab.gitlab.io/rvcodecjs/) to help us analyze the corresponding instructions.
First of all, it uses specific bits of instruction from instrcution fetch stage to find the instruction type.
#### `0x00a02223`: `sw x10, 4(x0)`

From waveform, because the opcode is 0100011, it's S-type. The funct3 is 010, so it's sw.

regs_reg1_read_address: rs1 => 00000
regs_reg2_read_address: rs2 => 01010
aluop1_source: ALUOp1Source.Register
aluop2_source: ALUOp2Source.Immediate
ex_immediate: Cat(Fill(21, io.instruction(31)), io.instruction(30, 25), io.instruction(11, 7)) => bits extension according to 31th bit + bit 30~25 + bit 11~7 and the value is 0x00000004
S-type will write to memory, so the memory_write_enable needs to be 1.
#### `0x000022b7`: `lui x5, 2`

From waveform, because the opcode is 0110111, it's lui.

regs_reg1_read_address: 0.U(Parameters.PhysicalRegisterAddrWidth)
aluop1_source: ALUOp1Source.Register
aluop2_source: ALUOp2Source.Immediate
reg_write_address: rd => 00101
ex_immediate: Cat(io.instruction(31, 12), 0.U(12.W)) => bit 31~12 + 12 zeros and the value is 0x00002000
lui will write to register, so the reg_write_enable needs to be 1.
#### `0x002081b3`: `add x3, x1, x2`

From waveform, because the opcode is 0110011, it's RM-type. The funct3 is 000 and the funct7 is 0000000, so it's add.

regs_reg1_read_address: rs1 => 00001
regs_reg2_read_address: rs2 => 00010
aluop1_source: ALUOp1Source.Register
aluop2_source: ALUOp2Source.Register
reg_write_address: rd => 00011
lui will write to register, so the reg_write_enable needs to be 1.
### Execute
ExecuteTest.scala
```scala
class ExecuteTest extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("Execution of Single Cycle CPU")
it should "execute correctly" in {
test(new Execute).withAnnotations(TestAnnotations.annos) { c =>
c.io.instruction.poke(0x001101b3L.U) // x3 = x2 + x1
var x = 0
for (x <- 0 to 100) {
val op1 = scala.util.Random.nextInt(429496729)
val op2 = scala.util.Random.nextInt(429496729)
val result = op1 + op2
val addr = scala.util.Random.nextInt(32)
c.io.reg1_data.poke(op1.U)
c.io.reg2_data.poke(op2.U)
c.clock.step()
c.io.mem_alu_result.expect(result.U)
c.io.if_jump_flag.expect(0.U)
}
// beq test
c.io.instruction.poke(0x00208163L.U) // pc + 2 if x1 === x2
c.io.instruction_address.poke(2.U)
c.io.immediate.poke(2.U)
c.io.aluop1_source.poke(1.U)
c.io.aluop2_source.poke(1.U)
c.clock.step()
// equ
c.io.reg1_data.poke(9.U)
c.io.reg2_data.poke(9.U)
c.clock.step()
c.io.if_jump_flag.expect(1.U)
c.io.if_jump_address.expect(4.U)
// not equ
c.io.reg1_data.poke(9.U)
c.io.reg2_data.poke(19.U)
c.clock.step()
c.io.if_jump_flag.expect(0.U)
c.io.if_jump_address.expect(4.U)
}
}
}
```
In the Execute stage, there are two different instruction type testcases, R-type and B-type.

For R-type testcase, it's `0x00208163` and the corresponding assembly is `add x3, x2, x1`. The testbench tests 100 times addition and due to no branch, the if_jump_flag is 0.

For B-type, it's `0x00208163` and the corresponding assembly is `beq x1, x2, 2`. If x1 and x2 are equal, the program counter will add 2.
The CPU needs to compare the data in two register x1 and x2, so the aluop1_source and aluop2_source should be 1. It results in alu.op1 and alu.op2 select reg1_data and reg2_data instead of instruction_address and immediate. The values in register x1 and x2 are the same. The program counter jumps to `4` and the if_jump_flag is 1. On the other hand, if x1 and x2 are not equal. The program counter stays at `4` and the if_jump_flag is 0.
After filling the blanks, I entered`sbt test` in the command line to verify my answer
```
$sbt test
```
and got the result.
```
$ sbt test
[info] welcome to sbt 1.9.7 (Temurin Java 1.8.0_392)
[info] loading settings for project ca2023-lab3-build from plugins.sbt ...
[info] loading project definition from /home/dejuva9487/Desktop/2023_Computer_Architecture_FALL/hw3/ca2023-lab3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/home/dejuva9487/Desktop/2023_Computer_Architecture_FALL/hw3/ca2023-lab3/)
[info] compiling 1 Scala source to /home/dejuva9487/Desktop/2023_Computer_Architecture_FALL/hw3/ca2023-lab3/target/scala-2.13/test-classes ...
[info] ExecuteTest:
[info] Execution of Single Cycle CPU
[info] - should execute correctly
[info] ByteAccessTest:
[info] Single Cycle CPU
[info] - should store and load a single byte
[info] InstructionDecoderTest:
[info] InstructionDecoder of Single Cycle CPU
[info] - should produce correct control signal
[info] InstructionFetchTest:
[info] InstructionFetch of Single Cycle CPU
[info] - should fetch instruction
[info] RegisterFileTest:
[info] Register File of Single Cycle CPU
[info] - should read the written content
[info] - should x0 always be zero
[info] - should read the writing content
[info] FibonacciTest:
[info] Single Cycle CPU
[info] - should recursively calculate Fibonacci(10)
[info] QuicksortTest:
[info] Single Cycle CPU
[info] - should perform a quicksort on 10 numbers
[info] Run completed in 12 seconds, 744 milliseconds.
[info] Total number of tests run: 9
[info] Suites: completed 7, aborted 0
[info] Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
## Modify the handwritten RISC-V assembly code in Homework2
In order to fit Homework2 in `single-cycle RISC-V CPU with Chisel`, I modified related code as well as makefile for translating my `binarization.c` into `binarization.asmbin`. I put the binarized values in the corresponding address, so that they can be used by the `CPUTest` to verify the answer.
```c
#include <stdint.h>
#include <stdio.h>
uint16_t count_leading_zeros_16(uint16_t x)
{
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x -= ((x >> 1) & 0x5555);
x = ((x >> 2) & 0x3333) + (x & 0x3333);
x = ((x >> 4) + x) & 0x0f0f;
x += (x >> 8);
return (16 - (x & 0x1f)); // change 0x3f to 0x1f
}
static int binarization(uint16_t *arr, uint16_t threshold, int i){
uint16_t sub = threshold - *(arr+i);
uint16_t clz = count_leading_zeros_16(sub);
return (clz) ? 0 : 255;
}
int main(){
// pixel test
// 8-bit color depth for black and white photo
uint16_t picture[5] = {0,80,127,150,231};
uint16_t threshold = 127;
uint16_t *pixel = picture;
*((volatile uint16_t *) (2)) = binarization(pixel, threshold, 0);
*((volatile uint16_t *) (4)) = binarization(pixel, threshold, 1);
*((volatile uint16_t *) (6)) = binarization(pixel, threshold, 2);
*((volatile uint16_t *) (8)) = binarization(pixel, threshold, 3);
*((volatile uint16_t *) (10)) = binarization(pixel, threshold, 4);
return 0;
}
```
The following text is inserted into `CPUTest.scala`. My homework2 is to binarize 5 values 0, 80, 127, 150, 231 with threshod 127, so the answer is 0, 0, 0, 255, 255.
```scala
class hw2Test extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("Single Cycle CPU")
it should "binarize the pixel" in {
test(new TestTopModule("binarization.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
for (i <- 1 to 500) {
c.clock.step(100)
c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
}
c.io.mem_debug_read_address.poke(2.U)
c.clock.step()
c.io.mem_debug_read_data.expect(0x0.U)
c.io.mem_debug_read_address.poke(4.U)
c.clock.step()
c.io.mem_debug_read_data.expect(0x0.U)
c.io.mem_debug_read_address.poke(6.U)
c.clock.step()
c.io.mem_debug_read_data.expect(0x0.U)
c.io.mem_debug_read_address.poke(8.U)
c.clock.step()
c.io.mem_debug_read_data.expect(0xff.U)
c.io.mem_debug_read_address.poke(10.U)
c.clock.step()
c.io.mem_debug_read_data.expect(0xff.U)
}
}
}
```
This is the successful test result of hw2Test captured from the `sbt test` command line.
```
$ sbt test
[info] welcome to sbt 1.9.7 (Temurin Java 1.8.0_392)
[info] loading settings for project ca2023-lab3-build from plugins.sbt ...
[info] loading project definition from /home/dejuva9487/Desktop/2023_Computer_Architecture_FALL/hw3/ca2023-lab3/projec
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/home/dejuva9487/Desktop/2023_Computer_Architecture_FALL/hw3/ca2023-lab3/)
...
[info] hw2Test:
[info] Single Cycle CPU
[info] - should binarize the pixel
...
[info] Run completed in 12 seconds, 349 milliseconds.
[info] Total number of tests run: 10
[info] Suites: completed 8, aborted 0
[info] Tests: succeeded 10, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
Here is the waveform.
