owned this note
owned this note
Published
Linked with GitHub
# Assignment3: single-cycle RISC-V CPU
contributed by < [`fan1071221`](https://github.com/fan1071221/ca2023-lab3) >
## **Engage with the Chisel Tutorial**
### **Describe the operation of Hello World in Chisel**
```scala
class Hello extends Module {
val io = IO(new Bundle {
val led = Output(UInt(1.W))
})
val CNT_MAX = (50000000 / 2 - 1).U;
val cntReg = RegInit(0.U(32.W))
val blkReg = RegInit(0.U(1.W))
cntReg := cntReg + 1.U
when(cntReg === CNT_MAX) {
cntReg := 0.U
blkReg := ~blkReg
}
io.led := blkReg
}
```
1. **Class Declaration**: The program defines a class Hello that extends the Module class, a fundamental construct in Chisel for defining a hardware module.
2. **IO Interface**: Inside the Hello class, an IO interface is declared. This interface includes a single output, led, which is a one-bit unsigned integer (UInt(1.W)). This output will be connected to an LED on the hardware.
3. **Constants and Registers**:
* **CNT_MAX**: A constant is defined as 50000000 / 2 - 1, which is then converted to an unsigned integer. This constant is likely used to determine the blink rate of the LED.
* **cntReg**: A 32-bit register (UInt(32.W)) initialized to 0. This register is used as a counter.
* **blkReg**: A 1-bit register (UInt(1.W)) initialized to 0. This register controls the state of the LED.
4. **Counter Logic**:
* The counter cntReg is incremented by 1 on each clock cycle: cntReg := cntReg + 1.U.
* When cntReg equals CNT_MAX, two actions occur:
1. The counter is reset to 0: cntReg := 0.U.
2. The blkReg register is toggled: blkReg := ~blkReg.
:::info
The tilde (~) is a bitwise negation operator, so if blkReg was 0, it becomes 1, and vice versa.
:::
5. **LED Control**:
* The state of blkReg is assigned to io.led. This means the LED's on or off state is controlled by blkReg. When blkReg is 1, the LED is on, and when it's 0, the LED is off.
### **Enhance it by incorporating logic circuit**
```c=
class Hello extends Module {
val io = IO(new Bundle {
val led = Output(UInt(1.W))
})
val CNT_MAX = (50000000 / 2 - 1).U
val cntReg = RegInit(0.U(32.W))
cntReg := Mux(cntReg === CNT_MAX, 0.U, cntReg + 1.U)
io.led := Mux(cntReg === CNT_MAX, ~io.led, io.led)
}
```
1. **Registers and Constants**:
* Retains the CNT_MAX constant and the cntReg counter.Removes the blkReg register, simplifying hardware resource usage.
2. **Combined Counter**:
* Utilizes the Mux function to directly implement the counter reset logic within the assignment expression of cntReg.
3. **LED Control**:
* The state of the LED is directly determined in the assignment to io.led based on the value of cntReg.
## **Construct a single-cycle RISC-V CPU with Chisel**
### **Running Unit Tests**
Run the unit tests:
```shell
$ sbt test
```
Output:
```shell
[info] welcome to sbt 1.9.7 (Ubuntu Java 11.0.20.1)
[info] loading settings for project ca2023-lab3-build from plugins.sbt ...
[info] loading project definition from /home/hongwei/ca2023-lab3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/home/hongwei/ca2023-lab3/)
[info] InstructionFetchTest:
[info] InstructionFetch of Single Cycle CPU
[info] - should fetch instruction
[info] InstructionDecoderTest:
[info] InstructionDecoder of Single Cycle CPU
[info] - should produce correct control signal
[info] ExecuteTest:
[info] Execution of Single Cycle CPU
[info] - should execute correctly
[info] ByteAccessTest:
[info] Single Cycle CPU
[info] - should store and load a single byte
[info] QuicksortTest:
[info] Single Cycle CPU
[info] - should perform a quicksort on 10 numbers
[info] FibonacciTest:
[info] Single Cycle CPU
[info] - should recursively calculate Fibonacci(10)
[info] hw2Test:
[info] Single Cycle CPU
[info] - should Perform hw2
[info] RegisterFileTest:
[info] Register File of Single Cycle CPU
[info] - should read the written content
[info] - should x0 always be zero
[info] - should read the writing content
[info] Run completed in 21 seconds, 748 milliseconds.
[info] Total number of tests run: 10
[info] Suites: completed 8, aborted 0
[info] Tests: succeeded 10, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 23 s, completed Nov 30, 2023, 5:25:58 PM
```
### **Descriptions of each unit test**:
#### **InstructionFetchTest**
##### **Purpose of testing the "Instruction Fetch" phase**:
* The processor can correctly fetch instructions from memory.
* The program counter is correctly updated, whether for sequential execution or due to a jump instruction changing the flow of execution.
* Jump and branch instructions are properly handled, ensuring that the program flow executes as expected.
##### **InstructionFetchTest.scala**
:::danger
:warning: **Refrain from copying and pasting your solution directly into the HackMD note**. Instead, provide a concise summary of the various test cases, outlining the aspects of the CPU they evaluate, the techniques employed for loading test program instructions, and the outcomes of these test cases.
:::
##### **Test Implementation**
* It sets up a scenario to test instruction fetching with and without jump conditions.
* The entry variable is the initial address, and pre and cur are used to track the previous and current instruction addresses, respectively.
* The loop (for (x <- 0 to 100)) simulates 100 instruction fetch cycles.
* Within each cycle, it randomly decides whether to perform a jump or not (using Random.nextInt(2)).
* If there's no jump (case 0), it increments the address by 4 (standard for RISC-V as each instruction is 32 bits or 4 bytes).
* If a jump is simulated (case 1), it sets the instruction address back to the entry.
* The test checks if the module's output (instruction_address) matches the expected address (either incremented or reset to entry).
##### Waveform

* When io_jump_flag_id is equal to 0, the Program Counter (PC) increments sequentially by the default instruction size, which for 32-bit instructions is 4 bytes, exemplified by the increase from 0x1008 to 0x100C. However, when io_jump_flag_id equals 1, it indicates that a jump has occurred, causing the PC not to just increment simply, but to leap to an entirely new address. In the example provided, the PC jumps from 0x1014 to 0x1000.
* * *
#### InstructionDecodeTest
##### Purpose of testing the "Instruction Decode" phase:
* Correct Signal Generation: The decoder generates appropriate control signals for different types of instructions (e.g., S-type, LUI, ADD).
* Accurate Register Addressing: It correctly identifies source and destination register addresses.
* Proper ALU Operation Setup: The ALU operation sources are correctly determined based on the instruction type.
##### **InstructionDecodeTest.scala**
:::danger
:warning: **Refrain from copying and pasting your solution directly into the HackMD note**. Instead, provide a concise summary of the various test cases, outlining the aspects of the CPU they evaluate, the techniques employed for loading test program instructions, and the outcomes of these test cases.
:::
##### **Test Implementation**
* It sets up a scenario to test instruction fetching with and without jump conditions.
* The entry variable is the initial address, and pre and cur are used to track the previous and current instruction addresses, respectively.
* The loop (for (x <- 0 to 100)) simulates 100 instruction fetch cycles.
* Within each cycle, it randomly decides whether to perform a jump or not (using Random.nextInt(2)).
* If there's no jump (case 0), it increments the address by 4 (standard for RISC-V as each instruction is 32 bits or 4 bytes).
* If a jump is simulated (case 1), it sets the instruction address back to the entry.
* The test checks if the module's output (instruction_address) matches the expected address (either incremented or reset to entry).
##### **Waveform**

At clock cycle 3, the following events are taking place:
* The "lui" instruction with the hexadecimal value 0x22B7 is executed, which loads the immediate value 2 into the upper 20 bits of register x5.
* It is a U-type instruction that does not require memory access, hence memory_read_enable and memory_write_enable remain inactive.
* The reg_write_enable signal is active to allow the value to be written to register x5.
* io_ex_aluop1_source is set to 0, indicating no use of the first ALU operand or it is tied to an internal value for the "lui" operation.
* The io_reg_write_address is set to 0x5, targeting register x5 for the operation.
* io_ex_aluop2_source is set to 1 to use the immediate value 2 as the second ALU operand.
* * *
### **ExecuteTest**
##### **Purpose of testing the “Execute” phase:**
* Ensure arithmetic and logic instructions are processed correctly by the ALU.
* Verify that results are properly written back to registers.
* Check immediate value instructions accurately manipulate and store results.
* Confirm that branch decisions and program counter updates occur correctly.
* Validate that all control signals are set and interpreted correctly for proper operation.
##### **ExecuteTest.scala**
:::danger
:warning: **Refrain from copying and pasting your solution directly into the HackMD note**. Instead, provide a concise summary of the various test cases, outlining the aspects of the CPU they evaluate, the techniques employed for loading test program instructions, and the outcomes of these test cases.
:::
##### **Test Implementation**
* Test for Add Operation:
* The test sets an ADD instruction (x3 = x2 + x1) using a poke into the instruction input of the Execute module. It then enters a loop to test this operation 100 times with random input values for register data (reg1_data and reg2_data).For each iteration, it checks the mem_alu_result output against the expected result of the addition and asserts that no jump should be flagged (if_jump_flag).
* Test for Branch Equal (BEQ) Operation:
* The code sets up a BEQ instruction (pc + 2 if x1 === x2) and steps the clock once to process it.The test then checks for the equality of the contents of two registers (reg1_data and reg2_data). If they are equal, it expects the if_jump_flag to be set (indicating a successful branch), and the if_jump_address to be 4 (as the immediate was set to 2, and the addition with the instruction address should result in 4).It also tests the case where the registers are not equal, in which case the if_jump_flag should not be set.
##### Waveform

* An opcode of 33 is specified, which in this context corresponds to the instruction addu (add unsigned) in some instruction sets.The operation being performed is the addition of two operands, io_op1 with the value 0x0BFABBC4 and io_op2 with the value 0x153495EA.The result of this addition is 0x212F51AE, which is stored in io_result.
## **Modify the handwritten RISC-V assembly code on the single-cycle RISC-V CPU**
- **Remove the RDCYCLE or RDCYCLEH instruction and store the result in memory**
- **Modify the makefile in csrc directory to generate corresponding .asmbin file**
```
BINS = \
fibonacci.asmbin \
hello.asmbin \
mmio.asmbin \
quicksort.asmbin \
sb.asmbin \
hw2.asmbin
```
- **Add test in CPUTest.scala**
```scala=
class hw2Test extends AnyFlatSpec with ChiselScalatestTester {
behavior.of("Single Cycle CPU")
it should "Perform hw2" in {
test(new TestTopModule("hw2.asmbin")).withAnnotations(TestAnnotations.annos) { c =>
for (i <- 1 to 50) {
c.clock.step(1000)
c.io.mem_debug_read_address.poke((i * 4).U) // Avoid timeout
}
c.io.mem_debug_read_address.poke(4.U)
c.clock.step()
c.io.mem_debug_read_data.expect(16.U)
c.io.mem_debug_read_address.poke(8.U)
c.clock.step()
c.io.mem_debug_read_data.expect(23.U)
c.io.mem_debug_read_address.poke(12.U)
c.clock.step()
c.io.mem_debug_read_data.expect(46.U)
}
}
}
```
- **Result**
```shell
$ sbt "testOnly riscv.singlecycle.hw2Test"`
[info] welcome to sbt 1.9.7 (Ubuntu Java 11.0.21)
[info] loading settings for project ca2023-lab3-build from plugins.sbt ...
[info] loading project definition from /home/hongwei/ca2023-lab3/project
[info] loading settings for project root from build.sbt ...
[info] set current project to mycpu (in build file:/home/hongwei/ca2023-lab3/)
[info] hw2Test:
[info] Single Cycle CPU
[info] - should Perform hw2
[info] Run completed in 6 seconds, 415 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 7 s, completed Dec 1, 2023, 4:05:15 PM
```
## **Using Verilator to Run the Assembly**
Use the following command to generate the simulation executable file of the CPU:
```
$ make verilator
```
Use the following command to run `hw2_asm.asmbin` on the simulated CPU:
```
$ ./run-verilator.sh -instruction src/main/resources/hw2.asmbin -time 2000 -vcd hw2.vcd
```
Output:
```
-time 2000
-memory 1048576
-instruction src/main/resources/hw2.asmbin
[-------------------->] 100%
```
## **Analyze the waveform data by loading the file into GTKWave**
### **Instruction Fetch**

When io_jump_flag_id equals 1, it indicates that a jump has occurred, causing the PC not to just increment simply, but to leap to an entirely new address. In the example provided, the PC jumps from 0x10EC to 0x1024.
### **Instruction Decode**

The io_instruction value 0x00452483 represents a lw instruction in a RISC-V system, which loads a word from memory into register s1. The memory read is enabled (io_memory_read_enable = 1), while memory write is disabled (io_memory_write_enable = 0). The register a0 (address 0x0A) is the base address for the memory access, with an offset of 4 (io_ex_immediate = 0x4). The data from memory will be written into register s1 (rd = 0x9), with write operations enabled (io_reg_write_enable = 1).
### **Execute**

The alu_ctrl_io_alu_funct signal with a value of 0x0001 specifies an addition operation in the arithmetic logic unit (ALU) of the system. When the instruction type is a load operation (indicated by InstructionTypes.L), the ALU function is set to perform an addition (ALUFunctions.add). This is seen in the ALU control code.
In the ALU's Scala implementation, when io.func is set to ALUFunctions.add, the ALU performs an addition of io.op1 and io.op2. This operation is used to calculate the effective memory address for the load instruction.
The effective memory address, or alu_io_result, is computed by adding the base register value rs (0x00001004) to the immediate offset (0x000000EC), resulting in a calculated memory address of 0x000010F0. This is the address from which data will be loaded into the register.
### **Memory Access**
The assembly program counts the number of zeros in a number and stores the result in memory. The count of zeros is calculated in the counting_zero function and stored in the t0 register. After counting, the program calls the print_result function, which stores the result from t0 into memory at the address pointed to by the s7 register. The s7 register is set before each counting operation to point to memory locations immediately following num1, num2, or num3 data areas, depending on which number is being processed.
- **Test case 1 (Answer is 16)**
```shell
c.io.mem_debug_read_address.poke(4.U)
c.clock.step()
c.io.mem_debug_read_data.expect(16.U)
```

- **Test case 2 (Answer is 23)**
```shell
c.io.mem_debug_read_address.poke(8.U)
c.clock.step()
c.io.mem_debug_read_data.expect(23.U)
```

- **Test case 3 (Answer is 46)**
```shell
c.io.mem_debug_read_address.poke(12.U)
c.clock.step()
c.io.mem_debug_read_data.expect(46.U)
```
