# Assignment3: single-cycle RISC-V CPU
contributed by <[`shhung`](https://github.com/shhung/ca2023-lab3)>
## Chisel
### Hello World in Chisel
The `Hello` circuit is designed to output the LED signal and accumulate a value until it reaches the maximum value of 25,000,000 which means 25,000,000 clock cycle. Once the maximum value is attained, it inverts the LED signal. The LED signal starts at 0 and alternates between 0 and 1 during the accumulation process.
## Construct a single-cycle RISC-V CPU with Chisel
To complete the work in Lab3, the most crucial aspect is understanding the Single-cycle CPU architecture diagram. Each block in the diagram corresponds to a Scala file in `src/main/scala/riscv/core/`.
By following the I/O depicted in the diagram and the internal logic outlined in the Lab3 materials, we can successfully accomplish the tasks.
### Single-cycle CPU architecture diagram
- [ ] Full

### Instruction Fetch

The key signal in the IF stage is `jump_flag_id`, which determines the value of `pc` in the next clock cycle. Observing the waveform reveals that `pc` consistently increments by 4 in the absence of `jump_flag_id`. However, when `jump_flag_id` is set, `pc` jumps back to a fixed value of 4096 as defined in the test.
### Instruction Decode
The testcase for ID provides three instructions in the same order as the waveform below.
- sw x10, 4(x0)
- lui x5, 2
- add x3, x1, x2

The blank space left for us to fill involves determining the output signals `memory_read_enable` and `memory_write_enable`. By checking the `opcode`, it can decide whether to generate `memory_read_enable` or `memory_write_enable`.
The testcase lacks testing for the `memory_write_enable` signals. After completing the assignment, we can modify the test for a more comprehensive testing.
### Execution

To determine the two operands of the ALU, we examine the `aluop_source`. In the waveform, it's noticeable that when `aluop_source` is set to 0, data from the register will be assigned to the operands. Alternatively, when `aluop_source` is set to 1, `instruction_address` and `immediate` will be assigned to `op1` and `op2`.
### Combining into a CPU
When the above components have been completed and are functioning properly, we can combine each part to build our CPU. According to the Single-cycle CPU architecture diagram shown before, we can connect the output of one component to the input of another. If everything goes well, we should pass all the tests.
## Run handwritten RISC-V assembly code on MyCPU
Since making my assembly code into a callable function and integrating it with a C program in [Homework2](https://hackmd.io/vHoQpw69R_eddq4WrcT0UA), all I have to do is add the main function in assembly code to call the function.
To verify the functionality of the code, specific test cases need to be written. Due to the lack of floating-point support, I modified the C code to output the result in integer format instead of floating point.
In [HW2](https://github.com/shhung/Image-scaling-with-Bilinear-interpolation-by-float32-multiplication/tree/hw3), the code has been modified. Simply building it with `make` and executing it on `rv32emu` will yield the following result:
```shell
$ rv32emu ImgScaleFromC.elf
...
1064594550 1063304507 1062014466 1060724424 1059434382
1064042902 1062445378 1060847855 1059250332 1057652809
1063491255 1061586249 1059681246 1057776241 1054777865
1062939607 1060727120 1058514636 1055639692 1051214720
1062387961 1059867992 1057348026 1052691510 1046727151
```
To run the code on MyCPU, make sure to place your code into the /csrc directory and use make update to copy the program into MyCPU.
The final step is modifying the number of cycles to ensure that the program can fully execute to the end. An easy but effective approach is to load an immediate value into a register just before the end of the program, then peek it in the test. If we obtain the value we assigned, it indicates that the program has executed to the end.
Then run the test for imgScale will also print out the result.
```shell
$ sbt "testOnly riscv.singlecycle.ImgScaleTest"
...
UInt<32>(1064594550), UInt<32>(1063304507), UInt<32>(1062014466), UInt<32>(1060724424), UInt<32>(1059434382),
UInt<32>(1064042902), UInt<32>(1062445378), UInt<32>(1060847855), UInt<32>(1059250332), UInt<32>(1057652809),
UInt<32>(1063491255), UInt<32>(1061586249), UInt<32>(1059681246), UInt<32>(1057776241), UInt<32>(1054777865),
UInt<32>(1062939607), UInt<32>(1060727120), UInt<32>(1058514636), UInt<32>(1055639692), UInt<32>(1051214720),
UInt<32>(1062387961), UInt<32>(1059867992), UInt<32>(1057348026), UInt<32>(1052691510), UInt<32>(1046727151),
...
```
## Examining the waveform
To observe how signals vary, we can examine the waveform generated from Verilator, which is fast to complete the execution.
Follow the instructions in [lab3-waveform](https://hackmd.io/@sysprog/r1mlr3I7p#Waveform) we can easily utilize the tool without needing to know a lot.
### Gtkwave
Since I'm working in the WSL environment, instead of installing GTKWave in Ubuntu, I downloaded [gtkwave-3.3.90-bin-win64](https://sourceforge.net/projects/gtkwave/files/gtkwave-3.3.90-bin-win64/gtkwave-3.3.90-bin-win64.zip/download) on Windows. To interact with the GUI, simply click the GTKWave icon and open the VCD file in the GUI.




---
Let's move forward to examine the waveform for different formats or different types of instructions. I will choose the instructions in my code and find their signals in GTKWave. The process of how I did it will be listed below:
1. Choose the instruction I want to examine in my code
2. Utilize the [online tool for RISC-V Instruction Encoder/Decoder](https://luplab.gitlab.io/rvcodecjs/) to decode the instruction into hexadecimal
3. Search for the `io_instruction` signal using the hexadecimal value obtained from the previous step
The instruction's binary value was be separated with meaningful to easily distinguish its `rs1`, `rs2`, `funt3`, `opcode`, `immediate`, etc.
### R-type
#### sub t1, t1, s0
Assembly: sub x6, x6, x8
Hexdicimal: 0x40830333
Binary: 0100000 01000 00110 000 00110 0110011
##### ID

R-type need for `reg_write_enable`
##### EX

`alu_op` has read the value from the register
##### REG

register value changes at the next clock
### I-type
#### slli t1, t1, 2
Assembly: slli x6, x6, 2
Hexdicimal: 0x00231313
Binary: 0000000 00010 00110 001 00110 0010011
##### ID

`regs_reg2` is set as an immediate, the important thing to notice is that Shift by Immediate encodes the shift amount in the lower-order 5 bits of `imm`
The explanation is available at [RV32 I-Format Arithmetic Instructions](https://docs.google.com/presentation/d/1k5qJyfuyk3ITzOfK0cIosyNl0nh1o9UC/edit#slide=id.p25)
#### lw a1, 0(t1)
Assembly: slli x6, x6, 2
Hexdicimal: 0x00231313
Binary: 000000000000 00110 010 01011 0000011
##### ID

`mem_read_enable` and `reg_write_enable` is set
##### EX

from alu we get the memory address `0x1674` to read data
##### MEM

MEM will get the data at half of the clock
##### REG

register get value at the next clock
### S-type
#### sw s6, 28(sp)
Assembly: sw x22, 28(x2)
Hexdicimal: 0x01612e23
Binary: 0000000 10110 00010 010 11100 0100011
##### ID

`memory_write_enable` is set to 1
##### EX

`alu_op` is determined by `aluop_source`