# Assignment3: SoftCPU
## Get srv32 Wavefile
After seeing [dhrystone](https://github.com/sysprog21/srv32/tree/devel/sw/dhrystone),I found that we can get srv32 wavefile by doing the following thing
* make `plusone` folder in `sw` directory
* place our code in `plusone` folder
* write Makefile in `plusone` folder
* excute `make plusone` in `srv32` folder
### C code
```c=
#include <stdio.h>
int main() {
int carry = 0, nd = 10;
int digits[] = {9,9,9,9,9,9,9,9,9,9};
int newdigits[nd+1];
for(int i = 0; i < nd; i++)
printf("%d",digits[i]);
printf(" plus one is ");
digits[nd-1] += 1;
for(int i = nd-1; i >= 0 ; i--) {
if(digits[i] == 10) {
digits[i] = 0;
if(i == 0)
carry = 1;
else
digits[i-1] += 1;
}
else
break;
}
if(carry) {
newdigits[0] = 1;
for(int i = 0; i < nd; i++) {
newdigits[i+1] = digits[i];
}
for(int i = 0; i < nd+1; i++) {
printf("%d",newdigits[i]);
}
}
else {
for(int i = 0; i < nd; i++)
printf("%d",digits[i]);
}
printf("\n");
}
```
### Makefile
```
include ../common/Makefile.common
EXE = .elf
SRC = plusone.c
CFLAGS += -DTIME -DRISCV -DHZ=100000000 -L../common
CFLAGS += -Wno-return-type -Wno-implicit-function-declaration -Wno-implicit-int
LDFLAGS += -T ../common/default.ld
TARGET = plusone
OUTPUT = $(TARGET)$(EXE)
.PHONY: all clean
all: $(TARGET)
$(TARGET): $(SRC)
$(CC) $(CFLAGS) -o $(OUTPUT) $(SRC) $(LDFLAGS)
$(OBJCOPY) -j .text -O binary $(OUTPUT) imem.bin
$(OBJCOPY) -j .data -O binary $(OUTPUT) dmem.bin
$(OBJCOPY) -O binary $(OUTPUT) memory.bin
$(OBJDUMP) -d $(OUTPUT) > $(TARGET).dis
$(READELF) -a $(OUTPUT) > $(TARGET).symbol
clean:
$(RM) *.o $(OUTPUT) $(TARGET).dis $(TARGET).symbol [id]mem.bin memory.bin
```
### Result
After run `make plusone`,we can get the wave.fst file in `sim` directory,and we can also see some information like the following,I guess it just simulates how our code run on srv32 processor
```
make[1]: Entering directory '/home/korin/srv32-devel/sw'
make -C common
make[2]: Entering directory '/home/korin/srv32-devel/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/korin/srv32-devel/sw/common'
make[2]: Entering directory '/home/korin/srv32-devel/sw/plusone'
riscv-none-embed-gcc -O1 -Wall -march=rv32im -mabi=ilp32 -nostartfiles -nostdlib -DTIME -DRISCV -DHZ=100000000 -L../common -Wno-return-type -Wno-implicit-function-declaration -Wno-implicit-int -o plusone.elf plusone.c -lc -lm -lgcc -lsys -T ../common/default.ld
riscv-none-embed-objcopy -j .text -O binary plusone.elf imem.bin
riscv-none-embed-objcopy -j .data -O binary plusone.elf dmem.bin
riscv-none-embed-objcopy -O binary plusone.elf memory.bin
riscv-none-embed-objdump -d plusone.elf > plusone.dis
riscv-none-embed-readelf -a plusone.elf > plusone.symbol
make[2]: Leaving directory '/home/korin/srv32-devel/sw/plusone'
make[1]: Leaving directory '/home/korin/srv32-devel/sw'
make[1]: Entering directory '/home/korin/srv32-devel/sim'
9999999999 plus one is 10000000000
Excuting 9937 instructions, 12649 cycles, 1.272 CPI
Program terminate
- ../rtl/../testbench/testbench.v:418: Verilog $finish
Simulation statistics
=====================
Simulation time : 0.161 s
Simulation cycles: 12660
Simulation speed : 0.0786335 MHz
make[1]: Leaving directory '/home/korin/srv32-devel/sim'
make[1]: Entering directory '/home/korin/srv32-devel/tools'
./rvsim --memsize 128 -l trace.log ../sw/plusone/plusone.elf
9999999999 plus one is 10000000000
Excuting 9937 instructions, 12649 cycles, 1.273 CPI
Program terminate
Simulation statistics
=====================
Simulation time : 0.011 s
Simulation cycles: 12649
Simulation speed : 1.118 MHz
make[1]: Leaving directory '/home/korin/srv32-devel/tools'
Compare the trace between RTL and ISS simulator
=== Simulation passed ===
```
## Optimization
### C code
Originally,I will add 1 to each digit and check if it is greater than 10,but i can also check each digits whether it equals to 9,and replace it to 0,so I can reduce the additional add。
```c=
#include <stdio.h>
int main() {
int nd = 10;
int digits[] = {9,9,9,9,9,9,9,9,9,9};
int newdigits[nd+1];
int carry = 0;
for(int i = 0; i < nd; i++)
printf("%d",digits[i]);
printf(" plus one is ");
int i = nd - 1;
if(digits[i] == 9) {
digits[i] = 0;
while(--i > 0 && digits[i] == 9) {
digits[i] = 0;
}
}
if(i == -1)
carry = 1;
else
++digits[i];
if(carry) {
newdigits[0] = 1;
for(int i = 0; i < nd; ++i) {
newdigits[i+1] = digits[i];
}
for(int i = 0; i <= nd; ++i) {
printf("%d",newdigits[i]);
}
}
else {
for(int i = 0; i < nd; ++i)
printf("%d",digits[i]);
}
printf("\n");
}
```
### Result
instruction counts decrease from 9937 to 9720
```
make[1]: Entering directory '/home/korin/srv32-devel/sw'
make -C common
make[2]: Entering directory '/home/korin/srv32-devel/sw/common'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/korin/srv32-devel/sw/common'
make[2]: Entering directory '/home/korin/srv32-devel/sw/plusone2'
riscv-none-embed-gcc -O1 -Wall -march=rv32im -mabi=ilp32 -nostartfiles -nostdlib -DTIME -DRISCV -DHZ=100000000 -L../common -Wno-return-type -Wno-implicit-function-declaration -Wno-implicit-int -o plusone2.elf plusone2.c -lc -lm -lgcc -lsys -T ../common/default.ld
riscv-none-embed-objcopy -j .text -O binary plusone2.elf imem.bin
riscv-none-embed-objcopy -j .data -O binary plusone2.elf dmem.bin
riscv-none-embed-objcopy -O binary plusone2.elf memory.bin
riscv-none-embed-objdump -d plusone2.elf > plusone2.dis
riscv-none-embed-readelf -a plusone2.elf > plusone2.symbol
make[2]: Leaving directory '/home/korin/srv32-devel/sw/plusone2'
make[1]: Leaving directory '/home/korin/srv32-devel/sw'
make[1]: Entering directory '/home/korin/srv32-devel/sim'
9999999999 plus one is 10000000000
Excuting 9720 instructions, 12364 cycles, 1.272 CPI
Program terminate
- ../rtl/../testbench/testbench.v:418: Verilog $finish
Simulation statistics
=====================
Simulation time : 0.15 s
Simulation cycles: 12375
Simulation speed : 0.0825 MHz
make[1]: Leaving directory '/home/korin/srv32-devel/sim'
make[1]: Entering directory '/home/korin/srv32-devel/tools'
./rvsim --memsize 128 -l trace.log ../sw/plusone2/plusone2.elf
9999999999 plus one is 10000000000
Excuting 9720 instructions, 12364 cycles, 1.272 CPI
Program terminate
Simulation statistics
=====================
Simulation time : 0.011 s
Simulation cycles: 12364
Simulation speed : 1.115 MHz
make[1]: Leaving directory '/home/korin/srv32-devel/tools'
Compare the trace between RTL and ISS simulator
=== Simulation passed ===
```
## Check how my program is executed along with srv32 Simulation by GTKWave
### PC and Instruction
I find when it fetches pc `00000004`,the inst`6e028293` in that pc is execute in next cycle,and srv32 also record if、ex、wb_pc and corresponding instruction which are the 3 stages of it

### Branch Prediction
I find next_pc is always `PC+4`,but since we can find `t0(000217dc)` is less than `t1(0002181c)`,it will branch to `20` rather than `2c`,so we can see it will have some wrong fetch pc `2c`、`30`。
Also we can see the next pc is change during the `ex phase`,so it can reduce the stall during the wrong prediction。

### Forwarding
I find when `and a4,a2,a4` is in `ex phase`,although `a4` register isn't update(we can see it is`ffffe000`),we can get the correct value by forwarding `ex_result`

### dmem read
I find when `lw t1,-216(t1)` instruction is in `ex phase`,the `dmeme_rready` is 1,and it will get `dmem_rdata(00020e40)` from `dmem_raddr(00020030)` in `wb phase`,then change `t1` register in next clock

### dmem write
I find when `sw zero,0(t0)` instruction is in `wb phase`,the `dmem_wready` is 1,so we can store `dmem_wdata` into `dmem_waddr`。
