# Assignment3: SoftCPU ## Get srv32 Wavefile After seeing [dhrystone](https://github.com/sysprog21/srv32/tree/devel/sw/dhrystone),I found that we can get srv32 wavefile by doing the following thing * make `plusone` folder in `sw` directory * place our code in `plusone` folder * write Makefile in `plusone` folder * excute `make plusone` in `srv32` folder ### C code ```c= #include <stdio.h> int main() { int carry = 0, nd = 10; int digits[] = {9,9,9,9,9,9,9,9,9,9}; int newdigits[nd+1]; for(int i = 0; i < nd; i++) printf("%d",digits[i]); printf(" plus one is "); digits[nd-1] += 1; for(int i = nd-1; i >= 0 ; i--) { if(digits[i] == 10) { digits[i] = 0; if(i == 0) carry = 1; else digits[i-1] += 1; } else break; } if(carry) { newdigits[0] = 1; for(int i = 0; i < nd; i++) { newdigits[i+1] = digits[i]; } for(int i = 0; i < nd+1; i++) { printf("%d",newdigits[i]); } } else { for(int i = 0; i < nd; i++) printf("%d",digits[i]); } printf("\n"); } ``` ### Makefile ``` include ../common/Makefile.common EXE = .elf SRC = plusone.c CFLAGS += -DTIME -DRISCV -DHZ=100000000 -L../common CFLAGS += -Wno-return-type -Wno-implicit-function-declaration -Wno-implicit-int LDFLAGS += -T ../common/default.ld TARGET = plusone OUTPUT = $(TARGET)$(EXE) .PHONY: all clean all: $(TARGET) $(TARGET): $(SRC) $(CC) $(CFLAGS) -o $(OUTPUT) $(SRC) $(LDFLAGS) $(OBJCOPY) -j .text -O binary $(OUTPUT) imem.bin $(OBJCOPY) -j .data -O binary $(OUTPUT) dmem.bin $(OBJCOPY) -O binary $(OUTPUT) memory.bin $(OBJDUMP) -d $(OUTPUT) > $(TARGET).dis $(READELF) -a $(OUTPUT) > $(TARGET).symbol clean: $(RM) *.o $(OUTPUT) $(TARGET).dis $(TARGET).symbol [id]mem.bin memory.bin ``` ### Result After run `make plusone`,we can get the wave.fst file in `sim` directory,and we can also see some information like the following,I guess it just simulates how our code run on srv32 processor ``` make[1]: Entering directory '/home/korin/srv32-devel/sw' make -C common make[2]: Entering directory '/home/korin/srv32-devel/sw/common' make[2]: Nothing to be done for 'all'. make[2]: Leaving directory '/home/korin/srv32-devel/sw/common' make[2]: Entering directory '/home/korin/srv32-devel/sw/plusone' riscv-none-embed-gcc -O1 -Wall -march=rv32im -mabi=ilp32 -nostartfiles -nostdlib -DTIME -DRISCV -DHZ=100000000 -L../common -Wno-return-type -Wno-implicit-function-declaration -Wno-implicit-int -o plusone.elf plusone.c -lc -lm -lgcc -lsys -T ../common/default.ld riscv-none-embed-objcopy -j .text -O binary plusone.elf imem.bin riscv-none-embed-objcopy -j .data -O binary plusone.elf dmem.bin riscv-none-embed-objcopy -O binary plusone.elf memory.bin riscv-none-embed-objdump -d plusone.elf > plusone.dis riscv-none-embed-readelf -a plusone.elf > plusone.symbol make[2]: Leaving directory '/home/korin/srv32-devel/sw/plusone' make[1]: Leaving directory '/home/korin/srv32-devel/sw' make[1]: Entering directory '/home/korin/srv32-devel/sim' 9999999999 plus one is 10000000000 Excuting 9937 instructions, 12649 cycles, 1.272 CPI Program terminate - ../rtl/../testbench/testbench.v:418: Verilog $finish Simulation statistics ===================== Simulation time : 0.161 s Simulation cycles: 12660 Simulation speed : 0.0786335 MHz make[1]: Leaving directory '/home/korin/srv32-devel/sim' make[1]: Entering directory '/home/korin/srv32-devel/tools' ./rvsim --memsize 128 -l trace.log ../sw/plusone/plusone.elf 9999999999 plus one is 10000000000 Excuting 9937 instructions, 12649 cycles, 1.273 CPI Program terminate Simulation statistics ===================== Simulation time : 0.011 s Simulation cycles: 12649 Simulation speed : 1.118 MHz make[1]: Leaving directory '/home/korin/srv32-devel/tools' Compare the trace between RTL and ISS simulator === Simulation passed === ``` ## Optimization ### C code Originally,I will add 1 to each digit and check if it is greater than 10,but i can also check each digits whether it equals to 9,and replace it to 0,so I can reduce the additional add。 ```c= #include <stdio.h> int main() { int nd = 10; int digits[] = {9,9,9,9,9,9,9,9,9,9}; int newdigits[nd+1]; int carry = 0; for(int i = 0; i < nd; i++) printf("%d",digits[i]); printf(" plus one is "); int i = nd - 1; if(digits[i] == 9) { digits[i] = 0; while(--i > 0 && digits[i] == 9) { digits[i] = 0; } } if(i == -1) carry = 1; else ++digits[i]; if(carry) { newdigits[0] = 1; for(int i = 0; i < nd; ++i) { newdigits[i+1] = digits[i]; } for(int i = 0; i <= nd; ++i) { printf("%d",newdigits[i]); } } else { for(int i = 0; i < nd; ++i) printf("%d",digits[i]); } printf("\n"); } ``` ### Result instruction counts decrease from 9937 to 9720 ``` make[1]: Entering directory '/home/korin/srv32-devel/sw' make -C common make[2]: Entering directory '/home/korin/srv32-devel/sw/common' make[2]: Nothing to be done for 'all'. make[2]: Leaving directory '/home/korin/srv32-devel/sw/common' make[2]: Entering directory '/home/korin/srv32-devel/sw/plusone2' riscv-none-embed-gcc -O1 -Wall -march=rv32im -mabi=ilp32 -nostartfiles -nostdlib -DTIME -DRISCV -DHZ=100000000 -L../common -Wno-return-type -Wno-implicit-function-declaration -Wno-implicit-int -o plusone2.elf plusone2.c -lc -lm -lgcc -lsys -T ../common/default.ld riscv-none-embed-objcopy -j .text -O binary plusone2.elf imem.bin riscv-none-embed-objcopy -j .data -O binary plusone2.elf dmem.bin riscv-none-embed-objcopy -O binary plusone2.elf memory.bin riscv-none-embed-objdump -d plusone2.elf > plusone2.dis riscv-none-embed-readelf -a plusone2.elf > plusone2.symbol make[2]: Leaving directory '/home/korin/srv32-devel/sw/plusone2' make[1]: Leaving directory '/home/korin/srv32-devel/sw' make[1]: Entering directory '/home/korin/srv32-devel/sim' 9999999999 plus one is 10000000000 Excuting 9720 instructions, 12364 cycles, 1.272 CPI Program terminate - ../rtl/../testbench/testbench.v:418: Verilog $finish Simulation statistics ===================== Simulation time : 0.15 s Simulation cycles: 12375 Simulation speed : 0.0825 MHz make[1]: Leaving directory '/home/korin/srv32-devel/sim' make[1]: Entering directory '/home/korin/srv32-devel/tools' ./rvsim --memsize 128 -l trace.log ../sw/plusone2/plusone2.elf 9999999999 plus one is 10000000000 Excuting 9720 instructions, 12364 cycles, 1.272 CPI Program terminate Simulation statistics ===================== Simulation time : 0.011 s Simulation cycles: 12364 Simulation speed : 1.115 MHz make[1]: Leaving directory '/home/korin/srv32-devel/tools' Compare the trace between RTL and ISS simulator === Simulation passed === ``` ## Check how my program is executed along with srv32 Simulation by GTKWave ### PC and Instruction I find when it fetches pc `00000004`,the inst`6e028293` in that pc is execute in next cycle,and srv32 also record if、ex、wb_pc and corresponding instruction which are the 3 stages of it ![](https://i.imgur.com/BpjICJb.png) ### Branch Prediction I find next_pc is always `PC+4`,but since we can find `t0(000217dc)` is less than `t1(0002181c)`,it will branch to `20` rather than `2c`,so we can see it will have some wrong fetch pc `2c`、`30`。 Also we can see the next pc is change during the `ex phase`,so it can reduce the stall during the wrong prediction。 ![](https://i.imgur.com/GwtmJBy.png) ### Forwarding I find when `and a4,a2,a4` is in `ex phase`,although `a4` register isn't update(we can see it is`ffffe000`),we can get the correct value by forwarding `ex_result` ![](https://i.imgur.com/W4l4pGu.png) ### dmem read I find when `lw t1,-216(t1)` instruction is in `ex phase`,the `dmeme_rready` is 1,and it will get `dmem_rdata(00020e40)` from `dmem_raddr(00020030)` in `wb phase`,then change `t1` register in next clock ![](https://i.imgur.com/9phEN1x.png) ### dmem write I find when `sw zero,0(t0)` instruction is in `wb phase`,the `dmem_wready` is 1,so we can store `dmem_wdata` into `dmem_waddr`。 ![](https://i.imgur.com/kCjPvZN.png)