# Asssigment3: SoftCPU ## Setting up the Environment * Download and Install Verivilator ```Bash= Prerequisites: #sudo apt-get install git perl python3 make autoconf g++ flex bison ccache #sudo apt-get install libgoogle-perftools-dev numactl perl-doc #sudo apt-get install libfl2 # Ubuntu only (ignore if gives error) #sudo apt-get install libfl-dev # Ubuntu only (ignore if gives error) #sudo apt-get install zlibc zlib1g zlib1g-dev # Ubuntu only (ignore if gives error) git clone https://github.com/verilator/verilator # Only first time unset VERILATOR_ROOT # For bash cd verilator autoconf # Create ./configure script ./configure # Configure and create Makefile make -j `nproc` # Build Verilator itself (if error, try just 'make') sudo make install ``` * Download srv32 ```Bash= git clone https://github.com/sysprog21/srv32.git ``` * Download and Install RISC-V toolchains ```Bash= sudo apt-get install autoconf automake autotools-dev curl python3 libmpc-dev libmpfr-dev libgmp-dev gawk build-essential bison flex texinfo gperf libtool patchutils bc zlib1g-dev libexpat-dev git clone --recursive https://github.com/riscv/riscv-gnu-toolchain cd riscv-gnu-toolchain mkdir -p build && cd build …/configure --prefix=/opt/riscv --enable-multilib sudo make -j$(nproc) ``` * Install the GTKWave ```Bash= sudo apt install gtkwave ``` ## Requirement 1 for this homework, I used the hw1 code. then to make it compatible with the RISC-V compiler I rewrite a code using volatile int. * C Code ```Bash= #include <stdio.h> #include <stdlib.h> int main(){ volatile int n = 2 ,sum,rem; while(n>0){ rem = n%10; sum += rem * rem; n=n/10; if(n == 0 && sum >= 10){ n = sum; sum = 0; } } if(sum == 1){ printf("true"); }else{ printf("false"); } return 0; } ``` * Making File ```Bash= #in srv32/ mkdir sw/hw3 # Copy existing Makefile to our directory cp sw/hello/Makefile sw/hw3/ # Write down your code vim sw/hw3/hw3.c ``` * Modify the Makefile ```Bash= #in srv32/sw/common vim Makefile.common #the modify the Makefile as below include ../common/Makefile.common EXE = .elf SRC = hw3.c CFLAGS += -L../common LDFLAGS += -T ../common/default.ld TARGET = hw3 OUTPUT = $(TARGET)$(EXE) .PHONY: all clean all: $(TARGET) $(TARGET): $(SRC) $(CC) $(CFLAGS) -o $(OUTPUT) $(SRC) $(LDFLAGS) $(OBJCOPY) -j .text -O binary $(OUTPUT) imem.bin $(OBJCOPY) -j .data -O binary $(OUTPUT) dmem.bin $(OBJCOPY) -O binary $(OUTPUT) memory.bin $(OBJDUMP) -d $(OUTPUT) > $(TARGET).dis $(READELF) -a $(OUTPUT) > $(TARGET).symbol clean: $(RM) *.o $(OUTPUT) $(TARGET).dis $(TARGET).symbol [id]mem.bin memory.> ``` * Run the code ```Bash= # Run it in srv32 directory make hw3 ``` * Result ```Bash= ./rvsim --memsize 128 -l trace.log ../sw/hw3/hw3.elf false Excuting 1007 instructions, 1307 cycles, 1.298 CPI Program terminate Simulation statistics ===================== Simulation time : 0.009 s Simulation cycles: 1307 Simulation speed : 0.140 MHz make[1]: Leaving directory '/home/p76107077/srv32/tools' Compare the trace between RTL and ISS simulator ``` ## Requirement 2 * Assembly code generated by SRV32 as follows. :::spoiler ```Bash= hw3.elf: file format elf32-littleriscv Disassembly of section .text: 00000000 <_start>: 0: 00003297 auipc t0,0x3 4: f9028293 addi t0,t0,-112 # 2f90 <trap_handler> 8: 30529073 csrw mtvec,t0 c: 3050e073 csrsi mtvec,1 10: 00021297 auipc t0,0x21 14: 85828293 addi t0,t0,-1960 # 20868 <__malloc_max_sbrked_mem> 18: 00021317 auipc t1,0x21 1c: 88c30313 addi t1,t1,-1908 # 208a4 <_bss_end> 00000020 <_bss_clear>: 20: 0002a023 sw zero,0(t0) 24: 00428293 addi t0,t0,4 28: fe62ece3 bltu t0,t1,20 <_bss_clear> 2c: 00040117 auipc sp,0x40 30: fd410113 addi sp,sp,-44 # 40000 <_stack> 34: 008000ef jal ra,3c <main> 38: 2680306f j 32a0 <exit> 0000003c <main>: 3c: ff010113 addi sp,sp,-16 40: 00112623 sw ra,12(sp) 44: 00200713 li a4,2 48: 00a00613 li a2,10 4c: 00900593 li a1,9 50: 02c767b3 rem a5,a4,a2 54: 02c74733 div a4,a4,a2 58: 02f787b3 mul a5,a5,a5 5c: 00f686b3 add a3,a3,a5 60: fe0718e3 bnez a4,50 <main+0x14> 64: 02d5c463 blt a1,a3,8c <main+0x50> 68: 00100793 li a5,1 6c: 02f68663 beq a3,a5,98 <main+0x5c> 70: 00020537 lui a0,0x20 74: 02850513 addi a0,a0,40 # 20028 <__malloc_trim_threshold+0xc> 78: 0fc000ef jal ra,174 <puts> 7c: 00c12083 lw ra,12(sp) 80: 00000513 li a0,0 84: 01010113 addi sp,sp,16 88: 00008067 ret 8c: 00068713 mv a4,a3 90: 00000693 li a3,0 94: fbdff06f j 50 <main+0x14> 98: 00020537 lui a0,0x20 9c: 02050513 addi a0,a0,32 # 20020 <__malloc_trim_threshold+0x4> a0: 0d4000ef jal ra,174 <puts> a4: fd9ff06f j 7c <main+0x40> 000000a8 <_puts_r>: a8: fc010113 addi sp,sp,-64 ac: 02812c23 sw s0,56(sp) b0: 00050413 mv s0,a0 b4: 00058513 mv a0,a1 b8: 02912a23 sw s1,52(sp) bc: 02112e23 sw ra,60(sp) c0: 00058493 mv s1,a1 c4: 0c0000ef jal ra,184 <strlen> c8: 00150713 addi a4,a0,1 cc: 00020697 auipc a3,0x20 d0: f6468693 addi a3,a3,-156 # 20030 <__malloc_trim_threshold+0x14> d4: 00e12e23 sw a4,28(sp) d8: 03842783 lw a5,56(s0) dc: 02010713 addi a4,sp,32 e0: 02d12423 sw a3,40(sp) e4: 00e12a23 sw a4,20(sp) e8: 00100693 li a3,1 ec: 00200713 li a4,2 f0: 02912023 sw s1,32(sp) f4: 02a12223 sw a0,36(sp) f8: 02d12623 sw a3,44(sp) fc: 00e12c23 sw a4,24(sp) 100: 00842583 lw a1,8(s0) 104: 04078e63 beqz a5,160 <_puts_r+0xb8> 108: 00c59783 lh a5,12(a1) 10c: 01279713 slli a4,a5,0x12 110: 02074263 bltz a4,134 <_puts_r+0x8c> 114: 0645a703 lw a4,100(a1) 118: 000026b7 lui a3,0x2 11c: 00d7e7b3 or a5,a5,a3 120: ffffe6b7 lui a3,0xffffe 124: fff68693 addi a3,a3,-1 # ffffdfff <_stack+0xfffbdfff> 128: 00d77733 and a4,a4,a3 12c: 00f59623 sh a5,12(a1) 130: 06e5a223 sw a4,100(a1) 134: 01410613 addi a2,sp,20 138: 00040513 mv a0,s0 13c: 474000ef jal ra,5b0 <__sfvwrite_r> 140: 03c12083 lw ra,60(sp) 144: 03812403 lw s0,56(sp) 148: 00a03533 snez a0,a0 14c: 40a00533 neg a0,a0 150: 03412483 lw s1,52(sp) 154: 00a56513 ori a0,a0,10 158: 04010113 addi sp,sp,64 15c: 00008067 ret 160: 00040513 mv a0,s0 164: 00b12623 sw a1,12(sp) 168: 400000ef jal ra,568 <__sinit> 16c: 00c12583 lw a1,12(sp) 170: f99ff06f j 108 <_puts_r+0x60> 00000174 <puts>: 174: 00050593 mv a1,a0 178: 00020517 auipc a0,0x20 17c: e9c52503 lw a0,-356(a0) # 20014 <_impure_ptr> 180: f29ff06f j a8 <_puts_r> 00000184 <strlen>: 184: 00357793 andi a5,a0,3 188: 00050713 mv a4,a0 18c: 04079c63 bnez a5,1e4 <strlen+0x60> 190: 7f7f86b7 lui a3,0x7f7f8 194: f7f68693 addi a3,a3,-129 # 7f7f7f7f <_stack+0x7f7b7f7f> 198: fff00593 li a1,-1 19c: 00072603 lw a2,0(a4) 1a0: 00470713 addi a4,a4,4 1a4: 00d677b3 and a5,a2,a3 1a8: 00d787b3 add a5,a5,a3 1ac: 00c7e7b3 or a5,a5,a2 1b0: 00d7e7b3 or a5,a5,a3 1b4: feb784e3 beq a5,a1,19c <strlen+0x18> 1b8: ffc74683 lbu a3,-4(a4) 1bc: ffd74603 lbu a2,-3(a4) 1c0: ffe74783 lbu a5,-2(a4) 1c4: 40a70733 sub a4,a4,a0 1c8: 04068063 beqz a3,208 <strlen+0x84> 1cc: 02060a63 beqz a2,200 <strlen+0x7c> 1d0: 00f03533 snez a0,a5 1d4: 00e50533 add a0,a0,a4 1d8: ffe50513 addi a0,a0,-2 1dc: 00008067 ret 1e0: fa0688e3 beqz a3,190 <strlen+0xc> 1e4: 00074783 lbu a5,0(a4) 1e8: 00170713 addi a4,a4,1 1ec: 00377693 andi a3,a4,3 1f0: fe0798e3 bnez a5,1e0 <strlen+0x5c> 1f4: 40a70733 sub a4,a4,a0 1f8: fff70513 addi a0,a4,-1 1fc: 00008067 ret 200: ffd70513 addi a0,a4,-3 204: 00008067 ret 208: ffc70513 addi a0,a4,-4 20c: 00008067 ret 00000210 <__fp_lock>: 210: 00000513 li a0,0 214: 00008067 ret 00000218 <_cleanup_r>: 218: 00002597 auipc a1,0x2 21c: f5c58593 addi a1,a1,-164 # 2174 <_fclose_r> 220: 0cd0006f j aec <_fwalk_reent> 00000224 <__fp_unlock>: 224: 00000513 li a0,0 228: 00008067 ret 0000022c <__sinit.part.0>: 22c: fe010113 addi sp,sp,-32 230: 00112e23 sw ra,28(sp) 234: 00812c23 sw s0,24(sp) 238: 00912a23 sw s1,20(sp) 23c: 01212823 sw s2,16(sp) 240: 01312623 sw s3,12(sp) 244: 01412423 sw s4,8(sp) 248: 01512223 sw s5,4(sp) 24c: 01612023 sw s6,0(sp) 250: 00452403 lw s0,4(a0) 254: 00000717 auipc a4,0x0 258: fc470713 addi a4,a4,-60 # 218 <_cleanup_r> 25c: 02e52e23 sw a4,60(a0) 260: 2ec50793 addi a5,a0,748 264: 00300713 li a4,3 268: 2ee52223 sw a4,740(a0) 26c: 2ef52423 sw a5,744(a0) 270: 2e052023 sw zero,736(a0) 274: 00400793 li a5,4 278: 00050913 mv s2,a0 27c: 00f42623 sw a5,12(s0) 280: 00800613 li a2,8 284: 00000593 li a1,0 288: 06042223 sw zero,100(s0) 28c: 00042023 sw zero,0(s0) 290: 00042223 sw zero,4(s0) 294: 00042423 sw zero,8(s0) 298: 00042823 sw zero,16(s0) 29c: 00042a23 sw zero,20(s0) 2a0: 00042c23 sw zero,24(s0) 2a4: 05c40513 addi a0,s0,92 2a8: 3d4010ef jal ra,167c <memset> 2ac: 00892483 lw s1,8(s2) 2b0: 00002b17 auipc s6,0x2 2b4: a80b0b13 addi s6,s6,-1408 # 1d30 <__sread> 2b8: 00002a97 auipc s5,0x2 2bc: adca8a93 addi s5,s5,-1316 # 1d94 <__swrite> 2c0: 00002a17 auipc s4,0x2 2c4: b5ca0a13 addi s4,s4,-1188 # 1e1c <__sseek> 2c8: 00002997 auipc s3,0x2 2cc: bbc98993 addi s3,s3,-1092 # 1e84 <__sclose> 2d0: 000107b7 lui a5,0x10 2d4: 03642023 sw s6,32(s0) 2d8: 03542223 sw s5,36(s0) 2dc: 03442423 sw s4,40(s0) 2e0: 03342623 sw s3,44(s0) 2e4: 00842e23 sw s0,28(s0) 2e8: 00978793 addi a5,a5,9 # 10009 <_text_end+0xcced> 2ec: 00f4a623 sw a5,12(s1) 2f0: 00800613 li a2,8 2f4: 00000593 li a1,0 2f8: 0604a223 sw zero,100(s1) 2fc: 0004a023 sw zero,0(s1) 300: 0004a223 sw zero,4(s1) 304: 0004a423 sw zero,8(s1) 308: 0004a823 sw zero,16(s1) 30c: 0004aa23 sw zero,20(s1) 310: 0004ac23 sw zero,24(s1) 314: 05c48513 addi a0,s1,92 318: 364010ef jal ra,167c <memset> 31c: 00c92403 lw s0,12(s2) 320: 000207b7 lui a5,0x20 324: 0364a023 sw s6,32(s1) 328: 0354a223 sw s5,36(s1) 32c: 0344a423 sw s4,40(s1) 330: 0334a623 sw s3,44(s1) 334: 0094ae23 sw s1,28(s1) 338: 01278793 addi a5,a5,18 # 20012 <_global_impure_ptr+0x2> 33c: 00f42623 sw a5,12(s0) 340: 06042223 sw zero,100(s0) 344: 00042023 sw zero,0(s0) 348: 00042223 sw zero,4(s0) 34c: 00042423 sw zero,8(s0) 350: 00042823 sw zero,16(s0) 354: 00042a23 sw zero,20(s0) 358: 00042c23 sw zero,24(s0) 35c: 05c40513 addi a0,s0,92 360: 00800613 li a2,8 364: 00000593 li a1,0 368: 314010ef jal ra,167c <memset> 36c: 01c12083 lw ra,28(sp) 370: 03642023 sw s6,32(s0) 374: 03542223 sw s5,36(s0) 378: 03442423 sw s4,40(s0) 37c: 03342623 sw s3,44(s0) 380: 00842e23 sw s0,28(s0) 384: 01812403 lw s0,24(sp) 388: 00100793 li a5,1 38c: 02f92c23 sw a5,56(s2) 390: 01412483 lw s1,20(sp) 394: 01012903 lw s2,16(sp) 398: 00c12983 lw s3,12(sp) 39c: 00812a03 lw s4,8(sp) 3a0: 00412a83 lw s5,4(sp) 3a4: 00012b03 lw s6,0(sp) 3a8: 02010113 addi sp,sp,32 3ac: 00008067 ret ``` ::: * srv32 architecture ![](https://i.imgur.com/E84iDjT.jpg) 1. based on the picture above the srv32 is a three-stage pipeline processor (IF/ID, EX, WB) that has passed the RV32IM compliance test. 2. srv32 is able to implement full forwarding to avoid data hazard, in which case a last instruction read by memory can be passed to the execution stage. 3. The failed branch prediction at the execution stage will take 2 stalls in two-cycle to flush the wrong instruction by the pipeline * GtkWave we can analyze the pipilene based on a signal, the first step we need to do is import/open the fst file, after we import/open the file we need to set the properties on gtkwave according to the srv32 3 stage pipeline architecture ![](https://i.imgur.com/fo3h4aO.png) look at the arrows contained in each cycle, the arrows indicate a characteristic of the processor operation according to the DEC and EXEC stage of the pipeline. ![](https://i.imgur.com/FPv4tEg.png) Based on observations, there is a failed branch penalty in two clock cycles. ![](https://i.imgur.com/PIOAStc.png) the two waves below will be flushed after instructions after the branch instruction. ![](https://i.imgur.com/VKGjMAC.png) ```Bash= 1c48: 01042703 lw a4,16(s0) 1c4c: fad610e3 bne a2,a3,1bec <_realloc_r+0x48c> 1c50: 00ec2c23 sw a4,24(s8) 1c54: 01442703 lw a4,20(s0) 1c58: 020c0793 addi a5,s8,32 1c5c: 01840413 addi s0,s0,24 1c60: 00ec2e23 sw a4,28(s8) 1c64: 00042703 lw a4,0(s0) ``` | address | instruction | 1 | 2 | 3 | 4|5|6| | -------- | -------- | -------- |-------- |-------- |-------- |-------- |-------- | | 1c50 | bne a2,a3,1bec | IF/ID |EXE|WB| | 1c54 | sw a4,24(s8) | |nop|nop|nop| | 1c58 | lw a4,20(s0) | ||nop|nop|nop| | 1c5c | addi a5,s8,32 | |||IF/ID|EXE|WB ## Requirement 3 * Original Result ```Bash= false Excuting 1007 instructions, 1307 cycles, 1.297 CPI Program terminate - ../rtl/../testbench/testbench.v:418: Verilog $finish Simulation statistics ===================== Simulation time : 0.059 s Simulation cycles: 1318 Simulation speed : 0.022339 MHz ``` * New code with optimization ```Bash= #include<stdio.h> #include<stdlib.h> int main(){ volatile int n=2,sum,rem; do{ while(n>0){ rem=n%10; n=n/10; sum+=(rem*rem); } n=sum; } while(!(n>=0&&n<=9));{ if(n==1){ printf("true"); }else{ printf("false"); } } } ``` * Result after optimazion ```Bash= false Excuting 945 instructions, 1209 cycles, 1.279 CPI Program terminate - ../rtl/../testbench/testbench.v:418: Verilog $finish Simulation statistics ===================== Simulation time : 0.045 s Simulation cycles: 1220 Simulation speed : 0.0271111 MHz ``` Cycles: We reduced 1318-1220 = 98 cycles. ## Requirement 4 * A `RISC-V compliance test` is a rule for implementing hardware (i.e. RISC-V core) or software written for RISC-V to run within a RISC-V ecosystem. in the process of testing RISC-V assembler code that runs on the processor and returns results in the specified memory area (`signature`). The test will run smoothly if a minimum of instructions and only those that are absolutely necessary. It should only use instructions and registers from the target ISA instruction set. The result that architectural testing provides to users is an assurance that the specifications have been correctly interpreted and the implementation under test (DUT) can be declared as conforming to the RISC-V Architecture Test. * srv32 is a 3-stage RISC-V core and `Verilator` is a simulator for Verilog. `Verilator` is a compiler that can convert verilog HDL to a cycle-accurate behavioral model in C++ for simulation. Therefore, by running srv32 inside `Verilator`, it checks its implementation and simulates the design speed using digital simulation. It can simulate the execution of RISC-V binary at RTL level.