Assignment3: SoftCPU

# Assignment3: SoftCPU ## Environment #### RISC-V toolchains. ``` $ sudo apt install autoconf automake autotools-dev curl gawk git \ build-essential bison flex texinfo gperf libtool patchutils bc git \ libmpc-dev libmpfr-dev libgmp-dev gawk zlib1g-dev libexpat1-dev $ git clone --recursive https://github.com/riscv/riscv-gnu-toolchain $ cd riscv-gnu-toolchain $ mkdir -p build && cd build $ ../configure --prefix=/opt/riscv --enable-multilib $ make -j$(nproc) # Install the dependent packages. $ sudo apt install build-essential ccache ``` #### srv32 ``` $ git clone https://github.com/sysprog21/srv32.git $ cd ~/srv32/tools/ $ make $ cd ~/srv32/sim/ $ make # Check Lab2: riscv-none-embed-gcc $ cd $HOME $ source riscv-none-embed-gcc/setenv $ riscv-none-embed-gcc -v ``` #### Run compliance tests on SW simulator ##### Result: ``` $ cd ~/srv32/tests/ $ make tests-sw ================================ OK: 48/48 RISCV_TARGET=srv32 RISCV_DEVICE=rv32i RISCV_ISA=rv32i OK: 8/8 RISCV_TARGET=srv32 RISCV_DEVICE=rv32im RISCV_ISA=rv32im OK: 6/6 RISCV_TARGET=srv32 RISCV_DEVICE=rv32Zicsr RISCV_ISA=rv32Zicsr -------------------------------- $ make tests ================================ OK: 48/48 RISCV_TARGET=srv32 RISCV_DEVICE=rv32i RISCV_ISA=rv32i OK: 8/8 RISCV_TARGET=srv32 RISCV_DEVICE=rv32im RISCV_ISA=rv32im OK: 6/6 RISCV_TARGET=srv32 RISCV_DEVICE=rv32Zicsr RISCV_ISA=rv32Zicsr ``` ## Assignment1 - MoveZeroes * Rewrite my source code: modify int to **volatile int**. * mkdir hw3 in sw, and put hw3.c and Makefile in. :::spoiler Code ```C= #include <stdio.h> #include <stdlib.h> void moveZeroes(volatile int nums[], volatile int numsSize){ volatile int index=0; for(volatile int i = 0; i < numsSize; ++i){ if(nums[i]) nums[index++] = nums[i]; } while(index < numsSize) nums[index++] = 0; } int main() { volatile int nums[] = {0,1,0,3,12}; volatile int numsSize = 5; printf("\nBefore move zeroes = "); volatile int i; for(i = 0; i < numsSize; ++i){ printf("%d ", nums[i]); } moveZeroes(nums, numsSize); printf("\nAfter move zeroes = "); for(i = 0; i < numsSize; ++i){ printf("%d ", nums[i]); } } ``` ::: ``` $ cd $HOME/srv32 $ make hw3 make[2]: Leaving directory '/home/jeff/srv32/sw/hw3' make[1]: Leaving directory '/home/jeff/srv32/sw' make[1]: Entering directory '/home/jeff/srv32/sim' Before move zeroes = 0 1 0 3 12 After move zeroes = 1 3 12 0 0 Excuting 9454 instructions, 12126 cycles, 1.282 CPI Program terminate - ../rtl/../testbench/testbench.v:418: Verilog $finish Simulation statistics ===================== Simulation time : 0.094 s Simulation cycles: 12137 Simulation speed : 0.129117 MHz make[1]: Leaving directory '/home/jeff/srv32/sim' make[1]: Entering directory '/home/jeff/srv32/tools' ./rvsim --memsize 128 -l trace.log ../sw/hw3/hw3.elf Before move zeroes = 0 1 0 3 12 After move zeroes = 1 3 12 0 0 Excuting 9454 instructions, 12126 cycles, 1.283 CPI Program terminate Simulation statistics ===================== Simulation time : 0.005 s Simulation cycles: 12126 Simulation speed : 2.680 MHz make[1]: Leaving directory '/home/jeff/srv32/tools' ``` ## Waveform :::spoiler MoveZeroes Assembly Code Generated By SRV32 ```C= 0000003c <moveZeroes>: 3c: fe010113 addi sp,sp,-32 40: 00b12623 sw a1,12(sp) 44: 00012c23 sw zero,24(sp) 48: 00012e23 sw zero,28(sp) 4c: 01c12703 lw a4,28(sp) 50: 00c12783 lw a5,12(sp) 54: 06f75e63 bge a4,a5,d0 <moveZeroes+0x94> 58: 01c12783 lw a5,28(sp) 5c: 00279793 slli a5,a5,0x2 60: 00f507b3 add a5,a0,a5 64: 0007a783 lw a5,0(a5) 68: 02078663 beqz a5,94 <moveZeroes+0x58> 6c: 01c12703 lw a4,28(sp) 70: 01812783 lw a5,24(sp) 74: 00271713 slli a4,a4,0x2 78: 00178693 addi a3,a5,1 7c: 00d12c23 sw a3,24(sp) 80: 00e50733 add a4,a0,a4 84: 00072703 lw a4,0(a4) 88: 00279793 slli a5,a5,0x2 8c: 00f507b3 add a5,a0,a5 90: 00e7a023 sw a4,0(a5) 94: 01c12783 lw a5,28(sp) 98: 00178793 addi a5,a5,1 9c: 00f12e23 sw a5,28(sp) a0: 01c12703 lw a4,28(sp) a4: 00c12783 lw a5,12(sp) a8: faf748e3 blt a4,a5,58 <moveZeroes+0x1c> ac: 01812703 lw a4,24(sp) b0: 00c12783 lw a5,12(sp) b4: 02f75463 bge a4,a5,dc <moveZeroes+0xa0> b8: 01812783 lw a5,24(sp) bc: 00178713 addi a4,a5,1 c0: 00279793 slli a5,a5,0x2 c4: 00e12c23 sw a4,24(sp) c8: 00f507b3 add a5,a0,a5 cc: 0007a023 sw zero,0(a5) d0: 01812703 lw a4,24(sp) d4: 00c12783 lw a5,12(sp) d8: fef740e3 blt a4,a5,b8 <moveZeroes+0x7c> dc: 02010113 addi sp,sp,32 e0: 00008067 ret ``` ::: ``` $ sudo apt install gtkwave $ gtkwave Load the file $HOME/srv32/sim/wave.fst. ``` * This core is three-stage pipeline processors, which is IF/ID, EXE and WB. * Branch is taken during the EXE, it needs to flush two instructions that have been fetched into the pipeline, which causes a delay. ![branch_penality](https://i.imgur.com/8JWLA4c.png) In my MoveZeroes program, we focus on branch instruction. When encountering the branch, if the branch jump, it needs to flush the instructions that have been fetched into the pipeline. ``` 68: 02078663 beqz a5,94 <moveZeroes+0x58> 6c: 01c12703 lw a4,28(sp) 70: 01812783 lw a5,24(sp) . . 94: 01c12783 lw a5,28(sp) . ``` The instruction **beqz a5,94** takes the branch, so the instructions at 6c and 70 need to flush and fetch the instruction at 94. ![68-94wave](https://i.imgur.com/fsAuckU.png) |Address| Instruction | Cycle1 | Cycle2 |Cycle3 | Cycle4 | Cycle5 |Cycle6 | |-------| -------- | -------- | -------- |-------- |-------- |-------- |-------- | |68 | beqz a5,94 | IF/ID | EXE |WB | | | | |6c | lw a4,28(sp)| | ~~IF/ID~~ |**NOP** |**NOP** | | | |70 | lw a5,24(sp)| | |~~IF/ID~~ |**NOP** | | | |94 | lw a5,28(sp)| | | |IF/ID |EXE |WB | The jump instruction is similar like the branch instruction, it also needs to stall the two following instructions. ![jumpWave](https://i.imgur.com/4F53oRE.png) ``` 0000003c <moveZeroes>: 3c: fe010113 addi sp,sp,-32 . . . 184: eb9ff0ef jal ra,3c <moveZeroes> 188: 00020537 lui a0,0x20 18c: 05850513 addi a0,a0,88 # 20058 <__malloc_trim_threshold+0x20> ``` ## Software Optimizations Because encountering the branch and the jump instruction will cause a delay of two instructions. So if we want to eliminate stalls, and get shorter cycle counts, we can do the loop-unrolling to reach it. * First, we know the size of the array, so we can loop the array without for-loop. * Second, we move our MoveZeroes function into the main function, which can reduce the jump instruction. :::spoiler optimal_hw3 Code ```C= #include <stdio.h> volatile int main() { volatile int nums[] = {0,1,0,3,12}; volatile int numsSize = 5; volatile int i; volatile int index=0; printf("\nBefore move zeroes = "); for(i = 0; i < numsSize; ++i){ printf("%d ", nums[i]); } if(nums[0]){nums[index++] = nums[0];} if(nums[1]){nums[index++] = nums[1];} if(nums[2]){nums[index++] = nums[2];} if(nums[3]){nums[index++] = nums[3];} if(nums[4]){nums[index++] = nums[4];} while(index < numsSize){ nums[index++] = 0; } printf("\nAfter move zeroes = "); for(i = 0; i < numsSize; ++i){ printf("%d ", nums[i]); } printf("\n"); } ``` ::: After optimization, the total instruction and cycle counts are as follows. ``` ake[2]: Leaving directory '/home/jeff/srv32/sw/optimal_hw3' make[1]: Leaving directory '/home/jeff/srv32/sw' make[1]: Entering directory '/home/jeff/srv32/sim' Before move zeroes = 0 1 0 3 12 After move zeroes = 1 3 12 0 0 Excuting 9109 instructions, 11673 cycles, 1.281 CPI Program terminate - ../rtl/../testbench/testbench.v:418: Verilog $finish Simulation statistics ===================== Simulation time : 0.086 s Simulation cycles: 11684 Simulation speed : 0.13586 MHz make[1]: Leaving directory '/home/jeff/srv32/sim' make[1]: Entering directory '/home/jeff/srv32/tools' ./rvsim --memsize 128 -l trace.log ../sw/optimal_hw3/optimal_hw3.elf Before move zeroes = 0 1 0 3 12 After move zeroes = 1 3 12 0 0 Excuting 9109 instructions, 11673 cycles, 1.281 CPI Program terminate Simulation statistics ===================== Simulation time : 0.004 s Simulation cycles: 11673 Simulation speed : 2.904 MHz make[1]: Leaving directory '/home/jeff/srv32/tools' Compare the trace between RTL and ISS simulator === Simulation passed === ``` * We reduced the cycle counts from 12126 to 11673, a total of 453 cycles.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.