# Assignment2: RISC-V Toolchain
contributed by < [`dhiptmc`](https://github.com/dhiptmc) >
###### tags: `RISC-V` `Computer Architecture 2022`
## Problem
Problem was chosen from the work done by [王昱承](https://hackmd.io/@zKOCm3SSTKyUyiPV-nfEjw/ryC8TgjZi) in Assignment 1.
* Running Sum of 1d Array([LeetCode1480](https://leetcode.com/problems/running-sum-of-1d-array/))
>Given an array nums. We define a running sum of an array as runningSum[i] = sum(nums[0]…nums[i]).
>
>Return the running sum of nums.
* Motivation:
In the [last assignment](https://hackmd.io/@XiaXia/ComputerArchitecture_Hw1), I spent lots of time dealing with arrays.
I would like to examine whether I can make use of the experiences learnt from previous attempts.
#### Example 1:
```
Input: nums = [1,2,3,4]
Output: [1,3,6,10]
Explanation: Running sum is obtained as follows: [1, 1+2, 1+2+3, 1+2+3+4].
```
#### Example 2:
```
Input: nums = [1,1,1,1,1]
Output: [1,2,3,4,5]
Explanation: Running sum is obtained as follows: [1, 1+1, 1+1+1, 1+1+1+1, 1+1+1+1+1].
```
#### Example 3:
```
Input: nums = [3,1,2,10,1]
Output: [3,4,6,16,17]
```
## Solution
The solution is based on the work done by [王昱承](https://hackmd.io/@zKOCm3SSTKyUyiPV-nfEjw/ryC8TgjZi).
>Initially, pass the first element, i.e., nums[0], to result[0]. Add the result[i-1] element with nums[i], then pass the value to the result.
>
>Return the result array eventually.
## Implementation
### C code
```clike=
#include <stdlib.h>
#include <stdio.h>
int* runningSum(int* nums, int numsSize){
int* result = (int*)malloc(sizeof(int)*numsSize);
for(int i=0; i<numsSize; i++){
if(i==0)
result[0] = nums[0];
else
result[i] = result[i-1] + nums[i];
}
return result;
}
int main(int argc, char *argv[]){
int nums1[5] = {1,2,3,4,5};
int nums2[6] = {1,3,5,7,9,11};
int nums3[7] = {0,2,4,6,8,10,12};
int* result1 = runningSum(nums1, 5);
int* result2 = runningSum(nums2, 6);
int* result3 = runningSum(nums3, 7);
for(int j=0; j<5; j++){
printf("%d ", result1[j]);
}
printf("\n");
for(int j=0; j<6; j++){
printf("%d ", result2[j]);
}
printf("\n");
for(int j=0; j<7; j++){
printf("%d ", result3[j]);
}
printf("\n");
}
```
### Assembly code
``` clike=
.data
nums1: .word 1,2,3,4,5
nums2: .word 1,3,5,7,9,11
nums3: .word 0,2,4,6,8,10,12
res1: .word 0,0,0,0,0,
res2: .word 0,0,0,0,0,0
res3: .word 0,0,0,0,0,0,0
space: .string " "
nl: .string "\n"
.text
main: # t2: nums head, t3: result head, t4: numsSize
la t2, nums1 # load arr base to t2
la t3, res1 # load res base to t3
li t4, 5 # load the numSize to t4
jal ra, runningSum
la t2, nums2 # load arr base to t2
la t3, res2 # load res base to t3
li t4, 6 # load the numSize to t4
jal ra, runningSum
la t2, nums3 # load arr base to t2
la t3, res3 # load res base to t3
li t4, 7 # load the numSize to t4
jal ra, runningSum
j exit
runningSum: # t5: i
li t5, 1 # load i with 1
lw a3, 0(t2) # load nums[0] to a3
sw a3, 0(t3) # save num[0] to result[0]
end.L1:
slli a0, t5, 2 # i*4 (cause of byte size)
add a1, t3, a0 # add result's address with i*4 to get result[i]'s address
lw a3, -4(a1) # load result[i-1] to a3
add a2, t2, a0 # add nums's address with i*4 to get nums[i]'s address
lw a2, 0(a2) # load nums[i] to a5
add a3, a3, a2 # add result[i-1] + nums[i]
sw a3, 0(a1) # save to result[i]
addi t5, t5, 1 # i = i + 1
blt t5, t4, end.L1 # if i < numsSize then branch to end.L1
li t5, 0
print:
slli a1, t5, 2
add a1, t3, a1
lw a0, 0(a1) # load result out
li a7, 1 # pint a0
ecall
la a0, space # load space
li a7, 4 # print string
ecall
addi t5, t5, 1 # j++
blt t5, t4, print # loop
la a0, nl # load next line
li a7, 4 # print string
ecall
jr ra
exit:
li a7, 10 # exit
ecall
```
:::info
There are possible ways to use fewer instructions.
Modified version of Assembly code is shown below.
``` clike=
.data
nums1: .word 1,2,3,4,5
nums2: .word 1,3,5,7,9,11
nums3: .word 0,2,4,6,8,10,12
res1: .word 0,0,0,0,0,
res2: .word 0,0,0,0,0,0
res3: .word 0,0,0,0,0,0,0
space: .string " "
nl: .string "\n"
.text
main: # t2: nums head, t3: result head, t4: numsSize
la t2, nums1 # load arr base to t2
la t3, res1 # load res base to t3
li t4, 5 # load the numSize to t4
jal ra, runningSum
la t2, nums2 # load arr base to t2
la t3, res2 # load res base to t3
li t4, 6 # load the numSize to t4
jal ra, runningSum
la t2, nums3 # load arr base to t2
la t3, res3 # load res base to t3
li t4, 7 # load the numSize to t4
jal ra, runningSum
j exit
runningSum: # t5: i, t6: current result addr
add t6, t3, zero
li t5, 1 # load i with 1
lw a3, 0(t2) # load nums[0] to a3
sw a3, 0(t6) # save nums[0] to result[0]
loop:
addi t2, t2, 4 # nums[i] addr
lw a2, 0(t2) # nums[i]
add a3, a3, a2 # add result[i-1] + nums[i]
addi t6, t6, 4 # result[i]
addi t5, t5, 1 # i = i + 1
sw a3, 0(t6) # save result[0] + nums[1] to result[1]
blt t5, t4, loop # if i < numsSize then branch to loop
print:
lw a0, 0(t3) # load result out
li a7, 1 # pint a0
ecall
la a0, space # load space
li a7, 4 # print string
ecall
addi t3, t3, 4
addi t5, t5, -1 # i--
blt zero, t5, print
la a0, nl # load next line
li a7, 4 # print string
ecall
jr ra
exit:
li a7, 10 # exit
ecall
```
As a result, clock cycle as the figure shown below decreases compared to the original work.
Origin:

After modified:

:::
## Environment setup
Following the instructions of [Lab2: RISC-V RV32I[MACF] emulator with ELF support](https://hackmd.io/@sysprog/SJAR5XMmi),
```shell
$ git clone https://github.com/sysprog21/rv32emu
$ cd rv32emu
$ make
```
however, there is an error occured:
```shell
/bin/sh: 1: cc: not found
/bin/sh: 1: cc: not found
CC build/map.o
make: cc: No such file or directory
make: *** [Makefile:110: build/map.o] Error 127
```
Credit to TA(yucheng871011@gmail.com) who help solving this problem for me. That is, some tools should be installed first.
```shell
$ sudo apt install build-essential
```
## Instructions
* Compile
```shell
$ riscv-none-elf-gcc -march=rv32i -mabi=ilp32 -compilation options(e.g., -Ofast, -Os, etc.) -o output input.c
```
* Objdump
```shell
$ riscv-none-elf-objdump -d output > output.txt
```
* Readelf
```shell
$ riscv-none-elf-readelf -h output
```
* Size
```shell
$ riscv-none-elf-size output
```
* Execute and Statistics of execution
```shell
~/rv32emu$ build/rv32emu --stats ./Hw2/output
```
## Compare Assembly code
In this section, C code for the problem is compiled with different compiler optimization level (-Os to -Ofast).
The assembly code is lengthy and hard to read. We focus on the main section of the code to better demonstrate the effect of compiler optimization.
### Handwritten Assembly
``` clike=
runningSum: # t5: i, t6: current result addr
add t6, t3, zero
li t5, 1 # load i with 1
lw a3, 0(t2) # load nums[0] to a3
sw a3, 0(t6) # save nums[0] to result[0]
loop:
addi t2, t2, 4 # nums[i] addr
lw a2, 0(t2) # nums[i]
add a3, a3, a2 # add result[i-1] + nums[i]
addi t6, t6, 4 # result[i]
addi t5, t5, 1 # i = i + 1
sw a3, 0(t6) # save result[0] + nums[1] to result[1]
blt t5, t4, loop # if i < numsSize then branch to loop
```
#### Observation
* LOC(Line of code): 13
* Register used:
* *tx* registers: **t2**, **t3**, **t4**, **t5**, **t6**
* *ax* registers: **a2**, **a3**
* *sx* registers: none
* *others*: none
* Stack used(bytes): 0 bytes
* Number of lw and sw: lw=2 sw=2
* Number of branch/jump instructions: 1
### -Os
``` clike=
00010290 <runningSum>:
10290: ff010113 addi sp,sp,-16
10294: 00812423 sw s0,8(sp)
10298: 00912223 sw s1,4(sp)
1029c: 00058413 mv s0,a1
102a0: 00050493 mv s1,a0
102a4: 00259513 slli a0,a1,0x2
102a8: 00112623 sw ra,12(sp)
102ac: 158000ef jal ra,10404 <malloc>
102b0: 00805e63 blez s0,102cc <runningSum+0x3c>
102b4: 0004a783 lw a5,0(s1)
102b8: 00050713 mv a4,a0
102bc: 00f52023 sw a5,0(a0)
102c0: 00000793 li a5,0
102c4: 00178793 addi a5,a5,1
102c8: 0087cc63 blt a5,s0,102e0 <runningSum+0x50>
102cc: 00c12083 lw ra,12(sp)
102d0: 00812403 lw s0,8(sp)
102d4: 00412483 lw s1,4(sp)
102d8: 01010113 addi sp,sp,16
102dc: 00008067 ret
102e0: 00279693 slli a3,a5,0x2
102e4: 00d486b3 add a3,s1,a3
102e8: 0006a683 lw a3,0(a3)
102ec: 00072603 lw a2,0(a4)
102f0: 00470713 addi a4,a4,4
102f4: 00c686b3 add a3,a3,a2
102f8: 00d72023 sw a3,0(a4)
102fc: fc9ff06f j 102c4 <runningSum+0x34>
```
#### Observation
* LOC(Line of code): 29
* Register used:
* *tx* registers: none
* *ax* registers: **a0**, **a1**, **a2**, **a3**, **a4**, **a5**
* *sx* registers: **s0**, **s1**
* *others*: **sp**, **ra**
* Stack used(bytes): 16 bytes
* Number of lw and sw: lw=6 sw=5
* Number of branch/jump instructions: 5
1. ELF file header:
```
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x101e8
Start of program headers: 52 (bytes into file)
Start of section headers: 95232 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
```
2. Size:
```
text data bss dec hex filename
75512 2816 812 79140 13524 Hw2_s
```
3. Execute and Statistics of execution
```
1 3 6 10 15
1 4 9 16 25 36
0 2 6 12 20 30 42
inferior exit code 0
CSR cycle count: 16503
```
### -O0
``` clike=
00010184 <runningSum>:
10184: fd010113 addi sp,sp,-48
10188: 02112623 sw ra,44(sp)
1018c: 02812423 sw s0,40(sp)
10190: 03010413 addi s0,sp,48
10194: fca42e23 sw a0,-36(s0)
10198: fcb42c23 sw a1,-40(s0)
1019c: fd842783 lw a5,-40(s0)
101a0: 00279793 slli a5,a5,0x2
101a4: 00078513 mv a0,a5
101a8: 3a8000ef jal ra,10550 <malloc>
101ac: 00050793 mv a5,a0
101b0: fef42423 sw a5,-24(s0)
101b4: fe042623 sw zero,-20(s0)
101b8: 0780006f j 10230 <runningSum+0xac>
101bc: fec42783 lw a5,-20(s0)
101c0: 00079c63 bnez a5,101d8 <runningSum+0x54>
101c4: fdc42783 lw a5,-36(s0)
101c8: 0007a703 lw a4,0(a5)
101cc: fe842783 lw a5,-24(s0)
101d0: 00e7a023 sw a4,0(a5)
101d4: 0500006f j 10224 <runningSum+0xa0>
101d8: fec42703 lw a4,-20(s0)
101dc: 400007b7 lui a5,0x40000
101e0: fff78793 addi a5,a5,-1 # 3fffffff <__BSS_END__+0x3ffdc1cf>
101e4: 00f707b3 add a5,a4,a5
101e8: 00279793 slli a5,a5,0x2
101ec: fe842703 lw a4,-24(s0)
101f0: 00f707b3 add a5,a4,a5
101f4: 0007a683 lw a3,0(a5)
101f8: fec42783 lw a5,-20(s0)
101fc: 00279793 slli a5,a5,0x2
10200: fdc42703 lw a4,-36(s0)
10204: 00f707b3 add a5,a4,a5
10208: 0007a703 lw a4,0(a5)
1020c: fec42783 lw a5,-20(s0)
10210: 00279793 slli a5,a5,0x2
10214: fe842603 lw a2,-24(s0)
10218: 00f607b3 add a5,a2,a5
1021c: 00e68733 add a4,a3,a4
10220: 00e7a023 sw a4,0(a5)
10224: fec42783 lw a5,-20(s0)
10228: 00178793 addi a5,a5,1
1022c: fef42623 sw a5,-20(s0)
10230: fec42703 lw a4,-20(s0)
10234: fd842783 lw a5,-40(s0)
10238: f8f742e3 blt a4,a5,101bc <runningSum+0x38>
1023c: fe842783 lw a5,-24(s0)
10240: 00078513 mv a0,a5
10244: 02c12083 lw ra,44(sp)
10248: 02812403 lw s0,40(sp)
1024c: 03010113 addi sp,sp,48
10250: 00008067 ret
```
#### Observation
* LOC(Line of code): 53
* Register used:
* *tx* registers: none
* *ax* registers: **a0**, **a1**, **a2**, **a3**, **a4**, **a5**
* *sx* registers: **s0**
* *others*: **sp**, **ra**, **zero**
* Stack used(bytes): 48 bytes
* Number of lw and sw: lw=19 sw=9
* Number of branch/jump instructions: 6
1. ELF file header:
```
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x100dc
Start of program headers: 52 (bytes into file)
Start of section headers: 95216 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
```
2. Size:
```
text data bss dec hex filename
75844 2816 812 79472 13670 Hw2_0
```
3. Execute and Statistics of execution
```
1 3 6 10 15
1 4 9 16 25 36
0 2 6 12 20 30 42
inferior exit code 0
CSR cycle count: 16889
```
### -O1
``` clike=
00010184 <runningSum>:
10184: ff010113 addi sp,sp,-16
10188: 00112623 sw ra,12(sp)
1018c: 00812423 sw s0,8(sp)
10190: 00912223 sw s1,4(sp)
10194: 00050493 mv s1,a0
10198: 00058413 mv s0,a1
1019c: 00259513 slli a0,a1,0x2
101a0: 2e0000ef jal ra,10480 <malloc>
101a4: 04805263 blez s0,101e8 <runningSum+0x64>
101a8: 00050713 mv a4,a0
101ac: 00048693 mv a3,s1
101b0: 00000793 li a5,0
101b4: 01c0006f j 101d0 <runningSum+0x4c>
101b8: 0004a603 lw a2,0(s1)
101bc: 00c52023 sw a2,0(a0)
101c0: 00178793 addi a5,a5,1
101c4: 00470713 addi a4,a4,4
101c8: 00468693 addi a3,a3,4
101cc: 00f40e63 beq s0,a5,101e8 <runningSum+0x64>
101d0: fe0784e3 beqz a5,101b8 <runningSum+0x34>
101d4: ffc72603 lw a2,-4(a4)
101d8: 0006a583 lw a1,0(a3)
101dc: 00b60633 add a2,a2,a1
101e0: 00c72023 sw a2,0(a4)
101e4: fddff06f j 101c0 <runningSum+0x3c>
101e8: 00c12083 lw ra,12(sp)
101ec: 00812403 lw s0,8(sp)
101f0: 00412483 lw s1,4(sp)
101f4: 01010113 addi sp,sp,16
101f8: 00008067 ret
```
#### Observation
* LOC(Line of code): 31
* Register used:
* *tx* registers: none
* *ax* registers: **a0**, **a1**, **a2**, **a3**, **a4**, **a5**
* *sx* registers: **s0**, **s1**
* *others*: **sp**, **ra**
* Stack used(bytes): 16 bytes
* Number of lw and sw: lw=6 sw=5
* Number of branch/jump instructions: 7
1. ELF file header:
```
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x100dc
Start of program headers: 52 (bytes into file)
Start of section headers: 95216 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
```
2. Size:
```
text data bss dec hex filename
75636 2816 812 79264 135a0 Hw2_1
```
3. Execute and Statistics of execution
```
1 3 6 10 15
1 4 9 16 25 36
0 2 6 12 20 30 42
inferior exit code 0
CSR cycle count: 16406
```
### -O2
``` clike=
000102f0 <runningSum>:
102f0: ff010113 addi sp,sp,-16
102f4: 00812423 sw s0,8(sp)
102f8: 00912223 sw s1,4(sp)
102fc: 00050413 mv s0,a0
10300: 00058493 mv s1,a1
10304: 00259513 slli a0,a1,0x2
10308: 00112623 sw ra,12(sp)
1030c: 158000ef jal ra,10464 <malloc>
10310: 02905e63 blez s1,1034c <runningSum+0x5c>
10314: 00042683 lw a3,0(s0)
10318: 00040713 mv a4,s0
1031c: 00050793 mv a5,a0
10320: 00000613 li a2,0
10324: 00d52023 sw a3,0(a0)
10328: 0140006f j 1033c <runningSum+0x4c>
1032c: ffc7a683 lw a3,-4(a5)
10330: 00072803 lw a6,0(a4)
10334: 010686b3 add a3,a3,a6
10338: 00d7a023 sw a3,0(a5)
1033c: 00160613 addi a2,a2,1
10340: 00478793 addi a5,a5,4
10344: 00470713 addi a4,a4,4
10348: fec492e3 bne s1,a2,1032c <runningSum+0x3c>
1034c: 00c12083 lw ra,12(sp)
10350: 00812403 lw s0,8(sp)
10354: 00412483 lw s1,4(sp)
10358: 01010113 addi sp,sp,16
1035c: 00008067 ret
```
#### Observation
* LOC(Line of code): 29
* Register used:
* *tx* registers: none
* *ax* registers: **a0**, **a1**, **a2**, **a3**, **a4**, **a5**, **a6**
* *sx* registers: **s0**, **s1**
* *others*: **sp**, **ra**
* Stack used(bytes): 16 bytes
* Number of lw and sw: lw=6 sw=5
* Number of branch/jump instructions: 5
1. ELF file header:
```
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x10248
Start of program headers: 52 (bytes into file)
Start of section headers: 95232 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
```
2. Size:
```
text data bss dec hex filename
75608 2816 812 79236 13584 Hw2_2
```
3. Execute and Statistics of execution
```
1 3 6 10 15
1 4 9 16 25 36
0 2 6 12 20 30 42
inferior exit code 0
CSR cycle count: 16401
```
### -Ofast
``` clike=
000102f0 <runningSum>:
102f0: ff010113 addi sp,sp,-16
102f4: 00812423 sw s0,8(sp)
102f8: 00912223 sw s1,4(sp)
102fc: 00050413 mv s0,a0
10300: 00058493 mv s1,a1
10304: 00259513 slli a0,a1,0x2
10308: 00112623 sw ra,12(sp)
1030c: 15c000ef jal ra,10468 <malloc>
10310: 04905063 blez s1,10350 <runningSum+0x60>
10314: 00042683 lw a3,0(s0)
10318: 00100713 li a4,1
1031c: 00450793 addi a5,a0,4
10320: 00d52023 sw a3,0(a0)
10324: 00440693 addi a3,s0,4
10328: 02e48463 beq s1,a4,10350 <runningSum+0x60>
1032c: 00100613 li a2,1
10330: ffc7a703 lw a4,-4(a5)
10334: 0006a803 lw a6,0(a3)
10338: 00478793 addi a5,a5,4
1033c: 00160613 addi a2,a2,1
10340: 01070733 add a4,a4,a6
10344: fee7ae23 sw a4,-4(a5)
10348: 00468693 addi a3,a3,4
1034c: fec492e3 bne s1,a2,10330 <runningSum+0x40>
10350: 00c12083 lw ra,12(sp)
10354: 00812403 lw s0,8(sp)
10358: 00412483 lw s1,4(sp)
1035c: 01010113 addi sp,sp,16
10360: 00008067 ret
```
#### Observation
* LOC(Line of code): 30
* Register used:
* *tx* registers: none
* *ax* registers: **a0**, **a1**, **a2**, **a3**, **a4**, **a5**, **a6**
* *sx* registers: **s0**, **s1**
* *others*: **sp**, **ra**
* Stack used(bytes): 16 bytes
* Number of lw and sw: lw=6 sw=5
* Number of branch/jump instructions: 5
1. ELF file header:
```
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x10248
Start of program headers: 52 (bytes into file)
Start of section headers: 95232 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
```
2. Size:
```
text data bss dec hex filename
75612 2816 812 79240 13588 Hw2_f
```
3. Execute and Statistics of execution
```
1 3 6 10 15
1 4 9 16 25 36
0 2 6 12 20 30 42
inferior exit code 0
CSR cycle count: 16395
```
## Summary: Observed Compiler Behavior
:::info
* From `-O0` to `-O1`
* Additional usage of **s1** which significantly decreases lw/sw count and stack space.
* From `-O1` to `-O2`
* Unroll the loop to avoid branch misprediction.
* From `-O2` to `-Ofast`
* Nearly identical.
:::
| | -Os | -O0 | -O1 | -O2 | -Ofast |
|--------------------|-----------|-----------|-----------|-----------|--------------|
| Line of Code | 29 | 53 | 31 | 29 | 30 |
| Register used | 10 | 10 | 10 | 11 | 11 |
| Stack used (bytes) | 16 | 48 | 16 | 16 | 16 |
| lw/sw count | 11 | 28 | 11 | 11 | 11 |
| branch/jump count | 5 | 6 | 7 | 5 | 5 |
| CSR cycle count | 16503 | 16889 | 16406 | 16401 | 16395 |
* Size among different optimization levels:
While the data and bss section sizes share the same value across all different optimization levels, we can obtain the tendency in other sections.



* Cycle count among different optimization levels:
