# Assignment2: RISC-V Toolchain contributed by < [`dhiptmc`](https://github.com/dhiptmc) > ###### tags: `RISC-V` `Computer Architecture 2022` ## Problem Problem was chosen from the work done by [王昱承](https://hackmd.io/@zKOCm3SSTKyUyiPV-nfEjw/ryC8TgjZi) in Assignment 1. * Running Sum of 1d Array([LeetCode1480](https://leetcode.com/problems/running-sum-of-1d-array/)) >Given an array nums. We define a running sum of an array as runningSum[i] = sum(nums[0]…nums[i]). > >Return the running sum of nums. * Motivation: In the [last assignment](https://hackmd.io/@XiaXia/ComputerArchitecture_Hw1), I spent lots of time dealing with arrays. I would like to examine whether I can make use of the experiences learnt from previous attempts. #### Example 1: ``` Input: nums = [1,2,3,4] Output: [1,3,6,10] Explanation: Running sum is obtained as follows: [1, 1+2, 1+2+3, 1+2+3+4]. ``` #### Example 2: ``` Input: nums = [1,1,1,1,1] Output: [1,2,3,4,5] Explanation: Running sum is obtained as follows: [1, 1+1, 1+1+1, 1+1+1+1, 1+1+1+1+1]. ``` #### Example 3: ``` Input: nums = [3,1,2,10,1] Output: [3,4,6,16,17] ``` ## Solution The solution is based on the work done by [王昱承](https://hackmd.io/@zKOCm3SSTKyUyiPV-nfEjw/ryC8TgjZi). >Initially, pass the first element, i.e., nums[0], to result[0]. Add the result[i-1] element with nums[i], then pass the value to the result. > >Return the result array eventually. ## Implementation ### C code ```clike= #include <stdlib.h> #include <stdio.h> int* runningSum(int* nums, int numsSize){ int* result = (int*)malloc(sizeof(int)*numsSize); for(int i=0; i<numsSize; i++){ if(i==0) result[0] = nums[0]; else result[i] = result[i-1] + nums[i]; } return result; } int main(int argc, char *argv[]){ int nums1[5] = {1,2,3,4,5}; int nums2[6] = {1,3,5,7,9,11}; int nums3[7] = {0,2,4,6,8,10,12}; int* result1 = runningSum(nums1, 5); int* result2 = runningSum(nums2, 6); int* result3 = runningSum(nums3, 7); for(int j=0; j<5; j++){ printf("%d ", result1[j]); } printf("\n"); for(int j=0; j<6; j++){ printf("%d ", result2[j]); } printf("\n"); for(int j=0; j<7; j++){ printf("%d ", result3[j]); } printf("\n"); } ``` ### Assembly code ``` clike= .data nums1: .word 1,2,3,4,5 nums2: .word 1,3,5,7,9,11 nums3: .word 0,2,4,6,8,10,12 res1: .word 0,0,0,0,0, res2: .word 0,0,0,0,0,0 res3: .word 0,0,0,0,0,0,0 space: .string " " nl: .string "\n" .text main: # t2: nums head, t3: result head, t4: numsSize la t2, nums1 # load arr base to t2 la t3, res1 # load res base to t3 li t4, 5 # load the numSize to t4 jal ra, runningSum la t2, nums2 # load arr base to t2 la t3, res2 # load res base to t3 li t4, 6 # load the numSize to t4 jal ra, runningSum la t2, nums3 # load arr base to t2 la t3, res3 # load res base to t3 li t4, 7 # load the numSize to t4 jal ra, runningSum j exit runningSum: # t5: i li t5, 1 # load i with 1 lw a3, 0(t2) # load nums[0] to a3 sw a3, 0(t3) # save num[0] to result[0] end.L1: slli a0, t5, 2 # i*4 (cause of byte size) add a1, t3, a0 # add result's address with i*4 to get result[i]'s address lw a3, -4(a1) # load result[i-1] to a3 add a2, t2, a0 # add nums's address with i*4 to get nums[i]'s address lw a2, 0(a2) # load nums[i] to a5 add a3, a3, a2 # add result[i-1] + nums[i] sw a3, 0(a1) # save to result[i] addi t5, t5, 1 # i = i + 1 blt t5, t4, end.L1 # if i < numsSize then branch to end.L1 li t5, 0 print: slli a1, t5, 2 add a1, t3, a1 lw a0, 0(a1) # load result out li a7, 1 # pint a0 ecall la a0, space # load space li a7, 4 # print string ecall addi t5, t5, 1 # j++ blt t5, t4, print # loop la a0, nl # load next line li a7, 4 # print string ecall jr ra exit: li a7, 10 # exit ecall ``` :::info There are possible ways to use fewer instructions. Modified version of Assembly code is shown below. ``` clike= .data nums1: .word 1,2,3,4,5 nums2: .word 1,3,5,7,9,11 nums3: .word 0,2,4,6,8,10,12 res1: .word 0,0,0,0,0, res2: .word 0,0,0,0,0,0 res3: .word 0,0,0,0,0,0,0 space: .string " " nl: .string "\n" .text main: # t2: nums head, t3: result head, t4: numsSize la t2, nums1 # load arr base to t2 la t3, res1 # load res base to t3 li t4, 5 # load the numSize to t4 jal ra, runningSum la t2, nums2 # load arr base to t2 la t3, res2 # load res base to t3 li t4, 6 # load the numSize to t4 jal ra, runningSum la t2, nums3 # load arr base to t2 la t3, res3 # load res base to t3 li t4, 7 # load the numSize to t4 jal ra, runningSum j exit runningSum: # t5: i, t6: current result addr add t6, t3, zero li t5, 1 # load i with 1 lw a3, 0(t2) # load nums[0] to a3 sw a3, 0(t6) # save nums[0] to result[0] loop: addi t2, t2, 4 # nums[i] addr lw a2, 0(t2) # nums[i] add a3, a3, a2 # add result[i-1] + nums[i] addi t6, t6, 4 # result[i] addi t5, t5, 1 # i = i + 1 sw a3, 0(t6) # save result[0] + nums[1] to result[1] blt t5, t4, loop # if i < numsSize then branch to loop print: lw a0, 0(t3) # load result out li a7, 1 # pint a0 ecall la a0, space # load space li a7, 4 # print string ecall addi t3, t3, 4 addi t5, t5, -1 # i-- blt zero, t5, print la a0, nl # load next line li a7, 4 # print string ecall jr ra exit: li a7, 10 # exit ecall ``` As a result, clock cycle as the figure shown below decreases compared to the original work. Origin: ![](https://i.imgur.com/9eUYD6b.png "Original") After modified: ![](https://i.imgur.com/R0PkcSK.png "Modified") ::: ## Environment setup Following the instructions of [Lab2: RISC-V RV32I[MACF] emulator with ELF support](https://hackmd.io/@sysprog/SJAR5XMmi), ```shell $ git clone https://github.com/sysprog21/rv32emu $ cd rv32emu $ make ``` however, there is an error occured: ```shell /bin/sh: 1: cc: not found /bin/sh: 1: cc: not found CC build/map.o make: cc: No such file or directory make: *** [Makefile:110: build/map.o] Error 127 ``` Credit to TA(yucheng871011@gmail.com) who help solving this problem for me. That is, some tools should be installed first. ```shell $ sudo apt install build-essential ``` ## Instructions * Compile ```shell $ riscv-none-elf-gcc -march=rv32i -mabi=ilp32 -compilation options(e.g., -Ofast, -Os, etc.) -o output input.c ``` * Objdump ```shell $ riscv-none-elf-objdump -d output > output.txt ``` * Readelf ```shell $ riscv-none-elf-readelf -h output ``` * Size ```shell $ riscv-none-elf-size output ``` * Execute and Statistics of execution ```shell ~/rv32emu$ build/rv32emu --stats ./Hw2/output ``` ## Compare Assembly code In this section, C code for the problem is compiled with different compiler optimization level (-Os to -Ofast). The assembly code is lengthy and hard to read. We focus on the main section of the code to better demonstrate the effect of compiler optimization. ### Handwritten Assembly ``` clike= runningSum: # t5: i, t6: current result addr add t6, t3, zero li t5, 1 # load i with 1 lw a3, 0(t2) # load nums[0] to a3 sw a3, 0(t6) # save nums[0] to result[0] loop: addi t2, t2, 4 # nums[i] addr lw a2, 0(t2) # nums[i] add a3, a3, a2 # add result[i-1] + nums[i] addi t6, t6, 4 # result[i] addi t5, t5, 1 # i = i + 1 sw a3, 0(t6) # save result[0] + nums[1] to result[1] blt t5, t4, loop # if i < numsSize then branch to loop ``` #### Observation * LOC(Line of code): 13 * Register used: * *tx* registers: **t2**, **t3**, **t4**, **t5**, **t6** * *ax* registers: **a2**, **a3** * *sx* registers: none * *others*: none * Stack used(bytes): 0 bytes * Number of lw and sw: lw=2 sw=2 * Number of branch/jump instructions: 1 ### -Os ``` clike= 00010290 <runningSum>: 10290: ff010113 addi sp,sp,-16 10294: 00812423 sw s0,8(sp) 10298: 00912223 sw s1,4(sp) 1029c: 00058413 mv s0,a1 102a0: 00050493 mv s1,a0 102a4: 00259513 slli a0,a1,0x2 102a8: 00112623 sw ra,12(sp) 102ac: 158000ef jal ra,10404 <malloc> 102b0: 00805e63 blez s0,102cc <runningSum+0x3c> 102b4: 0004a783 lw a5,0(s1) 102b8: 00050713 mv a4,a0 102bc: 00f52023 sw a5,0(a0) 102c0: 00000793 li a5,0 102c4: 00178793 addi a5,a5,1 102c8: 0087cc63 blt a5,s0,102e0 <runningSum+0x50> 102cc: 00c12083 lw ra,12(sp) 102d0: 00812403 lw s0,8(sp) 102d4: 00412483 lw s1,4(sp) 102d8: 01010113 addi sp,sp,16 102dc: 00008067 ret 102e0: 00279693 slli a3,a5,0x2 102e4: 00d486b3 add a3,s1,a3 102e8: 0006a683 lw a3,0(a3) 102ec: 00072603 lw a2,0(a4) 102f0: 00470713 addi a4,a4,4 102f4: 00c686b3 add a3,a3,a2 102f8: 00d72023 sw a3,0(a4) 102fc: fc9ff06f j 102c4 <runningSum+0x34> ``` #### Observation * LOC(Line of code): 29 * Register used: * *tx* registers: none * *ax* registers: **a0**, **a1**, **a2**, **a3**, **a4**, **a5** * *sx* registers: **s0**, **s1** * *others*: **sp**, **ra** * Stack used(bytes): 16 bytes * Number of lw and sw: lw=6 sw=5 * Number of branch/jump instructions: 5 1. ELF file header: ``` ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x101e8 Start of program headers: 52 (bytes into file) Start of section headers: 95232 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 3 Size of section headers: 40 (bytes) Number of section headers: 15 Section header string table index: 14 ``` 2. Size: ``` text data bss dec hex filename 75512 2816 812 79140 13524 Hw2_s ``` 3. Execute and Statistics of execution ``` 1 3 6 10 15 1 4 9 16 25 36 0 2 6 12 20 30 42 inferior exit code 0 CSR cycle count: 16503 ``` ### -O0 ``` clike= 00010184 <runningSum>: 10184: fd010113 addi sp,sp,-48 10188: 02112623 sw ra,44(sp) 1018c: 02812423 sw s0,40(sp) 10190: 03010413 addi s0,sp,48 10194: fca42e23 sw a0,-36(s0) 10198: fcb42c23 sw a1,-40(s0) 1019c: fd842783 lw a5,-40(s0) 101a0: 00279793 slli a5,a5,0x2 101a4: 00078513 mv a0,a5 101a8: 3a8000ef jal ra,10550 <malloc> 101ac: 00050793 mv a5,a0 101b0: fef42423 sw a5,-24(s0) 101b4: fe042623 sw zero,-20(s0) 101b8: 0780006f j 10230 <runningSum+0xac> 101bc: fec42783 lw a5,-20(s0) 101c0: 00079c63 bnez a5,101d8 <runningSum+0x54> 101c4: fdc42783 lw a5,-36(s0) 101c8: 0007a703 lw a4,0(a5) 101cc: fe842783 lw a5,-24(s0) 101d0: 00e7a023 sw a4,0(a5) 101d4: 0500006f j 10224 <runningSum+0xa0> 101d8: fec42703 lw a4,-20(s0) 101dc: 400007b7 lui a5,0x40000 101e0: fff78793 addi a5,a5,-1 # 3fffffff <__BSS_END__+0x3ffdc1cf> 101e4: 00f707b3 add a5,a4,a5 101e8: 00279793 slli a5,a5,0x2 101ec: fe842703 lw a4,-24(s0) 101f0: 00f707b3 add a5,a4,a5 101f4: 0007a683 lw a3,0(a5) 101f8: fec42783 lw a5,-20(s0) 101fc: 00279793 slli a5,a5,0x2 10200: fdc42703 lw a4,-36(s0) 10204: 00f707b3 add a5,a4,a5 10208: 0007a703 lw a4,0(a5) 1020c: fec42783 lw a5,-20(s0) 10210: 00279793 slli a5,a5,0x2 10214: fe842603 lw a2,-24(s0) 10218: 00f607b3 add a5,a2,a5 1021c: 00e68733 add a4,a3,a4 10220: 00e7a023 sw a4,0(a5) 10224: fec42783 lw a5,-20(s0) 10228: 00178793 addi a5,a5,1 1022c: fef42623 sw a5,-20(s0) 10230: fec42703 lw a4,-20(s0) 10234: fd842783 lw a5,-40(s0) 10238: f8f742e3 blt a4,a5,101bc <runningSum+0x38> 1023c: fe842783 lw a5,-24(s0) 10240: 00078513 mv a0,a5 10244: 02c12083 lw ra,44(sp) 10248: 02812403 lw s0,40(sp) 1024c: 03010113 addi sp,sp,48 10250: 00008067 ret ``` #### Observation * LOC(Line of code): 53 * Register used: * *tx* registers: none * *ax* registers: **a0**, **a1**, **a2**, **a3**, **a4**, **a5** * *sx* registers: **s0** * *others*: **sp**, **ra**, **zero** * Stack used(bytes): 48 bytes * Number of lw and sw: lw=19 sw=9 * Number of branch/jump instructions: 6 1. ELF file header: ``` ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x100dc Start of program headers: 52 (bytes into file) Start of section headers: 95216 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 3 Size of section headers: 40 (bytes) Number of section headers: 15 Section header string table index: 14 ``` 2. Size: ``` text data bss dec hex filename 75844 2816 812 79472 13670 Hw2_0 ``` 3. Execute and Statistics of execution ``` 1 3 6 10 15 1 4 9 16 25 36 0 2 6 12 20 30 42 inferior exit code 0 CSR cycle count: 16889 ``` ### -O1 ``` clike= 00010184 <runningSum>: 10184: ff010113 addi sp,sp,-16 10188: 00112623 sw ra,12(sp) 1018c: 00812423 sw s0,8(sp) 10190: 00912223 sw s1,4(sp) 10194: 00050493 mv s1,a0 10198: 00058413 mv s0,a1 1019c: 00259513 slli a0,a1,0x2 101a0: 2e0000ef jal ra,10480 <malloc> 101a4: 04805263 blez s0,101e8 <runningSum+0x64> 101a8: 00050713 mv a4,a0 101ac: 00048693 mv a3,s1 101b0: 00000793 li a5,0 101b4: 01c0006f j 101d0 <runningSum+0x4c> 101b8: 0004a603 lw a2,0(s1) 101bc: 00c52023 sw a2,0(a0) 101c0: 00178793 addi a5,a5,1 101c4: 00470713 addi a4,a4,4 101c8: 00468693 addi a3,a3,4 101cc: 00f40e63 beq s0,a5,101e8 <runningSum+0x64> 101d0: fe0784e3 beqz a5,101b8 <runningSum+0x34> 101d4: ffc72603 lw a2,-4(a4) 101d8: 0006a583 lw a1,0(a3) 101dc: 00b60633 add a2,a2,a1 101e0: 00c72023 sw a2,0(a4) 101e4: fddff06f j 101c0 <runningSum+0x3c> 101e8: 00c12083 lw ra,12(sp) 101ec: 00812403 lw s0,8(sp) 101f0: 00412483 lw s1,4(sp) 101f4: 01010113 addi sp,sp,16 101f8: 00008067 ret ``` #### Observation * LOC(Line of code): 31 * Register used: * *tx* registers: none * *ax* registers: **a0**, **a1**, **a2**, **a3**, **a4**, **a5** * *sx* registers: **s0**, **s1** * *others*: **sp**, **ra** * Stack used(bytes): 16 bytes * Number of lw and sw: lw=6 sw=5 * Number of branch/jump instructions: 7 1. ELF file header: ``` ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x100dc Start of program headers: 52 (bytes into file) Start of section headers: 95216 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 3 Size of section headers: 40 (bytes) Number of section headers: 15 Section header string table index: 14 ``` 2. Size: ``` text data bss dec hex filename 75636 2816 812 79264 135a0 Hw2_1 ``` 3. Execute and Statistics of execution ``` 1 3 6 10 15 1 4 9 16 25 36 0 2 6 12 20 30 42 inferior exit code 0 CSR cycle count: 16406 ``` ### -O2 ``` clike= 000102f0 <runningSum>: 102f0: ff010113 addi sp,sp,-16 102f4: 00812423 sw s0,8(sp) 102f8: 00912223 sw s1,4(sp) 102fc: 00050413 mv s0,a0 10300: 00058493 mv s1,a1 10304: 00259513 slli a0,a1,0x2 10308: 00112623 sw ra,12(sp) 1030c: 158000ef jal ra,10464 <malloc> 10310: 02905e63 blez s1,1034c <runningSum+0x5c> 10314: 00042683 lw a3,0(s0) 10318: 00040713 mv a4,s0 1031c: 00050793 mv a5,a0 10320: 00000613 li a2,0 10324: 00d52023 sw a3,0(a0) 10328: 0140006f j 1033c <runningSum+0x4c> 1032c: ffc7a683 lw a3,-4(a5) 10330: 00072803 lw a6,0(a4) 10334: 010686b3 add a3,a3,a6 10338: 00d7a023 sw a3,0(a5) 1033c: 00160613 addi a2,a2,1 10340: 00478793 addi a5,a5,4 10344: 00470713 addi a4,a4,4 10348: fec492e3 bne s1,a2,1032c <runningSum+0x3c> 1034c: 00c12083 lw ra,12(sp) 10350: 00812403 lw s0,8(sp) 10354: 00412483 lw s1,4(sp) 10358: 01010113 addi sp,sp,16 1035c: 00008067 ret ``` #### Observation * LOC(Line of code): 29 * Register used: * *tx* registers: none * *ax* registers: **a0**, **a1**, **a2**, **a3**, **a4**, **a5**, **a6** * *sx* registers: **s0**, **s1** * *others*: **sp**, **ra** * Stack used(bytes): 16 bytes * Number of lw and sw: lw=6 sw=5 * Number of branch/jump instructions: 5 1. ELF file header: ``` ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x10248 Start of program headers: 52 (bytes into file) Start of section headers: 95232 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 3 Size of section headers: 40 (bytes) Number of section headers: 15 Section header string table index: 14 ``` 2. Size: ``` text data bss dec hex filename 75608 2816 812 79236 13584 Hw2_2 ``` 3. Execute and Statistics of execution ``` 1 3 6 10 15 1 4 9 16 25 36 0 2 6 12 20 30 42 inferior exit code 0 CSR cycle count: 16401 ``` ### -Ofast ``` clike= 000102f0 <runningSum>: 102f0: ff010113 addi sp,sp,-16 102f4: 00812423 sw s0,8(sp) 102f8: 00912223 sw s1,4(sp) 102fc: 00050413 mv s0,a0 10300: 00058493 mv s1,a1 10304: 00259513 slli a0,a1,0x2 10308: 00112623 sw ra,12(sp) 1030c: 15c000ef jal ra,10468 <malloc> 10310: 04905063 blez s1,10350 <runningSum+0x60> 10314: 00042683 lw a3,0(s0) 10318: 00100713 li a4,1 1031c: 00450793 addi a5,a0,4 10320: 00d52023 sw a3,0(a0) 10324: 00440693 addi a3,s0,4 10328: 02e48463 beq s1,a4,10350 <runningSum+0x60> 1032c: 00100613 li a2,1 10330: ffc7a703 lw a4,-4(a5) 10334: 0006a803 lw a6,0(a3) 10338: 00478793 addi a5,a5,4 1033c: 00160613 addi a2,a2,1 10340: 01070733 add a4,a4,a6 10344: fee7ae23 sw a4,-4(a5) 10348: 00468693 addi a3,a3,4 1034c: fec492e3 bne s1,a2,10330 <runningSum+0x40> 10350: 00c12083 lw ra,12(sp) 10354: 00812403 lw s0,8(sp) 10358: 00412483 lw s1,4(sp) 1035c: 01010113 addi sp,sp,16 10360: 00008067 ret ``` #### Observation * LOC(Line of code): 30 * Register used: * *tx* registers: none * *ax* registers: **a0**, **a1**, **a2**, **a3**, **a4**, **a5**, **a6** * *sx* registers: **s0**, **s1** * *others*: **sp**, **ra** * Stack used(bytes): 16 bytes * Number of lw and sw: lw=6 sw=5 * Number of branch/jump instructions: 5 1. ELF file header: ``` ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x10248 Start of program headers: 52 (bytes into file) Start of section headers: 95232 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 3 Size of section headers: 40 (bytes) Number of section headers: 15 Section header string table index: 14 ``` 2. Size: ``` text data bss dec hex filename 75612 2816 812 79240 13588 Hw2_f ``` 3. Execute and Statistics of execution ``` 1 3 6 10 15 1 4 9 16 25 36 0 2 6 12 20 30 42 inferior exit code 0 CSR cycle count: 16395 ``` ## Summary: Observed Compiler Behavior :::info * From `-O0` to `-O1` * Additional usage of **s1** which significantly decreases lw/sw count and stack space. * From `-O1` to `-O2` * Unroll the loop to avoid branch misprediction. * From `-O2` to `-Ofast` * Nearly identical. ::: | | -Os | -O0 | -O1 | -O2 | -Ofast | |--------------------|-----------|-----------|-----------|-----------|--------------| | Line of Code | 29 | 53 | 31 | 29 | 30 | | Register used | 10 | 10 | 10 | 11 | 11 | | Stack used (bytes) | 16 | 48 | 16 | 16 | 16 | | lw/sw count | 11 | 28 | 11 | 11 | 11 | | branch/jump count | 5 | 6 | 7 | 5 | 5 | | CSR cycle count | 16503 | 16889 | 16406 | 16401 | 16395 | * Size among different optimization levels: While the data and bss section sizes share the same value across all different optimization levels, we can obtain the tendency in other sections. ![](https://i.imgur.com/ANzgwXU.png "text") ![](https://i.imgur.com/LbV9nhQ.png "dec") ![](https://i.imgur.com/JlCd6Yj.png "hex") * Cycle count among different optimization levels: ![](https://i.imgur.com/VyMeKHm.png "Cycle count")