Lab2: RISC-V RV32I[MA] emulator with ELF support

--- title: "Lab2: RISC-V RV32I[MA] emulator with ELF support" --- Lab2: RISC-V RV32I[MA] emulator with ELF support for Convolution Sum === ###### tags: `RISC-V` `Computer Architeture` [TOC] **[Original Document](https://hackmd.io/hXZ0XVxTTLexGeQXfcj6aA)** ## Rewritten Code in C Above original program is rewritten in `Conv_Sum.c` by myself as below. I did a little modification to original `*` operator. While we are using RV32I micro architecture, there's no `mul` supported. Thus, I use the code credited to [dck9661](https://hackmd.io/@oR8-QX4TQzGKDJ72DmqDUg/SJrNq1QuB), and made a little change to suffice signed numbers multiplication as `int mul(int, int)` below. ```c= int mul(int a, int b){ int result = 0, minus_flag = ((a < 0) != (b < 0)) ? 1 : 0; a = (a < 0) ? -a : a; b = (b < 0) ? -b : b; for(int i=0;i<31;i++) { if(b & (0x1 << i)) result = result + (a << i); } if(minus_flag) return -result; return result; } int Conv_Sum(int A[], int B[], unsigned int len) { int result = 0; for(int i = 0; i < len; i ++) { result += mul(A[i], B[i]); } return result; } int _start(){ int signal[10] = { 39, 174, 243, 51, 73, 184, 33, 137, 82, 114 }; int weight[10] = { 6, 10, 0, -2, -8, 8, 8, 10, 4, -10 }; int result = Conv_Sum(signal, weight, 10); // Printing int charIndex = 0; char charStack[12]; volatile char* tx = (volatile char*) 0x40002000; if(result < 0){ *tx = '-'; result = -result; } for(; result; result >>=4, charIndex ++){ if((result & 0xf) > 9) charStack[charIndex] = (result & 0xf) + 'A' - 10; else charStack[charIndex] = (result & 0xf) + '0'; } *tx = '0'; *tx = 'x'; while(charIndex--){ *tx = charStack[charIndex]; } return 0; } ``` ## Execution with rv32emu 1. By using `riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -O3 -nostdlib ./Conv_Sum.c -o Conv_Sum3` to compile Conv_Sum.c above with optimization in favor of speed and executing with `emu-rv32i Conv_Sum3`, I had the result below. ```bash 0xDFE >>> Execution time: 3924561 ns >>> Instruction count: 2147 (IPS=547067) >>> Jumps: 623 (29.02%) - 309 forwards, 314 backwards >>> Branching T=618 (95.22%) F=31 (4.78%) ``` 2. By using `riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -Os -nostdlib ./Conv_Sum.c -o Conv_Sums`to compile Conv_Sum.c above with optimization in favor of code size, I would confront the problem that compiler is trying to link `memcpy` since I'm initializing an array. Thus, I add a little C code below to suffice the `memcpy` linkage problem. ```c= #include <stddef.h> void* memcpy(void* str1, const void* str2, size_t n){ char* d = str1; const char* s = str2; while(n --) *d ++ = *s ++; return d; } ``` After successfully compiling `Conv_Sum.c` with `-Os` flag, I executed it with `emu-rv32i Conv_Sum3` and had the result below. ```bash 0xDFE >>> Execution time: 5061702 ns >>> Instruction count: 2482 (IPS=490348) >>> Jumps: 824 (33.20%) - 414 forwards, 410 backwards >>> Branching T=701 (95.37%) F=34 (4.63%) ``` ## Checking Code Size By using `riscv-none-embed-objdump -h Conv_Sum3 > Conv_Sum3h` and `riscv-none-embed-objdump -h Conv_Sums > Conv_Sumsh` to check the segments of code, I have below result. Still, the `-Os` flag did a significant improvement to code size. * Conv_Sum with -O3 ```bash Conv_Sum3: file format elf32-littleriscv Sections: Idx Name Size VMA LMA File off Algn 0 .text 00000494 00010054 00010054 00000054 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 1 .rodata 00000050 000104e8 000104e8 000004e8 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 2 .comment 00000033 00000000 00000000 00000538 2**0 CONTENTS, READONLY ``` * Conv_Sum with -Os ```bash Conv_Sums: file format elf32-littleriscv Sections: Idx Name Size VMA LMA File off Algn 0 .text 000001d0 00010054 00010054 00000054 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 1 .rodata 00000050 00010224 00010224 00000224 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 2 .comment 00000033 00000000 00000000 00000274 2**0 CONTENTS, READONLY ``` ## Checking Disassembled Program By using `riscv-none-embed-objdump -d Conv_Sum3 > Conv_Sum3.as ` and `riscv-none-embed-objdump -d Conv_Sums > Conv_Sums.as` to check the disassembled code of the two elf files, I have below result. * Conv_Sum with -O3 ```bash Conv_Sum3: file format elf32-littleriscv Disassembly of section .text: 00010054 <mul>: 10054: 41f5d793 srai a5,a1,0x1f 10058: 41f55713 srai a4,a0,0x1f 1005c: 00b7c633 xor a2,a5,a1 10060: 00a74833 xor a6,a4,a0 10064: 01f55313 srli t1,a0,0x1f 10068: 01f5d893 srli a7,a1,0x1f 1006c: 40f60633 sub a2,a2,a5 10070: 40e80833 sub a6,a6,a4 10074: 00000793 li a5,0 10078: 00000513 li a0,0 1007c: 01f00593 li a1,31 10080: 40f65733 sra a4,a2,a5 10084: 00177713 andi a4,a4,1 10088: 00f816b3 sll a3,a6,a5 1008c: 00178793 addi a5,a5,1 10090: 00070463 beqz a4,10098 <mul+0x44> 10094: 00d50533 add a0,a0,a3 10098: feb794e3 bne a5,a1,10080 <mul+0x2c> 1009c: 01130463 beq t1,a7,100a4 <mul+0x50> 100a0: 40a00533 neg a0,a0 100a4: 00008067 ret 000100a8 <Conv_Sum>: 100a8: 08060663 beqz a2,10134 <Conv_Sum+0x8c> 100ac: 00261613 slli a2,a2,0x2 100b0: 00050f13 mv t5,a0 100b4: 00c502b3 add t0,a0,a2 100b8: 01f00313 li t1,31 100bc: 00000513 li a0,0 100c0: 0005a783 lw a5,0(a1) 100c4: 000f2703 lw a4,0(t5) 100c8: 00000893 li a7,0 100cc: 41f7d693 srai a3,a5,0x1f 100d0: 41f75f93 srai t6,a4,0x1f 100d4: 00f6c633 xor a2,a3,a5 100d8: 00efc833 xor a6,t6,a4 100dc: 01f7de13 srli t3,a5,0x1f 100e0: 01f75e93 srli t4,a4,0x1f 100e4: 41f80833 sub a6,a6,t6 100e8: 40d60633 sub a2,a2,a3 100ec: 00000793 li a5,0 100f0: 40f65733 sra a4,a2,a5 100f4: 00177713 andi a4,a4,1 100f8: 00f816b3 sll a3,a6,a5 100fc: 00178793 addi a5,a5,1 10100: 00070463 beqz a4,10108 <Conv_Sum+0x60> 10104: 00d888b3 add a7,a7,a3 10108: fe6794e3 bne a5,t1,100f0 <Conv_Sum+0x48> 1010c: 004f0f13 addi t5,t5,4 10110: 01ce8a63 beq t4,t3,10124 <Conv_Sum+0x7c> 10114: 41150533 sub a0,a0,a7 10118: 00458593 addi a1,a1,4 1011c: fa5f12e3 bne t5,t0,100c0 <Conv_Sum+0x18> 10120: 00008067 ret 10124: 01150533 add a0,a0,a7 10128: 00458593 addi a1,a1,4 1012c: f9e29ae3 bne t0,t5,100c0 <Conv_Sum+0x18> 10130: 00008067 ret 10134: 00000513 li a0,0 10138: 00008067 ret 0001013c <_start>: 1013c: 000107b7 lui a5,0x10 10140: 4e878793 addi a5,a5,1256 # 104e8 <_start+0x3ac> 10144: 0007a703 lw a4,0(a5) 10148: fa010113 addi sp,sp,-96 1014c: 01c7af83 lw t6,28(a5) 10150: 00e12823 sw a4,16(sp) 10154: 0047a703 lw a4,4(a5) 10158: 0207af03 lw t5,32(a5) 1015c: 0287ae03 lw t3,40(a5) 10160: 00e12a23 sw a4,20(sp) 10164: 0087a703 lw a4,8(a5) 10168: 02c7a503 lw a0,44(a5) 1016c: 0147a383 lw t2,20(a5) 10170: 00e12c23 sw a4,24(sp) 10174: 00c7a703 lw a4,12(a5) 10178: 0187a283 lw t0,24(a5) 1017c: 0247ae83 lw t4,36(a5) 10180: 00e12e23 sw a4,28(sp) 10184: 0107a703 lw a4,16(a5) 10188: 0347a303 lw t1,52(a5) 1018c: 0387a883 lw a7,56(a5) 10190: 03c7a803 lw a6,60(a5) 10194: 0407a583 lw a1,64(a5) 10198: 0447a603 lw a2,68(a5) 1019c: 0487a683 lw a3,72(a5) 101a0: 02e12023 sw a4,32(sp) 101a4: 04c7a703 lw a4,76(a5) 101a8: 0307a783 lw a5,48(a5) 101ac: 03f12623 sw t6,44(sp) 101b0: 03e12823 sw t5,48(sp) 101b4: 03c12c23 sw t3,56(sp) 101b8: 02a12e23 sw a0,60(sp) 101bc: 02712223 sw t2,36(sp) 101c0: 02512423 sw t0,40(sp) 101c4: 03d12a23 sw t4,52(sp) 101c8: 03810e13 addi t3,sp,56 101cc: 04f12023 sw a5,64(sp) 101d0: 01010513 addi a0,sp,16 101d4: 04612223 sw t1,68(sp) 101d8: 05112423 sw a7,72(sp) 101dc: 05012623 sw a6,76(sp) 101e0: 04b12823 sw a1,80(sp) 101e4: 04c12a23 sw a2,84(sp) 101e8: 04d12c23 sw a3,88(sp) 101ec: 04e12e23 sw a4,92(sp) 101f0: 000e0f93 mv t6,t3 101f4: 00000313 li t1,0 101f8: 01f00f13 li t5,31 101fc: 00052703 lw a4,0(a0) 10200: 000e2783 lw a5,0(t3) 10204: 00000593 li a1,0 10208: 41f75e93 srai t4,a4,0x1f 1020c: 41f7d613 srai a2,a5,0x1f 10210: 00eec8b3 xor a7,t4,a4 10214: 00f64833 xor a6,a2,a5 10218: 01f75693 srli a3,a4,0x1f 1021c: 41d888b3 sub a7,a7,t4 10220: 01f7d713 srli a4,a5,0x1f 10224: 40c80833 sub a6,a6,a2 10228: 00000793 li a5,0 1022c: 40f85633 sra a2,a6,a5 10230: 00167613 andi a2,a2,1 10234: 00f89eb3 sll t4,a7,a5 10238: 00178793 addi a5,a5,1 1023c: 00060463 beqz a2,10244 <_start+0x108> 10240: 01d585b3 add a1,a1,t4 10244: ffe794e3 bne a5,t5,1022c <_start+0xf0> 10248: 00450513 addi a0,a0,4 1024c: 1ce68463 beq a3,a4,10414 <_start+0x2d8> 10250: 40b30333 sub t1,t1,a1 10254: 004e0e13 addi t3,t3,4 10258: fbf512e3 bne a0,t6,101fc <_start+0xc0> 1025c: 1c034463 bltz t1,10424 <_start+0x2e8> 10260: 26030063 beqz t1,104c0 <_start+0x384> 10264: 00f37793 andi a5,t1,15 10268: 0ff7f693 andi a3,a5,255 1026c: 00900613 li a2,9 10270: 03068713 addi a4,a3,48 10274: 1cf64a63 blt a2,a5,10448 <_start+0x30c> 10278: 00e10223 sb a4,4(sp) 1027c: 40435793 srai a5,t1,0x4 10280: 1e078863 beqz a5,10470 <_start+0x334> 10284: 00f7f793 andi a5,a5,15 10288: 0ff7f693 andi a3,a5,255 1028c: 00900613 li a2,9 10290: 03068713 addi a4,a3,48 10294: 1af64e63 blt a2,a5,10450 <_start+0x314> 10298: 00e102a3 sb a4,5(sp) 1029c: 40835793 srai a5,t1,0x8 102a0: 1e078863 beqz a5,10490 <_start+0x354> 102a4: 00f7f793 andi a5,a5,15 102a8: 0ff7f693 andi a3,a5,255 102ac: 00900613 li a2,9 102b0: 03068713 addi a4,a3,48 102b4: 1af64263 blt a2,a5,10458 <_start+0x31c> 102b8: 00e10323 sb a4,6(sp) 102bc: 40c35793 srai a5,t1,0xc 102c0: 1e078863 beqz a5,104b0 <_start+0x374> 102c4: 00f7f793 andi a5,a5,15 102c8: 0ff7f693 andi a3,a5,255 102cc: 00900613 li a2,9 102d0: 03068713 addi a4,a3,48 102d4: 18f64663 blt a2,a5,10460 <_start+0x324> 102d8: 00e103a3 sb a4,7(sp) 102dc: 41035793 srai a5,t1,0x10 102e0: 1c078c63 beqz a5,104b8 <_start+0x37c> 102e4: 00f7f793 andi a5,a5,15 102e8: 0ff7f693 andi a3,a5,255 102ec: 00900613 li a2,9 102f0: 03068713 addi a4,a3,48 102f4: 00f65463 bge a2,a5,102fc <_start+0x1c0> 102f8: 03768713 addi a4,a3,55 102fc: 00e10423 sb a4,8(sp) 10300: 41435793 srai a5,t1,0x14 10304: 16078263 beqz a5,10468 <_start+0x32c> 10308: 00f7f793 andi a5,a5,15 1030c: 0ff7f693 andi a3,a5,255 10310: 00900613 li a2,9 10314: 03068713 addi a4,a3,48 10318: 00f65463 bge a2,a5,10320 <_start+0x1e4> 1031c: 03768713 addi a4,a3,55 10320: 00e104a3 sb a4,9(sp) 10324: 41835793 srai a5,t1,0x18 10328: 1a078863 beqz a5,104d8 <_start+0x39c> 1032c: 00f7f793 andi a5,a5,15 10330: 0ff7f693 andi a3,a5,255 10334: 00900613 li a2,9 10338: 03068713 addi a4,a3,48 1033c: 00f65463 bge a2,a5,10344 <_start+0x208> 10340: 03768713 addi a4,a3,55 10344: 00e10523 sb a4,10(sp) 10348: 41c35313 srai t1,t1,0x1c 1034c: 18030a63 beqz t1,104e0 <_start+0x3a4> 10350: 03030313 addi t1,t1,48 10354: 006105a3 sb t1,11(sp) 10358: 00700713 li a4,7 1035c: 400027b7 lui a5,0x40002 10360: 03000693 li a3,48 10364: 00d78023 sb a3,0(a5) # 40002000 <__global_pointer$+0x3fff02c8> 10368: 07800693 li a3,120 1036c: 00d78023 sb a3,0(a5) 10370: 06010693 addi a3,sp,96 10374: 00e68633 add a2,a3,a4 10378: fa464503 lbu a0,-92(a2) 1037c: ffe70693 addi a3,a4,-2 10380: 06010593 addi a1,sp,96 10384: 00a78023 sb a0,0(a5) 10388: fa364503 lbu a0,-93(a2) 1038c: 00d585b3 add a1,a1,a3 10390: ffd70613 addi a2,a4,-3 10394: 00a78023 sb a0,0(a5) 10398: fa45c583 lbu a1,-92(a1) 1039c: 00b78023 sb a1,0(a5) 103a0: 06068463 beqz a3,10408 <_start+0x2cc> 103a4: 06010693 addi a3,sp,96 103a8: 00c686b3 add a3,a3,a2 103ac: fa46c583 lbu a1,-92(a3) 103b0: ffc70693 addi a3,a4,-4 103b4: 00b78023 sb a1,0(a5) 103b8: 04060863 beqz a2,10408 <_start+0x2cc> 103bc: 06010613 addi a2,sp,96 103c0: 00d60633 add a2,a2,a3 103c4: fa464583 lbu a1,-92(a2) 103c8: ffb70613 addi a2,a4,-5 103cc: 00b78023 sb a1,0(a5) 103d0: 02068c63 beqz a3,10408 <_start+0x2cc> 103d4: 06010693 addi a3,sp,96 103d8: 00c686b3 add a3,a3,a2 103dc: fa46c683 lbu a3,-92(a3) 103e0: ffa70713 addi a4,a4,-6 103e4: 00d78023 sb a3,0(a5) 103e8: 02060063 beqz a2,10408 <_start+0x2cc> 103ec: 06010693 addi a3,sp,96 103f0: 00e686b3 add a3,a3,a4 103f4: fa46c683 lbu a3,-92(a3) 103f8: 00d78023 sb a3,0(a5) 103fc: 00070663 beqz a4,10408 <_start+0x2cc> 10400: 00414703 lbu a4,4(sp) 10404: 00e78023 sb a4,0(a5) 10408: 00000513 li a0,0 1040c: 06010113 addi sp,sp,96 10410: 00008067 ret 10414: 00b30333 add t1,t1,a1 10418: 004e0e13 addi t3,t3,4 1041c: dff510e3 bne a0,t6,101fc <_start+0xc0> 10420: e40350e3 bgez t1,10260 <_start+0x124> 10424: 400027b7 lui a5,0x40002 10428: 02d00713 li a4,45 1042c: 40600333 neg t1,t1 10430: 00e78023 sb a4,0(a5) # 40002000 <__global_pointer$+0x3fff02c8> 10434: 00f37793 andi a5,t1,15 10438: 0ff7f693 andi a3,a5,255 1043c: 00900613 li a2,9 10440: 03068713 addi a4,a3,48 10444: e2f65ae3 bge a2,a5,10278 <_start+0x13c> 10448: 03768713 addi a4,a3,55 1044c: e2dff06f j 10278 <_start+0x13c> 10450: 03768713 addi a4,a3,55 10454: e45ff06f j 10298 <_start+0x15c> 10458: 03768713 addi a4,a3,55 1045c: e5dff06f j 102b8 <_start+0x17c> 10460: 03768713 addi a4,a3,55 10464: e75ff06f j 102d8 <_start+0x19c> 10468: 00400713 li a4,4 1046c: ef1ff06f j 1035c <_start+0x220> 10470: 400027b7 lui a5,0x40002 10474: 03000713 li a4,48 10478: 00e78023 sb a4,0(a5) # 40002000 <__global_pointer$+0x3fff02c8> 1047c: 07800713 li a4,120 10480: 00e78023 sb a4,0(a5) 10484: 00414703 lbu a4,4(sp) 10488: 00e78023 sb a4,0(a5) 1048c: f7dff06f j 10408 <_start+0x2cc> 10490: 400027b7 lui a5,0x40002 10494: 03000713 li a4,48 10498: 00e78023 sb a4,0(a5) # 40002000 <__global_pointer$+0x3fff02c8> 1049c: 07800713 li a4,120 104a0: 00e78023 sb a4,0(a5) 104a4: 00514703 lbu a4,5(sp) 104a8: 00e78023 sb a4,0(a5) 104ac: f55ff06f j 10400 <_start+0x2c4> 104b0: 00200713 li a4,2 104b4: ea9ff06f j 1035c <_start+0x220> 104b8: 00300713 li a4,3 104bc: ea1ff06f j 1035c <_start+0x220> 104c0: 400027b7 lui a5,0x40002 104c4: 03000713 li a4,48 104c8: 00e78023 sb a4,0(a5) # 40002000 <__global_pointer$+0x3fff02c8> 104cc: 07800713 li a4,120 104d0: 00e78023 sb a4,0(a5) 104d4: f35ff06f j 10408 <_start+0x2cc> 104d8: 00500713 li a4,5 104dc: e81ff06f j 1035c <_start+0x220> 104e0: 00600713 li a4,6 104e4: e79ff06f j 1035c <_start+0x220> ``` * Conv_Sum with -Os ```bash Conv_Sums: file format elf32-littleriscv Disassembly of section .text: 00010054 <memcpy>: 10054: 00000793 li a5,0 10058: 00f61663 bne a2,a5,10064 <memcpy+0x10> 1005c: 00c50533 add a0,a0,a2 10060: 00008067 ret 10064: 00f58733 add a4,a1,a5 10068: 00074683 lbu a3,0(a4) 1006c: 00f50733 add a4,a0,a5 10070: 00178793 addi a5,a5,1 10074: 00d70023 sb a3,0(a4) 10078: fe1ff06f j 10058 <memcpy+0x4> 0001007c <mul>: 1007c: 41f55713 srai a4,a0,0x1f 10080: 00a747b3 xor a5,a4,a0 10084: 40e787b3 sub a5,a5,a4 10088: 41f5d713 srai a4,a1,0x1f 1008c: 01f5d613 srli a2,a1,0x1f 10090: 00b745b3 xor a1,a4,a1 10094: 01f55813 srli a6,a0,0x1f 10098: 40e585b3 sub a1,a1,a4 1009c: 00000513 li a0,0 100a0: 00000713 li a4,0 100a4: 01f00893 li a7,31 100a8: 40e5d6b3 sra a3,a1,a4 100ac: 0016f693 andi a3,a3,1 100b0: 00068663 beqz a3,100bc <mul+0x40> 100b4: 00e796b3 sll a3,a5,a4 100b8: 00d50533 add a0,a0,a3 100bc: 00170713 addi a4,a4,1 100c0: ff1714e3 bne a4,a7,100a8 <mul+0x2c> 100c4: 00c80463 beq a6,a2,100cc <mul+0x50> 100c8: 40a00533 neg a0,a0 100cc: 00008067 ret 000100d0 <Conv_Sum>: 100d0: fe010113 addi sp,sp,-32 100d4: 00812c23 sw s0,24(sp) 100d8: 00912a23 sw s1,20(sp) 100dc: 01212823 sw s2,16(sp) 100e0: 01312623 sw s3,12(sp) 100e4: 01412423 sw s4,8(sp) 100e8: 00112e23 sw ra,28(sp) 100ec: 00050993 mv s3,a0 100f0: 00058a13 mv s4,a1 100f4: 00261913 slli s2,a2,0x2 100f8: 00000413 li s0,0 100fc: 00000493 li s1,0 10100: 03241463 bne s0,s2,10128 <Conv_Sum+0x58> 10104: 01c12083 lw ra,28(sp) 10108: 01812403 lw s0,24(sp) 1010c: 00048513 mv a0,s1 10110: 01012903 lw s2,16(sp) 10114: 01412483 lw s1,20(sp) 10118: 00c12983 lw s3,12(sp) 1011c: 00812a03 lw s4,8(sp) 10120: 02010113 addi sp,sp,32 10124: 00008067 ret 10128: 008a0733 add a4,s4,s0 1012c: 008987b3 add a5,s3,s0 10130: 00072583 lw a1,0(a4) 10134: 0007a503 lw a0,0(a5) 10138: 00440413 addi s0,s0,4 1013c: f41ff0ef jal ra,1007c <mul> 10140: 00a484b3 add s1,s1,a0 10144: fbdff06f j 10100 <Conv_Sum+0x30> 00010148 <_start>: 10148: f9010113 addi sp,sp,-112 1014c: 000105b7 lui a1,0x10 10150: 06812423 sw s0,104(sp) 10154: 02800613 li a2,40 10158: 22458413 addi s0,a1,548 # 10224 <_start+0xdc> 1015c: 01010513 addi a0,sp,16 10160: 22458593 addi a1,a1,548 10164: 06112623 sw ra,108(sp) 10168: eedff0ef jal ra,10054 <memcpy> 1016c: 02800613 li a2,40 10170: 02840593 addi a1,s0,40 10174: 03810513 addi a0,sp,56 10178: eddff0ef jal ra,10054 <memcpy> 1017c: 00a00613 li a2,10 10180: 03810593 addi a1,sp,56 10184: 01010513 addi a0,sp,16 10188: f49ff0ef jal ra,100d0 <Conv_Sum> 1018c: 00055a63 bgez a0,101a0 <_start+0x58> 10190: 400027b7 lui a5,0x40002 10194: 02d00713 li a4,45 10198: 00e78023 sb a4,0(a5) # 40002000 <__global_pointer$+0x3fff058c> 1019c: 40a00533 neg a0,a0 101a0: 00000793 li a5,0 101a4: 00900593 li a1,9 101a8: 02051e63 bnez a0,101e4 <_start+0x9c> 101ac: 40002737 lui a4,0x40002 101b0: 03000693 li a3,48 101b4: 00d70023 sb a3,0(a4) # 40002000 <__global_pointer$+0x3fff058c> 101b8: 07800693 li a3,120 101bc: 00d70023 sb a3,0(a4) 101c0: fff00713 li a4,-1 101c4: 400026b7 lui a3,0x40002 101c8: fff78793 addi a5,a5,-1 101cc: 04e79263 bne a5,a4,10210 <_start+0xc8> 101d0: 06c12083 lw ra,108(sp) 101d4: 06812403 lw s0,104(sp) 101d8: 00000513 li a0,0 101dc: 07010113 addi sp,sp,112 101e0: 00008067 ret 101e4: 00f57613 andi a2,a0,15 101e8: 0ff67693 andi a3,a2,255 101ec: 03068713 addi a4,a3,48 # 40002030 <__global_pointer$+0x3fff05bc> 101f0: 00c5d463 bge a1,a2,101f8 <_start+0xb0> 101f4: 03768713 addi a4,a3,55 101f8: 00410693 addi a3,sp,4 101fc: 00f686b3 add a3,a3,a5 10200: 00e68023 sb a4,0(a3) 10204: 40455513 srai a0,a0,0x4 10208: 00178793 addi a5,a5,1 1020c: f9dff06f j 101a8 <_start+0x60> 10210: 00410613 addi a2,sp,4 10214: 00f60633 add a2,a2,a5 10218: 00064603 lbu a2,0(a2) 1021c: 00c68023 sb a2,0(a3) 10220: fa9ff06f j 101c8 <_start+0x80> ``` ## Interesting Findings and Further Tests * Comparison with inline subroutine Because I found `-O3` optimization would do the same thing as `inline` flag I want to find if in this case the compiler would do the same thing. By adding inline prefix for `Conv_Sum`, `mul`, and both `Conv_Sum` and `mul` would make three cases to discuss. Below is the execution result of three cases. * -O3 flag ```bash 0xDFE >>> Execution time: 3924561 ns >>> Instruction count: 2147 (IPS=547067) >>> Jumps: 623 (29.02%) - 309 forwards, 314 backwards >>> Branching T=618 (95.22%) F=31 (4.78%) ``` * inline Conv_Sum ```bash 0xDFE >>> Execution time: 131317 ns >>> Instruction count: 2147 (IPS=16349749) >>> Jumps: 623 (29.02%) - 309 forwards, 314 backwards >>> Branching T=618 (95.22%) F=31 (4.78%) ``` * inline mul ```bash 0xDFE >>> Execution time: 69190 ns >>> Instruction count: 2147 (IPS=31030495) >>> Jumps: 623 (29.02%) - 309 forwards, 314 backwards >>> Branching T=618 (95.22%) F=31 (4.78%) ``` * inline both function ```bash 0xDFE >>> Execution time: 86672 ns >>> Instruction count: 2147 (IPS=24771552) >>> Jumps: 623 (29.02%) - 309 forwards, 314 backwards >>> Branching T=618 (95.22%) F=31 (4.78%) ``` After the comparison among these four compiler generated code, I found they are still identical except for address for jump and branch. Thus, optimization of compiler under `-O3` flag seems to inline function as default.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.