owned this note
owned this note
Published
Linked with GitHub
---
title: "Lab2: RISC-V RV32I[MA] emulator with ELF support"
---
Lab2: RISC-V RV32I[MA] emulator with ELF support for Convolution Sum
===
###### tags: `RISC-V` `Computer Architeture`
[TOC]
**[Original Document](https://hackmd.io/hXZ0XVxTTLexGeQXfcj6aA)**
## Rewritten Code in C
Above original program is rewritten in `Conv_Sum.c` by myself as below.
I did a little modification to original `*` operator. While we are using RV32I micro architecture, there's no `mul` supported. Thus, I use the code credited to [dck9661](https://hackmd.io/@oR8-QX4TQzGKDJ72DmqDUg/SJrNq1QuB), and made a little change to suffice signed numbers multiplication as `int mul(int, int)` below.
```c=
int mul(int a, int b){
int result = 0, minus_flag = ((a < 0) != (b < 0)) ? 1 : 0;
a = (a < 0) ? -a : a;
b = (b < 0) ? -b : b;
for(int i=0;i<31;i++) {
if(b & (0x1 << i))
result = result + (a << i);
}
if(minus_flag)
return -result;
return result;
}
int Conv_Sum(int A[], int B[], unsigned int len) {
int result = 0;
for(int i = 0; i < len; i ++) {
result += mul(A[i], B[i]);
}
return result;
}
int _start(){
int signal[10] = {
39,
174,
243,
51,
73,
184,
33,
137,
82,
114
};
int weight[10] = {
6,
10,
0,
-2,
-8,
8,
8,
10,
4,
-10
};
int result = Conv_Sum(signal, weight, 10);
// Printing
int charIndex = 0;
char charStack[12];
volatile char* tx = (volatile char*) 0x40002000;
if(result < 0){
*tx = '-';
result = -result;
}
for(; result; result >>=4, charIndex ++){
if((result & 0xf) > 9)
charStack[charIndex] = (result & 0xf) + 'A' - 10;
else
charStack[charIndex] = (result & 0xf) + '0';
}
*tx = '0';
*tx = 'x';
while(charIndex--){
*tx = charStack[charIndex];
}
return 0;
}
```
## Execution with rv32emu
1. By using `riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -O3 -nostdlib ./Conv_Sum.c -o Conv_Sum3` to compile Conv_Sum.c above with optimization in favor of speed and executing with `emu-rv32i Conv_Sum3`, I had the result below.
```bash
0xDFE
>>> Execution time: 3924561 ns
>>> Instruction count: 2147 (IPS=547067)
>>> Jumps: 623 (29.02%) - 309 forwards, 314 backwards
>>> Branching T=618 (95.22%) F=31 (4.78%)
```
2. By using `riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -Os -nostdlib ./Conv_Sum.c -o Conv_Sums`to compile Conv_Sum.c above with optimization in favor of code size, I would confront the problem that compiler is trying to link `memcpy` since I'm initializing an array. Thus, I add a little C code below to suffice the `memcpy` linkage problem.
```c=
#include <stddef.h>
void* memcpy(void* str1, const void* str2, size_t n){
char* d = str1;
const char* s = str2;
while(n --)
*d ++ = *s ++;
return d;
}
```
After successfully compiling `Conv_Sum.c` with `-Os` flag, I executed it with `emu-rv32i Conv_Sum3` and had the result below.
```bash
0xDFE
>>> Execution time: 5061702 ns
>>> Instruction count: 2482 (IPS=490348)
>>> Jumps: 824 (33.20%) - 414 forwards, 410 backwards
>>> Branching T=701 (95.37%) F=34 (4.63%)
```
## Checking Code Size
By using `riscv-none-embed-objdump -h Conv_Sum3 > Conv_Sum3h` and `riscv-none-embed-objdump -h Conv_Sums > Conv_Sumsh` to check the segments of code, I have below result. Still, the `-Os` flag did a significant improvement to code size.
* Conv_Sum with -O3
```bash
Conv_Sum3: file format elf32-littleriscv
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000494 00010054 00010054 00000054 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .rodata 00000050 000104e8 000104e8 000004e8 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .comment 00000033 00000000 00000000 00000538 2**0
CONTENTS, READONLY
```
* Conv_Sum with -Os
```bash
Conv_Sums: file format elf32-littleriscv
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 000001d0 00010054 00010054 00000054 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
1 .rodata 00000050 00010224 00010224 00000224 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .comment 00000033 00000000 00000000 00000274 2**0
CONTENTS, READONLY
```
## Checking Disassembled Program
By using `riscv-none-embed-objdump -d Conv_Sum3 > Conv_Sum3.as
` and `riscv-none-embed-objdump -d Conv_Sums > Conv_Sums.as` to check the disassembled code of the two elf files, I have below result.
* Conv_Sum with -O3
```bash
Conv_Sum3: file format elf32-littleriscv
Disassembly of section .text:
00010054 <mul>:
10054: 41f5d793 srai a5,a1,0x1f
10058: 41f55713 srai a4,a0,0x1f
1005c: 00b7c633 xor a2,a5,a1
10060: 00a74833 xor a6,a4,a0
10064: 01f55313 srli t1,a0,0x1f
10068: 01f5d893 srli a7,a1,0x1f
1006c: 40f60633 sub a2,a2,a5
10070: 40e80833 sub a6,a6,a4
10074: 00000793 li a5,0
10078: 00000513 li a0,0
1007c: 01f00593 li a1,31
10080: 40f65733 sra a4,a2,a5
10084: 00177713 andi a4,a4,1
10088: 00f816b3 sll a3,a6,a5
1008c: 00178793 addi a5,a5,1
10090: 00070463 beqz a4,10098 <mul+0x44>
10094: 00d50533 add a0,a0,a3
10098: feb794e3 bne a5,a1,10080 <mul+0x2c>
1009c: 01130463 beq t1,a7,100a4 <mul+0x50>
100a0: 40a00533 neg a0,a0
100a4: 00008067 ret
000100a8 <Conv_Sum>:
100a8: 08060663 beqz a2,10134 <Conv_Sum+0x8c>
100ac: 00261613 slli a2,a2,0x2
100b0: 00050f13 mv t5,a0
100b4: 00c502b3 add t0,a0,a2
100b8: 01f00313 li t1,31
100bc: 00000513 li a0,0
100c0: 0005a783 lw a5,0(a1)
100c4: 000f2703 lw a4,0(t5)
100c8: 00000893 li a7,0
100cc: 41f7d693 srai a3,a5,0x1f
100d0: 41f75f93 srai t6,a4,0x1f
100d4: 00f6c633 xor a2,a3,a5
100d8: 00efc833 xor a6,t6,a4
100dc: 01f7de13 srli t3,a5,0x1f
100e0: 01f75e93 srli t4,a4,0x1f
100e4: 41f80833 sub a6,a6,t6
100e8: 40d60633 sub a2,a2,a3
100ec: 00000793 li a5,0
100f0: 40f65733 sra a4,a2,a5
100f4: 00177713 andi a4,a4,1
100f8: 00f816b3 sll a3,a6,a5
100fc: 00178793 addi a5,a5,1
10100: 00070463 beqz a4,10108 <Conv_Sum+0x60>
10104: 00d888b3 add a7,a7,a3
10108: fe6794e3 bne a5,t1,100f0 <Conv_Sum+0x48>
1010c: 004f0f13 addi t5,t5,4
10110: 01ce8a63 beq t4,t3,10124 <Conv_Sum+0x7c>
10114: 41150533 sub a0,a0,a7
10118: 00458593 addi a1,a1,4
1011c: fa5f12e3 bne t5,t0,100c0 <Conv_Sum+0x18>
10120: 00008067 ret
10124: 01150533 add a0,a0,a7
10128: 00458593 addi a1,a1,4
1012c: f9e29ae3 bne t0,t5,100c0 <Conv_Sum+0x18>
10130: 00008067 ret
10134: 00000513 li a0,0
10138: 00008067 ret
0001013c <_start>:
1013c: 000107b7 lui a5,0x10
10140: 4e878793 addi a5,a5,1256 # 104e8 <_start+0x3ac>
10144: 0007a703 lw a4,0(a5)
10148: fa010113 addi sp,sp,-96
1014c: 01c7af83 lw t6,28(a5)
10150: 00e12823 sw a4,16(sp)
10154: 0047a703 lw a4,4(a5)
10158: 0207af03 lw t5,32(a5)
1015c: 0287ae03 lw t3,40(a5)
10160: 00e12a23 sw a4,20(sp)
10164: 0087a703 lw a4,8(a5)
10168: 02c7a503 lw a0,44(a5)
1016c: 0147a383 lw t2,20(a5)
10170: 00e12c23 sw a4,24(sp)
10174: 00c7a703 lw a4,12(a5)
10178: 0187a283 lw t0,24(a5)
1017c: 0247ae83 lw t4,36(a5)
10180: 00e12e23 sw a4,28(sp)
10184: 0107a703 lw a4,16(a5)
10188: 0347a303 lw t1,52(a5)
1018c: 0387a883 lw a7,56(a5)
10190: 03c7a803 lw a6,60(a5)
10194: 0407a583 lw a1,64(a5)
10198: 0447a603 lw a2,68(a5)
1019c: 0487a683 lw a3,72(a5)
101a0: 02e12023 sw a4,32(sp)
101a4: 04c7a703 lw a4,76(a5)
101a8: 0307a783 lw a5,48(a5)
101ac: 03f12623 sw t6,44(sp)
101b0: 03e12823 sw t5,48(sp)
101b4: 03c12c23 sw t3,56(sp)
101b8: 02a12e23 sw a0,60(sp)
101bc: 02712223 sw t2,36(sp)
101c0: 02512423 sw t0,40(sp)
101c4: 03d12a23 sw t4,52(sp)
101c8: 03810e13 addi t3,sp,56
101cc: 04f12023 sw a5,64(sp)
101d0: 01010513 addi a0,sp,16
101d4: 04612223 sw t1,68(sp)
101d8: 05112423 sw a7,72(sp)
101dc: 05012623 sw a6,76(sp)
101e0: 04b12823 sw a1,80(sp)
101e4: 04c12a23 sw a2,84(sp)
101e8: 04d12c23 sw a3,88(sp)
101ec: 04e12e23 sw a4,92(sp)
101f0: 000e0f93 mv t6,t3
101f4: 00000313 li t1,0
101f8: 01f00f13 li t5,31
101fc: 00052703 lw a4,0(a0)
10200: 000e2783 lw a5,0(t3)
10204: 00000593 li a1,0
10208: 41f75e93 srai t4,a4,0x1f
1020c: 41f7d613 srai a2,a5,0x1f
10210: 00eec8b3 xor a7,t4,a4
10214: 00f64833 xor a6,a2,a5
10218: 01f75693 srli a3,a4,0x1f
1021c: 41d888b3 sub a7,a7,t4
10220: 01f7d713 srli a4,a5,0x1f
10224: 40c80833 sub a6,a6,a2
10228: 00000793 li a5,0
1022c: 40f85633 sra a2,a6,a5
10230: 00167613 andi a2,a2,1
10234: 00f89eb3 sll t4,a7,a5
10238: 00178793 addi a5,a5,1
1023c: 00060463 beqz a2,10244 <_start+0x108>
10240: 01d585b3 add a1,a1,t4
10244: ffe794e3 bne a5,t5,1022c <_start+0xf0>
10248: 00450513 addi a0,a0,4
1024c: 1ce68463 beq a3,a4,10414 <_start+0x2d8>
10250: 40b30333 sub t1,t1,a1
10254: 004e0e13 addi t3,t3,4
10258: fbf512e3 bne a0,t6,101fc <_start+0xc0>
1025c: 1c034463 bltz t1,10424 <_start+0x2e8>
10260: 26030063 beqz t1,104c0 <_start+0x384>
10264: 00f37793 andi a5,t1,15
10268: 0ff7f693 andi a3,a5,255
1026c: 00900613 li a2,9
10270: 03068713 addi a4,a3,48
10274: 1cf64a63 blt a2,a5,10448 <_start+0x30c>
10278: 00e10223 sb a4,4(sp)
1027c: 40435793 srai a5,t1,0x4
10280: 1e078863 beqz a5,10470 <_start+0x334>
10284: 00f7f793 andi a5,a5,15
10288: 0ff7f693 andi a3,a5,255
1028c: 00900613 li a2,9
10290: 03068713 addi a4,a3,48
10294: 1af64e63 blt a2,a5,10450 <_start+0x314>
10298: 00e102a3 sb a4,5(sp)
1029c: 40835793 srai a5,t1,0x8
102a0: 1e078863 beqz a5,10490 <_start+0x354>
102a4: 00f7f793 andi a5,a5,15
102a8: 0ff7f693 andi a3,a5,255
102ac: 00900613 li a2,9
102b0: 03068713 addi a4,a3,48
102b4: 1af64263 blt a2,a5,10458 <_start+0x31c>
102b8: 00e10323 sb a4,6(sp)
102bc: 40c35793 srai a5,t1,0xc
102c0: 1e078863 beqz a5,104b0 <_start+0x374>
102c4: 00f7f793 andi a5,a5,15
102c8: 0ff7f693 andi a3,a5,255
102cc: 00900613 li a2,9
102d0: 03068713 addi a4,a3,48
102d4: 18f64663 blt a2,a5,10460 <_start+0x324>
102d8: 00e103a3 sb a4,7(sp)
102dc: 41035793 srai a5,t1,0x10
102e0: 1c078c63 beqz a5,104b8 <_start+0x37c>
102e4: 00f7f793 andi a5,a5,15
102e8: 0ff7f693 andi a3,a5,255
102ec: 00900613 li a2,9
102f0: 03068713 addi a4,a3,48
102f4: 00f65463 bge a2,a5,102fc <_start+0x1c0>
102f8: 03768713 addi a4,a3,55
102fc: 00e10423 sb a4,8(sp)
10300: 41435793 srai a5,t1,0x14
10304: 16078263 beqz a5,10468 <_start+0x32c>
10308: 00f7f793 andi a5,a5,15
1030c: 0ff7f693 andi a3,a5,255
10310: 00900613 li a2,9
10314: 03068713 addi a4,a3,48
10318: 00f65463 bge a2,a5,10320 <_start+0x1e4>
1031c: 03768713 addi a4,a3,55
10320: 00e104a3 sb a4,9(sp)
10324: 41835793 srai a5,t1,0x18
10328: 1a078863 beqz a5,104d8 <_start+0x39c>
1032c: 00f7f793 andi a5,a5,15
10330: 0ff7f693 andi a3,a5,255
10334: 00900613 li a2,9
10338: 03068713 addi a4,a3,48
1033c: 00f65463 bge a2,a5,10344 <_start+0x208>
10340: 03768713 addi a4,a3,55
10344: 00e10523 sb a4,10(sp)
10348: 41c35313 srai t1,t1,0x1c
1034c: 18030a63 beqz t1,104e0 <_start+0x3a4>
10350: 03030313 addi t1,t1,48
10354: 006105a3 sb t1,11(sp)
10358: 00700713 li a4,7
1035c: 400027b7 lui a5,0x40002
10360: 03000693 li a3,48
10364: 00d78023 sb a3,0(a5) # 40002000 <__global_pointer$+0x3fff02c8>
10368: 07800693 li a3,120
1036c: 00d78023 sb a3,0(a5)
10370: 06010693 addi a3,sp,96
10374: 00e68633 add a2,a3,a4
10378: fa464503 lbu a0,-92(a2)
1037c: ffe70693 addi a3,a4,-2
10380: 06010593 addi a1,sp,96
10384: 00a78023 sb a0,0(a5)
10388: fa364503 lbu a0,-93(a2)
1038c: 00d585b3 add a1,a1,a3
10390: ffd70613 addi a2,a4,-3
10394: 00a78023 sb a0,0(a5)
10398: fa45c583 lbu a1,-92(a1)
1039c: 00b78023 sb a1,0(a5)
103a0: 06068463 beqz a3,10408 <_start+0x2cc>
103a4: 06010693 addi a3,sp,96
103a8: 00c686b3 add a3,a3,a2
103ac: fa46c583 lbu a1,-92(a3)
103b0: ffc70693 addi a3,a4,-4
103b4: 00b78023 sb a1,0(a5)
103b8: 04060863 beqz a2,10408 <_start+0x2cc>
103bc: 06010613 addi a2,sp,96
103c0: 00d60633 add a2,a2,a3
103c4: fa464583 lbu a1,-92(a2)
103c8: ffb70613 addi a2,a4,-5
103cc: 00b78023 sb a1,0(a5)
103d0: 02068c63 beqz a3,10408 <_start+0x2cc>
103d4: 06010693 addi a3,sp,96
103d8: 00c686b3 add a3,a3,a2
103dc: fa46c683 lbu a3,-92(a3)
103e0: ffa70713 addi a4,a4,-6
103e4: 00d78023 sb a3,0(a5)
103e8: 02060063 beqz a2,10408 <_start+0x2cc>
103ec: 06010693 addi a3,sp,96
103f0: 00e686b3 add a3,a3,a4
103f4: fa46c683 lbu a3,-92(a3)
103f8: 00d78023 sb a3,0(a5)
103fc: 00070663 beqz a4,10408 <_start+0x2cc>
10400: 00414703 lbu a4,4(sp)
10404: 00e78023 sb a4,0(a5)
10408: 00000513 li a0,0
1040c: 06010113 addi sp,sp,96
10410: 00008067 ret
10414: 00b30333 add t1,t1,a1
10418: 004e0e13 addi t3,t3,4
1041c: dff510e3 bne a0,t6,101fc <_start+0xc0>
10420: e40350e3 bgez t1,10260 <_start+0x124>
10424: 400027b7 lui a5,0x40002
10428: 02d00713 li a4,45
1042c: 40600333 neg t1,t1
10430: 00e78023 sb a4,0(a5) # 40002000 <__global_pointer$+0x3fff02c8>
10434: 00f37793 andi a5,t1,15
10438: 0ff7f693 andi a3,a5,255
1043c: 00900613 li a2,9
10440: 03068713 addi a4,a3,48
10444: e2f65ae3 bge a2,a5,10278 <_start+0x13c>
10448: 03768713 addi a4,a3,55
1044c: e2dff06f j 10278 <_start+0x13c>
10450: 03768713 addi a4,a3,55
10454: e45ff06f j 10298 <_start+0x15c>
10458: 03768713 addi a4,a3,55
1045c: e5dff06f j 102b8 <_start+0x17c>
10460: 03768713 addi a4,a3,55
10464: e75ff06f j 102d8 <_start+0x19c>
10468: 00400713 li a4,4
1046c: ef1ff06f j 1035c <_start+0x220>
10470: 400027b7 lui a5,0x40002
10474: 03000713 li a4,48
10478: 00e78023 sb a4,0(a5) # 40002000 <__global_pointer$+0x3fff02c8>
1047c: 07800713 li a4,120
10480: 00e78023 sb a4,0(a5)
10484: 00414703 lbu a4,4(sp)
10488: 00e78023 sb a4,0(a5)
1048c: f7dff06f j 10408 <_start+0x2cc>
10490: 400027b7 lui a5,0x40002
10494: 03000713 li a4,48
10498: 00e78023 sb a4,0(a5) # 40002000 <__global_pointer$+0x3fff02c8>
1049c: 07800713 li a4,120
104a0: 00e78023 sb a4,0(a5)
104a4: 00514703 lbu a4,5(sp)
104a8: 00e78023 sb a4,0(a5)
104ac: f55ff06f j 10400 <_start+0x2c4>
104b0: 00200713 li a4,2
104b4: ea9ff06f j 1035c <_start+0x220>
104b8: 00300713 li a4,3
104bc: ea1ff06f j 1035c <_start+0x220>
104c0: 400027b7 lui a5,0x40002
104c4: 03000713 li a4,48
104c8: 00e78023 sb a4,0(a5) # 40002000 <__global_pointer$+0x3fff02c8>
104cc: 07800713 li a4,120
104d0: 00e78023 sb a4,0(a5)
104d4: f35ff06f j 10408 <_start+0x2cc>
104d8: 00500713 li a4,5
104dc: e81ff06f j 1035c <_start+0x220>
104e0: 00600713 li a4,6
104e4: e79ff06f j 1035c <_start+0x220>
```
* Conv_Sum with -Os
```bash
Conv_Sums: file format elf32-littleriscv
Disassembly of section .text:
00010054 <memcpy>:
10054: 00000793 li a5,0
10058: 00f61663 bne a2,a5,10064 <memcpy+0x10>
1005c: 00c50533 add a0,a0,a2
10060: 00008067 ret
10064: 00f58733 add a4,a1,a5
10068: 00074683 lbu a3,0(a4)
1006c: 00f50733 add a4,a0,a5
10070: 00178793 addi a5,a5,1
10074: 00d70023 sb a3,0(a4)
10078: fe1ff06f j 10058 <memcpy+0x4>
0001007c <mul>:
1007c: 41f55713 srai a4,a0,0x1f
10080: 00a747b3 xor a5,a4,a0
10084: 40e787b3 sub a5,a5,a4
10088: 41f5d713 srai a4,a1,0x1f
1008c: 01f5d613 srli a2,a1,0x1f
10090: 00b745b3 xor a1,a4,a1
10094: 01f55813 srli a6,a0,0x1f
10098: 40e585b3 sub a1,a1,a4
1009c: 00000513 li a0,0
100a0: 00000713 li a4,0
100a4: 01f00893 li a7,31
100a8: 40e5d6b3 sra a3,a1,a4
100ac: 0016f693 andi a3,a3,1
100b0: 00068663 beqz a3,100bc <mul+0x40>
100b4: 00e796b3 sll a3,a5,a4
100b8: 00d50533 add a0,a0,a3
100bc: 00170713 addi a4,a4,1
100c0: ff1714e3 bne a4,a7,100a8 <mul+0x2c>
100c4: 00c80463 beq a6,a2,100cc <mul+0x50>
100c8: 40a00533 neg a0,a0
100cc: 00008067 ret
000100d0 <Conv_Sum>:
100d0: fe010113 addi sp,sp,-32
100d4: 00812c23 sw s0,24(sp)
100d8: 00912a23 sw s1,20(sp)
100dc: 01212823 sw s2,16(sp)
100e0: 01312623 sw s3,12(sp)
100e4: 01412423 sw s4,8(sp)
100e8: 00112e23 sw ra,28(sp)
100ec: 00050993 mv s3,a0
100f0: 00058a13 mv s4,a1
100f4: 00261913 slli s2,a2,0x2
100f8: 00000413 li s0,0
100fc: 00000493 li s1,0
10100: 03241463 bne s0,s2,10128 <Conv_Sum+0x58>
10104: 01c12083 lw ra,28(sp)
10108: 01812403 lw s0,24(sp)
1010c: 00048513 mv a0,s1
10110: 01012903 lw s2,16(sp)
10114: 01412483 lw s1,20(sp)
10118: 00c12983 lw s3,12(sp)
1011c: 00812a03 lw s4,8(sp)
10120: 02010113 addi sp,sp,32
10124: 00008067 ret
10128: 008a0733 add a4,s4,s0
1012c: 008987b3 add a5,s3,s0
10130: 00072583 lw a1,0(a4)
10134: 0007a503 lw a0,0(a5)
10138: 00440413 addi s0,s0,4
1013c: f41ff0ef jal ra,1007c <mul>
10140: 00a484b3 add s1,s1,a0
10144: fbdff06f j 10100 <Conv_Sum+0x30>
00010148 <_start>:
10148: f9010113 addi sp,sp,-112
1014c: 000105b7 lui a1,0x10
10150: 06812423 sw s0,104(sp)
10154: 02800613 li a2,40
10158: 22458413 addi s0,a1,548 # 10224 <_start+0xdc>
1015c: 01010513 addi a0,sp,16
10160: 22458593 addi a1,a1,548
10164: 06112623 sw ra,108(sp)
10168: eedff0ef jal ra,10054 <memcpy>
1016c: 02800613 li a2,40
10170: 02840593 addi a1,s0,40
10174: 03810513 addi a0,sp,56
10178: eddff0ef jal ra,10054 <memcpy>
1017c: 00a00613 li a2,10
10180: 03810593 addi a1,sp,56
10184: 01010513 addi a0,sp,16
10188: f49ff0ef jal ra,100d0 <Conv_Sum>
1018c: 00055a63 bgez a0,101a0 <_start+0x58>
10190: 400027b7 lui a5,0x40002
10194: 02d00713 li a4,45
10198: 00e78023 sb a4,0(a5) # 40002000 <__global_pointer$+0x3fff058c>
1019c: 40a00533 neg a0,a0
101a0: 00000793 li a5,0
101a4: 00900593 li a1,9
101a8: 02051e63 bnez a0,101e4 <_start+0x9c>
101ac: 40002737 lui a4,0x40002
101b0: 03000693 li a3,48
101b4: 00d70023 sb a3,0(a4) # 40002000 <__global_pointer$+0x3fff058c>
101b8: 07800693 li a3,120
101bc: 00d70023 sb a3,0(a4)
101c0: fff00713 li a4,-1
101c4: 400026b7 lui a3,0x40002
101c8: fff78793 addi a5,a5,-1
101cc: 04e79263 bne a5,a4,10210 <_start+0xc8>
101d0: 06c12083 lw ra,108(sp)
101d4: 06812403 lw s0,104(sp)
101d8: 00000513 li a0,0
101dc: 07010113 addi sp,sp,112
101e0: 00008067 ret
101e4: 00f57613 andi a2,a0,15
101e8: 0ff67693 andi a3,a2,255
101ec: 03068713 addi a4,a3,48 # 40002030 <__global_pointer$+0x3fff05bc>
101f0: 00c5d463 bge a1,a2,101f8 <_start+0xb0>
101f4: 03768713 addi a4,a3,55
101f8: 00410693 addi a3,sp,4
101fc: 00f686b3 add a3,a3,a5
10200: 00e68023 sb a4,0(a3)
10204: 40455513 srai a0,a0,0x4
10208: 00178793 addi a5,a5,1
1020c: f9dff06f j 101a8 <_start+0x60>
10210: 00410613 addi a2,sp,4
10214: 00f60633 add a2,a2,a5
10218: 00064603 lbu a2,0(a2)
1021c: 00c68023 sb a2,0(a3)
10220: fa9ff06f j 101c8 <_start+0x80>
```
## Interesting Findings and Further Tests
* Comparison with inline subroutine
Because I found `-O3` optimization would do the same thing as `inline` flag I want to find if in this case the compiler would do the same thing.
By adding inline prefix for `Conv_Sum`, `mul`, and both `Conv_Sum` and `mul` would make three cases to discuss. Below is the execution result of three cases.
* -O3 flag
```bash
0xDFE
>>> Execution time: 3924561 ns
>>> Instruction count: 2147 (IPS=547067)
>>> Jumps: 623 (29.02%) - 309 forwards, 314 backwards
>>> Branching T=618 (95.22%) F=31 (4.78%)
```
* inline Conv_Sum
```bash
0xDFE
>>> Execution time: 131317 ns
>>> Instruction count: 2147 (IPS=16349749)
>>> Jumps: 623 (29.02%) - 309 forwards, 314 backwards
>>> Branching T=618 (95.22%) F=31 (4.78%)
```
* inline mul
```bash
0xDFE
>>> Execution time: 69190 ns
>>> Instruction count: 2147 (IPS=31030495)
>>> Jumps: 623 (29.02%) - 309 forwards, 314 backwards
>>> Branching T=618 (95.22%) F=31 (4.78%)
```
* inline both function
```bash
0xDFE
>>> Execution time: 86672 ns
>>> Instruction count: 2147 (IPS=24771552)
>>> Jumps: 623 (29.02%) - 309 forwards, 314 backwards
>>> Branching T=618 (95.22%) F=31 (4.78%)
```
After the comparison among these four compiler generated code, I found they are still identical except for address for jump and branch. Thus, optimization of compiler under `-O3` flag seems to inline function as default.