魏晉成
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    --- title: "Lab2: RISC-V RV32I[MA] emulator with ELF support" --- Lab2: RISC-V RV32I[MA] emulator with ELF support for Convolution Sum === ###### tags: `RISC-V` `Computer Architeture` [TOC] **[Original Document](https://hackmd.io/hXZ0XVxTTLexGeQXfcj6aA)** ## Rewritten Code in C Above original program is rewritten in `Conv_Sum.c` by myself as below. I did a little modification to original `*` operator. While we are using RV32I micro architecture, there's no `mul` supported. Thus, I use the code credited to [dck9661](https://hackmd.io/@oR8-QX4TQzGKDJ72DmqDUg/SJrNq1QuB), and made a little change to suffice signed numbers multiplication as `int mul(int, int)` below. ```c= int mul(int a, int b){ int result = 0, minus_flag = ((a < 0) != (b < 0)) ? 1 : 0; a = (a < 0) ? -a : a; b = (b < 0) ? -b : b; for(int i=0;i<31;i++) { if(b & (0x1 << i)) result = result + (a << i); } if(minus_flag) return -result; return result; } int Conv_Sum(int A[], int B[], unsigned int len) { int result = 0; for(int i = 0; i < len; i ++) { result += mul(A[i], B[i]); } return result; } int _start(){ int signal[10] = { 39, 174, 243, 51, 73, 184, 33, 137, 82, 114 }; int weight[10] = { 6, 10, 0, -2, -8, 8, 8, 10, 4, -10 }; int result = Conv_Sum(signal, weight, 10); // Printing int charIndex = 0; char charStack[12]; volatile char* tx = (volatile char*) 0x40002000; if(result < 0){ *tx = '-'; result = -result; } for(; result; result >>=4, charIndex ++){ if((result & 0xf) > 9) charStack[charIndex] = (result & 0xf) + 'A' - 10; else charStack[charIndex] = (result & 0xf) + '0'; } *tx = '0'; *tx = 'x'; while(charIndex--){ *tx = charStack[charIndex]; } return 0; } ``` ## Execution with rv32emu 1. By using `riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -O3 -nostdlib ./Conv_Sum.c -o Conv_Sum3` to compile Conv_Sum.c above with optimization in favor of speed and executing with `emu-rv32i Conv_Sum3`, I had the result below. ```bash 0xDFE >>> Execution time: 3924561 ns >>> Instruction count: 2147 (IPS=547067) >>> Jumps: 623 (29.02%) - 309 forwards, 314 backwards >>> Branching T=618 (95.22%) F=31 (4.78%) ``` 2. By using `riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -Os -nostdlib ./Conv_Sum.c -o Conv_Sums`to compile Conv_Sum.c above with optimization in favor of code size, I would confront the problem that compiler is trying to link `memcpy` since I'm initializing an array. Thus, I add a little C code below to suffice the `memcpy` linkage problem. ```c= #include <stddef.h> void* memcpy(void* str1, const void* str2, size_t n){ char* d = str1; const char* s = str2; while(n --) *d ++ = *s ++; return d; } ``` After successfully compiling `Conv_Sum.c` with `-Os` flag, I executed it with `emu-rv32i Conv_Sum3` and had the result below. ```bash 0xDFE >>> Execution time: 5061702 ns >>> Instruction count: 2482 (IPS=490348) >>> Jumps: 824 (33.20%) - 414 forwards, 410 backwards >>> Branching T=701 (95.37%) F=34 (4.63%) ``` ## Checking Code Size By using `riscv-none-embed-objdump -h Conv_Sum3 > Conv_Sum3h` and `riscv-none-embed-objdump -h Conv_Sums > Conv_Sumsh` to check the segments of code, I have below result. Still, the `-Os` flag did a significant improvement to code size. * Conv_Sum with -O3 ```bash Conv_Sum3: file format elf32-littleriscv Sections: Idx Name Size VMA LMA File off Algn 0 .text 00000494 00010054 00010054 00000054 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 1 .rodata 00000050 000104e8 000104e8 000004e8 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 2 .comment 00000033 00000000 00000000 00000538 2**0 CONTENTS, READONLY ``` * Conv_Sum with -Os ```bash Conv_Sums: file format elf32-littleriscv Sections: Idx Name Size VMA LMA File off Algn 0 .text 000001d0 00010054 00010054 00000054 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 1 .rodata 00000050 00010224 00010224 00000224 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 2 .comment 00000033 00000000 00000000 00000274 2**0 CONTENTS, READONLY ``` ## Checking Disassembled Program By using `riscv-none-embed-objdump -d Conv_Sum3 > Conv_Sum3.as ` and `riscv-none-embed-objdump -d Conv_Sums > Conv_Sums.as` to check the disassembled code of the two elf files, I have below result. * Conv_Sum with -O3 ```bash Conv_Sum3: file format elf32-littleriscv Disassembly of section .text: 00010054 <mul>: 10054: 41f5d793 srai a5,a1,0x1f 10058: 41f55713 srai a4,a0,0x1f 1005c: 00b7c633 xor a2,a5,a1 10060: 00a74833 xor a6,a4,a0 10064: 01f55313 srli t1,a0,0x1f 10068: 01f5d893 srli a7,a1,0x1f 1006c: 40f60633 sub a2,a2,a5 10070: 40e80833 sub a6,a6,a4 10074: 00000793 li a5,0 10078: 00000513 li a0,0 1007c: 01f00593 li a1,31 10080: 40f65733 sra a4,a2,a5 10084: 00177713 andi a4,a4,1 10088: 00f816b3 sll a3,a6,a5 1008c: 00178793 addi a5,a5,1 10090: 00070463 beqz a4,10098 <mul+0x44> 10094: 00d50533 add a0,a0,a3 10098: feb794e3 bne a5,a1,10080 <mul+0x2c> 1009c: 01130463 beq t1,a7,100a4 <mul+0x50> 100a0: 40a00533 neg a0,a0 100a4: 00008067 ret 000100a8 <Conv_Sum>: 100a8: 08060663 beqz a2,10134 <Conv_Sum+0x8c> 100ac: 00261613 slli a2,a2,0x2 100b0: 00050f13 mv t5,a0 100b4: 00c502b3 add t0,a0,a2 100b8: 01f00313 li t1,31 100bc: 00000513 li a0,0 100c0: 0005a783 lw a5,0(a1) 100c4: 000f2703 lw a4,0(t5) 100c8: 00000893 li a7,0 100cc: 41f7d693 srai a3,a5,0x1f 100d0: 41f75f93 srai t6,a4,0x1f 100d4: 00f6c633 xor a2,a3,a5 100d8: 00efc833 xor a6,t6,a4 100dc: 01f7de13 srli t3,a5,0x1f 100e0: 01f75e93 srli t4,a4,0x1f 100e4: 41f80833 sub a6,a6,t6 100e8: 40d60633 sub a2,a2,a3 100ec: 00000793 li a5,0 100f0: 40f65733 sra a4,a2,a5 100f4: 00177713 andi a4,a4,1 100f8: 00f816b3 sll a3,a6,a5 100fc: 00178793 addi a5,a5,1 10100: 00070463 beqz a4,10108 <Conv_Sum+0x60> 10104: 00d888b3 add a7,a7,a3 10108: fe6794e3 bne a5,t1,100f0 <Conv_Sum+0x48> 1010c: 004f0f13 addi t5,t5,4 10110: 01ce8a63 beq t4,t3,10124 <Conv_Sum+0x7c> 10114: 41150533 sub a0,a0,a7 10118: 00458593 addi a1,a1,4 1011c: fa5f12e3 bne t5,t0,100c0 <Conv_Sum+0x18> 10120: 00008067 ret 10124: 01150533 add a0,a0,a7 10128: 00458593 addi a1,a1,4 1012c: f9e29ae3 bne t0,t5,100c0 <Conv_Sum+0x18> 10130: 00008067 ret 10134: 00000513 li a0,0 10138: 00008067 ret 0001013c <_start>: 1013c: 000107b7 lui a5,0x10 10140: 4e878793 addi a5,a5,1256 # 104e8 <_start+0x3ac> 10144: 0007a703 lw a4,0(a5) 10148: fa010113 addi sp,sp,-96 1014c: 01c7af83 lw t6,28(a5) 10150: 00e12823 sw a4,16(sp) 10154: 0047a703 lw a4,4(a5) 10158: 0207af03 lw t5,32(a5) 1015c: 0287ae03 lw t3,40(a5) 10160: 00e12a23 sw a4,20(sp) 10164: 0087a703 lw a4,8(a5) 10168: 02c7a503 lw a0,44(a5) 1016c: 0147a383 lw t2,20(a5) 10170: 00e12c23 sw a4,24(sp) 10174: 00c7a703 lw a4,12(a5) 10178: 0187a283 lw t0,24(a5) 1017c: 0247ae83 lw t4,36(a5) 10180: 00e12e23 sw a4,28(sp) 10184: 0107a703 lw a4,16(a5) 10188: 0347a303 lw t1,52(a5) 1018c: 0387a883 lw a7,56(a5) 10190: 03c7a803 lw a6,60(a5) 10194: 0407a583 lw a1,64(a5) 10198: 0447a603 lw a2,68(a5) 1019c: 0487a683 lw a3,72(a5) 101a0: 02e12023 sw a4,32(sp) 101a4: 04c7a703 lw a4,76(a5) 101a8: 0307a783 lw a5,48(a5) 101ac: 03f12623 sw t6,44(sp) 101b0: 03e12823 sw t5,48(sp) 101b4: 03c12c23 sw t3,56(sp) 101b8: 02a12e23 sw a0,60(sp) 101bc: 02712223 sw t2,36(sp) 101c0: 02512423 sw t0,40(sp) 101c4: 03d12a23 sw t4,52(sp) 101c8: 03810e13 addi t3,sp,56 101cc: 04f12023 sw a5,64(sp) 101d0: 01010513 addi a0,sp,16 101d4: 04612223 sw t1,68(sp) 101d8: 05112423 sw a7,72(sp) 101dc: 05012623 sw a6,76(sp) 101e0: 04b12823 sw a1,80(sp) 101e4: 04c12a23 sw a2,84(sp) 101e8: 04d12c23 sw a3,88(sp) 101ec: 04e12e23 sw a4,92(sp) 101f0: 000e0f93 mv t6,t3 101f4: 00000313 li t1,0 101f8: 01f00f13 li t5,31 101fc: 00052703 lw a4,0(a0) 10200: 000e2783 lw a5,0(t3) 10204: 00000593 li a1,0 10208: 41f75e93 srai t4,a4,0x1f 1020c: 41f7d613 srai a2,a5,0x1f 10210: 00eec8b3 xor a7,t4,a4 10214: 00f64833 xor a6,a2,a5 10218: 01f75693 srli a3,a4,0x1f 1021c: 41d888b3 sub a7,a7,t4 10220: 01f7d713 srli a4,a5,0x1f 10224: 40c80833 sub a6,a6,a2 10228: 00000793 li a5,0 1022c: 40f85633 sra a2,a6,a5 10230: 00167613 andi a2,a2,1 10234: 00f89eb3 sll t4,a7,a5 10238: 00178793 addi a5,a5,1 1023c: 00060463 beqz a2,10244 <_start+0x108> 10240: 01d585b3 add a1,a1,t4 10244: ffe794e3 bne a5,t5,1022c <_start+0xf0> 10248: 00450513 addi a0,a0,4 1024c: 1ce68463 beq a3,a4,10414 <_start+0x2d8> 10250: 40b30333 sub t1,t1,a1 10254: 004e0e13 addi t3,t3,4 10258: fbf512e3 bne a0,t6,101fc <_start+0xc0> 1025c: 1c034463 bltz t1,10424 <_start+0x2e8> 10260: 26030063 beqz t1,104c0 <_start+0x384> 10264: 00f37793 andi a5,t1,15 10268: 0ff7f693 andi a3,a5,255 1026c: 00900613 li a2,9 10270: 03068713 addi a4,a3,48 10274: 1cf64a63 blt a2,a5,10448 <_start+0x30c> 10278: 00e10223 sb a4,4(sp) 1027c: 40435793 srai a5,t1,0x4 10280: 1e078863 beqz a5,10470 <_start+0x334> 10284: 00f7f793 andi a5,a5,15 10288: 0ff7f693 andi a3,a5,255 1028c: 00900613 li a2,9 10290: 03068713 addi a4,a3,48 10294: 1af64e63 blt a2,a5,10450 <_start+0x314> 10298: 00e102a3 sb a4,5(sp) 1029c: 40835793 srai a5,t1,0x8 102a0: 1e078863 beqz a5,10490 <_start+0x354> 102a4: 00f7f793 andi a5,a5,15 102a8: 0ff7f693 andi a3,a5,255 102ac: 00900613 li a2,9 102b0: 03068713 addi a4,a3,48 102b4: 1af64263 blt a2,a5,10458 <_start+0x31c> 102b8: 00e10323 sb a4,6(sp) 102bc: 40c35793 srai a5,t1,0xc 102c0: 1e078863 beqz a5,104b0 <_start+0x374> 102c4: 00f7f793 andi a5,a5,15 102c8: 0ff7f693 andi a3,a5,255 102cc: 00900613 li a2,9 102d0: 03068713 addi a4,a3,48 102d4: 18f64663 blt a2,a5,10460 <_start+0x324> 102d8: 00e103a3 sb a4,7(sp) 102dc: 41035793 srai a5,t1,0x10 102e0: 1c078c63 beqz a5,104b8 <_start+0x37c> 102e4: 00f7f793 andi a5,a5,15 102e8: 0ff7f693 andi a3,a5,255 102ec: 00900613 li a2,9 102f0: 03068713 addi a4,a3,48 102f4: 00f65463 bge a2,a5,102fc <_start+0x1c0> 102f8: 03768713 addi a4,a3,55 102fc: 00e10423 sb a4,8(sp) 10300: 41435793 srai a5,t1,0x14 10304: 16078263 beqz a5,10468 <_start+0x32c> 10308: 00f7f793 andi a5,a5,15 1030c: 0ff7f693 andi a3,a5,255 10310: 00900613 li a2,9 10314: 03068713 addi a4,a3,48 10318: 00f65463 bge a2,a5,10320 <_start+0x1e4> 1031c: 03768713 addi a4,a3,55 10320: 00e104a3 sb a4,9(sp) 10324: 41835793 srai a5,t1,0x18 10328: 1a078863 beqz a5,104d8 <_start+0x39c> 1032c: 00f7f793 andi a5,a5,15 10330: 0ff7f693 andi a3,a5,255 10334: 00900613 li a2,9 10338: 03068713 addi a4,a3,48 1033c: 00f65463 bge a2,a5,10344 <_start+0x208> 10340: 03768713 addi a4,a3,55 10344: 00e10523 sb a4,10(sp) 10348: 41c35313 srai t1,t1,0x1c 1034c: 18030a63 beqz t1,104e0 <_start+0x3a4> 10350: 03030313 addi t1,t1,48 10354: 006105a3 sb t1,11(sp) 10358: 00700713 li a4,7 1035c: 400027b7 lui a5,0x40002 10360: 03000693 li a3,48 10364: 00d78023 sb a3,0(a5) # 40002000 <__global_pointer$+0x3fff02c8> 10368: 07800693 li a3,120 1036c: 00d78023 sb a3,0(a5) 10370: 06010693 addi a3,sp,96 10374: 00e68633 add a2,a3,a4 10378: fa464503 lbu a0,-92(a2) 1037c: ffe70693 addi a3,a4,-2 10380: 06010593 addi a1,sp,96 10384: 00a78023 sb a0,0(a5) 10388: fa364503 lbu a0,-93(a2) 1038c: 00d585b3 add a1,a1,a3 10390: ffd70613 addi a2,a4,-3 10394: 00a78023 sb a0,0(a5) 10398: fa45c583 lbu a1,-92(a1) 1039c: 00b78023 sb a1,0(a5) 103a0: 06068463 beqz a3,10408 <_start+0x2cc> 103a4: 06010693 addi a3,sp,96 103a8: 00c686b3 add a3,a3,a2 103ac: fa46c583 lbu a1,-92(a3) 103b0: ffc70693 addi a3,a4,-4 103b4: 00b78023 sb a1,0(a5) 103b8: 04060863 beqz a2,10408 <_start+0x2cc> 103bc: 06010613 addi a2,sp,96 103c0: 00d60633 add a2,a2,a3 103c4: fa464583 lbu a1,-92(a2) 103c8: ffb70613 addi a2,a4,-5 103cc: 00b78023 sb a1,0(a5) 103d0: 02068c63 beqz a3,10408 <_start+0x2cc> 103d4: 06010693 addi a3,sp,96 103d8: 00c686b3 add a3,a3,a2 103dc: fa46c683 lbu a3,-92(a3) 103e0: ffa70713 addi a4,a4,-6 103e4: 00d78023 sb a3,0(a5) 103e8: 02060063 beqz a2,10408 <_start+0x2cc> 103ec: 06010693 addi a3,sp,96 103f0: 00e686b3 add a3,a3,a4 103f4: fa46c683 lbu a3,-92(a3) 103f8: 00d78023 sb a3,0(a5) 103fc: 00070663 beqz a4,10408 <_start+0x2cc> 10400: 00414703 lbu a4,4(sp) 10404: 00e78023 sb a4,0(a5) 10408: 00000513 li a0,0 1040c: 06010113 addi sp,sp,96 10410: 00008067 ret 10414: 00b30333 add t1,t1,a1 10418: 004e0e13 addi t3,t3,4 1041c: dff510e3 bne a0,t6,101fc <_start+0xc0> 10420: e40350e3 bgez t1,10260 <_start+0x124> 10424: 400027b7 lui a5,0x40002 10428: 02d00713 li a4,45 1042c: 40600333 neg t1,t1 10430: 00e78023 sb a4,0(a5) # 40002000 <__global_pointer$+0x3fff02c8> 10434: 00f37793 andi a5,t1,15 10438: 0ff7f693 andi a3,a5,255 1043c: 00900613 li a2,9 10440: 03068713 addi a4,a3,48 10444: e2f65ae3 bge a2,a5,10278 <_start+0x13c> 10448: 03768713 addi a4,a3,55 1044c: e2dff06f j 10278 <_start+0x13c> 10450: 03768713 addi a4,a3,55 10454: e45ff06f j 10298 <_start+0x15c> 10458: 03768713 addi a4,a3,55 1045c: e5dff06f j 102b8 <_start+0x17c> 10460: 03768713 addi a4,a3,55 10464: e75ff06f j 102d8 <_start+0x19c> 10468: 00400713 li a4,4 1046c: ef1ff06f j 1035c <_start+0x220> 10470: 400027b7 lui a5,0x40002 10474: 03000713 li a4,48 10478: 00e78023 sb a4,0(a5) # 40002000 <__global_pointer$+0x3fff02c8> 1047c: 07800713 li a4,120 10480: 00e78023 sb a4,0(a5) 10484: 00414703 lbu a4,4(sp) 10488: 00e78023 sb a4,0(a5) 1048c: f7dff06f j 10408 <_start+0x2cc> 10490: 400027b7 lui a5,0x40002 10494: 03000713 li a4,48 10498: 00e78023 sb a4,0(a5) # 40002000 <__global_pointer$+0x3fff02c8> 1049c: 07800713 li a4,120 104a0: 00e78023 sb a4,0(a5) 104a4: 00514703 lbu a4,5(sp) 104a8: 00e78023 sb a4,0(a5) 104ac: f55ff06f j 10400 <_start+0x2c4> 104b0: 00200713 li a4,2 104b4: ea9ff06f j 1035c <_start+0x220> 104b8: 00300713 li a4,3 104bc: ea1ff06f j 1035c <_start+0x220> 104c0: 400027b7 lui a5,0x40002 104c4: 03000713 li a4,48 104c8: 00e78023 sb a4,0(a5) # 40002000 <__global_pointer$+0x3fff02c8> 104cc: 07800713 li a4,120 104d0: 00e78023 sb a4,0(a5) 104d4: f35ff06f j 10408 <_start+0x2cc> 104d8: 00500713 li a4,5 104dc: e81ff06f j 1035c <_start+0x220> 104e0: 00600713 li a4,6 104e4: e79ff06f j 1035c <_start+0x220> ``` * Conv_Sum with -Os ```bash Conv_Sums: file format elf32-littleriscv Disassembly of section .text: 00010054 <memcpy>: 10054: 00000793 li a5,0 10058: 00f61663 bne a2,a5,10064 <memcpy+0x10> 1005c: 00c50533 add a0,a0,a2 10060: 00008067 ret 10064: 00f58733 add a4,a1,a5 10068: 00074683 lbu a3,0(a4) 1006c: 00f50733 add a4,a0,a5 10070: 00178793 addi a5,a5,1 10074: 00d70023 sb a3,0(a4) 10078: fe1ff06f j 10058 <memcpy+0x4> 0001007c <mul>: 1007c: 41f55713 srai a4,a0,0x1f 10080: 00a747b3 xor a5,a4,a0 10084: 40e787b3 sub a5,a5,a4 10088: 41f5d713 srai a4,a1,0x1f 1008c: 01f5d613 srli a2,a1,0x1f 10090: 00b745b3 xor a1,a4,a1 10094: 01f55813 srli a6,a0,0x1f 10098: 40e585b3 sub a1,a1,a4 1009c: 00000513 li a0,0 100a0: 00000713 li a4,0 100a4: 01f00893 li a7,31 100a8: 40e5d6b3 sra a3,a1,a4 100ac: 0016f693 andi a3,a3,1 100b0: 00068663 beqz a3,100bc <mul+0x40> 100b4: 00e796b3 sll a3,a5,a4 100b8: 00d50533 add a0,a0,a3 100bc: 00170713 addi a4,a4,1 100c0: ff1714e3 bne a4,a7,100a8 <mul+0x2c> 100c4: 00c80463 beq a6,a2,100cc <mul+0x50> 100c8: 40a00533 neg a0,a0 100cc: 00008067 ret 000100d0 <Conv_Sum>: 100d0: fe010113 addi sp,sp,-32 100d4: 00812c23 sw s0,24(sp) 100d8: 00912a23 sw s1,20(sp) 100dc: 01212823 sw s2,16(sp) 100e0: 01312623 sw s3,12(sp) 100e4: 01412423 sw s4,8(sp) 100e8: 00112e23 sw ra,28(sp) 100ec: 00050993 mv s3,a0 100f0: 00058a13 mv s4,a1 100f4: 00261913 slli s2,a2,0x2 100f8: 00000413 li s0,0 100fc: 00000493 li s1,0 10100: 03241463 bne s0,s2,10128 <Conv_Sum+0x58> 10104: 01c12083 lw ra,28(sp) 10108: 01812403 lw s0,24(sp) 1010c: 00048513 mv a0,s1 10110: 01012903 lw s2,16(sp) 10114: 01412483 lw s1,20(sp) 10118: 00c12983 lw s3,12(sp) 1011c: 00812a03 lw s4,8(sp) 10120: 02010113 addi sp,sp,32 10124: 00008067 ret 10128: 008a0733 add a4,s4,s0 1012c: 008987b3 add a5,s3,s0 10130: 00072583 lw a1,0(a4) 10134: 0007a503 lw a0,0(a5) 10138: 00440413 addi s0,s0,4 1013c: f41ff0ef jal ra,1007c <mul> 10140: 00a484b3 add s1,s1,a0 10144: fbdff06f j 10100 <Conv_Sum+0x30> 00010148 <_start>: 10148: f9010113 addi sp,sp,-112 1014c: 000105b7 lui a1,0x10 10150: 06812423 sw s0,104(sp) 10154: 02800613 li a2,40 10158: 22458413 addi s0,a1,548 # 10224 <_start+0xdc> 1015c: 01010513 addi a0,sp,16 10160: 22458593 addi a1,a1,548 10164: 06112623 sw ra,108(sp) 10168: eedff0ef jal ra,10054 <memcpy> 1016c: 02800613 li a2,40 10170: 02840593 addi a1,s0,40 10174: 03810513 addi a0,sp,56 10178: eddff0ef jal ra,10054 <memcpy> 1017c: 00a00613 li a2,10 10180: 03810593 addi a1,sp,56 10184: 01010513 addi a0,sp,16 10188: f49ff0ef jal ra,100d0 <Conv_Sum> 1018c: 00055a63 bgez a0,101a0 <_start+0x58> 10190: 400027b7 lui a5,0x40002 10194: 02d00713 li a4,45 10198: 00e78023 sb a4,0(a5) # 40002000 <__global_pointer$+0x3fff058c> 1019c: 40a00533 neg a0,a0 101a0: 00000793 li a5,0 101a4: 00900593 li a1,9 101a8: 02051e63 bnez a0,101e4 <_start+0x9c> 101ac: 40002737 lui a4,0x40002 101b0: 03000693 li a3,48 101b4: 00d70023 sb a3,0(a4) # 40002000 <__global_pointer$+0x3fff058c> 101b8: 07800693 li a3,120 101bc: 00d70023 sb a3,0(a4) 101c0: fff00713 li a4,-1 101c4: 400026b7 lui a3,0x40002 101c8: fff78793 addi a5,a5,-1 101cc: 04e79263 bne a5,a4,10210 <_start+0xc8> 101d0: 06c12083 lw ra,108(sp) 101d4: 06812403 lw s0,104(sp) 101d8: 00000513 li a0,0 101dc: 07010113 addi sp,sp,112 101e0: 00008067 ret 101e4: 00f57613 andi a2,a0,15 101e8: 0ff67693 andi a3,a2,255 101ec: 03068713 addi a4,a3,48 # 40002030 <__global_pointer$+0x3fff05bc> 101f0: 00c5d463 bge a1,a2,101f8 <_start+0xb0> 101f4: 03768713 addi a4,a3,55 101f8: 00410693 addi a3,sp,4 101fc: 00f686b3 add a3,a3,a5 10200: 00e68023 sb a4,0(a3) 10204: 40455513 srai a0,a0,0x4 10208: 00178793 addi a5,a5,1 1020c: f9dff06f j 101a8 <_start+0x60> 10210: 00410613 addi a2,sp,4 10214: 00f60633 add a2,a2,a5 10218: 00064603 lbu a2,0(a2) 1021c: 00c68023 sb a2,0(a3) 10220: fa9ff06f j 101c8 <_start+0x80> ``` ## Interesting Findings and Further Tests * Comparison with inline subroutine Because I found `-O3` optimization would do the same thing as `inline` flag I want to find if in this case the compiler would do the same thing. By adding inline prefix for `Conv_Sum`, `mul`, and both `Conv_Sum` and `mul` would make three cases to discuss. Below is the execution result of three cases. * -O3 flag ```bash 0xDFE >>> Execution time: 3924561 ns >>> Instruction count: 2147 (IPS=547067) >>> Jumps: 623 (29.02%) - 309 forwards, 314 backwards >>> Branching T=618 (95.22%) F=31 (4.78%) ``` * inline Conv_Sum ```bash 0xDFE >>> Execution time: 131317 ns >>> Instruction count: 2147 (IPS=16349749) >>> Jumps: 623 (29.02%) - 309 forwards, 314 backwards >>> Branching T=618 (95.22%) F=31 (4.78%) ``` * inline mul ```bash 0xDFE >>> Execution time: 69190 ns >>> Instruction count: 2147 (IPS=31030495) >>> Jumps: 623 (29.02%) - 309 forwards, 314 backwards >>> Branching T=618 (95.22%) F=31 (4.78%) ``` * inline both function ```bash 0xDFE >>> Execution time: 86672 ns >>> Instruction count: 2147 (IPS=24771552) >>> Jumps: 623 (29.02%) - 309 forwards, 314 backwards >>> Branching T=618 (95.22%) F=31 (4.78%) ``` After the comparison among these four compiler generated code, I found they are still identical except for address for jump and branch. Thus, optimization of compiler under `-O3` flag seems to inline function as default.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully