# Lab2: RISC-V RV32I[MA] emulator with ELF support : Bubble Sort ###### tags: `Computer Architecture` [TOC] ### Environment & Introduction #### [Lab2: RISC-V RV32I[MA] emulator with ELF support](https://hackmd.io/@hsieh22/2020_CA_lab2-1) ### Rewrite C code ```c= #include <stdio.h> void BBsort(int *a, unsigned int arrSize) { for (int i = 0; i < arrSize; i++) { for (int j = 0; j < arrSize - i - 1; j++) { if (a[j] > a[j + 1]) { int temp = a[j]; a[j] = a[j + 1]; a[j + 1] = temp; } } } } int _start() { int arr[5] = {2, 3, 7, 4, 1}; unsigned int arraySize = 5; BBsort(arr, arraySize); volatile char *tx = (volatile char*)0x40002000; for (int i = 0; i < arraySize; i++) { *tx = (arr[i] + '0'); *tx = ' '; } return 0; } ``` Compile without optimization ``` >>> Execution time: 29625 ns >>> Instruction count: 79 (IPS=2666666) >>> Jumps: 12 (15.19%) - 8 forwards, 4 backwards >>> Branching T=8 (80.00%) F=2 (20.00%) ``` ### Disassemble the ELF file ``` BBsort: file format elf32-littleriscv Sections: Idx Name Size VMA LMA File off Algn 0 .text 000001d8 00010054 00010054 00000054 2**2 CONTENTS, ALLOC, LOAD, READONLY, CODE 1 .rodata 00000014 0001022c 0001022c 0000022c 2**2 CONTENTS, ALLOC, LOAD, READONLY, DATA 2 .comment 00000033 00000000 00000000 00000240 2**0 CONTENTS, READONLY ``` ### Header of ELF file ``` ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x1009c Start of program headers: 52 (bytes into file) Start of section headers: 1004 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 1 Size of section headers: 40 (bytes) Number of section headers: 7 Section header string table index: 6 ``` RISC-V assembly code ```= BBsort: file format elf32-littleriscv Disassembly of section .text: 00010054 <BBsort>: 10054: 04058263 beqz a1,10098 <BBsort+0x44> 10058: 00259613 slli a2,a1,0x2 1005c: fff58593 addi a1,a1,-1 10060: 00c50633 add a2,a0,a2 10064: 02058863 beqz a1,10094 <BBsort+0x40> 10068: 00450793 addi a5,a0,4 1006c: ffc7a703 lw a4,-4(a5) 10070: 0007a683 lw a3,0(a5) 10074: 00e6d663 bge a3,a4,10080 <BBsort+0x2c> 10078: fed7ae23 sw a3,-4(a5) 1007c: 00e7a023 sw a4,0(a5) 10080: 00478793 addi a5,a5,4 10084: fec794e3 bne a5,a2,1006c <BBsort+0x18> 10088: fff58593 addi a1,a1,-1 1008c: ffc60613 addi a2,a2,-4 10090: fc059ce3 bnez a1,10068 <BBsort+0x14> 10094: 00008067 ret 10098: 00008067 ret 0001009c <_start>: 1009c: 000107b7 lui a5,0x10 100a0: 22c78793 addi a5,a5,556 # 1022c <_start+0x190> 100a4: 0007a703 lw a4,0(a5) 100a8: 0047a683 lw a3,4(a5) 100ac: 0087a583 lw a1,8(a5) 100b0: 00c7a603 lw a2,12(a5) 100b4: 0107a783 lw a5,16(a5) 100b8: fe010113 addi sp,sp,-32 100bc: 00e12623 sw a4,12(sp) 100c0: 00d12823 sw a3,16(sp) 100c4: 00b12a23 sw a1,20(sp) 100c8: 00c12c23 sw a2,24(sp) 100cc: 00f12e23 sw a5,28(sp) 100d0: 00e6d663 bge a3,a4,100dc <_start+0x40> 100d4: 00d12623 sw a3,12(sp) 100d8: 00e12823 sw a4,16(sp) 100dc: 01012783 lw a5,16(sp) 100e0: 01412703 lw a4,20(sp) 100e4: 00f75c63 bge a4,a5,100fc <_start+0x60> 100e8: 00078693 mv a3,a5 100ec: 00e12823 sw a4,16(sp) 100f0: 00f12a23 sw a5,20(sp) 100f4: 00070793 mv a5,a4 100f8: 00068713 mv a4,a3 100fc: 01812603 lw a2,24(sp) 10100: 00e65c63 bge a2,a4,10118 <_start+0x7c> 10104: 00070693 mv a3,a4 10108: 00c12a23 sw a2,20(sp) 1010c: 00e12c23 sw a4,24(sp) 10110: 00060713 mv a4,a2 10114: 00068613 mv a2,a3 10118: 01c12683 lw a3,28(sp) 1011c: 10c6c063 blt a3,a2,1021c <_start+0x180> 10120: 00c12683 lw a3,12(sp) 10124: 00d7dc63 bge a5,a3,1013c <_start+0xa0> 10128: 00078593 mv a1,a5 1012c: 00f12623 sw a5,12(sp) 10130: 00d12823 sw a3,16(sp) 10134: 00068793 mv a5,a3 10138: 00058693 mv a3,a1 1013c: 00f75c63 bge a4,a5,10154 <_start+0xb8> 10140: 00078593 mv a1,a5 10144: 00e12823 sw a4,16(sp) 10148: 00f12a23 sw a5,20(sp) 1014c: 00070793 mv a5,a4 10150: 00058713 mv a4,a1 10154: 0ae64c63 blt a2,a4,1020c <_start+0x170> 10158: 00d7dc63 bge a5,a3,10170 <_start+0xd4> 1015c: 00078613 mv a2,a5 10160: 00f12623 sw a5,12(sp) 10164: 00d12823 sw a3,16(sp) 10168: 00068793 mv a5,a3 1016c: 00060693 mv a3,a2 10170: 08f74663 blt a4,a5,101fc <_start+0x160> 10174: 00d7d663 bge a5,a3,10180 <_start+0xe4> 10178: 00f12623 sw a5,12(sp) 1017c: 00d12823 sw a3,16(sp) 10180: 00c12683 lw a3,12(sp) 10184: 03a00713 li a4,58 10188: 400027b7 lui a5,0x40002 1018c: 40d706b3 sub a3,a4,a3 10190: 0ff6f693 andi a3,a3,255 10194: 00d78023 sb a3,0(a5) # 40002000 <__global_pointer$+0x3fff05c0> 10198: 02000693 li a3,32 1019c: 00d78023 sb a3,0(a5) 101a0: 01012603 lw a2,16(sp) 101a4: 00000513 li a0,0 101a8: 40c70633 sub a2,a4,a2 101ac: 0ff67613 andi a2,a2,255 101b0: 00c78023 sb a2,0(a5) 101b4: 00d78023 sb a3,0(a5) 101b8: 01412603 lw a2,20(sp) 101bc: 40c70633 sub a2,a4,a2 101c0: 0ff67613 andi a2,a2,255 101c4: 00c78023 sb a2,0(a5) 101c8: 00d78023 sb a3,0(a5) 101cc: 01812603 lw a2,24(sp) 101d0: 40c70633 sub a2,a4,a2 101d4: 0ff67613 andi a2,a2,255 101d8: 00c78023 sb a2,0(a5) 101dc: 00d78023 sb a3,0(a5) 101e0: 01c12603 lw a2,28(sp) 101e4: 40c70733 sub a4,a4,a2 101e8: 0ff77713 andi a4,a4,255 101ec: 00e78023 sb a4,0(a5) 101f0: 00d78023 sb a3,0(a5) 101f4: 02010113 addi sp,sp,32 101f8: 00008067 ret 101fc: 00f12a23 sw a5,20(sp) 10200: 00e12823 sw a4,16(sp) 10204: 00070793 mv a5,a4 10208: f6dff06f j 10174 <_start+0xd8> 1020c: 00e12c23 sw a4,24(sp) 10210: 00c12a23 sw a2,20(sp) 10214: 00060713 mv a4,a2 10218: f41ff06f j 10158 <_start+0xbc> 1021c: 00c12e23 sw a2,28(sp) 10220: 00d12c23 sw a3,24(sp) 10224: 00068613 mv a2,a3 10228: ef9ff06f j 10120 <_start+0x84> ``` Compare to assembly code credit to [ηŽ‹ε‚‘δΈ–](https://hackmd.io/4-oWQOprRnCLu3ZeJy31kA?view#Bubble-Sort) ``` .data arr: .word 2, 3, 7, 4, 1 .text main: la s0, arr mv t3, s0 addi s1, s1, 4 addi t0, zero, -1 jal ra, bbsort mv t0, zero addi s1, s1, 1 mv s0, t3 j print bbsort: addi sp, sp, -12 sw ra, 8(sp) sw s1, 4(sp) sw s0, 0(sp) oloop: addi t0, t0, 1 mv t1, zero sub t2, s1, t0 blt t0, s1, iloop addi sp, sp, 12 jr ra iloop: mv s0, t3 bge t1, t2, oloop slli t4, t1, 2 add s0, s0, t4 lw a2, 0(s0) lw a3, 4(s0) addi t1, t1, 1 blt a2, a3, swap j iloop swap: sw a2, 4(s0) sw a3, 0(s0) j iloop print: bge t0, s1, end lw a0, 0(s0) li a7, 1 ecall addi t0, t0, 1 addi s0, s0, 4 j print end: ``` ### Optimization We just test O2 and O3 here #### O2 ##### result ``` >>> Execution time: 36077 ns >>> Instruction count: 144 (IPS=3991462) >>> Jumps: 21 (14.58%) - 6 forwards, 15 backwards >>> Branching T=18 (58.06%) F=13 (41.94%) ``` ##### assenbly code: ```= BBsort: file format elf32-littleriscv Disassembly of section .text: 00010054 <BBsort>: 10054: 04058263 beqz a1,10098 <BBsort+0x44> 10058: 00259613 slli a2,a1,0x2 1005c: fff58593 addi a1,a1,-1 10060: 00c50633 add a2,a0,a2 10064: 02058863 beqz a1,10094 <BBsort+0x40> 10068: 00450793 addi a5,a0,4 1006c: ffc7a703 lw a4,-4(a5) 10070: 0007a683 lw a3,0(a5) 10074: 00e6d663 bge a3,a4,10080 <BBsort+0x2c> 10078: fed7ae23 sw a3,-4(a5) 1007c: 00e7a023 sw a4,0(a5) 10080: 00478793 addi a5,a5,4 10084: fec794e3 bne a5,a2,1006c <BBsort+0x18> 10088: fff58593 addi a1,a1,-1 1008c: ffc60613 addi a2,a2,-4 10090: fc059ce3 bnez a1,10068 <BBsort+0x14> 10094: 00008067 ret 10098: 00008067 ret 0001009c <_start>: 1009c: 000107b7 lui a5,0x10 100a0: 12078793 addi a5,a5,288 # 10120 <_start+0x84> 100a4: 0047a603 lw a2,4(a5) 100a8: 0087a683 lw a3,8(a5) 100ac: 00c7a703 lw a4,12(a5) 100b0: 0007a803 lw a6,0(a5) 100b4: 0107a783 lw a5,16(a5) 100b8: fd010113 addi sp,sp,-48 100bc: 00500593 li a1,5 100c0: 00c10513 addi a0,sp,12 100c4: 00c12823 sw a2,16(sp) 100c8: 00d12a23 sw a3,20(sp) 100cc: 00e12c23 sw a4,24(sp) 100d0: 02112623 sw ra,44(sp) 100d4: 01012623 sw a6,12(sp) 100d8: 00f12e23 sw a5,28(sp) 100dc: f79ff0ef jal ra,10054 <BBsort> 100e0: 00c10713 addi a4,sp,12 100e4: 02010513 addi a0,sp,32 100e8: 03a00593 li a1,58 100ec: 400026b7 lui a3,0x40002 100f0: 02000613 li a2,32 100f4: 00072783 lw a5,0(a4) 100f8: 00470713 addi a4,a4,4 100fc: 40f587b3 sub a5,a1,a5 10100: 0ff7f793 andi a5,a5,255 10104: 00f68023 sb a5,0(a3) # 40002000 <__global_pointer$+0x3fff06cc> 10108: 00c68023 sb a2,0(a3) 1010c: fea714e3 bne a4,a0,100f4 <_start+0x58> 10110: 02c12083 lw ra,44(sp) 10114: 00000513 li a0,0 10118: 03010113 addi sp,sp,48 1011c: 00008067 ret ``` #### O3 ##### result ``` >>> Execution time: 30067 ns >>> Instruction count: 79 (IPS=2627465) >>> Jumps: 12 (15.19%) - 8 forwards, 4 backwards >>> Branching T=8 (80.00%) F=2 (20.00%) ``` ##### assembly code ```= BBsort: file format elf32-littleriscv Disassembly of section .text: 00010054 <BBsort>: 10054: 04058263 beqz a1,10098 <BBsort+0x44> 10058: 00259613 slli a2,a1,0x2 1005c: fff58593 addi a1,a1,-1 10060: 00c50633 add a2,a0,a2 10064: 02058863 beqz a1,10094 <BBsort+0x40> 10068: 00450793 addi a5,a0,4 1006c: ffc7a703 lw a4,-4(a5) 10070: 0007a683 lw a3,0(a5) 10074: 00e6d663 bge a3,a4,10080 <BBsort+0x2c> 10078: fed7ae23 sw a3,-4(a5) 1007c: 00e7a023 sw a4,0(a5) 10080: 00478793 addi a5,a5,4 10084: fec794e3 bne a5,a2,1006c <BBsort+0x18> 10088: fff58593 addi a1,a1,-1 1008c: ffc60613 addi a2,a2,-4 10090: fc059ce3 bnez a1,10068 <BBsort+0x14> 10094: 00008067 ret 10098: 00008067 ret 0001009c <_start>: 1009c: 000107b7 lui a5,0x10 100a0: 22c78793 addi a5,a5,556 # 1022c <_start+0x190> 100a4: 0007a703 lw a4,0(a5) 100a8: 0047a683 lw a3,4(a5) 100ac: 0087a583 lw a1,8(a5) 100b0: 00c7a603 lw a2,12(a5) 100b4: 0107a783 lw a5,16(a5) 100b8: fe010113 addi sp,sp,-32 100bc: 00e12623 sw a4,12(sp) 100c0: 00d12823 sw a3,16(sp) 100c4: 00b12a23 sw a1,20(sp) 100c8: 00c12c23 sw a2,24(sp) 100cc: 00f12e23 sw a5,28(sp) 100d0: 00e6d663 bge a3,a4,100dc <_start+0x40> 100d4: 00d12623 sw a3,12(sp) 100d8: 00e12823 sw a4,16(sp) 100dc: 01012783 lw a5,16(sp) 100e0: 01412703 lw a4,20(sp) 100e4: 00f75c63 bge a4,a5,100fc <_start+0x60> 100e8: 00078693 mv a3,a5 100ec: 00e12823 sw a4,16(sp) 100f0: 00f12a23 sw a5,20(sp) 100f4: 00070793 mv a5,a4 100f8: 00068713 mv a4,a3 100fc: 01812603 lw a2,24(sp) 10100: 00e65c63 bge a2,a4,10118 <_start+0x7c> 10104: 00070693 mv a3,a4 10108: 00c12a23 sw a2,20(sp) 1010c: 00e12c23 sw a4,24(sp) 10110: 00060713 mv a4,a2 10114: 00068613 mv a2,a3 10118: 01c12683 lw a3,28(sp) 1011c: 10c6c063 blt a3,a2,1021c <_start+0x180> 10120: 00c12683 lw a3,12(sp) 10124: 00d7dc63 bge a5,a3,1013c <_start+0xa0> 10128: 00078593 mv a1,a5 1012c: 00f12623 sw a5,12(sp) 10130: 00d12823 sw a3,16(sp) 10134: 00068793 mv a5,a3 10138: 00058693 mv a3,a1 1013c: 00f75c63 bge a4,a5,10154 <_start+0xb8> 10140: 00078593 mv a1,a5 10144: 00e12823 sw a4,16(sp) 10148: 00f12a23 sw a5,20(sp) 1014c: 00070793 mv a5,a4 10150: 00058713 mv a4,a1 10154: 0ae64c63 blt a2,a4,1020c <_start+0x170> 10158: 00d7dc63 bge a5,a3,10170 <_start+0xd4> 1015c: 00078613 mv a2,a5 10160: 00f12623 sw a5,12(sp) 10164: 00d12823 sw a3,16(sp) 10168: 00068793 mv a5,a3 1016c: 00060693 mv a3,a2 10170: 08f74663 blt a4,a5,101fc <_start+0x160> 10174: 00d7d663 bge a5,a3,10180 <_start+0xe4> 10178: 00f12623 sw a5,12(sp) 1017c: 00d12823 sw a3,16(sp) 10180: 00c12683 lw a3,12(sp) 10184: 03a00713 li a4,58 10188: 400027b7 lui a5,0x40002 1018c: 40d706b3 sub a3,a4,a3 10190: 0ff6f693 andi a3,a3,255 10194: 00d78023 sb a3,0(a5) # 40002000 <__global_pointer$+0x3fff05c0> 10198: 02000693 li a3,32 1019c: 00d78023 sb a3,0(a5) 101a0: 01012603 lw a2,16(sp) 101a4: 00000513 li a0,0 101a8: 40c70633 sub a2,a4,a2 101ac: 0ff67613 andi a2,a2,255 101b0: 00c78023 sb a2,0(a5) 101b4: 00d78023 sb a3,0(a5) 101b8: 01412603 lw a2,20(sp) 101bc: 40c70633 sub a2,a4,a2 101c0: 0ff67613 andi a2,a2,255 101c4: 00c78023 sb a2,0(a5) 101c8: 00d78023 sb a3,0(a5) 101cc: 01812603 lw a2,24(sp) 101d0: 40c70633 sub a2,a4,a2 101d4: 0ff67613 andi a2,a2,255 101d8: 00c78023 sb a2,0(a5) 101dc: 00d78023 sb a3,0(a5) 101e0: 01c12603 lw a2,28(sp) 101e4: 40c70733 sub a4,a4,a2 101e8: 0ff77713 andi a4,a4,255 101ec: 00e78023 sb a4,0(a5) 101f0: 00d78023 sb a3,0(a5) 101f4: 02010113 addi sp,sp,32 101f8: 00008067 ret 101fc: 00f12a23 sw a5,20(sp) 10200: 00e12823 sw a4,16(sp) 10204: 00070793 mv a5,a4 10208: f6dff06f j 10174 <_start+0xd8> 1020c: 00e12c23 sw a4,24(sp) 10210: 00c12a23 sw a2,20(sp) 10214: 00060713 mv a4,a2 10218: f41ff06f j 10158 <_start+0xbc> 1021c: 00c12e23 sw a2,28(sp) 10220: 00d12c23 sw a3,24(sp) 10224: 00068613 mv a2,a3 10228: ef9ff06f j 10120 <_start+0x84> ``` ### Comparison | - | O2 | O3 | |:-------------------------:|:--------:|:-------:| | Execution time | 36077 | 30067 | | Instruction count | 144 | 79 | | Jumps(Forwards/backwards) | 21(6/15) | 12(8/4) | | Branching(T/F) | 18/13 | 8/2 | | line of asem code | 59 | 127 | * As we can see, in this case all of the parts in O3 are better than O2, especially on instruction count, O3 is almost half of O2. * But there is an interesting point that the generated assembly code in O3 is more than O2, the O2 generated assembly code is just half of O3, I think the reason is that O3 has unrolling the loop, which can decrease branch condition to boost the program, but relatively it will hugely increase the line of assembly code. So we can see the above table, the branch and jump condition are more than O3. ### problem * I'm still confused about which optimization is better(O2 or O3) in which case, in last case (CLZ) O2 is better in most of the part, but in this case the result is reversed. * I'm not sure how to compare two assembly code.