owned this note
owned this note
Published
Linked with GitHub
# Lab2: RISC-V `RV32I[MA]` emulator with ELF support
###### tags: `108-1_RISC-V`
## [Bubble Sort](https://hackmd.io/4-oWQOprRnCLu3ZeJy31kA?both)
Array: `2, 3, 7, 4, 9`
The following assembly program was implemented by 王傑世同學.
(link in the title)
```
.data
arr: .word 2, 3, 7, 4, 1
.text
main:
la s0, arr
mv t3, s0
addi s1, s1, 4
addi t0, zero, -1
jal ra, bbsort
mv t0, zero
addi s1, s1, 1
mv s0, t3
j print
bbsort:
addi sp, sp, -12
sw ra, 8(sp)
sw s1, 4(sp)
sw s0, 0(sp)
oloop:
addi t0, t0, 1
mv t1, zero
sub t2, s1, t0
blt t0, s1, iloop
addi sp, sp, 12
jr ra
iloop:
mv s0, t3
bge t1, t2, oloop
slli t4, t1, 2
add s0, s0, t4
lw a2, 0(s0)
lw a3, 4(s0)
addi t1, t1, 1
blt a2, a3, swap
j iloop
swap:
sw a2, 4(s0)
sw a3, 0(s0)
j iloop
print:
bge t0, s1, end
lw a0, 0(s0)
li a7, 1
ecall
addi t0, t0, 1
addi s0, s0, 4
j print
end:
```
#### Rewrite Bubble Sort function in C
main() was replaced by _start() meaning that the code will start at this position.
```cpp
#include <stdio.h>
void BBSort(int *arr, int size) {
int temp;
for (int i = 0; i < size; ++i)
for (int j = 0; j < size - i; ++j)
if (arr[j] < arr[j + 1]) {
temp = arr[j];
arr[j] = arr[j + 1];
arr[j + 1] = temp;
}
}
int _start() {
int arr[5] = {2, 3, 7, 4, 9};
int size = 5;
BBSort(arr, size - 1);
volatile char *tx = (volatile char *)0x40002000;
const char *result1 = "Sorted array from big to small: ";
while (*result1)
{
*tx = *result1;
result1++;
}
for (int i = 0; i < size; i++)
{
*tx = (char)((arr[i]+'0') & 0x000000ff);
*tx = ' ';
}
*tx = '\n';
return 0;
}
```
### run it
```make check```
* Printing output version:
```
riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -nostdlib -O3 -Wall -std=gnu99 -o BubbleSort BubbleSort.c
./emu-rv32i BubbleSort
Sorted array from big to small: 9 7 4 3 2
>>> Execution time: 87677 ns
>>> Instruction count: 227 (IPS=2589048)
>>> Jumps: 37 (16.30%) - 3 forwards, 34 backwards
>>> Branching T=34 (80.95%) F=8 (19.05%)
```
* No printing output version:
```
riscv-none-embed-gcc -march=rv32i -mabi=ilp32 -nostdlib -O3 -Wall -std=gnu99 -o BubbleSort BubbleSort.c
./emu-rv32i BubbleSort
>>> Execution time: 883 ns
>>> Instruction count: 3 (IPS=3397508)
>>> Jumps: 1 (33.33%) - 0 forwards, 1 backwards
>>> Branching T=0 (-nan%) F=0 (-nan%)
```
---
### objdump with -O3 not print out the result
I see the generated code which print out the result will make it much longer. So i decide not to print that code here but i will analyse it in the table at the end of this page.
```riscv-none-embed-objdump -d BubbleSort```
```
00010054 <BBSort>:
10054: 02b05a63 blez a1,10088 <BBSort+0x34>
10058: 00259613 slli a2,a1,0x2
1005c: 00c50633 add a2,a0,a2
10060: 00050793 mv a5,a0
10064: 0007a703 lw a4,0(a5)
10068: 0047a683 lw a3,4(a5)
1006c: 00d75663 bge a4,a3,10078 <BBSort+0x24>
10070: 00d7a023 sw a3,0(a5)
10074: 00e7a223 sw a4,4(a5)
10078: 00478793 addi a5,a5,4
1007c: fec794e3 bne a5,a2,10064 <BBSort+0x10>
10080: ffc60613 addi a2,a2,-4
10084: fcc51ee3 bne a0,a2,10060 <BBSort+0xc>
10088: 00008067 ret
0001008c <_start>:
1008c: 00000513 li a0,0
10090: 00008067 ret
```
---
### readelf
```riscv-none-embed-readelf -h BubbleSort```
```
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x1008c
Start of program headers: 52 (bytes into file)
Start of section headers: 556 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 1
Size of section headers: 40 (bytes)
Number of section headers: 6
Section header string table index: 5
```
---
### size
```riscv-none-embed-size BubbleSort```
```
text data bss dec hex filename
64 0 0 64 40 BubbleSort
```
---
:::success
### Optimization comparison
Because of printing out the result will make the code much more longer and the this HW require rewrite a code from another student. However, the code of 王傑世同學 will print out the sorted array after execution so i make two tables of comparison: printing result table and not printing result table.
#### Not printing out result table:
| Compile | without optimization | with -O1 | with -O2 | with -O3 |
| ----------------- | -------- | -------------------- | --- | ---------------- |
| Execution Time(ns)| 31320 | 7440 | 6878| 883 |
| Total instruction | 516 | 126 | 106| 3 |
| Jump | 23 | 23 | 13| 1 |
| Jump forwards | 7 | 7 | 2 | 0 |
| Jump backwards | 16 | 16 | 11| 1 |
| Branch True | 15 | 10 | 10| 0 |
| Branch False | 14 | 19 | 15 | 0 |
#### Printing out result table:
| Compile | without optimization | with -O1 | with -O2 | with -O3 |
| ----------------- | -------- | -------------------- | --- | ---------------- |
| Execution Time(ns)| 138693 | 97567 |76322| 87677 |
| Total instruction | 948 | 300 |279| 227 |
| Jump | 62 | 58 |48| 37 |
| Jump forwards | 9 | 7 |2 | 3 |
| Jump backwards | 53 | 51 |46| 34 |
| Branch True | 52 | 45 |45| 34 |
| Branch False | 16 | 21 |17| 8 |
:::