# [Assignment2: RISC-V Toolchain](https://hackmd.io/@sysprog/2022-arch-homework2)
reference: [黃冠予](https://hackmd.io/@ZLQisilvQvOh2DclLmk1bg/SyjKI7sZi#Optimization) [leetcode1720](https://leetcode.com/problems/decode-xored-array/)
Motivation: When I trace his assemble code, I find some useless instruct, like blow(line1 ~ line5), and he use malloc to get memory space for store result in c but he use address 0 for store result in assenmble code, i think it is not safety.
some use less instruction
```c
index_zero:
addi a4, x0, 0 // a4 = 0
print:
slli t0, a4, 2 // t0 = 0 << 2 = 0
lw a0, 0(t0) // load address(t0)
li a4,0
sw t0,0(a4) #result[0] = first
```
In Ripes he use address 0(.text section) for store result, and his program actually change .text section
Memory map show address 0x00000000 - 0x000000f8 is .text section

before execute

after execution for Q1

## original code
### c
```c
int* decode(int* encoded, int encodedSize, int first, int* returnSize){
int* result = (int*)malloc(sizeof(int)*(encodedSize+1));
result[0]=first;
for(int i=0;i<encodedSize;i++){
result[i+1]=result[i] ^ encoded[i];
}
*returnSize = encodedSize+1;
return result;
}
```
### assemble code
```c
.data
num1: .word 1,2,3
numSize1: .word 3
first1: .word 1
num2: .word 6,2,7,3
numSize2: .word 4
first2: .word 4
num3: .word 5,6,7,8,9
numSize3: .word 5
first3: .word 2
nextline: .string "\n"
space: .string " "
.text
main:
la a1,num1 #get num1 address
la a2,numSize1 #get numSize1 address
la a3,first1 #get first1 address
lw a2,0(a2) #a2 = numSize1 = 3
lw a3,0(a3) #a3 = first1 = 1
li a4,0 #int i = 0
li a5,0 #int count = 0
li a6,3 #input set =3
jal ra,decode.L1 #goto decode.L1
la a1,num2 #get num2 address
la a2,numSize2 #get numSize2 address
la a3,first2 #get first2 address
lw a2,0(a2) #a2 = numSize2 = 4
lw a3,0(a3) #a3 = first2 = 4
jal ra,decode.L1 #goto decode.L1
la a1,num3 #get num3 address
la a2,numSize3 #get numSize3 address
la a3,first3 #get first3 address
lw a2,0(a2) #a2 = numSize3 = 5
lw a3,0(a3) #a3 = first3 = 2
decode.L1:
add t0,a3,x0
sw t0,0(a4) #result[0] = first
decode.L2: #loop
beq a4,a2,index_zero #if i=numSize then goto index_zero
slli t0,a4,2 #a4*4
add t2,t0,a1 #get num1[i]
lw t1,0(t2) #get num1[i]
lw t2,0(t0) #get result[i]
xor t1,t1,t2 #result[i] ^ num[i]
addi t2,t0,4 #a4*4,k = i + 1
sw t1,0(t2) #result[k] = result[i] ^ num[i]
addi a4,a4,1 #i++
j decode.L2 #goto decode.L2
index_zero:
addi a4,x0,0 #i=0
print:
slli t0,a4,2 #a4*4
lw a0,0(t0) #get result[i]
li a7,1 #systemcall print
ecall #execute
addi a4,a4,1 #i++
la a0,space #print " "
li a7,4
ecall
ble a4,a2,print #if i<numSize then goto print
la a0,nextline #nextline
li a7,4
ecall
addi a5,a5,1 #count++
beq a5,a6,exit #excute all dataset
li a4,0 #int i = 0
jr ra
exit:
li a7,10
ecall
```
## My_Improvements
### c
adding test_data and some code makeing it can execute.
```c
#include <stdio.h>
#include <stdlib.h>
void output(int* result, int result_size) {
for (int i = 0; i < result_size; i++) {
printf("%d ", result[i]);
}
printf("\n");
return;
}
int* decode(int* encoded, int encodedSize, int first, int* returnSize){
int* result = (int*)malloc(sizeof(int)*(encodedSize+1));
result[0]=first;
for(int i=0;i<encodedSize;i++){
result[i+1]=result[i] ^ encoded[i];
}
*returnSize = encodedSize+1;
return result;
}
int main()
{
int num1[] = {1,2,3};
int num2[] = {6,2,7,3};
int num3[] = {5,6,7,8,9};
int num1_size = 3;
int num2_size = 4;
int num3_size = 5;
int first_1 = 1;
int first_2 = 4;
int first_3 = 2;
int *result = NULL;
int result_size = 0;
result = decode(num1, num1_size, first_1, &result_size);
output(result, result_size);
result = decode(num2, num2_size, first_2, &result_size);
output(result, result_size);
result = decode(num3, num3_size, first_3, &result_size);
output(result, result_size);
}
```
### handwritten assemble code
I use buffer for store result, and try to keep same execution step.
Improvement:
1. Replace counter with arry_size + arry[0] ,so I don't need counter++ in loop
```c
int arry[3] = [0,0,0];
int arry_size = 3
int i = 0;
while (i<0) {
do something
i++;
}
*end = arry + arry_size
while (arry != end) {
do something
}
```
2. Change branch condition so i don't use j label
```c
#before, using beq and j instruction
label:
beq a0, a1, exit
do something
j label
exit:
do something
#after, only use bne instruction
label:
do something
bne a0, a1, label
exit:
do something
```
execution step : load q1 -> decode.L1 -> decode.L2 -> index_zero -> print -> return -> load q2......
```c
.data
buffer: .word 0,0,0,0,0,0,0
num1: .word 1,2,3
numSize1: .word 3
first1: .word 1
num2: .word 6,2,7,3
numSize2: .word 4
first2: .word 4
num3: .word 5,6,7,8,9
numSize3: .word 5
first3: .word 2
nextline: .string "\n"
space: .string " "
.text
main:
la a1, num1 #get num1 address
la a2, numSize1 #get numSize1 address
la a3, first1 #get first1 address
lw a2, 0(a2) #a2 = numSize1 = 3
lw a3, 0(a3) #a3 = first1 = 1
jal ra, decode.L1 #goto decode.L1
la a1,num2 #get num2 address
la a2,numSize2 #get numSize2 address
la a3,first2 #get first2 address
lw a2,0(a2) #a2 = numSize2 = 4
lw a3,0(a3) #a3 = first2 = 4
jal ra,decode.L1 #goto decode.L1
la a1,num3 #get num3 address
la a2,numSize3 #get numSize3 address
la a3,first3 #get first3 address
lw a2,0(a2) #a2 = numSize3 = 5
lw a3,0(a3) #a3 = first3 = 2
jal ra,decode.L1 #goto decode.L1
li a7,10
ecall
decode.L1:
slli a2, a2, 2 #a2 = bufferSize * 4(size of int)
la a0, buffer #get address buffer
sw a3,0(a0) #result[0] = first
add t0, a1, a2 #a2 = num end = num[0] + 4 * numsize
decode.L2: #loop
lw t1,0(a1) #get num1[i]
addi a0, a0, 4 #result++
xor a3, a3, t1 #a3 = result[i] ^ num[i]
addi a1, a1, 4 #num++
sw a3,0(a0) #result[i + 1] = result[i] ^ num[i]
bne t0, a1, decode.L2 #if num end == num[i] then goto index_zero
index_zero:
la a1, buffer #a1 = address of buffer[0]
addi a2, a2, 4 #bufferSize = numSize + 1
add t0, a1, a2 #t0 = buffer end = buffer[0] + 4 * buffersize
print:
lw a0,0(a1) #get result[i]
addi a1, a1, 4 #buffer++
li a7,1 #systemcall print
ecall #execute
la a0, space #print " "
li a7, 4
ecall
bne a1,t0,print #if a1 not equal buffer end goto print
la a0, nextline #nextline
li a7,4
ecall
j ra
```
output

## compare original code and My_Improvement
### performance on Ripes
original code


My_Improvement
i got fewer execution cycle and fewer Instruction Retired


### TO DO
- [ ] Making original and handwritten assemble code can implement on rv32emu
## result with GNU Toolchain
### O0
#### compile
$ iscv-none-elf-gcc -march=rv32i -mabi=ilp32 -O0 -o temp.elf temp.c
#### objdump
the assnmble code is too long to show, so I only show about main, decode and output function
```
$ riscv-none-elf-objdump -d temp.elf
00010184 <output>:
10184: fd010113 addi sp,sp,-48
10188: 02112623 sw ra,44(sp)
1018c: 02812423 sw s0,40(sp)
10190: 03010413 addi s0,sp,48
10194: fca42e23 sw a0,-36(s0)
10198: fcb42c23 sw a1,-40(s0)
1019c: fe042623 sw zero,-20(s0)
101a0: 0340006f j 101d4 <output+0x50>
101a4: fec42783 lw a5,-20(s0)
101a8: 00279793 slli a5,a5,0x2
101ac: fdc42703 lw a4,-36(s0)
101b0: 00f707b3 add a5,a4,a5
101b4: 0007a783 lw a5,0(a5)
101b8: 00078593 mv a1,a5
101bc: 000227b7 lui a5,0x22
101c0: ad078513 addi a0,a5,-1328 # 21ad0 <__clzsi2+0x8c>
101c4: 435000ef jal ra,10df8 <printf>
101c8: fec42783 lw a5,-20(s0)
101cc: 00178793 addi a5,a5,1
101d0: fef42623 sw a5,-20(s0)
101d4: fec42703 lw a4,-20(s0)
101d8: fd842783 lw a5,-40(s0)
101dc: fcf744e3 blt a4,a5,101a4 <output+0x20>
101e0: 00a00513 li a0,10
101e4: 469000ef jal ra,10e4c <putchar>
101e8: 00000013 nop
101ec: 02c12083 lw ra,44(sp)
101f0: 02812403 lw s0,40(sp)
101f4: 03010113 addi sp,sp,48
101f8: 00008067 ret
000101fc <decode>:
101fc: fd010113 addi sp,sp,-48
10200: 02112623 sw ra,44(sp)
10204: 02812423 sw s0,40(sp)
10208: 03010413 addi s0,sp,48
1020c: fca42e23 sw a0,-36(s0)
10210: fcb42c23 sw a1,-40(s0)
10214: fcc42a23 sw a2,-44(s0)
10218: fcd42823 sw a3,-48(s0)
1021c: fd842783 lw a5,-40(s0)
10220: 00178793 addi a5,a5,1
10224: 00279793 slli a5,a5,0x2
10228: 00078513 mv a0,a5
1022c: 308000ef jal ra,10534 <malloc>
10230: 00050793 mv a5,a0
10234: fef42423 sw a5,-24(s0)
10238: fe842783 lw a5,-24(s0)
1023c: fd442703 lw a4,-44(s0)
10240: 00e7a023 sw a4,0(a5)
10244: fe042623 sw zero,-20(s0)
10248: 0540006f j 1029c <decode+0xa0>
1024c: fec42783 lw a5,-20(s0)
10250: 00279793 slli a5,a5,0x2
10254: fe842703 lw a4,-24(s0)
10258: 00f707b3 add a5,a4,a5
1025c: 0007a683 lw a3,0(a5)
10260: fec42783 lw a5,-20(s0)
10264: 00279793 slli a5,a5,0x2
10268: fdc42703 lw a4,-36(s0)
1026c: 00f707b3 add a5,a4,a5
10270: 0007a703 lw a4,0(a5)
10274: fec42783 lw a5,-20(s0)
10278: 00178793 addi a5,a5,1
1027c: 00279793 slli a5,a5,0x2
10280: fe842603 lw a2,-24(s0)
10284: 00f607b3 add a5,a2,a5
10288: 00e6c733 xor a4,a3,a4
1028c: 00e7a023 sw a4,0(a5)
10290: fec42783 lw a5,-20(s0)
10294: 00178793 addi a5,a5,1
10298: fef42623 sw a5,-20(s0)
1029c: fec42703 lw a4,-20(s0)
102a0: fd842783 lw a5,-40(s0)
102a4: faf744e3 blt a4,a5,1024c <decode+0x50>
102a8: fd842783 lw a5,-40(s0)
102ac: 00178713 addi a4,a5,1
102b0: fd042783 lw a5,-48(s0)
102b4: 00e7a023 sw a4,0(a5)
102b8: fe842783 lw a5,-24(s0)
102bc: 00078513 mv a0,a5
102c0: 02c12083 lw ra,44(sp)
102c4: 02812403 lw s0,40(sp)
102c8: 03010113 addi sp,sp,48
102cc: 00008067 ret
000102d0 <main>:
102d0: fa010113 addi sp,sp,-96
102d4: 04112e23 sw ra,92(sp)
102d8: 04812c23 sw s0,88(sp)
102dc: 06010413 addi s0,sp,96
102e0: 00100793 li a5,1
102e4: fcf42423 sw a5,-56(s0)
102e8: 00200793 li a5,2
102ec: fcf42623 sw a5,-52(s0)
102f0: 00300793 li a5,3
102f4: fcf42823 sw a5,-48(s0)
102f8: 000227b7 lui a5,0x22
102fc: ad478793 addi a5,a5,-1324 # 21ad4 <__clzsi2+0x90>
10300: 0007a603 lw a2,0(a5)
10304: 0047a683 lw a3,4(a5)
10308: 0087a703 lw a4,8(a5)
1030c: 00c7a783 lw a5,12(a5)
10310: fac42c23 sw a2,-72(s0)
10314: fad42e23 sw a3,-68(s0)
10318: fce42023 sw a4,-64(s0)
1031c: fcf42223 sw a5,-60(s0)
10320: 000227b7 lui a5,0x22
10324: ae478793 addi a5,a5,-1308 # 21ae4 <__clzsi2+0xa0>
10328: 0007a583 lw a1,0(a5)
1032c: 0047a603 lw a2,4(a5)
10330: 0087a683 lw a3,8(a5)
10334: 00c7a703 lw a4,12(a5)
10338: 0107a783 lw a5,16(a5)
1033c: fab42223 sw a1,-92(s0)
10340: fac42423 sw a2,-88(s0)
10344: fad42623 sw a3,-84(s0)
10348: fae42823 sw a4,-80(s0)
1034c: faf42a23 sw a5,-76(s0)
10350: 00300793 li a5,3
10354: fef42623 sw a5,-20(s0)
10358: 00400793 li a5,4
1035c: fef42423 sw a5,-24(s0)
10360: 00500793 li a5,5
10364: fef42223 sw a5,-28(s0)
10368: 00100793 li a5,1
1036c: fef42023 sw a5,-32(s0)
10370: 00400793 li a5,4
10374: fcf42e23 sw a5,-36(s0)
10378: 00200793 li a5,2
1037c: fcf42c23 sw a5,-40(s0)
10380: fc042a23 sw zero,-44(s0)
10384: fa042023 sw zero,-96(s0)
10388: fa040713 addi a4,s0,-96
1038c: fc840793 addi a5,s0,-56
10390: 00070693 mv a3,a4
10394: fe042603 lw a2,-32(s0)
10398: fec42583 lw a1,-20(s0)
1039c: 00078513 mv a0,a5
103a0: e5dff0ef jal ra,101fc <decode>
103a4: fca42a23 sw a0,-44(s0)
103a8: fa042783 lw a5,-96(s0)
103ac: 00078593 mv a1,a5
103b0: fd442503 lw a0,-44(s0)
103b4: dd1ff0ef jal ra,10184 <output>
103b8: fa040713 addi a4,s0,-96
103bc: fb840793 addi a5,s0,-72
103c0: 00070693 mv a3,a4
103c4: fdc42603 lw a2,-36(s0)
103c8: fe842583 lw a1,-24(s0)
103cc: 00078513 mv a0,a5
103d0: e2dff0ef jal ra,101fc <decode>
103d4: fca42a23 sw a0,-44(s0)
103d8: fa042783 lw a5,-96(s0)
103dc: 00078593 mv a1,a5
103e0: fd442503 lw a0,-44(s0)
103e4: da1ff0ef jal ra,10184 <output>
103e8: fa040713 addi a4,s0,-96
103ec: fa440793 addi a5,s0,-92
103f0: 00070693 mv a3,a4
103f4: fd842603 lw a2,-40(s0)
103f8: fe442583 lw a1,-28(s0)
103fc: 00078513 mv a0,a5
10400: dfdff0ef jal ra,101fc <decode>
10404: fca42a23 sw a0,-44(s0)
10408: fa042783 lw a5,-96(s0)
1040c: 00078593 mv a1,a5
10410: fd442503 lw a0,-44(s0)
10414: d71ff0ef jal ra,10184 <output>
10418: 00000793 li a5,0
1041c: 00078513 mv a0,a5
10420: 05c12083 lw ra,92(sp)
10424: 05812403 lw s0,88(sp)
10428: 06010113 addi sp,sp,96
1042c: 00008067 ret
```
observation
main function
> satck use : 96 (bytes)
> using register a0 ~ a5, s0, ra
saving all variable(like nums1[0], nums1[1], first_1......) in stack
using s0 for stack address location
using a4, a5 for get address or load value from stack
using a0, a1, a2, a3 for pass value to deoce function
using a0, a1 for pass value to output function
passing num1~3 with address of stack rather than using address at data section
deoce function
> satck use : 48 (bytes)
using register: a0 ~ a5, s0, ra, s0, sp
using a5 for store address from malloc
output function
> satck use : 48 (bytes)
using register: a0 ~ a5, s0, ra, s0, sp
rv32emu --stats
```
$ $HOME/rv32emu/build/rv32emu --stats temp.elf
1 0 2 1
4 2 0 7 4
2 7 1 6 14 7
inferior exit code 0
CSR cycle count: 10767
```
readelf
```
$ riscv-none-elf-readelf -h temp.elf
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x100dc
Start of program headers: 52 (bytes into file)
Start of section headers: 95236 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
```
elf size
```
$ riscv-none-elf-size temp.elf
text data bss dec hex filename
75776 2816 812 79404 1362c temp.elf
```
### Ofast
#### compile
riscv-none-elf-gcc -march=rv32i -mabi=ilp32 -Ofast -o temp_fast.elf temp.c
#### objdump
```
$ riscv-none-elf-objdump -d temp_fast.elf
000100c4 <main>:
100c4: 000227b7 lui a5,0x22
100c8: a0478793 addi a5,a5,-1532 # 21a04 <__clzsi2+0x90>
100cc: fb010113 addi sp,sp,-80
100d0: 0007af83 lw t6,0(a5)
100d4: 0047af03 lw t5,4(a5)
100d8: 0087ae83 lw t4,8(a5)
100dc: 00c7ae03 lw t3,12(a5)
100e0: 0107a303 lw t1,16(a5)
100e4: 0147a883 lw a7,20(a5)
100e8: 0187a803 lw a6,24(a5)
100ec: 01c7a703 lw a4,28(a5)
100f0: 00100293 li t0,1
100f4: 0207a783 lw a5,32(a5)
100f8: 00512823 sw t0,16(sp)
100fc: 00200293 li t0,2
10100: 00c10693 addi a3,sp,12
10104: 00100613 li a2,1
10108: 00512a23 sw t0,20(sp)
1010c: 00300593 li a1,3
10110: 00300293 li t0,3
10114: 01010513 addi a0,sp,16
10118: 04112623 sw ra,76(sp)
1011c: 00512c23 sw t0,24(sp)
10120: 01f12e23 sw t6,28(sp)
10124: 03e12023 sw t5,32(sp)
10128: 03d12223 sw t4,36(sp)
1012c: 03c12423 sw t3,40(sp)
10130: 02612623 sw t1,44(sp)
10134: 03112823 sw a7,48(sp)
10138: 03012a23 sw a6,52(sp)
1013c: 02e12c23 sw a4,56(sp)
10140: 02f12e23 sw a5,60(sp)
10144: 00012623 sw zero,12(sp)
10148: 180000ef jal ra,102c8 <decode>
1014c: 00c12583 lw a1,12(sp)
10150: 10c000ef jal ra,1025c <output>
10154: 00c10693 addi a3,sp,12
10158: 00400613 li a2,4
1015c: 00400593 li a1,4
10160: 01c10513 addi a0,sp,28
10164: 164000ef jal ra,102c8 <decode>
10168: 00c12583 lw a1,12(sp)
1016c: 0f0000ef jal ra,1025c <output>
10170: 00c10693 addi a3,sp,12
10174: 00200613 li a2,2
10178: 00500593 li a1,5
1017c: 02c10513 addi a0,sp,44
10180: 148000ef jal ra,102c8 <decode>
10184: 00c12583 lw a1,12(sp)
10188: 0d4000ef jal ra,1025c <output>
1018c: 04c12083 lw ra,76(sp)
10190: 00000513 li a0,0
10194: 05010113 addi sp,sp,80
10198: 00008067 ret
0001025c <output>:
1025c: 06b05263 blez a1,102c0 <output+0x64>
10260: fe010113 addi sp,sp,-32
10264: 00812c23 sw s0,24(sp)
10268: 00912a23 sw s1,20(sp)
1026c: 01212823 sw s2,16(sp)
10270: 01312623 sw s3,12(sp)
10274: 00112e23 sw ra,28(sp)
10278: 00058913 mv s2,a1
1027c: 00050413 mv s0,a0
10280: 00000493 li s1,0
10284: 000229b7 lui s3,0x22
10288: 00042583 lw a1,0(s0)
1028c: a0098513 addi a0,s3,-1536 # 21a00 <__clzsi2+0x8c>
10290: 00148493 addi s1,s1,1
10294: 295000ef jal ra,10d28 <printf>
10298: 00440413 addi s0,s0,4
1029c: fe9916e3 bne s2,s1,10288 <output+0x2c>
102a0: 01812403 lw s0,24(sp)
102a4: 01c12083 lw ra,28(sp)
102a8: 01412483 lw s1,20(sp)
102ac: 01012903 lw s2,16(sp)
102b0: 00c12983 lw s3,12(sp)
102b4: 00a00513 li a0,10
102b8: 02010113 addi sp,sp,32
102bc: 2c10006f j 10d7c <putchar>
102c0: 00a00513 li a0,10
102c4: 2b90006f j 10d7c <putchar>
000102c8 <decode>:
102c8: fe010113 addi sp,sp,-32
102cc: 01512223 sw s5,4(sp)
102d0: 00158a93 addi s5,a1,1
102d4: 01212823 sw s2,16(sp)
102d8: 002a9913 slli s2,s5,0x2
102dc: 00812c23 sw s0,24(sp)
102e0: 00050413 mv s0,a0
102e4: 00090513 mv a0,s2
102e8: 00912a23 sw s1,20(sp)
102ec: 01312623 sw s3,12(sp)
102f0: 01412423 sw s4,8(sp)
102f4: 00060493 mv s1,a2
102f8: 00112e23 sw ra,28(sp)
102fc: 00058993 mv s3,a1
10300: 00068a13 mv s4,a3
10304: 160000ef jal ra,10464 <malloc>
10308: 00952023 sw s1,0(a0)
1030c: 03305663 blez s3,10338 <decode+0x70>
10310: ffc90593 addi a1,s2,-4
10314: 00040793 mv a5,s0
10318: 00450713 addi a4,a0,4
1031c: 00b405b3 add a1,s0,a1
10320: 0007a603 lw a2,0(a5)
10324: 00470713 addi a4,a4,4
10328: 00478793 addi a5,a5,4
1032c: 00c4c4b3 xor s1,s1,a2
10330: fe972e23 sw s1,-4(a4)
10334: fef596e3 bne a1,a5,10320 <decode+0x58>
10338: 01c12083 lw ra,28(sp)
1033c: 01812403 lw s0,24(sp)
10340: 015a2023 sw s5,0(s4)
10344: 01412483 lw s1,20(sp)
10348: 01012903 lw s2,16(sp)
1034c: 00c12983 lw s3,12(sp)
10350: 00812a03 lw s4,8(sp)
10354: 00412a83 lw s5,4(sp)
10358: 02010113 addi sp,sp,32
1035c: 00008067 ret
```
#### observation
main function
> satck use : 80 (bytes)
saving all variable(like nums1[0], nums1[1], first_1......) in stack
using register t0 ~t6, a0 ~ a6, s0, ra
only read data address in once, because in data section nums2 and nums3 is continuous
saving num1, num2, num3 in stack is continuous
using sp for stack address location
deoce function
> satck use : 32 (bytes)
using register: a0 ~ a5, s0 ~ s4, ra, s0, sp
using address != address_end for loop
output function
> satck use : 32 (bytes)
using register: a0 ~ a5, s1 ~ s3, ra, sp
using more register replace stack
#### rv32emu --stats
```shell
$ $HOME/rv32emu/build/rv32emu --stats temp_fast.elf
1 0 2 1
4 2 0 7 4
2 7 1 6 14 7
inferior exit code 0
CSR cycle count: 10403
```
#### readelf
```shell
$ riscv-none-elf-readelf -h temp_fast.elf
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x101b4
Start of program headers: 52 (bytes into file)
Start of section headers: 95252 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
```
#### lf size
```shell
$ riscv-none-elf-size temp_fast.elf
text data bss dec hex filename
75568 2816 812 79196 1355c temp_fast.elf
```
### Os
#### compile
```shell
riscv-none-elf-gcc -march=rv32i -mabi=ilp32 -Os -o temp_store.elf temp.c
```
#### objdump
```shell
$ riscv-none-elf-objdump -d temp_store.elf
000100c4 <main>:
100c4: fb010113 addi sp,sp,-80
100c8: 000225b7 lui a1,0x22
100cc: 04812423 sw s0,72(sp)
100d0: 00c00613 li a2,12
100d4: 9cc58413 addi s0,a1,-1588 # 219cc <__clzsi2+0x8c>
100d8: 01010513 addi a0,sp,16
100dc: 9cc58593 addi a1,a1,-1588
100e0: 04112623 sw ra,76(sp)
100e4: 2e5000ef jal ra,10bc8 <memcpy>
100e8: 00c40593 addi a1,s0,12
100ec: 01000613 li a2,16
100f0: 01c10513 addi a0,sp,28
100f4: 2d5000ef jal ra,10bc8 <memcpy>
100f8: 01c40593 addi a1,s0,28
100fc: 01400613 li a2,20
10100: 02c10513 addi a0,sp,44
10104: 2c5000ef jal ra,10bc8 <memcpy>
10108: 00c10693 addi a3,sp,12
1010c: 00100613 li a2,1
10110: 00300593 li a1,3
10114: 01010513 addi a0,sp,16
10118: 00012623 sw zero,12(sp)
1011c: 180000ef jal ra,1029c <decode>
10120: 00c12583 lw a1,12(sp)
10124: 110000ef jal ra,10234 <output>
10128: 00c10693 addi a3,sp,12
1012c: 00400613 li a2,4
10130: 00400593 li a1,4
10134: 01c10513 addi a0,sp,28
10138: 164000ef jal ra,1029c <decode>
1013c: 00c12583 lw a1,12(sp)
10140: 0f4000ef jal ra,10234 <output>
10144: 00c10693 addi a3,sp,12
10148: 00200613 li a2,2
1014c: 00500593 li a1,5
10150: 02c10513 addi a0,sp,44
10154: 148000ef jal ra,1029c <decode>
10158: 00c12583 lw a1,12(sp)
1015c: 0d8000ef jal ra,10234 <output>
10160: 04c12083 lw ra,76(sp)
10164: 04812403 lw s0,72(sp)
10168: 00000513 li a0,0
1016c: 05010113 addi sp,sp,80
10170: 00008067 ret
0010234 <output>:
10234: fe010113 addi sp,sp,-32
10238: 00812c23 sw s0,24(sp)
1023c: 00912a23 sw s1,20(sp)
10240: 01212823 sw s2,16(sp)
10244: 01312623 sw s3,12(sp)
10248: 00112e23 sw ra,28(sp)
1024c: 00050913 mv s2,a0
10250: 00058493 mv s1,a1
10254: 00000413 li s0,0
10258: 000229b7 lui s3,0x22
1025c: 02944263 blt s0,s1,10280 <output+0x4c>
10260: 01812403 lw s0,24(sp)
10264: 01c12083 lw ra,28(sp)
10268: 01412483 lw s1,20(sp)
1026c: 01012903 lw s2,16(sp)
10270: 00c12983 lw s3,12(sp)
10274: 00a00513 li a0,10
10278: 02010113 addi sp,sp,32
1027c: 4710006f j 10eec <putchar>
10280: 00241793 slli a5,s0,0x2
10284: 00f907b3 add a5,s2,a5
10288: 0007a583 lw a1,0(a5)
1028c: 9c898513 addi a0,s3,-1592 # 219c8 <__clzsi2+0x88>
10290: 00140413 addi s0,s0,1
10294: 405000ef jal ra,10e98 <printf>
10298: fc5ff06f j 1025c <output+0x28>
0001029c <decode>:
1029c: fe010113 addi sp,sp,-32
102a0: 01412423 sw s4,8(sp)
102a4: 00158a13 addi s4,a1,1
102a8: 01212823 sw s2,16(sp)
102ac: 00050913 mv s2,a0
102b0: 002a1513 slli a0,s4,0x2
102b4: 00812c23 sw s0,24(sp)
102b8: 00912a23 sw s1,20(sp)
102bc: 01312623 sw s3,12(sp)
102c0: 00112e23 sw ra,28(sp)
102c4: 00060993 mv s3,a2
102c8: 00058413 mv s0,a1
102cc: 00068493 mv s1,a3
102d0: 160000ef jal ra,10430 <malloc>
102d4: 01352023 sw s3,0(a0)
102d8: 00050713 mv a4,a0
102dc: 00000793 li a5,0
102e0: 00470713 addi a4,a4,4
102e4: 0287c463 blt a5,s0,1030c <decode+0x70>
102e8: 01c12083 lw ra,28(sp)
102ec: 01812403 lw s0,24(sp)
102f0: 0144a023 sw s4,0(s1)
102f4: 01012903 lw s2,16(sp)
102f8: 01412483 lw s1,20(sp)
102fc: 00c12983 lw s3,12(sp)
10300: 00812a03 lw s4,8(sp)
10304: 02010113 addi sp,sp,32
10308: 00008067 ret
1030c: 00279693 slli a3,a5,0x2
10310: 00d906b3 add a3,s2,a3
10314: 0006a683 lw a3,0(a3)
10318: ffc72603 lw a2,-4(a4)
1031c: 00178793 addi a5,a5,1
10320: 00c6c6b3 xor a3,a3,a2
10324: 00d72023 sw a3,0(a4)
10328: fb9ff06f j 102e0 <decode+0x44>
```
#### observation
main function
> satck use : 80 (bytes)
saving all variable(like nums1[0], nums1[1], first_1......) in stack
using register: a0 ~ a3, s0, ra, sp
using memcpy for store num in stack
only read data address in once, because in data section nums2 and nums3 is continuous
saving num1, num2, num3 in stack is continuous
using sp for stack address location
deoce function
> satck use : 32 (bytes)
using register: a0 ~ a3, s0 ~ s3, ra, sp
using address != address_end for loop
output function
> satck use : 32 (bytes)
using register: a0 ~ a5, s1 ~ s3, ra, sp
using more fuction replace using register
#### rv32emu --stats
```shell
$ $HOME/rv32emu/build/rv32emu --stats temp_store.elf
1 0 2 1
4 2 0 7 4
2 7 1 6 14 7
inferior exit code 0
CSR cycle count: 10528
```
readelf
```
$ riscv-none-elf-readelf -h temp_store.elf
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x1018c
Start of program headers: 52 (bytes into file)
Start of section headers: 95252 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
```
elf-size
```shell
$ riscv-none-elf-size temp_store.elf
text data bss dec hex filename
75532 2816 812 79160 13538 temp_store.elf
```
### question
#### question:
when I compare O0, Os, Ofast, i find "num1" are not store in data section when use O0 and Ofast, they use "li" instruction for get value of "num1", but using "lui" and "lw" instruction for get value of "num2" and "num3".
I using below command for getting assemble code, I don't know why compile using to way to store array in O0, Ofast.
$ riscv-none-elf-gcc -march=rv32i -mabi=ilp32 -Os -S
[c code](https://hackmd.io/7JFyoJ59R7m7aFopLthAKA?view#c1)
```
int num1[] = {1,2,3};
int num2[] = {6,2,7,3};
int num3[] = {5,6,7,8,9};
```
in O0
```
# temp.c:23: int num1[] = {1,2,3};
li a5,1 # tmp77,
sw a5,-56(s0) # tmp77, num1[0]
li a5,2 # tmp78,
sw a5,-52(s0) # tmp78, num1[1]
li a5,3 # tmp79,
sw a5,-48(s0) # tmp79, num1[2]
# temp.c:24: int num2[] = {6,2,7,3};
lui a5,%hi(.LC0) # tmp80,
addi a5,a5,%lo(.LC0) # tmp81, tmp80,
lw a2,0(a5) # tmp82,
lw a3,4(a5) # tmp83,
lw a4,8(a5) # tmp84,
lw a5,12(a5) # tmp85,
sw a2,-72(s0) # tmp82, num2
sw a3,-68(s0) # tmp83, num2
sw a4,-64(s0) # tmp84, num2
sw a5,-60(s0) # tmp85, num2
.LC0:
.word 6
.word 2
.word 7
.word 3
.align 2
.LC1:
.word 5
.word 6
.word 7
.word 8
.word 9
.text
.align 2
.globl main
.type main, @function
```
in Ofast
```
.LC0:
.word 6
.word 2
.word 7
.word 3
.LC1:
.word 5
.word 6
.word 7
.word 8
.word 9
.ident "GCC: (xPack GNU RISC-V Embedded GCC x86_64) 12.2.0"
```
in Os
```
.LC0:
.word 1
.word 2
.word 3
.LC1:
.word 6
.word 2
.word 7
.word 3
.LC2:
.word 5
.word 6
.word 7
.word 8
.word 9
.ident "GCC: (xPack GNU RISC-V Embedded GCC x86_64) 12.2.0"
```