Before Start
We need to install RISC-V Toolchain on our virtual machine or computer and change the PATH variable of system, but the instruction in lab2 cannot change it permanently so we have to activate riscv-none-elf-gcc/setenv
as a source file each time we log in. I think it is quite annoying so I rewrite the ~/.profile
to automatically add our toolchain into user path each time we log in.
If follow the instruction, the riscv-none-elf-gcc
should under "/home/YourUserName"
directory, so ~/.profile
should add following instruction:
Add this instruction to ~/.profile
. It will automatically check whether toolchain exist and add the directory to user path.
Then you can restart the terminal or use source ~/.profile
to update PATH variable.
Check $PATH
You should be able to see this in your path variable:
The following is my result:
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
I choose the Concatenation of Array from tonych1997
Motivation: My homework one is a practice of reducing array element. And this time I want to prctice how to increase the size of array with assembly code. Also, tonych1997 wrote enough comment so the assembly code is easy to understand.
Before I start to rewrite assembly program, I encountered a question. In the implementation of system call, the syscall_write
function always print input data byte by byte, so the emulator cannot print 32 bits integer unless we first convert it into string.
In rv32emu/src/state.h we can see that the default opened file of this emulator is stdin, atdout, stderr respectively, so the sample code in rv32emu/tests/asm-hello/hello.S will set a0 to 1 before ecall to get access of standard output.
Because there is no space for other argument to determine whether print integer or string unless rewrite some structure in emulator.c, but I think it is too complicated to modified. So I decided to use a very naive way to implement the integer output.
I modified syscall.c as following:
If 0xfff
passed into a0 register, the syscall_write
function will become integer mode, which will use fprintf rather than fwrite to output data into stdout. But it cannot print integer into specified file.
After modified all of the system call to fit rv32emu's SPEC, I tried to make file and got these errors:

The resons is that rv32emu and Ripes are not totally compatable, former do not support some instruction syntax sugar so we can modify the code by simply add comma between operands.
System calls is changed to fit rv32emu, and I also change some sigle character string into char to reduce memory size. Here is my modification:
.org 0
.global _start
.set STDOUT, 1
.set SYSEXIT, 93
.set SYSWRITE, 64
.set SYSBRK, 214
.data
arr: .word 1, 2, 3 # a[3] = {1, 2, 3}
len1: .word 3 # array length of a is 3
space: .byte ' ' # space
comma: .byte ','
nline: .byte '\n'
.text
# s1 = arr a base address
# s2 = arr b base address
# s3 = array length of a
# s4 = array length of b
# t0 = i for loopCon1, loopCon2, print
_start:
la s1, arr # s1 = address of a
lw s3, len1 # s3 = length of a
add s4, s3, s3 # s4 = length of b (a*2)
add t0, x0, x0 # i = 0
jal ra, loopCon1
add t0, x0, x0 # i = 0
jal ra, loopCon2
add t0, x0, x0 # i = 0
jal ra, print
jal ra, exit
loopCon1:
add t1, t0, x0 # t1 = i
add t1, t1, t1 # t1 = t1*2
add t1, t1, t1 # t1 = t1*2 (t1*4)
add t1, t1, s1 # address of a[i] (base addr. + 4i)
lw t2, 0(t1) # t2 = s1 (content of a[i])
add t1, t0, x0 # t1 = i
add t1, t1, t1 # t1 = t1*2
add t1, t1, t1 # t1 = t1*2 (t1*4)
add t1, t1, s2 # address of b[i] (base addr. + 4i)
sw t2, 0(t1) # b[i] = t2 (content of a[i])
addi t0, t0, 1 # i++
blt t0, s3, loopCon1 # if i < length, go to loopCon1
ret # else, return to main
loopCon2:
add t1, t0, x0 # t1 = i
add t1, t1, t1 # t1 i*2
add t1, t1, t1 # t1 i*2 (t1*4)
add t1, t1, s1 # t1=i*4+base_of_arr
lw t2, 0(t1) # t2 = s1 (content of a[i])
add t1, t0, s3 # t1 = i + length
add t1, t1, t1 # t1 = t1*2
add t1, t1, t1 # t1 = t1*2 (t1*4)
add t1, t1, s2 # t1 = address of b[i+length] (base addr. + 4*(i+length))
sw t2, 0(t1) # t2 = content in arr[n+1]
addi t0, t0, 1 # i++
blt t0, s3, loopCon2 # if i < length, go to loopCon2
ret # else, return to main
print:
addi sp, sp, -4
sw ra, 0(sp)
lw t2, 0(s2) # t2 = content of b[i]
add a1, t2, x0 # load result of array b
call printInt
la a1, space # load string - space
call printChar
addi s2, s2, 4 # address move forward
addi t0, t0, 1 # i++
blt t0, s4, print
lw ra, 0(sp)
addi sp, sp, 4
ret
exit:
li a0, 0
li a7, SYSEXIT # end
ecall
# a1 is the value of int
printInt:
addi sp, sp, -4
li a0, 0xfff
sw a1, 0(sp)
mv a1, sp
li a2, 4
li a7, SYSWRITE
addi sp, sp, 4
ecall
ret
# a1 is the address of char
printChar:
li a0, STDOUT
li a2, 1
li a7, SYSWRITE
ecall
ret
Compile:
Execution and results:
And the assembly provide at here has the following output on Ripes:
Analysis

The CSR count is 225 in the picture, and line of code is 66 in this implementation.
There are some problem in this program:
-
Tonych1997 used two function (loopCon1, loopCon2) to implement an inline for loop in main function, which will spend lots of time. They stated the result might be wrong if combine these for loop together but after some studies, I realized that is because they didn't initialize the base address of array 2.
-
Because they didn't initialized the register value, array 2 will be stored at 0x00 and there is code section! The following is a monitoring of instruction memory.

before execution

after execution
-
The former is the memory contents before execution and latter is after. The code will even change instruction which haven'd been executed. I think if there is no instruction cache, which stored the unmodified instruction, the modified code will be executed and lead to some umpredictable behavior.
-
But in modern system, this situation is unlikely to happend because operating system will monitor the usage of memory and deny invalid memory access, causing a segmentation fault.
Optimization
In seek of solving the problems I mentioned before, I combine loopCon1 and loopCon2 together and make it inline. I also extend stack in _start
function to store our new array.
.org 0
.global _start
.set STDOUT, 1
.set SYSEXIT, 93
.set SYSWRITE, 64
.set SYSBRK, 214
.data
arr: .word 1, 2, 3 # a[3] = {1, 2, 3}
len1: .word 3 # array length of a is 3
space: .byte ' ' # space
comma: .byte ','
nline: .byte '\n'
.text
_start:
addi sp, sp, -24
la s1, arr
mv s2, sp
lw s3, len1
li t0, 0
slli t1, s3, 2
for1:
add t2, t0, s1
add t3, t0, s2
add t4, t3, t1
lw t2, 0(t2)
sw t2, 0(t3)
sw t2, 0(t4)
addi t0, t0, 4
blt t0, t1, for1
li t0, 0
slli t1, s3, 3
forPrint:
add t2, s2, t0
lw a1, 0(t2)
call printInt
la a1, space
call printChar
addi t0, t0, 4
blt t0, t1, forPrint
j exit
exit:
li a0, 0
li a7, SYSEXIT
ecall
printInt:
addi sp, sp, -4
li a0, 0xfff
sw a1, 0(sp)
mv a1, sp
li a2, 4
li a7, SYSWRITE
addi sp, sp, 4
ecall
ret
printChar:
li a0, STDOUT
li a2, 1
li a7, SYSWRITE
ecall
ret

After the optimization, CSR cycle count reduced to 142 and LOC reduced to 41.
Observation: My implementation of printInt will push and pop stack each time be called, which is unnecessary because there is no other function call there. Each iteration we will do push and pop once and only 2 times is necessary. I also want to modifiy it to avoid function call overhead of storing return address.
.org 0
.global _start
.set STDOUT, 1
.set SYSEXIT, 93
.set SYSWRITE, 64
.set SYSBRK, 214
.data
arr1: .word 1, 2, 3
len1: .word 3
space: .byte ' '
comma: .byte ','
nline: .byte '\n'
.text
_start:
addi sp, sp, -24
la s1, arr1
mv s2, sp
lw s3, len1
li t0, 0
slli t1, s3, 2
for1:
add t2, t0, s1
add t3, t0, s2
add t4, t3, t1
lw t2, 0(t2)
sw t2, 0(t3)
sw t2, 0(t4)
addi t0, t0, 4
blt t0, t1, for1
li t0, 0
slli t1, s3, 3
li a7, SYSWRITE
forPrint:
add t2, s2, t0
li a0, 0xfff
mv a1, t2
li a2, 4
ecall
li a0, STDOUT
la a1, space
li a2, 1
ecall
addi t0, t0, 4
blt t0, t1, forPrint
j exit
exit:
li a0, 0
li a7, SYSEXIT
ecall

Cycle count becomes 107 and LOC becomes 32. And I realized that I can take li a7, SYSWRITE
outside from for loop because we only use SYSWRITE here. There is no need to specify for each iteration. And also, there is no need to push value into stack because we already have the address of value we want to show on screen. After this optimization, function calling, function returning, stack operation and a7 configuration will be eliminated. Instruction count will have a reduction of 10n - 1, where n is two times of input array's length.
Compile C Code
To get execution file from C code, simply type:
Read the header:
I will explain some line I am instresting in.
The first line is magic number, and in wikipedia, it says magic number is a constant numerical or text value used to identify a file format or protocol
. The first byte in this line is 0x7f, which is a leet and the reason is described in this stackoverflow page. We can see this in first line of file's heximal format with hexdump
command:
And the entry point is the first address of _start
, rather than our main function. The compiler will add a _start
function in our code to avoid invalid access of computer resources. After excecution, the _start
will do some necessary initialization and invoke main
function.
For more information, there is a book name ELF format details each part of ELF file.
Read the disassembly file:
hw2/concatenation_of_array.elf: file format elf32-littleriscv
Disassembly of section .text:
00010094 <exit>:
10094: 1141 addi sp,sp,-16
10096: 4581 li a1,0
10098: c422 sw s0,8(sp)
1009a: c606 sw ra,12(sp)
1009c: 842a mv s0,a0
1009e: 4a0020ef jal ra,1253e <__call_exitprocs>
100a2: 2281a503 lw a0,552(gp) # 1da38 <_global_impure_ptr>
100a6: 5d5c lw a5,60(a0)
100a8: c391 beqz a5,100ac <exit+0x18>
100aa: 9782 jalr a5
100ac: 8522 mv a0,s0
100ae: 6cc090ef jal ra,1977a <_exit>
000100c4 <_start>:
100c4: 0000d197 auipc gp,0xd
100c8: 74c18193 addi gp,gp,1868 # 1d810 <__global_pointer$>
100cc: 24418513 addi a0,gp,580 # 1da54 <completed.1>
100d0: 57018613 addi a2,gp,1392 # 1dd80 <__BSS_END__>
100d4: 8e09 sub a2,a2,a0
100d6: 4581 li a1,0
100d8: 22d9 jal 1029e <memset>
100da: 00000517 auipc a0,0x0
100de: 12250513 addi a0,a0,290 # 101fc <__libc_fini_array>
100e2: 2239 jal 101f0 <atexit>
100e4: 2a81 jal 10234 <__libc_init_array>
100e6: 4502 lw a0,0(sp)
100e8: 004c addi a1,sp,4
100ea: 4601 li a2,0
100ec: 2891 jal 10140 <main>
100ee: b75d j 10094 <exit>
00010140 <main>:
10140: 7139 addi sp,sp,-64
10142: de06 sw ra,60(sp)
10144: dc22 sw s0,56(sp)
10146: 0080 addi s0,sp,64
10148: 4785 li a5,1
1014a: fef42023 sw a5,-32(s0)
1014e: 4789 li a5,2
10150: fef42223 sw a5,-28(s0)
10154: 478d li a5,3
10156: fef42423 sw a5,-24(s0)
1015a: fe042623 sw zero,-20(s0)
1015e: a0a9 j 101a8 <main+0x68>
10160: fec42783 lw a5,-20(s0)
10164: 078a slli a5,a5,0x2
10166: 17c1 addi a5,a5,-16
10168: 97a2 add a5,a5,s0
1016a: ff07a703 lw a4,-16(a5)
1016e: fec42783 lw a5,-20(s0)
10172: 078a slli a5,a5,0x2
10174: 17c1 addi a5,a5,-16
10176: 97a2 add a5,a5,s0
10178: fce7ac23 sw a4,-40(a5)
1017c: fec42783 lw a5,-20(s0)
10180: 00378693 addi a3,a5,3
10184: fec42783 lw a5,-20(s0)
10188: 078a slli a5,a5,0x2
1018a: 17c1 addi a5,a5,-16
1018c: 97a2 add a5,a5,s0
1018e: ff07a703 lw a4,-16(a5)
10192: 00269793 slli a5,a3,0x2
10196: 17c1 addi a5,a5,-16
10198: 97a2 add a5,a5,s0
1019a: fce7ac23 sw a4,-40(a5)
1019e: fec42783 lw a5,-20(s0)
101a2: 0785 addi a5,a5,1
101a4: fef42623 sw a5,-20(s0)
101a8: fec42703 lw a4,-20(s0)
101ac: 4789 li a5,2
101ae: fae7d9e3 bge a5,a4,10160 <main+0x20>
101b2: fe042623 sw zero,-20(s0)
101b6: a015 j 101da <main+0x9a>
101b8: fec42783 lw a5,-20(s0)
101bc: 078a slli a5,a5,0x2
101be: 17c1 addi a5,a5,-16
101c0: 97a2 add a5,a5,s0
101c2: fd87a783 lw a5,-40(a5)
101c6: 85be mv a1,a5
101c8: 67f1 lui a5,0x1c
101ca: 8e078513 addi a0,a5,-1824 # 1b8e0 <__clzsi2+0x74>
101ce: 2a61 jal 10366 <printf>
101d0: fec42783 lw a5,-20(s0)
101d4: 0785 addi a5,a5,1
101d6: fef42623 sw a5,-20(s0)
101da: fec42703 lw a4,-20(s0)
101de: 4795 li a5,5
101e0: fce7dce3 bge a5,a4,101b8 <main+0x78>
101e4: 4781 li a5,0
101e6: 853e mv a0,a5
101e8: 50f2 lw ra,60(sp)
101ea: 5462 lw s0,56(sp)
101ec: 6121 addi sp,sp,64
101ee: 8082 ret
The code has 17092 lines, which too large to show, so I simply pick up two function defined in our c implementation.
As showed, the compiler will automatically use compressed extension of rv32 to compile our c code if flags remain unspecified, but it is still very large.
As we can see in object dump file, the compiler will automatically add a function _start
to invoke our main function. The OS will first load _start
function in to physical memory and the position of _start
finction is same as entry point address got with readelf command and in this example, it is 000100c4
.
c4
is the offset of _start
function, we can know that by printing out. We can also see that risc-v is based on little endian.


And the 10000
is the default virtual memory address when load program into RAM, but I can't find the reason of this address rather than 0x00000000.
Comparing Optimization Levels
-march=rv321, -mabi=ilp32
: specify to use interger and 4-bytes-long instruction.
-fdata-sections, -ffunction-sections
: specify to seperate unused function and data.
-Wl,--gc-sections
: -Wl
will tell gcc to pass the arguments after comma to linker, and --gc-sections
tells linker not to link unused sections. (what is -Wl)
how to link used functions only


The pictures show that after removing unused data and function from program, code section reduced from 74740 to 60120, data section reduced from 2816 to 2776 and bss section reduced from 812 to 104.
-O0

00010094 <exit>:
10094: ff010113 addi sp,sp,-16
10098: 00000593 li a1,0
1009c: 00812423 sw s0,8(sp)
100a0: 00112623 sw ra,12(sp)
100a4: 00050413 mv s0,a0
100a8: 3b0030ef jal ra,13458 <__call_exitprocs>
100ac: 2081a503 lw a0,520(gp) # 1fac8 <_global_impure_ptr>
100b0: 03c52783 lw a5,60(a0)
100b4: 00078463 beqz a5,100bc <exit+0x28>
100b8: 000780e7 jalr a5
100bc: 00040513 mv a0,s0
100c0: 6800a0ef jal ra,1a740 <_exit>
000100dc <_start>:
100dc: 0000f197 auipc gp,0xf
100e0: 7e418193 addi gp,gp,2020 # 1f8c0 <__global_pointer$>
100e4: 21c18513 addi a0,gp,540 # 1fadc <completed.1>
100e8: 28418613 addi a2,gp,644 # 1fb44 <__BSS_END__>
100ec: 40a60633 sub a2,a2,a0
100f0: 00000593 li a1,0
100f4: 28c000ef jal ra,10380 <memset>
100f8: 00000517 auipc a0,0x0
100fc: 19850513 addi a0,a0,408 # 10290 <__libc_fini_array>
10100: 17c000ef jal ra,1027c <atexit>
10104: 1e8000ef jal ra,102ec <__libc_init_array>
10108: 00012503 lw a0,0(sp)
1010c: 00410593 addi a1,sp,4
10110: 00000613 li a2,0
10114: 070000ef jal ra,10184 <main>
10118: f7dff06f j 10094 <exit>
00010184 <main>:
10184: fc010113 addi sp,sp,-64
10188: 02112e23 sw ra,60(sp)
1018c: 02812c23 sw s0,56(sp)
10190: 04010413 addi s0,sp,64
10194: 00100793 li a5,1
10198: fef42023 sw a5,-32(s0)
1019c: 00200793 li a5,2
101a0: fef42223 sw a5,-28(s0)
101a4: 00300793 li a5,3
101a8: fef42423 sw a5,-24(s0)
101ac: fe042623 sw zero,-20(s0)
101b0: 0640006f j 10214 <main+0x90>
101b4: fec42783 lw a5,-20(s0)
101b8: 00279793 slli a5,a5,0x2
101bc: ff078793 addi a5,a5,-16
101c0: 008787b3 add a5,a5,s0
101c4: ff07a703 lw a4,-16(a5)
101c8: fec42783 lw a5,-20(s0)
101cc: 00279793 slli a5,a5,0x2
101d0: ff078793 addi a5,a5,-16
101d4: 008787b3 add a5,a5,s0
101d8: fce7ac23 sw a4,-40(a5)
101dc: fec42783 lw a5,-20(s0)
101e0: 00378693 addi a3,a5,3
101e4: fec42783 lw a5,-20(s0)
101e8: 00279793 slli a5,a5,0x2
101ec: ff078793 addi a5,a5,-16
101f0: 008787b3 add a5,a5,s0
101f4: ff07a703 lw a4,-16(a5)
101f8: 00269793 slli a5,a3,0x2
101fc: ff078793 addi a5,a5,-16
10200: 008787b3 add a5,a5,s0
10204: fce7ac23 sw a4,-40(a5)
10208: fec42783 lw a5,-20(s0)
1020c: 00178793 addi a5,a5,1
10210: fef42623 sw a5,-20(s0)
10214: fec42703 lw a4,-20(s0)
10218: 00200793 li a5,2
1021c: f8e7dce3 bge a5,a4,101b4 <main+0x30>
10220: fe042623 sw zero,-20(s0)
10224: 0340006f j 10258 <main+0xd4>
10228: fec42783 lw a5,-20(s0)
1022c: 00279793 slli a5,a5,0x2
10230: ff078793 addi a5,a5,-16
10234: 008787b3 add a5,a5,s0
10238: fd87a783 lw a5,-40(a5)
1023c: 00078593 mv a1,a5
10240: 0001e7b7 lui a5,0x1e
10244: 2a078513 addi a0,a5,672 # 1e2a0 <__clzsi2+0x8c>
10248: 214000ef jal ra,1045c <printf>
1024c: fec42783 lw a5,-20(s0)
10250: 00178793 addi a5,a5,1
10254: fef42623 sw a5,-20(s0)
10258: fec42703 lw a4,-20(s0)
1025c: 00500793 li a5,5
10260: fce7d4e3 bge a5,a4,10228 <main+0xa4>
10264: 00000793 li a5,0
10268: 00078513 mv a0,a5
1026c: 03c12083 lw ra,60(sp)
10270: 03812403 lw s0,56(sp)
10274: 04010113 addi sp,sp,64
10278: 00008067 ret
-O1

00010094 <exit>:
10094: ff010113 addi sp,sp,-16
10098: 00000593 li a1,0
1009c: 00812423 sw s0,8(sp)
100a0: 00112623 sw ra,12(sp)
100a4: 00050413 mv s0,a0
100a8: 32c030ef jal ra,133d4 <__call_exitprocs>
100ac: 2081a503 lw a0,520(gp) # 1fac8 <_global_impure_ptr>
100b0: 03c52783 lw a5,60(a0)
100b4: 00078463 beqz a5,100bc <exit+0x28>
100b8: 000780e7 jalr a5
100bc: 00040513 mv a0,s0
100c0: 5fc0a0ef jal ra,1a6bc <_exit>
000100dc <_start>:
100dc: 0000f197 auipc gp,0xf
100e0: 7e418193 addi gp,gp,2020 # 1f8c0 <__global_pointer$>
100e4: 21c18513 addi a0,gp,540 # 1fadc <completed.1>
100e8: 28418613 addi a2,gp,644 # 1fb44 <__BSS_END__>
100ec: 40a60633 sub a2,a2,a0
100f0: 00000593 li a1,0
100f4: 208000ef jal ra,102fc <memset>
100f8: 00000517 auipc a0,0x0
100fc: 11450513 addi a0,a0,276 # 1020c <__libc_fini_array>
10100: 0f8000ef jal ra,101f8 <atexit>
10104: 164000ef jal ra,10268 <__libc_init_array>
10108: 00012503 lw a0,0(sp)
1010c: 00410593 addi a1,sp,4
10110: 00000613 li a2,0
10114: 070000ef jal ra,10184 <main>
10118: f7dff06f j 10094 <exit>
00010184 <main>:
10184: fd010113 addi sp,sp,-48
10188: 02112623 sw ra,44(sp)
1018c: 02812423 sw s0,40(sp)
10190: 02912223 sw s1,36(sp)
10194: 03212023 sw s2,32(sp)
10198: 00100793 li a5,1
1019c: 00f12423 sw a5,8(sp)
101a0: 00f12a23 sw a5,20(sp)
101a4: 00200793 li a5,2
101a8: 00f12623 sw a5,12(sp)
101ac: 00f12c23 sw a5,24(sp)
101b0: 00300793 li a5,3
101b4: 00f12823 sw a5,16(sp)
101b8: 00f12e23 sw a5,28(sp)
101bc: 00810413 addi s0,sp,8
101c0: 02010913 addi s2,sp,32
101c4: 0001e4b7 lui s1,0x1e
101c8: 00042583 lw a1,0(s0)
101cc: 21848513 addi a0,s1,536 # 1e218 <__clzsi2+0x88>
101d0: 208000ef jal ra,103d8 <printf>
101d4: 00440413 addi s0,s0,4
101d8: ff2418e3 bne s0,s2,101c8 <main+0x44>
101dc: 00000513 li a0,0
101e0: 02c12083 lw ra,44(sp)
101e4: 02812403 lw s0,40(sp)
101e8: 02412483 lw s1,36(sp)
101ec: 02012903 lw s2,32(sp)
101f0: 03010113 addi sp,sp,48
101f4: 00008067 ret
-O2

00010094 <exit>:
10094: ff010113 addi sp,sp,-16
10098: 00000593 li a1,0
1009c: 00812423 sw s0,8(sp)
100a0: 00112623 sw ra,12(sp)
100a4: 00050413 mv s0,a0
100a8: 32c030ef jal ra,133d4 <__call_exitprocs>
100ac: 2081a503 lw a0,520(gp) # 1fac8 <_global_impure_ptr>
100b0: 03c52783 lw a5,60(a0)
100b4: 00078463 beqz a5,100bc <exit+0x28>
100b8: 000780e7 jalr a5
100bc: 00040513 mv a0,s0
100c0: 5fc0a0ef jal ra,1a6bc <_exit>
000100c4 <main>:
100c4: fd010113 addi sp,sp,-48
100c8: 00100693 li a3,1
100cc: 00200713 li a4,2
100d0: 00300793 li a5,3
100d4: 02812423 sw s0,40(sp)
100d8: 02912223 sw s1,36(sp)
100dc: 03212023 sw s2,32(sp)
100e0: 02112623 sw ra,44(sp)
100e4: 00d12423 sw a3,8(sp)
100e8: 00d12a23 sw a3,20(sp)
100ec: 00e12623 sw a4,12(sp)
100f0: 00e12c23 sw a4,24(sp)
100f4: 00f12823 sw a5,16(sp)
100f8: 00f12e23 sw a5,28(sp)
100fc: 00810413 addi s0,sp,8
10100: 02010913 addi s2,sp,32
10104: 0001e4b7 lui s1,0x1e
10108: 00042583 lw a1,0(s0)
1010c: 21848513 addi a0,s1,536 # 1e218 <__clzsi2+0x88>
10110: 00440413 addi s0,s0,4
10114: 2c4000ef jal ra,103d8 <printf>
10118: ff2418e3 bne s0,s2,10108 <main+0x44>
1011c: 02c12083 lw ra,44(sp)
10120: 02812403 lw s0,40(sp)
10124: 02412483 lw s1,36(sp)
10128: 02012903 lw s2,32(sp)
1012c: 00000513 li a0,0
10130: 03010113 addi sp,sp,48
10134: 00008067 ret
00010150 <_start>:
10150: 0000f197 auipc gp,0xf
10154: 77018193 addi gp,gp,1904 # 1f8c0 <__global_pointer$>
10158: 21c18513 addi a0,gp,540 # 1fadc <completed.1>
1015c: 28418613 addi a2,gp,644 # 1fb44 <__BSS_END__>
10160: 40a60633 sub a2,a2,a0
10164: 00000593 li a1,0
10168: 194000ef jal ra,102fc <memset>
1016c: 00000517 auipc a0,0x0
10170: 0a050513 addi a0,a0,160 # 1020c <__libc_fini_array>
10174: 084000ef jal ra,101f8 <atexit>
10178: 0f0000ef jal ra,10268 <__libc_init_array>
1017c: 00012503 lw a0,0(sp)
10180: 00410593 addi a1,sp,4
10184: 00000613 li a2,0
10188: f3dff0ef jal ra,100c4 <main>
1018c: f09ff06f j 10094 <exit>
-O3

00010094 <exit>:
10094: ff010113 addi sp,sp,-16
10098: 00000593 li a1,0
1009c: 00812423 sw s0,8(sp)
100a0: 00112623 sw ra,12(sp)
100a4: 00050413 mv s0,a0
100a8: 32c030ef jal ra,133d4 <__call_exitprocs>
100ac: 2081a503 lw a0,520(gp) # 1fac8 <_global_impure_ptr>
100b0: 03c52783 lw a5,60(a0)
100b4: 00078463 beqz a5,100bc <exit+0x28>
100b8: 000780e7 jalr a5
100bc: 00040513 mv a0,s0
100c0: 5fc0a0ef jal ra,1a6bc <_exit>
000100c4 <main>:
100c4: fd010113 addi sp,sp,-48
100c8: 00100693 li a3,1
100cc: 00200713 li a4,2
100d0: 00300793 li a5,3
100d4: 02812423 sw s0,40(sp)
100d8: 02912223 sw s1,36(sp)
100dc: 03212023 sw s2,32(sp)
100e0: 02112623 sw ra,44(sp)
100e4: 00d12423 sw a3,8(sp)
100e8: 00d12a23 sw a3,20(sp)
100ec: 00e12623 sw a4,12(sp)
100f0: 00e12c23 sw a4,24(sp)
100f4: 00f12823 sw a5,16(sp)
100f8: 00f12e23 sw a5,28(sp)
100fc: 00810413 addi s0,sp,8
10100: 02010913 addi s2,sp,32
10104: 0001e4b7 lui s1,0x1e
10108: 00042583 lw a1,0(s0)
1010c: 21848513 addi a0,s1,536 # 1e218 <__clzsi2+0x88>
10110: 00440413 addi s0,s0,4
10114: 2c4000ef jal ra,103d8 <printf>
10118: ff2418e3 bne s0,s2,10108 <main+0x44>
1011c: 02c12083 lw ra,44(sp)
10120: 02812403 lw s0,40(sp)
10124: 02412483 lw s1,36(sp)
10128: 02012903 lw s2,32(sp)
1012c: 00000513 li a0,0
10130: 03010113 addi sp,sp,48
10134: 00008067 ret
00010150 <_start>:
10150: 0000f197 auipc gp,0xf
10154: 77018193 addi gp,gp,1904 # 1f8c0 <__global_pointer$>
10158: 21c18513 addi a0,gp,540 # 1fadc <completed.1>
1015c: 28418613 addi a2,gp,644 # 1fb44 <__BSS_END__>
10160: 40a60633 sub a2,a2,a0
10164: 00000593 li a1,0
10168: 194000ef jal ra,102fc <memset>
1016c: 00000517 auipc a0,0x0
10170: 0a050513 addi a0,a0,160 # 1020c <__libc_fini_array>
10174: 084000ef jal ra,101f8 <atexit>
10178: 0f0000ef jal ra,10268 <__libc_init_array>
1017c: 00012503 lw a0,0(sp)
10180: 00410593 addi a1,sp,4
10184: 00000613 li a2,0
10188: f3dff0ef jal ra,100c4 <main>
1018c: f09ff06f j 10094 <exit>
-Os

00010094 <exit>:
10094: ff010113 addi sp,sp,-16
10098: 00000593 li a1,0
1009c: 00812423 sw s0,8(sp)
100a0: 00112623 sw ra,12(sp)
100a4: 00050413 mv s0,a0
100a8: 324030ef jal ra,133cc <__call_exitprocs>
100ac: 2081a503 lw a0,520(gp) # 1fac8 <_global_impure_ptr>
100b0: 03c52783 lw a5,60(a0)
100b4: 00078463 beqz a5,100bc <exit+0x28>
100b8: 000780e7 jalr a5
100bc: 00040513 mv a0,s0
100c0: 5f40a0ef jal ra,1a6b4 <_exit>
000100c4 <main>:
100c4: fd010113 addi sp,sp,-48
100c8: 00100793 li a5,1
100cc: 00f12423 sw a5,8(sp)
100d0: 00f12a23 sw a5,20(sp)
100d4: 00200793 li a5,2
100d8: 00f12623 sw a5,12(sp)
100dc: 00f12c23 sw a5,24(sp)
100e0: 00300793 li a5,3
100e4: 02812423 sw s0,40(sp)
100e8: 02912223 sw s1,36(sp)
100ec: 02112623 sw ra,44(sp)
100f0: 00f12823 sw a5,16(sp)
100f4: 00f12e23 sw a5,28(sp)
100f8: 00810413 addi s0,sp,8
100fc: 0001e4b7 lui s1,0x1e
10100: 00042583 lw a1,0(s0)
10104: 21048513 addi a0,s1,528 # 1e210 <__clzsi2+0x88>
10108: 00440413 addi s0,s0,4
1010c: 2c4000ef jal ra,103d0 <printf>
10110: 02010793 addi a5,sp,32
10114: fef416e3 bne s0,a5,10100 <main+0x3c>
10118: 02c12083 lw ra,44(sp)
1011c: 02812403 lw s0,40(sp)
10120: 02412483 lw s1,36(sp)
10124: 00000513 li a0,0
10128: 03010113 addi sp,sp,48
1012c: 00008067 ret
00010148 <_start>:
10148: 0000f197 auipc gp,0xf
1014c: 77818193 addi gp,gp,1912 # 1f8c0 <__global_pointer$>
10150: 21c18513 addi a0,gp,540 # 1fadc <completed.1>
10154: 28418613 addi a2,gp,644 # 1fb44 <__BSS_END__>
10158: 40a60633 sub a2,a2,a0
1015c: 00000593 li a1,0
10160: 194000ef jal ra,102f4 <memset>
10164: 00000517 auipc a0,0x0
10168: 0a050513 addi a0,a0,160 # 10204 <__libc_fini_array>
1016c: 084000ef jal ra,101f0 <atexit>
10170: 0f0000ef jal ra,10260 <__libc_init_array>
10174: 00012503 lw a0,0(sp)
10178: 00410593 addi a1,sp,4
1017c: 00000613 li a2,0
10180: f45ff0ef jal ra,100c4 <main>
10184: f11ff06f j 10094 <exit>
-Ofast

00010094 <exit>:
10094: ff010113 addi sp,sp,-16
10098: 00000593 li a1,0
1009c: 00812423 sw s0,8(sp)
100a0: 00112623 sw ra,12(sp)
100a4: 00050413 mv s0,a0
100a8: 32c030ef jal ra,133d4 <__call_exitprocs>
100ac: 2081a503 lw a0,520(gp) # 1fac8 <_global_impure_ptr>
100b0: 03c52783 lw a5,60(a0)
100b4: 00078463 beqz a5,100bc <exit+0x28>
100b8: 000780e7 jalr a5
100bc: 00040513 mv a0,s0
100c0: 5fc0a0ef jal ra,1a6bc <_exit>
000100c4 <main>:
100c4: fd010113 addi sp,sp,-48
100c8: 00100693 li a3,1
100cc: 00200713 li a4,2
100d0: 00300793 li a5,3
100d4: 02812423 sw s0,40(sp)
100d8: 02912223 sw s1,36(sp)
100dc: 03212023 sw s2,32(sp)
100e0: 02112623 sw ra,44(sp)
100e4: 00d12423 sw a3,8(sp)
100e8: 00d12a23 sw a3,20(sp)
100ec: 00e12623 sw a4,12(sp)
100f0: 00e12c23 sw a4,24(sp)
100f4: 00f12823 sw a5,16(sp)
100f8: 00f12e23 sw a5,28(sp)
100fc: 00810413 addi s0,sp,8
10100: 02010913 addi s2,sp,32
10104: 0001e4b7 lui s1,0x1e
10108: 00042583 lw a1,0(s0)
1010c: 21848513 addi a0,s1,536 # 1e218 <__clzsi2+0x88>
10110: 00440413 addi s0,s0,4
10114: 2c4000ef jal ra,103d8 <printf>
10118: ff2418e3 bne s0,s2,10108 <main+0x44>
1011c: 02c12083 lw ra,44(sp)
10120: 02812403 lw s0,40(sp)
10124: 02412483 lw s1,36(sp)
10128: 02012903 lw s2,32(sp)
1012c: 00000513 li a0,0
10130: 03010113 addi sp,sp,48
10134: 00008067 ret
00010150 <_start>:
10150: 0000f197 auipc gp,0xf
10154: 77018193 addi gp,gp,1904 # 1f8c0 <__global_pointer$>
10158: 21c18513 addi a0,gp,540 # 1fadc <completed.1>
1015c: 28418613 addi a2,gp,644 # 1fb44 <__BSS_END__>
10160: 40a60633 sub a2,a2,a0
10164: 00000593 li a1,0
10168: 194000ef jal ra,102fc <memset>
1016c: 00000517 auipc a0,0x0
10170: 0a050513 addi a0,a0,160 # 1020c <__libc_fini_array>
10174: 084000ef jal ra,101f8 <atexit>
10178: 0f0000ef jal ra,10268 <__libc_init_array>
1017c: 00012503 lw a0,0(sp)
10180: 00410593 addi a1,sp,4
10184: 00000613 li a2,0
10188: f3dff0ef jal ra,100c4 <main>
1018c: f09ff06f j 10094 <exit>
Analysis
level |
text size |
cycle count |
address of self written code |
size of main |
O0 |
60120 |
4455 |
0x10184 |
244 |
O1 |
59988 |
4104 |
0x10184 |
112 |
O2 |
59988 |
4104 |
0x100c4 |
112 |
O3 |
59988 |
4104 |
0x100c4 |
112 |
Os |
59988 |
4107 |
0x100c4 |
104 |
Ofast |
59988 |
4104 |
0x100c4 |
112 |
We can see that first address of main is at 0x10184
when O0
and O1
are specified. This means the entry point address (100c4
) doesn't point to self written code and a strat routine provied by library is needed. As I mentioned before, usually a _start
function will be added in our elf file, but in seek of optimization, it might be excluded. O2
, O3
, Os
and Ofast
will directly execute main function without the help of _start
, and the size of main function seems comparably large besides Os. I think it is because althought the efficiency may increased by eliminating _start
, the main function should handel some condition that will hazard the systeym by itself. I have tried to read the main functions' assembly code but it it too understand to understand withot help of O1's and Os'.
There is a problem. The entry point might be diffrent from each elf file. After reviewing my homework again, I realized that the _start
function exists in every elf file.
And we can realize that after optimization level of O2, the text section and cycle counts remain the same. There is a limitation of compiler optimization. And the code size is significantly larger than hand due to heavy work of printf