Assignment2: RISC-V Toolchain

contributed by < dhiptmc >

tags: `RISC-V` `Computer Architecture 2022`

Problem

Problem was chosen from the work done by 王昱承 in Assignment 1.

Running Sum of 1d Array(LeetCode1480)

Given an array nums. We define a running sum of an array as runningSum[i] = sum(nums[0]…nums[i]).

Return the running sum of nums.

Motivation:
In the last assignment, I spent lots of time dealing with arrays.
I would like to examine whether I can make use of the experiences learnt from previous attempts.

Example 1:

Input: nums = [1,2,3,4]
Output: [1,3,6,10]
Explanation: Running sum is obtained as follows: [1, 1+2, 1+2+3, 1+2+3+4].

Example 2:

Input: nums = [1,1,1,1,1]
Output: [1,2,3,4,5]
Explanation: Running sum is obtained as follows: [1, 1+1, 1+1+1, 1+1+1+1, 1+1+1+1+1].

Example 3:

Input: nums = [3,1,2,10,1]
Output: [3,4,6,16,17]

Solution

The solution is based on the work done by 王昱承.

Initially, pass the first element, i.e., nums[0], to result[0]. Add the result[i-1] element with nums[i], then pass the value to the result.

Return the result array eventually.

Implementation

C code









































#include <stdlib.h>
#include <stdio.h>

int* runningSum(int* nums, int numsSize){
    int* result = (int*)malloc(sizeof(int)*numsSize);
    
    
    for(int i=0; i<numsSize; i++){
        if(i==0)
            result[0] = nums[0];
        else
            result[i] = result[i-1] + nums[i];
    }
    
    return result;
}

int main(int argc, char *argv[]){
    int nums1[5] = {1,2,3,4,5};
    int nums2[6] = {1,3,5,7,9,11};
    int nums3[7] = {0,2,4,6,8,10,12};

    int* result1 = runningSum(nums1, 5);
    int* result2 = runningSum(nums2, 6);
    int* result3 = runningSum(nums3, 7);
    
    for(int j=0; j<5; j++){
        printf("%d ", result1[j]);
    }
    printf("\n");
    
    for(int j=0; j<6; j++){
        printf("%d ", result2[j]);
    }
    printf("\n");
    
    for(int j=0; j<7; j++){
        printf("%d ", result3[j]);
    }
    printf("\n");
}

Assembly code

























































.data
nums1:     .word 1,2,3,4,5
nums2:     .word 1,3,5,7,9,11
nums3:     .word 0,2,4,6,8,10,12
res1:      .word 0,0,0,0,0,
res2:      .word 0,0,0,0,0,0
res3:      .word 0,0,0,0,0,0,0
space:     .string " "
nl:        .string "\n"
.text
main:                      # t2: nums head, t3: result head, t4: numsSize
    la      t2, nums1      # load arr base to t2
    la      t3, res1       # load res base to t3
    li      t4, 5          # load the numSize to t4
    jal     ra, runningSum
    la      t2, nums2      # load arr base to t2
    la      t3, res2       # load res base to t3
    li      t4, 6          # load the numSize to t4
    jal     ra, runningSum
    la      t2, nums3      # load arr base to t2
    la      t3, res3       # load res base to t3
    li      t4, 7          # load the numSize to t4
    jal     ra, runningSum
    j       exit
runningSum:                # t5: i
    li      t5, 1          # load i with 1
    lw      a3, 0(t2)      # load nums[0] to a3
    sw      a3, 0(t3)      # save num[0] to result[0]
end.L1:
    slli    a0, t5, 2      # i*4 (cause of byte size)
    add     a1, t3, a0     # add result's address with i*4 to get result[i]'s address
    lw      a3, -4(a1)     # load result[i-1] to a3
    add     a2, t2, a0     # add nums's address with i*4 to get nums[i]'s address
    lw      a2, 0(a2)      # load nums[i] to a5
    add     a3, a3, a2     # add result[i-1] + nums[i]
    sw      a3, 0(a1)      # save to result[i]
    addi    t5, t5, 1      # i = i + 1 
    blt     t5, t4, end.L1 # if i < numsSize then branch to end.L1
    li      t5, 0
print:
    slli    a1, t5, 2
    add     a1, t3, a1
    lw      a0, 0(a1)      # load result out
    li      a7, 1          # pint a0
    ecall
    la      a0, space      # load space
    li      a7, 4          # print string
    ecall
    addi    t5, t5, 1      # j++
    blt     t5, t4, print  # loop
    la      a0, nl         # load next line
    li      a7,  4         # print string
    ecall
    jr      ra
exit:
    li      a7, 10         # exit
    ecall

There are possible ways to use fewer instructions.
Modified version of Assembly code is shown below.






















































.data
nums1:     .word 1,2,3,4,5
nums2:     .word 1,3,5,7,9,11
nums3:     .word 0,2,4,6,8,10,12
res1:      .word 0,0,0,0,0,
res2:      .word 0,0,0,0,0,0
res3:      .word 0,0,0,0,0,0,0
space:     .string " "
nl:        .string "\n"
.text
main:                      # t2: nums head, t3: result head, t4: numsSize
    la      t2, nums1      # load arr base to t2
    la      t3, res1       # load res base to t3
    li      t4, 5          # load the numSize to t4
    jal     ra, runningSum
    la      t2, nums2      # load arr base to t2
    la      t3, res2       # load res base to t3
    li      t4, 6          # load the numSize to t4
    jal     ra, runningSum
    la      t2, nums3      # load arr base to t2
    la      t3, res3       # load res base to t3
    li      t4, 7          # load the numSize to t4
    jal     ra, runningSum
    j       exit
runningSum:                # t5: i, t6: current result addr
    add     t6, t3, zero    
    li      t5, 1          # load i with 1
    lw      a3, 0(t2)      # load nums[0] to a3
    sw      a3, 0(t6)      # save nums[0] to result[0]
loop:
    addi    t2, t2, 4      # nums[i] addr
    lw      a2, 0(t2)      # nums[i]
    add     a3, a3, a2     # add result[i-1] + nums[i]
    addi    t6, t6, 4      # result[i]
    addi    t5, t5, 1      # i = i + 1 
    sw      a3, 0(t6)      # save result[0] + nums[1] to result[1]
    blt     t5, t4, loop   # if i < numsSize then branch to loop
print:
    lw      a0, 0(t3)      # load result out
    li      a7, 1          # pint a0
    ecall
    la      a0, space      # load space
    li      a7, 4          # print string
    ecall
    addi    t3, t3,  4
    addi    t5, t5, -1     # i--
    blt     zero, t5, print
    la      a0, nl         # load next line
    li      a7, 4          # print string
    ecall
    jr      ra
exit:
    li      a7, 10         # exit
    ecall

As a result, clock cycle as the figure shown below decreases compared to the original work.

Origin:

After modified:

Environment setup

Following the instructions of Lab2: RISC-V RV32I[MACF] emulator with ELF support,

$ git clone https://github.com/sysprog21/rv32emu
$ cd rv32emu
$ make

however, there is an error occured:

/bin/sh: 1: cc: not found
/bin/sh: 1: cc: not found
  CC    build/map.o
make: cc: No such file or directory
make: *** [Makefile:110: build/map.o] Error 127

Credit to TA(yucheng871011@gmail.com) who help solving this problem for me. That is, some tools should be installed first.

$ sudo apt install build-essential

Instructions

Compile

$ riscv-none-elf-gcc -march=rv32i -mabi=ilp32 -compilation options(e.g., -Ofast, -Os, etc.) -o output input.c

Objdump

$ riscv-none-elf-objdump -d output > output.txt

Readelf

$ riscv-none-elf-readelf -h output

Size

$ riscv-none-elf-size output

Execute and Statistics of execution

~/rv32emu$ build/rv32emu --stats ./Hw2/output

Compare Assembly code

In this section, C code for the problem is compiled with different compiler optimization level (-Os to -Ofast).
The assembly code is lengthy and hard to read. We focus on the main section of the code to better demonstrate the effect of compiler optimization.

Handwritten Assembly













runningSum:                # t5: i, t6: current result addr
    add     t6, t3, zero    
    li      t5, 1          # load i with 1
    lw      a3, 0(t2)      # load nums[0] to a3
    sw      a3, 0(t6)      # save nums[0] to result[0]
loop:
    addi    t2, t2, 4      # nums[i] addr
    lw      a2, 0(t2)      # nums[i]
    add     a3, a3, a2     # add result[i-1] + nums[i]
    addi    t6, t6, 4      # result[i]
    addi    t5, t5, 1      # i = i + 1 
    sw      a3, 0(t6)      # save result[0] + nums[1] to result[1]
    blt     t5, t4, loop   # if i < numsSize then branch to loop

Observation

LOC(Line of code): 13
Register used:
- tx registers: t2, t3, t4, t5, t6
- ax registers: a2, a3
- sx registers: none
- others: none
Stack used(bytes): 0 bytes
Number of lw and sw: lw=2 sw=2
Number of branch/jump instructions: 1

-Os





























00010290 <runningSum>:
   10290:	ff010113          	addi	sp,sp,-16
   10294:	00812423          	sw	s0,8(sp)
   10298:	00912223          	sw	s1,4(sp)
   1029c:	00058413          	mv	s0,a1
   102a0:	00050493          	mv	s1,a0
   102a4:	00259513          	slli	a0,a1,0x2
   102a8:	00112623          	sw	ra,12(sp)
   102ac:	158000ef          	jal	ra,10404 <malloc>
   102b0:	00805e63          	blez	s0,102cc <runningSum+0x3c>
   102b4:	0004a783          	lw	a5,0(s1)
   102b8:	00050713          	mv	a4,a0
   102bc:	00f52023          	sw	a5,0(a0)
   102c0:	00000793          	li	a5,0
   102c4:	00178793          	addi	a5,a5,1
   102c8:	0087cc63          	blt	a5,s0,102e0 <runningSum+0x50>
   102cc:	00c12083          	lw	ra,12(sp)
   102d0:	00812403          	lw	s0,8(sp)
   102d4:	00412483          	lw	s1,4(sp)
   102d8:	01010113          	addi	sp,sp,16
   102dc:	00008067          	ret
   102e0:	00279693          	slli	a3,a5,0x2
   102e4:	00d486b3          	add	a3,s1,a3
   102e8:	0006a683          	lw	a3,0(a3)
   102ec:	00072603          	lw	a2,0(a4)
   102f0:	00470713          	addi	a4,a4,4
   102f4:	00c686b3          	add	a3,a3,a2
   102f8:	00d72023          	sw	a3,0(a4)
   102fc:	fc9ff06f          	j	102c4 <runningSum+0x34>

Observation

LOC(Line of code): 29
Register used:
- tx registers: none
- ax registers: a0, a1, a2, a3, a4, a5
- sx registers: s0, s1
- others: sp, ra
Stack used(bytes): 16 bytes
Number of lw and sw: lw=6 sw=5
Number of branch/jump instructions: 5

ELF file header:

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           RISC-V
  Version:                           0x1
  Entry point address:               0x101e8
  Start of program headers:          52 (bytes into file)
  Start of section headers:          95232 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         15
  Section header string table index: 14

Size:

   text	   data	    bss	    dec	    hex	filename
  75512	   2816	    812	  79140	  13524	Hw2_s

Execute and Statistics of execution

1 3 6 10 15 
1 4 9 16 25 36 
0 2 6 12 20 30 42 
inferior exit code 0
CSR cycle count: 16503

-O0





















































00010184 <runningSum>:
   10184:	fd010113          	addi	sp,sp,-48
   10188:	02112623          	sw	ra,44(sp)
   1018c:	02812423          	sw	s0,40(sp)
   10190:	03010413          	addi	s0,sp,48
   10194:	fca42e23          	sw	a0,-36(s0)
   10198:	fcb42c23          	sw	a1,-40(s0)
   1019c:	fd842783          	lw	a5,-40(s0)
   101a0:	00279793          	slli	a5,a5,0x2
   101a4:	00078513          	mv	a0,a5
   101a8:	3a8000ef          	jal	ra,10550 <malloc>
   101ac:	00050793          	mv	a5,a0
   101b0:	fef42423          	sw	a5,-24(s0)
   101b4:	fe042623          	sw	zero,-20(s0)
   101b8:	0780006f          	j	10230 <runningSum+0xac>
   101bc:	fec42783          	lw	a5,-20(s0)
   101c0:	00079c63          	bnez	a5,101d8 <runningSum+0x54>
   101c4:	fdc42783          	lw	a5,-36(s0)
   101c8:	0007a703          	lw	a4,0(a5)
   101cc:	fe842783          	lw	a5,-24(s0)
   101d0:	00e7a023          	sw	a4,0(a5)
   101d4:	0500006f          	j	10224 <runningSum+0xa0>
   101d8:	fec42703          	lw	a4,-20(s0)
   101dc:	400007b7          	lui	a5,0x40000
   101e0:	fff78793          	addi	a5,a5,-1 # 3fffffff <__BSS_END__+0x3ffdc1cf>
   101e4:	00f707b3          	add	a5,a4,a5
   101e8:	00279793          	slli	a5,a5,0x2
   101ec:	fe842703          	lw	a4,-24(s0)
   101f0:	00f707b3          	add	a5,a4,a5
   101f4:	0007a683          	lw	a3,0(a5)
   101f8:	fec42783          	lw	a5,-20(s0)
   101fc:	00279793          	slli	a5,a5,0x2
   10200:	fdc42703          	lw	a4,-36(s0)
   10204:	00f707b3          	add	a5,a4,a5
   10208:	0007a703          	lw	a4,0(a5)
   1020c:	fec42783          	lw	a5,-20(s0)
   10210:	00279793          	slli	a5,a5,0x2
   10214:	fe842603          	lw	a2,-24(s0)
   10218:	00f607b3          	add	a5,a2,a5
   1021c:	00e68733          	add	a4,a3,a4
   10220:	00e7a023          	sw	a4,0(a5)
   10224:	fec42783          	lw	a5,-20(s0)
   10228:	00178793          	addi	a5,a5,1
   1022c:	fef42623          	sw	a5,-20(s0)
   10230:	fec42703          	lw	a4,-20(s0)
   10234:	fd842783          	lw	a5,-40(s0)
   10238:	f8f742e3          	blt	a4,a5,101bc <runningSum+0x38>
   1023c:	fe842783          	lw	a5,-24(s0)
   10240:	00078513          	mv	a0,a5
   10244:	02c12083          	lw	ra,44(sp)
   10248:	02812403          	lw	s0,40(sp)
   1024c:	03010113          	addi	sp,sp,48
   10250:	00008067          	ret

Observation

LOC(Line of code): 53
Register used:
- tx registers: none
- ax registers: a0, a1, a2, a3, a4, a5
- sx registers: s0
- others: sp, ra, zero
Stack used(bytes): 48 bytes
Number of lw and sw: lw=19 sw=9
Number of branch/jump instructions: 6

ELF file header:

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           RISC-V
  Version:                           0x1
  Entry point address:               0x100dc
  Start of program headers:          52 (bytes into file)
  Start of section headers:          95216 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         15
  Section header string table index: 14

Size:

   text	   data	    bss	    dec	    hex	filename
  75844	   2816	    812	  79472	  13670	Hw2_0

Execute and Statistics of execution

1 3 6 10 15 
1 4 9 16 25 36 
0 2 6 12 20 30 42 
inferior exit code 0
CSR cycle count: 16889

-O1































00010184 <runningSum>:
   10184:	ff010113          	addi	sp,sp,-16
   10188:	00112623          	sw	ra,12(sp)
   1018c:	00812423          	sw	s0,8(sp)
   10190:	00912223          	sw	s1,4(sp)
   10194:	00050493          	mv	s1,a0
   10198:	00058413          	mv	s0,a1
   1019c:	00259513          	slli	a0,a1,0x2
   101a0:	2e0000ef          	jal	ra,10480 <malloc>
   101a4:	04805263          	blez	s0,101e8 <runningSum+0x64>
   101a8:	00050713          	mv	a4,a0
   101ac:	00048693          	mv	a3,s1
   101b0:	00000793          	li	a5,0
   101b4:	01c0006f          	j	101d0 <runningSum+0x4c>
   101b8:	0004a603          	lw	a2,0(s1)
   101bc:	00c52023          	sw	a2,0(a0)
   101c0:	00178793          	addi	a5,a5,1
   101c4:	00470713          	addi	a4,a4,4
   101c8:	00468693          	addi	a3,a3,4
   101cc:	00f40e63          	beq	s0,a5,101e8 <runningSum+0x64>
   101d0:	fe0784e3          	beqz	a5,101b8 <runningSum+0x34>
   101d4:	ffc72603          	lw	a2,-4(a4)
   101d8:	0006a583          	lw	a1,0(a3)
   101dc:	00b60633          	add	a2,a2,a1
   101e0:	00c72023          	sw	a2,0(a4)
   101e4:	fddff06f          	j	101c0 <runningSum+0x3c>
   101e8:	00c12083          	lw	ra,12(sp)
   101ec:	00812403          	lw	s0,8(sp)
   101f0:	00412483          	lw	s1,4(sp)
   101f4:	01010113          	addi	sp,sp,16
   101f8:	00008067          	ret

Observation

LOC(Line of code): 31
Register used:
- tx registers: none
- ax registers: a0, a1, a2, a3, a4, a5
- sx registers: s0, s1
- others: sp, ra
Stack used(bytes): 16 bytes
Number of lw and sw: lw=6 sw=5
Number of branch/jump instructions: 7

ELF file header:

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           RISC-V
  Version:                           0x1
  Entry point address:               0x100dc
  Start of program headers:          52 (bytes into file)
  Start of section headers:          95216 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         15
  Section header string table index: 14

Size:

   text	   data	    bss	    dec	    hex	filename
  75636	   2816	    812	  79264	  135a0	Hw2_1

Execute and Statistics of execution

1 3 6 10 15 
1 4 9 16 25 36 
0 2 6 12 20 30 42 
inferior exit code 0
CSR cycle count: 16406

-O2





























000102f0 <runningSum>:
   102f0:	ff010113          	addi	sp,sp,-16
   102f4:	00812423          	sw	s0,8(sp)
   102f8:	00912223          	sw	s1,4(sp)
   102fc:	00050413          	mv	s0,a0
   10300:	00058493          	mv	s1,a1
   10304:	00259513          	slli	a0,a1,0x2
   10308:	00112623          	sw	ra,12(sp)
   1030c:	158000ef          	jal	ra,10464 <malloc>
   10310:	02905e63          	blez	s1,1034c <runningSum+0x5c>
   10314:	00042683          	lw	a3,0(s0)
   10318:	00040713          	mv	a4,s0
   1031c:	00050793          	mv	a5,a0
   10320:	00000613          	li	a2,0
   10324:	00d52023          	sw	a3,0(a0)
   10328:	0140006f          	j	1033c <runningSum+0x4c>
   1032c:	ffc7a683          	lw	a3,-4(a5)
   10330:	00072803          	lw	a6,0(a4)
   10334:	010686b3          	add	a3,a3,a6
   10338:	00d7a023          	sw	a3,0(a5)
   1033c:	00160613          	addi	a2,a2,1
   10340:	00478793          	addi	a5,a5,4
   10344:	00470713          	addi	a4,a4,4
   10348:	fec492e3          	bne	s1,a2,1032c <runningSum+0x3c>
   1034c:	00c12083          	lw	ra,12(sp)
   10350:	00812403          	lw	s0,8(sp)
   10354:	00412483          	lw	s1,4(sp)
   10358:	01010113          	addi	sp,sp,16
   1035c:	00008067          	ret

Observation

LOC(Line of code): 29
Register used:
- tx registers: none
- ax registers: a0, a1, a2, a3, a4, a5, a6
- sx registers: s0, s1
- others: sp, ra
Stack used(bytes): 16 bytes
Number of lw and sw: lw=6 sw=5
Number of branch/jump instructions: 5

ELF file header:

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           RISC-V
  Version:                           0x1
  Entry point address:               0x10248
  Start of program headers:          52 (bytes into file)
  Start of section headers:          95232 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         15
  Section header string table index: 14

Size:

   text	   data	    bss	    dec	    hex	filename
  75608	   2816	    812	  79236	  13584	Hw2_2

Execute and Statistics of execution

1 3 6 10 15 
1 4 9 16 25 36 
0 2 6 12 20 30 42 
inferior exit code 0
CSR cycle count: 16401

-Ofast






























000102f0 <runningSum>:
   102f0:	ff010113          	addi	sp,sp,-16
   102f4:	00812423          	sw	s0,8(sp)
   102f8:	00912223          	sw	s1,4(sp)
   102fc:	00050413          	mv	s0,a0
   10300:	00058493          	mv	s1,a1
   10304:	00259513          	slli	a0,a1,0x2
   10308:	00112623          	sw	ra,12(sp)
   1030c:	15c000ef          	jal	ra,10468 <malloc>
   10310:	04905063          	blez	s1,10350 <runningSum+0x60>
   10314:	00042683          	lw	a3,0(s0)
   10318:	00100713          	li	a4,1
   1031c:	00450793          	addi	a5,a0,4
   10320:	00d52023          	sw	a3,0(a0)
   10324:	00440693          	addi	a3,s0,4
   10328:	02e48463          	beq	s1,a4,10350 <runningSum+0x60>
   1032c:	00100613          	li	a2,1
   10330:	ffc7a703          	lw	a4,-4(a5)
   10334:	0006a803          	lw	a6,0(a3)
   10338:	00478793          	addi	a5,a5,4
   1033c:	00160613          	addi	a2,a2,1
   10340:	01070733          	add	a4,a4,a6
   10344:	fee7ae23          	sw	a4,-4(a5)
   10348:	00468693          	addi	a3,a3,4
   1034c:	fec492e3          	bne	s1,a2,10330 <runningSum+0x40>
   10350:	00c12083          	lw	ra,12(sp)
   10354:	00812403          	lw	s0,8(sp)
   10358:	00412483          	lw	s1,4(sp)
   1035c:	01010113          	addi	sp,sp,16
   10360:	00008067          	ret

Observation

LOC(Line of code): 30
Register used:
- tx registers: none
- ax registers: a0, a1, a2, a3, a4, a5, a6
- sx registers: s0, s1
- others: sp, ra
Stack used(bytes): 16 bytes
Number of lw and sw: lw=6 sw=5
Number of branch/jump instructions: 5

ELF file header:

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           RISC-V
  Version:                           0x1
  Entry point address:               0x10248
  Start of program headers:          52 (bytes into file)
  Start of section headers:          95232 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         15
  Section header string table index: 14

Size:

   text	   data	    bss	    dec	    hex	filename
  75612	   2816	    812	  79240	  13588	Hw2_f

Execute and Statistics of execution

1 3 6 10 15 
1 4 9 16 25 36 
0 2 6 12 20 30 42 
inferior exit code 0
CSR cycle count: 16395

Summary: Observed Compiler Behavior

From -O0 to -O1
- Additional usage of s1 which significantly decreases lw/sw count and stack space.
From -O1 to -O2
- Unroll the loop to avoid branch misprediction.
From -O2 to -Ofast
- Nearly identical.

	-Os	-O0	-O1	-O2	-Ofast
Line of Code	29	53	31	29	30
Register used	10	10	10	11	11
Stack used (bytes)	16	48	16	16	16
lw/sw count	11	28	11	11	11
branch/jump count	5	6	7	5	5
CSR cycle count	16503	16889	16406	16401	16395

Size among different optimization levels:
While the data and bss section sizes share the same value across all different optimization levels, we can obtain the tendency in other sections.
Cycle count among different optimization levels:

Assignment2: RISC-V Toolchain

tags: RISC-V Computer Architecture 2022

Problem

Example 1:

Example 2:

Example 3:

Solution

Implementation

C code

Assembly code

Environment setup

Instructions

Compare Assembly code

Handwritten Assembly

Observation

-Os

Observation

-O0

Observation

-O1

Observation

-O2

Observation

-Ofast

Observation

Summary: Observed Compiler Behavior

Read more

Assignment3: SoftCPU

Assignment1: RISC-V Assembly and Instruction Pipeline

tags: `RISC-V` `Computer Architecture 2022`