Try   HackMD

Assignment2: RISC-V Toolchain

contributed by < dhiptmc >

tags: RISC-V Computer Architecture 2022

Problem

Problem was chosen from the work done by 王昱承 in Assignment 1.

Given an array nums. We define a running sum of an array as runningSum[i] = sum(nums[0]…nums[i]).

Return the running sum of nums.

  • Motivation:
    In the last assignment, I spent lots of time dealing with arrays.
    I would like to examine whether I can make use of the experiences learnt from previous attempts.

Example 1:

Input: nums = [1,2,3,4]
Output: [1,3,6,10]
Explanation: Running sum is obtained as follows: [1, 1+2, 1+2+3, 1+2+3+4].

Example 2:

Input: nums = [1,1,1,1,1]
Output: [1,2,3,4,5]
Explanation: Running sum is obtained as follows: [1, 1+1, 1+1+1, 1+1+1+1, 1+1+1+1+1].

Example 3:

Input: nums = [3,1,2,10,1]
Output: [3,4,6,16,17]

Solution

The solution is based on the work done by 王昱承.

Initially, pass the first element, i.e., nums[0], to result[0]. Add the result[i-1] element with nums[i], then pass the value to the result.

Return the result array eventually.

Implementation

C code

#include <stdlib.h> #include <stdio.h> int* runningSum(int* nums, int numsSize){ int* result = (int*)malloc(sizeof(int)*numsSize); for(int i=0; i<numsSize; i++){ if(i==0) result[0] = nums[0]; else result[i] = result[i-1] + nums[i]; } return result; } int main(int argc, char *argv[]){ int nums1[5] = {1,2,3,4,5}; int nums2[6] = {1,3,5,7,9,11}; int nums3[7] = {0,2,4,6,8,10,12}; int* result1 = runningSum(nums1, 5); int* result2 = runningSum(nums2, 6); int* result3 = runningSum(nums3, 7); for(int j=0; j<5; j++){ printf("%d ", result1[j]); } printf("\n"); for(int j=0; j<6; j++){ printf("%d ", result2[j]); } printf("\n"); for(int j=0; j<7; j++){ printf("%d ", result3[j]); } printf("\n"); }

Assembly code

.data nums1: .word 1,2,3,4,5 nums2: .word 1,3,5,7,9,11 nums3: .word 0,2,4,6,8,10,12 res1: .word 0,0,0,0,0, res2: .word 0,0,0,0,0,0 res3: .word 0,0,0,0,0,0,0 space: .string " " nl: .string "\n" .text main: # t2: nums head, t3: result head, t4: numsSize la t2, nums1 # load arr base to t2 la t3, res1 # load res base to t3 li t4, 5 # load the numSize to t4 jal ra, runningSum la t2, nums2 # load arr base to t2 la t3, res2 # load res base to t3 li t4, 6 # load the numSize to t4 jal ra, runningSum la t2, nums3 # load arr base to t2 la t3, res3 # load res base to t3 li t4, 7 # load the numSize to t4 jal ra, runningSum j exit runningSum: # t5: i li t5, 1 # load i with 1 lw a3, 0(t2) # load nums[0] to a3 sw a3, 0(t3) # save num[0] to result[0] end.L1: slli a0, t5, 2 # i*4 (cause of byte size) add a1, t3, a0 # add result's address with i*4 to get result[i]'s address lw a3, -4(a1) # load result[i-1] to a3 add a2, t2, a0 # add nums's address with i*4 to get nums[i]'s address lw a2, 0(a2) # load nums[i] to a5 add a3, a3, a2 # add result[i-1] + nums[i] sw a3, 0(a1) # save to result[i] addi t5, t5, 1 # i = i + 1 blt t5, t4, end.L1 # if i < numsSize then branch to end.L1 li t5, 0 print: slli a1, t5, 2 add a1, t3, a1 lw a0, 0(a1) # load result out li a7, 1 # pint a0 ecall la a0, space # load space li a7, 4 # print string ecall addi t5, t5, 1 # j++ blt t5, t4, print # loop la a0, nl # load next line li a7, 4 # print string ecall jr ra exit: li a7, 10 # exit ecall

There are possible ways to use fewer instructions.
Modified version of Assembly code is shown below.

.data nums1: .word 1,2,3,4,5 nums2: .word 1,3,5,7,9,11 nums3: .word 0,2,4,6,8,10,12 res1: .word 0,0,0,0,0, res2: .word 0,0,0,0,0,0 res3: .word 0,0,0,0,0,0,0 space: .string " " nl: .string "\n" .text main: # t2: nums head, t3: result head, t4: numsSize la t2, nums1 # load arr base to t2 la t3, res1 # load res base to t3 li t4, 5 # load the numSize to t4 jal ra, runningSum la t2, nums2 # load arr base to t2 la t3, res2 # load res base to t3 li t4, 6 # load the numSize to t4 jal ra, runningSum la t2, nums3 # load arr base to t2 la t3, res3 # load res base to t3 li t4, 7 # load the numSize to t4 jal ra, runningSum j exit runningSum: # t5: i, t6: current result addr add t6, t3, zero li t5, 1 # load i with 1 lw a3, 0(t2) # load nums[0] to a3 sw a3, 0(t6) # save nums[0] to result[0] loop: addi t2, t2, 4 # nums[i] addr lw a2, 0(t2) # nums[i] add a3, a3, a2 # add result[i-1] + nums[i] addi t6, t6, 4 # result[i] addi t5, t5, 1 # i = i + 1 sw a3, 0(t6) # save result[0] + nums[1] to result[1] blt t5, t4, loop # if i < numsSize then branch to loop print: lw a0, 0(t3) # load result out li a7, 1 # pint a0 ecall la a0, space # load space li a7, 4 # print string ecall addi t3, t3, 4 addi t5, t5, -1 # i-- blt zero, t5, print la a0, nl # load next line li a7, 4 # print string ecall jr ra exit: li a7, 10 # exit ecall

As a result, clock cycle as the figure shown below decreases compared to the original work.

Origin:

After modified:

Environment setup

Following the instructions of Lab2: RISC-V RV32I[MACF] emulator with ELF support,

$ git clone https://github.com/sysprog21/rv32emu
$ cd rv32emu
$ make

however, there is an error occured:

/bin/sh: 1: cc: not found
/bin/sh: 1: cc: not found
  CC    build/map.o
make: cc: No such file or directory
make: *** [Makefile:110: build/map.o] Error 127

Credit to TA(yucheng871011@gmail.com) who help solving this problem for me. That is, some tools should be installed first.

$ sudo apt install build-essential

Instructions

  • Compile
$ riscv-none-elf-gcc -march=rv32i -mabi=ilp32 -compilation options(e.g., -Ofast, -Os, etc.) -o output input.c
  • Objdump
$ riscv-none-elf-objdump -d output > output.txt
  • Readelf
$ riscv-none-elf-readelf -h output
  • Size
$ riscv-none-elf-size output
  • Execute and Statistics of execution
~/rv32emu$ build/rv32emu --stats ./Hw2/output

Compare Assembly code

In this section, C code for the problem is compiled with different compiler optimization level (-Os to -Ofast).
The assembly code is lengthy and hard to read. We focus on the main section of the code to better demonstrate the effect of compiler optimization.

Handwritten Assembly

runningSum: # t5: i, t6: current result addr add t6, t3, zero li t5, 1 # load i with 1 lw a3, 0(t2) # load nums[0] to a3 sw a3, 0(t6) # save nums[0] to result[0] loop: addi t2, t2, 4 # nums[i] addr lw a2, 0(t2) # nums[i] add a3, a3, a2 # add result[i-1] + nums[i] addi t6, t6, 4 # result[i] addi t5, t5, 1 # i = i + 1 sw a3, 0(t6) # save result[0] + nums[1] to result[1] blt t5, t4, loop # if i < numsSize then branch to loop

Observation

  • LOC(Line of code): 13
  • Register used:
    • tx registers: t2, t3, t4, t5, t6
    • ax registers: a2, a3
    • sx registers: none
    • others: none
  • Stack used(bytes): 0 bytes
  • Number of lw and sw: lw=2 sw=2
  • Number of branch/jump instructions: 1

-Os

00010290 <runningSum>: 10290: ff010113 addi sp,sp,-16 10294: 00812423 sw s0,8(sp) 10298: 00912223 sw s1,4(sp) 1029c: 00058413 mv s0,a1 102a0: 00050493 mv s1,a0 102a4: 00259513 slli a0,a1,0x2 102a8: 00112623 sw ra,12(sp) 102ac: 158000ef jal ra,10404 <malloc> 102b0: 00805e63 blez s0,102cc <runningSum+0x3c> 102b4: 0004a783 lw a5,0(s1) 102b8: 00050713 mv a4,a0 102bc: 00f52023 sw a5,0(a0) 102c0: 00000793 li a5,0 102c4: 00178793 addi a5,a5,1 102c8: 0087cc63 blt a5,s0,102e0 <runningSum+0x50> 102cc: 00c12083 lw ra,12(sp) 102d0: 00812403 lw s0,8(sp) 102d4: 00412483 lw s1,4(sp) 102d8: 01010113 addi sp,sp,16 102dc: 00008067 ret 102e0: 00279693 slli a3,a5,0x2 102e4: 00d486b3 add a3,s1,a3 102e8: 0006a683 lw a3,0(a3) 102ec: 00072603 lw a2,0(a4) 102f0: 00470713 addi a4,a4,4 102f4: 00c686b3 add a3,a3,a2 102f8: 00d72023 sw a3,0(a4) 102fc: fc9ff06f j 102c4 <runningSum+0x34>

Observation

  • LOC(Line of code): 29
  • Register used:
    • tx registers: none
    • ax registers: a0, a1, a2, a3, a4, a5
    • sx registers: s0, s1
    • others: sp, ra
  • Stack used(bytes): 16 bytes
  • Number of lw and sw: lw=6 sw=5
  • Number of branch/jump instructions: 5
  1. ELF file header:
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           RISC-V
  Version:                           0x1
  Entry point address:               0x101e8
  Start of program headers:          52 (bytes into file)
  Start of section headers:          95232 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         15
  Section header string table index: 14
  1. Size:
   text	   data	    bss	    dec	    hex	filename
  75512	   2816	    812	  79140	  13524	Hw2_s
  1. Execute and Statistics of execution
1 3 6 10 15 
1 4 9 16 25 36 
0 2 6 12 20 30 42 
inferior exit code 0
CSR cycle count: 16503

-O0

00010184 <runningSum>: 10184: fd010113 addi sp,sp,-48 10188: 02112623 sw ra,44(sp) 1018c: 02812423 sw s0,40(sp) 10190: 03010413 addi s0,sp,48 10194: fca42e23 sw a0,-36(s0) 10198: fcb42c23 sw a1,-40(s0) 1019c: fd842783 lw a5,-40(s0) 101a0: 00279793 slli a5,a5,0x2 101a4: 00078513 mv a0,a5 101a8: 3a8000ef jal ra,10550 <malloc> 101ac: 00050793 mv a5,a0 101b0: fef42423 sw a5,-24(s0) 101b4: fe042623 sw zero,-20(s0) 101b8: 0780006f j 10230 <runningSum+0xac> 101bc: fec42783 lw a5,-20(s0) 101c0: 00079c63 bnez a5,101d8 <runningSum+0x54> 101c4: fdc42783 lw a5,-36(s0) 101c8: 0007a703 lw a4,0(a5) 101cc: fe842783 lw a5,-24(s0) 101d0: 00e7a023 sw a4,0(a5) 101d4: 0500006f j 10224 <runningSum+0xa0> 101d8: fec42703 lw a4,-20(s0) 101dc: 400007b7 lui a5,0x40000 101e0: fff78793 addi a5,a5,-1 # 3fffffff <__BSS_END__+0x3ffdc1cf> 101e4: 00f707b3 add a5,a4,a5 101e8: 00279793 slli a5,a5,0x2 101ec: fe842703 lw a4,-24(s0) 101f0: 00f707b3 add a5,a4,a5 101f4: 0007a683 lw a3,0(a5) 101f8: fec42783 lw a5,-20(s0) 101fc: 00279793 slli a5,a5,0x2 10200: fdc42703 lw a4,-36(s0) 10204: 00f707b3 add a5,a4,a5 10208: 0007a703 lw a4,0(a5) 1020c: fec42783 lw a5,-20(s0) 10210: 00279793 slli a5,a5,0x2 10214: fe842603 lw a2,-24(s0) 10218: 00f607b3 add a5,a2,a5 1021c: 00e68733 add a4,a3,a4 10220: 00e7a023 sw a4,0(a5) 10224: fec42783 lw a5,-20(s0) 10228: 00178793 addi a5,a5,1 1022c: fef42623 sw a5,-20(s0) 10230: fec42703 lw a4,-20(s0) 10234: fd842783 lw a5,-40(s0) 10238: f8f742e3 blt a4,a5,101bc <runningSum+0x38> 1023c: fe842783 lw a5,-24(s0) 10240: 00078513 mv a0,a5 10244: 02c12083 lw ra,44(sp) 10248: 02812403 lw s0,40(sp) 1024c: 03010113 addi sp,sp,48 10250: 00008067 ret

Observation

  • LOC(Line of code): 53
  • Register used:
    • tx registers: none
    • ax registers: a0, a1, a2, a3, a4, a5
    • sx registers: s0
    • others: sp, ra, zero
  • Stack used(bytes): 48 bytes
  • Number of lw and sw: lw=19 sw=9
  • Number of branch/jump instructions: 6
  1. ELF file header:
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           RISC-V
  Version:                           0x1
  Entry point address:               0x100dc
  Start of program headers:          52 (bytes into file)
  Start of section headers:          95216 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         15
  Section header string table index: 14
  1. Size:
   text	   data	    bss	    dec	    hex	filename
  75844	   2816	    812	  79472	  13670	Hw2_0
  1. Execute and Statistics of execution
1 3 6 10 15 
1 4 9 16 25 36 
0 2 6 12 20 30 42 
inferior exit code 0
CSR cycle count: 16889

-O1

00010184 <runningSum>: 10184: ff010113 addi sp,sp,-16 10188: 00112623 sw ra,12(sp) 1018c: 00812423 sw s0,8(sp) 10190: 00912223 sw s1,4(sp) 10194: 00050493 mv s1,a0 10198: 00058413 mv s0,a1 1019c: 00259513 slli a0,a1,0x2 101a0: 2e0000ef jal ra,10480 <malloc> 101a4: 04805263 blez s0,101e8 <runningSum+0x64> 101a8: 00050713 mv a4,a0 101ac: 00048693 mv a3,s1 101b0: 00000793 li a5,0 101b4: 01c0006f j 101d0 <runningSum+0x4c> 101b8: 0004a603 lw a2,0(s1) 101bc: 00c52023 sw a2,0(a0) 101c0: 00178793 addi a5,a5,1 101c4: 00470713 addi a4,a4,4 101c8: 00468693 addi a3,a3,4 101cc: 00f40e63 beq s0,a5,101e8 <runningSum+0x64> 101d0: fe0784e3 beqz a5,101b8 <runningSum+0x34> 101d4: ffc72603 lw a2,-4(a4) 101d8: 0006a583 lw a1,0(a3) 101dc: 00b60633 add a2,a2,a1 101e0: 00c72023 sw a2,0(a4) 101e4: fddff06f j 101c0 <runningSum+0x3c> 101e8: 00c12083 lw ra,12(sp) 101ec: 00812403 lw s0,8(sp) 101f0: 00412483 lw s1,4(sp) 101f4: 01010113 addi sp,sp,16 101f8: 00008067 ret

Observation

  • LOC(Line of code): 31
  • Register used:
    • tx registers: none
    • ax registers: a0, a1, a2, a3, a4, a5
    • sx registers: s0, s1
    • others: sp, ra
  • Stack used(bytes): 16 bytes
  • Number of lw and sw: lw=6 sw=5
  • Number of branch/jump instructions: 7
  1. ELF file header:
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           RISC-V
  Version:                           0x1
  Entry point address:               0x100dc
  Start of program headers:          52 (bytes into file)
  Start of section headers:          95216 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         15
  Section header string table index: 14
  1. Size:
   text	   data	    bss	    dec	    hex	filename
  75636	   2816	    812	  79264	  135a0	Hw2_1
  1. Execute and Statistics of execution
1 3 6 10 15 
1 4 9 16 25 36 
0 2 6 12 20 30 42 
inferior exit code 0
CSR cycle count: 16406

-O2

000102f0 <runningSum>: 102f0: ff010113 addi sp,sp,-16 102f4: 00812423 sw s0,8(sp) 102f8: 00912223 sw s1,4(sp) 102fc: 00050413 mv s0,a0 10300: 00058493 mv s1,a1 10304: 00259513 slli a0,a1,0x2 10308: 00112623 sw ra,12(sp) 1030c: 158000ef jal ra,10464 <malloc> 10310: 02905e63 blez s1,1034c <runningSum+0x5c> 10314: 00042683 lw a3,0(s0) 10318: 00040713 mv a4,s0 1031c: 00050793 mv a5,a0 10320: 00000613 li a2,0 10324: 00d52023 sw a3,0(a0) 10328: 0140006f j 1033c <runningSum+0x4c> 1032c: ffc7a683 lw a3,-4(a5) 10330: 00072803 lw a6,0(a4) 10334: 010686b3 add a3,a3,a6 10338: 00d7a023 sw a3,0(a5) 1033c: 00160613 addi a2,a2,1 10340: 00478793 addi a5,a5,4 10344: 00470713 addi a4,a4,4 10348: fec492e3 bne s1,a2,1032c <runningSum+0x3c> 1034c: 00c12083 lw ra,12(sp) 10350: 00812403 lw s0,8(sp) 10354: 00412483 lw s1,4(sp) 10358: 01010113 addi sp,sp,16 1035c: 00008067 ret

Observation

  • LOC(Line of code): 29
  • Register used:
    • tx registers: none
    • ax registers: a0, a1, a2, a3, a4, a5, a6
    • sx registers: s0, s1
    • others: sp, ra
  • Stack used(bytes): 16 bytes
  • Number of lw and sw: lw=6 sw=5
  • Number of branch/jump instructions: 5
  1. ELF file header:
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           RISC-V
  Version:                           0x1
  Entry point address:               0x10248
  Start of program headers:          52 (bytes into file)
  Start of section headers:          95232 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         15
  Section header string table index: 14
  1. Size:
   text	   data	    bss	    dec	    hex	filename
  75608	   2816	    812	  79236	  13584	Hw2_2
  1. Execute and Statistics of execution
1 3 6 10 15 
1 4 9 16 25 36 
0 2 6 12 20 30 42 
inferior exit code 0
CSR cycle count: 16401

-Ofast

000102f0 <runningSum>: 102f0: ff010113 addi sp,sp,-16 102f4: 00812423 sw s0,8(sp) 102f8: 00912223 sw s1,4(sp) 102fc: 00050413 mv s0,a0 10300: 00058493 mv s1,a1 10304: 00259513 slli a0,a1,0x2 10308: 00112623 sw ra,12(sp) 1030c: 15c000ef jal ra,10468 <malloc> 10310: 04905063 blez s1,10350 <runningSum+0x60> 10314: 00042683 lw a3,0(s0) 10318: 00100713 li a4,1 1031c: 00450793 addi a5,a0,4 10320: 00d52023 sw a3,0(a0) 10324: 00440693 addi a3,s0,4 10328: 02e48463 beq s1,a4,10350 <runningSum+0x60> 1032c: 00100613 li a2,1 10330: ffc7a703 lw a4,-4(a5) 10334: 0006a803 lw a6,0(a3) 10338: 00478793 addi a5,a5,4 1033c: 00160613 addi a2,a2,1 10340: 01070733 add a4,a4,a6 10344: fee7ae23 sw a4,-4(a5) 10348: 00468693 addi a3,a3,4 1034c: fec492e3 bne s1,a2,10330 <runningSum+0x40> 10350: 00c12083 lw ra,12(sp) 10354: 00812403 lw s0,8(sp) 10358: 00412483 lw s1,4(sp) 1035c: 01010113 addi sp,sp,16 10360: 00008067 ret

Observation

  • LOC(Line of code): 30
  • Register used:
    • tx registers: none
    • ax registers: a0, a1, a2, a3, a4, a5, a6
    • sx registers: s0, s1
    • others: sp, ra
  • Stack used(bytes): 16 bytes
  • Number of lw and sw: lw=6 sw=5
  • Number of branch/jump instructions: 5
  1. ELF file header:
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           RISC-V
  Version:                           0x1
  Entry point address:               0x10248
  Start of program headers:          52 (bytes into file)
  Start of section headers:          95232 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         15
  Section header string table index: 14
  1. Size:
   text	   data	    bss	    dec	    hex	filename
  75612	   2816	    812	  79240	  13588	Hw2_f
  1. Execute and Statistics of execution
1 3 6 10 15 
1 4 9 16 25 36 
0 2 6 12 20 30 42 
inferior exit code 0
CSR cycle count: 16395

Summary: Observed Compiler Behavior

  • From -O0 to -O1
    • Additional usage of s1 which significantly decreases lw/sw count and stack space.
  • From -O1 to -O2
    • Unroll the loop to avoid branch misprediction.
  • From -O2 to -Ofast
    • Nearly identical.
-Os -O0 -O1 -O2 -Ofast
Line of Code 29 53 31 29 30
Register used 10 10 10 11 11
Stack used (bytes) 16 48 16 16 16
lw/sw count 11 28 11 11 11
branch/jump count 5 6 7 5 5
CSR cycle count 16503 16889 16406 16401 16395
  • Size among different optimization levels:
    While the data and bss section sizes share the same value across all different optimization levels, we can obtain the tendency in other sections.



  • Cycle count among different optimization levels: