# Assignment2: Complete Applications contributed by < [RayYang421](https://github.com/RayYang421/ca2025-Homework2) > ## RV32 emu ### ``` $ source $HOME/riscv-none-elf-gcc/setenv ``` ### Linker **OUTPUT_ARCH( "riscv" )** Specifies that the output ELF file is built for the RISC-V architecture. **ENTRY(_start)** * Defines the program entry point — the address where execution begins. * _start is usually defined in start.S, and it sets up the stack and initializes memory before calling main. **SECTIONS** The main part of the script. It tells the linker how to arrange code and data in memory. | Section | Purpose | Notes | | -------- | ----------------------- | -------------------------- | | `.text` | Program code | Starts at 0x10000 | | `.data` | Initialized variables | Loaded from ELF | | `.bss` | Uninitialized variables | Zeroed at startup | | `.stack` | Stack area | 4 KB, top at `__stack_top` | **gcc compiler** | Option | Main Goal | Code Size | Execution Speed | Description | | :--------: | :--------------------------- | :-------: | :---------------------: | :----------------------------------------------------------------- | | **-O0** | No optimization | Largest | Slowest | Keeps original structure; best for debugging | | **-O1** | Basic optimization | Smaller | Faster | Removes unnecessary code; slight speed-up | | **-O2** | Balanced optimization | Medium | Fast | Common for release builds; safe and efficient | | **-O3** | Maximum performance | Larger | Faster | Enables aggressive optimizations (e.g., inlining, vectorization) | | **-Ofast** | Extreme performance | Larger | Fastest | Ignores strict standards (e.g., floating-point precision) | | **-Os** | Optimize for size | Smallest | Fast or slightly faster | Disables size-increasing optimizations; ideal for embedded systems | | **-Oz** | Minimize size (mainly Clang) | Smallest | Slightly slower | More aggressive size reduction than `-Os` | ## Problem 1 ### objdump ``` $ riscv-none-elf-objdump -d test.elf ``` #### Result :::spoiler ``` test.elf: file format elf32-littleriscv Disassembly of section .text: 00010000 <main>: 10000: 168000ef jal 10168 <test> 10004: 00154513 xori a0,a0,1 10008: 1e00006f j 101e8 <Exit> 0001000c <clz>: 1000c: 00000313 li t1,0 10010: 00001e37 lui t3,0x1 10014: 01c533b3 sltu t2,a0,t3 10018: 00439393 slli t2,t2,0x4 1001c: 00730333 add t1,t1,t2 10020: 00751533 sll a0,a0,t2 10024: 01000e37 lui t3,0x1000 10028: 01c533b3 sltu t2,a0,t3 1002c: 00339393 slli t2,t2,0x3 10030: 00730333 add t1,t1,t2 10034: 00751533 sll a0,a0,t2 10038: 10000e37 lui t3,0x10000 1003c: 01c533b3 sltu t2,a0,t3 10040: 00239393 slli t2,t2,0x2 10044: 00730333 add t1,t1,t2 10048: 00751533 sll a0,a0,t2 1004c: 40000e37 lui t3,0x40000 10050: 01c533b3 sltu t2,a0,t3 10054: 00139393 slli t2,t2,0x1 10058: 00730333 add t1,t1,t2 1005c: 00751533 sll a0,a0,t2 10060: 80000e37 lui t3,0x80000 10064: 01c533b3 sltu t2,a0,t3 10068: 00730333 add t1,t1,t2 1006c: 00751533 sll a0,a0,t2 10070: 00153e13 seqz t3,a0 10074: 01c30333 add t1,t1,t3 10078: 00030513 mv a0,t1 1007c: 00008067 ret 00010080 <uf8_encode>: 10080: 01000293 li t0,16 10084: 0a556a63 bltu a0,t0,10138 <encode_done> 10088: ff810113 addi sp,sp,-8 1008c: 00112223 sw ra,4(sp) 10090: 00812023 sw s0,0(sp) 10094: 00050413 mv s0,a0 10098: f75ff0ef jal 1000c <clz> 1009c: 01f00293 li t0,31 100a0: 40a282b3 sub t0,t0,a0 100a4: 00000313 li t1,0 100a8: 00000393 li t2,0 100ac: 00500e13 li t3,5 100b0: 05c2c463 blt t0,t3,100f8 <find_exact_exponent> 100b4: ffc28313 addi t1,t0,-4 100b8: 00f00e13 li t3,15 100bc: 006e4463 blt t3,t1,100c4 <clamp_exponent> 100c0: 0080006f j 100c8 <calc_overflow> 000100c4 <clamp_exponent>: 100c4: 000e0313 mv t1,t3 000100c8 <calc_overflow>: 100c8: 00000e93 li t4,0 000100cc <calc_overflow_loop>: 100cc: 006eda63 bge t4,t1,100e0 <adjust_down> 100d0: 00139393 slli t2,t2,0x1 100d4: 01038393 addi t2,t2,16 100d8: 001e8e93 addi t4,t4,1 100dc: ff1ff06f j 100cc <calc_overflow_loop> 000100e0 <adjust_down>: 100e0: 00605c63 blez t1,100f8 <find_exact_exponent> 100e4: 00745a63 bge s0,t2,100f8 <find_exact_exponent> 100e8: ff038393 addi t2,t2,-16 100ec: 0013d393 srli t2,t2,0x1 100f0: fff30313 addi t1,t1,-1 100f4: fedff06f j 100e0 <adjust_down> 000100f8 <find_exact_exponent>: 100f8: 00f00e13 li t3,15 100fc: 01c35e63 bge t1,t3,10118 <final_calc> 10100: 00139e93 slli t4,t2,0x1 10104: 010e8e93 addi t4,t4,16 10108: 01d44863 blt s0,t4,10118 <final_calc> 1010c: 000e8393 mv t2,t4 10110: 00130313 addi t1,t1,1 10114: fe5ff06f j 100f8 <find_exact_exponent> 00010118 <final_calc>: 10118: 40740f33 sub t5,s0,t2 1011c: 006f5f33 srl t5,t5,t1 10120: 00431313 slli t1,t1,0x4 10124: 01e36533 or a0,t1,t5 10128: 00012403 lw s0,0(sp) 1012c: 00412083 lw ra,4(sp) 10130: 00810113 addi sp,sp,8 10134: 00008067 ret 00010138 <encode_done>: 10138: 00008067 ret 0001013c <uf8_decode>: 1013c: 00f57293 andi t0,a0,15 10140: 00455313 srli t1,a0,0x4 10144: 00f00393 li t2,15 10148: 406383b3 sub t2,t2,t1 1014c: 00008e37 lui t3,0x8 10150: fffe0e13 addi t3,t3,-1 # 7fff <main-0x8001> 10154: 007e53b3 srl t2,t3,t2 10158: 00439393 slli t2,t2,0x4 1015c: 00629533 sll a0,t0,t1 10160: 00750533 add a0,a0,t2 10164: 00008067 ret 00010168 <test>: 10168: ffc10113 addi sp,sp,-4 1016c: 00112023 sw ra,0(sp) 10170: 00100f93 li t6,1 10174: 00000513 li a0,0 10178: f09ff0ef jal 10080 <uf8_encode> 1017c: 00050313 mv t1,a0 10180: 00030513 mv a0,t1 10184: fb9ff0ef jal 1013c <uf8_decode> 10188: 01000393 li t2,16 1018c: 00750463 beq a0,t2,10194 <test0> 10190: 00000f93 li t6,0 00010194 <test0>: 10194: 00f00513 li a0,15 10198: ee9ff0ef jal 10080 <uf8_encode> 1019c: 00050313 mv t1,a0 101a0: 00030513 mv a0,t1 101a4: f99ff0ef jal 1013c <uf8_decode> 101a8: 00f00393 li t2,15 101ac: 00750463 beq a0,t2,101b4 <test15> 101b0: 00000f93 li t6,0 000101b4 <test15>: 101b4: 40000513 li a0,1024 101b8: ec9ff0ef jal 10080 <uf8_encode> 101bc: 00050313 mv t1,a0 101c0: 00030513 mv a0,t1 101c4: f79ff0ef jal 1013c <uf8_decode> 101c8: 3f000393 li t2,1008 101cc: 00750463 beq a0,t2,101d4 <test1024> 101d0: 00000f93 li t6,0 000101d4 <test1024>: 101d4: 000f8263 beqz t6,101d8 <test_ret> 000101d8 <test_ret>: 101d8: 000f8513 mv a0,t6 101dc: 00012083 lw ra,0(sp) 101e0: 00410113 addi sp,sp,4 101e4: 00008067 ret 000101e8 <Exit>: 101e8: 05d00893 li a7,93 101ec: 00000073 ecall ``` ::: ### readelf ``` $ riscv-none-elf-readelf -h test.elf ``` #### Result ``` ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: RISC-V Version: 0x1 Entry point address: 0x10000 Start of program headers: 52 (bytes into file) Start of section headers: 6732 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 2 Size of section headers: 40 (bytes) Number of section headers: 14 Section header string table index: 13 ``` ### Size ``` $ riscv-none-elf-size test.elf ``` #### Result ``` text data bss dec hex filename 496 17 4111 4624 1210 test.elf ``` ## Quiz 2 Problem A ### Hand ``` $ riscv-none-elf-size Problem2_Hand.elf ``` ``` text data bss dec hex filename 616 26 4110 4752 1290 Problem2_Hand.elf ``` ### Compiler-Optimized #### Compare size ``` $ riscv-none-elf-size Problem2_O0.elf Problem2_O3.elf Problem2_Os.elf ``` ``` text data bss dec hex filename 1674 1 4101 5776 1690 Problem2_O0.elf 1130 1 4101 5232 1470 Problem2_O3.elf 1062 1 4105 5168 1430 Problem2_Os.elf ``` #### Analysis | Attribute | -O0 | -O3 | -Os | | --------------------- | ---------------------------------------- | ------------------------------------------- | -------------------------------- | | **Compiler Behavior** | Keeps the original C logic | Removes redundancy, inlines small functions | Prioritizes minimizing code size | | **Instruction Count** | High (each variable stored on the stack) | Lower (more register usage) | Lowest (compact structure) | | **Execution Speed** | Slowest | Fastest | Moderate to fast | | **Program Size** | Largest (1674 bytes) | Reduced by ~33% (1130 bytes) | Smallest (1062 bytes) | ### Comparison between Os and Ofast #### Compare size ``` $ riscv-none-elf-size Problem2_Ofast.elf Problem2_Os.elf ``` ``` text data bss dec hex filename 1130 1 4101 5232 1470 Problem2_Ofast.elf 1062 1 4105 5168 1430 Problem2_Os.elf ``` ### Compare object dump ``` $ riscv-none-elf-objdump -d Problem2_Ofast.elf Problem2_Os.elf ``` #### .text section * Ofast:≈ 0x418 (=1048) bytes * Os: ≈ 0x3F0 (=1008) bytes With the `-Os` optimization flag, the program size was slightly reduced by about 40 bytes. ## Quiz 3 Problem C ### rsqrt `rsqrt` provides a fast reciprocal square root implementation optimized for RV32I Newton's method is an algorithm to compute the root of a function. To compute the reciprocal square root, we want to find $𝑦$ such that $y = \frac{1}{\sqrt{x}}$ Final formula: $$y_{n+1} = y_n \left(\frac{3 - xy_n^2}{2}\right) = y_n \left(\frac{3}{2} - \frac{xy_n^2}{2}\right)$$ **Newton Step** ```c static void newton_step(uint32_t *rec_inv_sqrt, uint32_t x) { uint32_t invsqrt, invsqrt2; uint64_t val; invsqrt = *rec_inv_sqrt; /* Dereference pointer */ invsqrt2 = ((uint64_t)invsqrt * invsqrt) >> 32; val = (3LL << 32) - ((uint64_t)x * invsqrt2); val >>= 2; /* Avoid overflow in following multiply */ val = (val * invsqrt) >> 31; /* Right shift by 31 = (32 - 2 + 1) */ *rec_inv_sqrt = (uint32_t)val; } ``` **Test Result** ``` ./print x = 1 | raw = 4294967295 | approx = 0.9999999998 | exact = 1.0000000000 x = 2 | raw = 3037000500 | approx = 0.7071067812 | exact = 0.7071067812 x = 3 | raw = 2479700525 | approx = 0.5773502693 | exact = 0.5773502692 x = 4 | raw = 2147483647 | approx = 0.4999999998 | exact = 0.5000000000 x = 5 | raw = 1920767767 | approx = 0.4472135955 | exact = 0.4472135955 x = 7 | raw = 1623345051 | approx = 0.3779644731 | exact = 0.3779644730 x = 8 | raw = 1518500250 | approx = 0.3535533906 | exact = 0.3535533906 x = 15 | raw = 1108955788 | approx = 0.2581988899 | exact = 0.2581988897 x = 16 | raw = 1073741824 | approx = 0.2500000000 | exact = 0.2500000000 x = 31 | raw = 771398899 | approx = 0.1796053022 | exact = 0.1796053020 x = 25 | raw = 858993459 | approx = 0.2000000000 | exact = 0.2000000000 x = 49 | raw = 613566759 | approx = 0.1428571434 | exact = 0.1428571429 x = 64 | raw = 536870912 | approx = 0.1250000000 | exact = 0.1250000000 x = 81 | raw = 477218592 | approx = 0.1111111119 | exact = 0.1111111111 x = 100 | raw = 429496731 | approx = 0.1000000003 | exact = 0.1000000000 x = 256 | raw = 268435456 | approx = 0.0625000000 | exact = 0.0625000000 x = 1024 | raw = 134217728 | approx = 0.0312500000 | exact = 0.0312500000 x = 12345 | raw = 38655794 | approx = 0.0090002534 | exact = 0.0090002475 x = 45678 | raw = 20095871 | approx = 0.0046789346 | exact = 0.0046789291 x = 99999 | raw = 13582036 | approx = 0.0031623142 | exact = 0.0031622935 x = 1456465 | raw = 3559000 | approx = 0.0008286443 | exact = 0.0008286096 x = 2000000 | raw = 3037581 | approx = 0.0007072419 | exact = 0.0007071068 x = 5000000 | raw = 1921748 | approx = 0.0004474418 | exact = 0.0004472136 x = 100000000 | raw = 432687 | approx = 0.0001007428 | exact = 0.0001000000 x = 400000000 | raw = 223486 | approx = 0.0000520344 | exact = 0.0000500000 x = 2147483647 | raw = 95721 | approx = 0.0000222868 | exact = 0.0000215792 x = 0 | raw = 4294967295 | approx = 0.9999999998 | exact = (undefined) ``` ### Compiler * 15 Newton iterations * 64-bit software multiplication **Compiler Optimization** | Optimization | text | data | bss | dec | hex | | ------------ | ----- | ---- | ---- | ----- | ------ | | **Os** | 15136 | 32 | 4096 | 19264 | 0x4b40 | | **O2** | 15216 | 32 | 4096 | 19344 | 0x4b90 | | **Ofast** | 16540 | 32 | 4096 | 20668 | 0x50bc | **Program Size Comparison** | Optimization | text size | Difference vs Os | | ------------ | --------- | ---------------- | | **Os** | 15136 | baseline | | **O2** | 15216 | +80 bytes | | **Ofast** | 16540 | +1404 bytes | * `Os` is the smallest — its goal is minimize code size, disabling many optimizations that enlarge code. * `O2` is slightly larger — it enables more aggressive optimizations for speed. * `Ofast` is much larger — it turns on the most aggressive optimizations: loop unrolling, vectorization, fast-math, etc. ### Hand <s>**Test**</s> :::danger Provide test suite and validation plans. ::: ### Compare ## Reference 1. [Lab2: RISC-V Instruction Set Simulator and System Emulator](https://hackmd.io/@sysprog/Sko2Ja5pel)