# Assignment2: Complete Applications
contributed by < [RayYang421](https://github.com/RayYang421/ca2025-Homework2) >
## RV32 emu
###
```
$ source $HOME/riscv-none-elf-gcc/setenv
```
### Linker
**OUTPUT_ARCH( "riscv" )**
Specifies that the output ELF file is built for the RISC-V architecture.
**ENTRY(_start)**
* Defines the program entry point — the address where execution begins.
* _start is usually defined in start.S, and it sets up the stack and initializes memory before calling main.
**SECTIONS**
The main part of the script. It tells the linker how to arrange code and data in memory.
| Section | Purpose | Notes |
| -------- | ----------------------- | -------------------------- |
| `.text` | Program code | Starts at 0x10000 |
| `.data` | Initialized variables | Loaded from ELF |
| `.bss` | Uninitialized variables | Zeroed at startup |
| `.stack` | Stack area | 4 KB, top at `__stack_top` |
**gcc compiler**
| Option | Main Goal | Code Size | Execution Speed | Description |
| :--------: | :--------------------------- | :-------: | :---------------------: | :----------------------------------------------------------------- |
| **-O0** | No optimization | Largest | Slowest | Keeps original structure; best for debugging |
| **-O1** | Basic optimization | Smaller | Faster | Removes unnecessary code; slight speed-up |
| **-O2** | Balanced optimization | Medium | Fast | Common for release builds; safe and efficient |
| **-O3** | Maximum performance | Larger | Faster | Enables aggressive optimizations (e.g., inlining, vectorization) |
| **-Ofast** | Extreme performance | Larger | Fastest | Ignores strict standards (e.g., floating-point precision) |
| **-Os** | Optimize for size | Smallest | Fast or slightly faster | Disables size-increasing optimizations; ideal for embedded systems |
| **-Oz** | Minimize size (mainly Clang) | Smallest | Slightly slower | More aggressive size reduction than `-Os` |
## Problem 1
### objdump
```
$ riscv-none-elf-objdump -d test.elf
```
#### Result
:::spoiler
```
test.elf: file format elf32-littleriscv
Disassembly of section .text:
00010000 <main>:
10000: 168000ef jal 10168 <test>
10004: 00154513 xori a0,a0,1
10008: 1e00006f j 101e8 <Exit>
0001000c <clz>:
1000c: 00000313 li t1,0
10010: 00001e37 lui t3,0x1
10014: 01c533b3 sltu t2,a0,t3
10018: 00439393 slli t2,t2,0x4
1001c: 00730333 add t1,t1,t2
10020: 00751533 sll a0,a0,t2
10024: 01000e37 lui t3,0x1000
10028: 01c533b3 sltu t2,a0,t3
1002c: 00339393 slli t2,t2,0x3
10030: 00730333 add t1,t1,t2
10034: 00751533 sll a0,a0,t2
10038: 10000e37 lui t3,0x10000
1003c: 01c533b3 sltu t2,a0,t3
10040: 00239393 slli t2,t2,0x2
10044: 00730333 add t1,t1,t2
10048: 00751533 sll a0,a0,t2
1004c: 40000e37 lui t3,0x40000
10050: 01c533b3 sltu t2,a0,t3
10054: 00139393 slli t2,t2,0x1
10058: 00730333 add t1,t1,t2
1005c: 00751533 sll a0,a0,t2
10060: 80000e37 lui t3,0x80000
10064: 01c533b3 sltu t2,a0,t3
10068: 00730333 add t1,t1,t2
1006c: 00751533 sll a0,a0,t2
10070: 00153e13 seqz t3,a0
10074: 01c30333 add t1,t1,t3
10078: 00030513 mv a0,t1
1007c: 00008067 ret
00010080 <uf8_encode>:
10080: 01000293 li t0,16
10084: 0a556a63 bltu a0,t0,10138 <encode_done>
10088: ff810113 addi sp,sp,-8
1008c: 00112223 sw ra,4(sp)
10090: 00812023 sw s0,0(sp)
10094: 00050413 mv s0,a0
10098: f75ff0ef jal 1000c <clz>
1009c: 01f00293 li t0,31
100a0: 40a282b3 sub t0,t0,a0
100a4: 00000313 li t1,0
100a8: 00000393 li t2,0
100ac: 00500e13 li t3,5
100b0: 05c2c463 blt t0,t3,100f8 <find_exact_exponent>
100b4: ffc28313 addi t1,t0,-4
100b8: 00f00e13 li t3,15
100bc: 006e4463 blt t3,t1,100c4 <clamp_exponent>
100c0: 0080006f j 100c8 <calc_overflow>
000100c4 <clamp_exponent>:
100c4: 000e0313 mv t1,t3
000100c8 <calc_overflow>:
100c8: 00000e93 li t4,0
000100cc <calc_overflow_loop>:
100cc: 006eda63 bge t4,t1,100e0 <adjust_down>
100d0: 00139393 slli t2,t2,0x1
100d4: 01038393 addi t2,t2,16
100d8: 001e8e93 addi t4,t4,1
100dc: ff1ff06f j 100cc <calc_overflow_loop>
000100e0 <adjust_down>:
100e0: 00605c63 blez t1,100f8 <find_exact_exponent>
100e4: 00745a63 bge s0,t2,100f8 <find_exact_exponent>
100e8: ff038393 addi t2,t2,-16
100ec: 0013d393 srli t2,t2,0x1
100f0: fff30313 addi t1,t1,-1
100f4: fedff06f j 100e0 <adjust_down>
000100f8 <find_exact_exponent>:
100f8: 00f00e13 li t3,15
100fc: 01c35e63 bge t1,t3,10118 <final_calc>
10100: 00139e93 slli t4,t2,0x1
10104: 010e8e93 addi t4,t4,16
10108: 01d44863 blt s0,t4,10118 <final_calc>
1010c: 000e8393 mv t2,t4
10110: 00130313 addi t1,t1,1
10114: fe5ff06f j 100f8 <find_exact_exponent>
00010118 <final_calc>:
10118: 40740f33 sub t5,s0,t2
1011c: 006f5f33 srl t5,t5,t1
10120: 00431313 slli t1,t1,0x4
10124: 01e36533 or a0,t1,t5
10128: 00012403 lw s0,0(sp)
1012c: 00412083 lw ra,4(sp)
10130: 00810113 addi sp,sp,8
10134: 00008067 ret
00010138 <encode_done>:
10138: 00008067 ret
0001013c <uf8_decode>:
1013c: 00f57293 andi t0,a0,15
10140: 00455313 srli t1,a0,0x4
10144: 00f00393 li t2,15
10148: 406383b3 sub t2,t2,t1
1014c: 00008e37 lui t3,0x8
10150: fffe0e13 addi t3,t3,-1 # 7fff <main-0x8001>
10154: 007e53b3 srl t2,t3,t2
10158: 00439393 slli t2,t2,0x4
1015c: 00629533 sll a0,t0,t1
10160: 00750533 add a0,a0,t2
10164: 00008067 ret
00010168 <test>:
10168: ffc10113 addi sp,sp,-4
1016c: 00112023 sw ra,0(sp)
10170: 00100f93 li t6,1
10174: 00000513 li a0,0
10178: f09ff0ef jal 10080 <uf8_encode>
1017c: 00050313 mv t1,a0
10180: 00030513 mv a0,t1
10184: fb9ff0ef jal 1013c <uf8_decode>
10188: 01000393 li t2,16
1018c: 00750463 beq a0,t2,10194 <test0>
10190: 00000f93 li t6,0
00010194 <test0>:
10194: 00f00513 li a0,15
10198: ee9ff0ef jal 10080 <uf8_encode>
1019c: 00050313 mv t1,a0
101a0: 00030513 mv a0,t1
101a4: f99ff0ef jal 1013c <uf8_decode>
101a8: 00f00393 li t2,15
101ac: 00750463 beq a0,t2,101b4 <test15>
101b0: 00000f93 li t6,0
000101b4 <test15>:
101b4: 40000513 li a0,1024
101b8: ec9ff0ef jal 10080 <uf8_encode>
101bc: 00050313 mv t1,a0
101c0: 00030513 mv a0,t1
101c4: f79ff0ef jal 1013c <uf8_decode>
101c8: 3f000393 li t2,1008
101cc: 00750463 beq a0,t2,101d4 <test1024>
101d0: 00000f93 li t6,0
000101d4 <test1024>:
101d4: 000f8263 beqz t6,101d8 <test_ret>
000101d8 <test_ret>:
101d8: 000f8513 mv a0,t6
101dc: 00012083 lw ra,0(sp)
101e0: 00410113 addi sp,sp,4
101e4: 00008067 ret
000101e8 <Exit>:
101e8: 05d00893 li a7,93
101ec: 00000073 ecall
```
:::
### readelf
```
$ riscv-none-elf-readelf -h test.elf
```
#### Result
```
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x10000
Start of program headers: 52 (bytes into file)
Start of section headers: 6732 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 2
Size of section headers: 40 (bytes)
Number of section headers: 14
Section header string table index: 13
```
### Size
```
$ riscv-none-elf-size test.elf
```
#### Result
```
text data bss dec hex filename
496 17 4111 4624 1210 test.elf
```
## Quiz 2 Problem A
### Hand
```
$ riscv-none-elf-size Problem2_Hand.elf
```
```
text data bss dec hex filename
616 26 4110 4752 1290 Problem2_Hand.elf
```
### Compiler-Optimized
#### Compare size
```
$ riscv-none-elf-size Problem2_O0.elf Problem2_O3.elf Problem2_Os.elf
```
```
text data bss dec hex filename
1674 1 4101 5776 1690 Problem2_O0.elf
1130 1 4101 5232 1470 Problem2_O3.elf
1062 1 4105 5168 1430 Problem2_Os.elf
```
#### Analysis
| Attribute | -O0 | -O3 | -Os |
| --------------------- | ---------------------------------------- | ------------------------------------------- | -------------------------------- |
| **Compiler Behavior** | Keeps the original C logic | Removes redundancy, inlines small functions | Prioritizes minimizing code size |
| **Instruction Count** | High (each variable stored on the stack) | Lower (more register usage) | Lowest (compact structure) |
| **Execution Speed** | Slowest | Fastest | Moderate to fast |
| **Program Size** | Largest (1674 bytes) | Reduced by ~33% (1130 bytes) | Smallest (1062 bytes) |
### Comparison between Os and Ofast
#### Compare size
```
$ riscv-none-elf-size Problem2_Ofast.elf Problem2_Os.elf
```
```
text data bss dec hex filename
1130 1 4101 5232 1470 Problem2_Ofast.elf
1062 1 4105 5168 1430 Problem2_Os.elf
```
### Compare object dump
```
$ riscv-none-elf-objdump -d Problem2_Ofast.elf Problem2_Os.elf
```
#### .text section
* Ofast:≈ 0x418 (=1048) bytes
* Os: ≈ 0x3F0 (=1008) bytes
With the `-Os` optimization flag, the program size was slightly reduced by about 40 bytes.
## Quiz 3 Problem C
### rsqrt
`rsqrt` provides a fast reciprocal square root implementation optimized for RV32I
Newton's method is an algorithm to compute the root of a function.
To compute the reciprocal square root, we want to find $𝑦$ such that $y = \frac{1}{\sqrt{x}}$
Final formula:
$$y_{n+1} = y_n \left(\frac{3 - xy_n^2}{2}\right) = y_n \left(\frac{3}{2} - \frac{xy_n^2}{2}\right)$$
**Newton Step**
```c
static void newton_step(uint32_t *rec_inv_sqrt, uint32_t x)
{
uint32_t invsqrt, invsqrt2;
uint64_t val;
invsqrt = *rec_inv_sqrt; /* Dereference pointer */
invsqrt2 = ((uint64_t)invsqrt * invsqrt) >> 32;
val = (3LL << 32) - ((uint64_t)x * invsqrt2);
val >>= 2; /* Avoid overflow in following multiply */
val = (val * invsqrt) >> 31; /* Right shift by 31 = (32 - 2 + 1) */
*rec_inv_sqrt = (uint32_t)val;
}
```
**Test Result**
```
./print
x = 1 | raw = 4294967295 | approx = 0.9999999998 | exact = 1.0000000000
x = 2 | raw = 3037000500 | approx = 0.7071067812 | exact = 0.7071067812
x = 3 | raw = 2479700525 | approx = 0.5773502693 | exact = 0.5773502692
x = 4 | raw = 2147483647 | approx = 0.4999999998 | exact = 0.5000000000
x = 5 | raw = 1920767767 | approx = 0.4472135955 | exact = 0.4472135955
x = 7 | raw = 1623345051 | approx = 0.3779644731 | exact = 0.3779644730
x = 8 | raw = 1518500250 | approx = 0.3535533906 | exact = 0.3535533906
x = 15 | raw = 1108955788 | approx = 0.2581988899 | exact = 0.2581988897
x = 16 | raw = 1073741824 | approx = 0.2500000000 | exact = 0.2500000000
x = 31 | raw = 771398899 | approx = 0.1796053022 | exact = 0.1796053020
x = 25 | raw = 858993459 | approx = 0.2000000000 | exact = 0.2000000000
x = 49 | raw = 613566759 | approx = 0.1428571434 | exact = 0.1428571429
x = 64 | raw = 536870912 | approx = 0.1250000000 | exact = 0.1250000000
x = 81 | raw = 477218592 | approx = 0.1111111119 | exact = 0.1111111111
x = 100 | raw = 429496731 | approx = 0.1000000003 | exact = 0.1000000000
x = 256 | raw = 268435456 | approx = 0.0625000000 | exact = 0.0625000000
x = 1024 | raw = 134217728 | approx = 0.0312500000 | exact = 0.0312500000
x = 12345 | raw = 38655794 | approx = 0.0090002534 | exact = 0.0090002475
x = 45678 | raw = 20095871 | approx = 0.0046789346 | exact = 0.0046789291
x = 99999 | raw = 13582036 | approx = 0.0031623142 | exact = 0.0031622935
x = 1456465 | raw = 3559000 | approx = 0.0008286443 | exact = 0.0008286096
x = 2000000 | raw = 3037581 | approx = 0.0007072419 | exact = 0.0007071068
x = 5000000 | raw = 1921748 | approx = 0.0004474418 | exact = 0.0004472136
x = 100000000 | raw = 432687 | approx = 0.0001007428 | exact = 0.0001000000
x = 400000000 | raw = 223486 | approx = 0.0000520344 | exact = 0.0000500000
x = 2147483647 | raw = 95721 | approx = 0.0000222868 | exact = 0.0000215792
x = 0 | raw = 4294967295 | approx = 0.9999999998 | exact = (undefined)
```
### Compiler
* 15 Newton iterations
* 64-bit software multiplication
**Compiler Optimization**
| Optimization | text | data | bss | dec | hex |
| ------------ | ----- | ---- | ---- | ----- | ------ |
| **Os** | 15136 | 32 | 4096 | 19264 | 0x4b40 |
| **O2** | 15216 | 32 | 4096 | 19344 | 0x4b90 |
| **Ofast** | 16540 | 32 | 4096 | 20668 | 0x50bc |
**Program Size Comparison**
| Optimization | text size | Difference vs Os |
| ------------ | --------- | ---------------- |
| **Os** | 15136 | baseline |
| **O2** | 15216 | +80 bytes |
| **Ofast** | 16540 | +1404 bytes |
* `Os` is the smallest — its goal is minimize code size, disabling many optimizations that enlarge code.
* `O2` is slightly larger — it enables more aggressive optimizations for speed.
* `Ofast` is much larger — it turns on the most aggressive optimizations: loop unrolling, vectorization, fast-math, etc.
### Hand
<s>**Test**</s>
:::danger
Provide test suite and validation plans.
:::
### Compare
## Reference
1. [Lab2: RISC-V Instruction Set Simulator and System Emulator](https://hackmd.io/@sysprog/Sko2Ja5pel)