# Assignment2 : Complete Applications
contributed by < [kaihsiang092](https://github.com/kaihsiang092/Assignment2-Complete-Applications) >
## Setup
According to the Lab 2 instructions, set up **rv32emu**, a RISC-V RV32 emulator that allows you to learn how to execute bare-metal programs.
1. Install RISC-V GNU Toolchain
2. Install Dependencies
3. Build rv32emu
4. System Emulation
Build with system emulation enabled:
```c=
make -C tests/system/alignment
make ENABLE_ELF_LOADER=1 ENABLE_EXT_C=0 ENABLE_SYSTEM=1 misalign-in-blk-emu
```
## RISC-V calling conventions
he assembly code in this assignment follows the system call conventions defined in [docs/syscall.md ](https://github.com/sysprog21/rv32emu/blob/master/docs/syscall.md), using only the system call syntax supported by rv32emu, ensuring correct execution on the emulator.
### 1. The write System Call
The write system call is used to output data — typically a string — to a file descriptor such as standard output.
Its C function signature is:
```c=
ssize_t write(int fd, const void *buf, size_t count);
```
* fd (file descriptor) : the output target — 1 represents standard output.
* buf : the address of the buffer (the string to print).
* count : the number of bytes to write (not including the null terminator).
### RISC-V Register Mapping
| Parameter | Register | Meaning |
| -------- | -------- | -------- |
| fd | a0 | 1 for stdout |
| buf | a1 | address of the data |
| count |a2 | number of bytes to write |
| syscall number | a7 | 64 for write |
### 2.The exit System Call
The exit system call terminates the program and returns a status code to the operating system or emulator.
In C, it corresponds to:
```c=
void exit(int status);
```
### RISC-V Register Mapping
| Parameter | Register | Meaning |
| --------- | -------- | ------- |
| status | a0 | 0 = success) |
| syscall number | a7 | 93 for exit |
## Homework 1 q1-uf8
Hand Written Assembly Code < [q1-uf8.S](https://github.com/kaihsiang092/Assignment2-Complete-Applications/blob/main/Homework%201/q1-uf8.S) >
### Result

### Examine ELF Headers
This is a 32-bit RISC-V executable ELF file (little endian) that starts execution at address 0x10000, with two program segments and fourteen section headers

### Display Section Sizes
text = 15156 bytes — the program code (instructions).
data = 136 bytes — initialized global or static data.
bss = 4112 bytes — uninitialized data that will be allocated at runtime.
dec = 19404 bytes — total memory usage in decimal (sum of text, data, and bss).
hex = 4bcc — total size in hexadecimal.
the executable q1-uf8.elf occupies about 19 KB of memory in total.

## Problem A from Quiz 2
Hand Written Assembly Code < [hanoi.S](https://github.com/kaihsiang092/Assignment2-Complete-Applications/blob/main/quiz2_problem%20A/hanoi.S) >
The iterative Gray code method provides a mathematically equivalent but computationally simpler way to solve the Tower of Hanoi. It exploits the bit-flip property of Gray codes to systematically generate all valid moves without recursion, making it ideal for efficient, low-level assembly implementation.
### Relationship between Tower of Hanoi and Gray Code
1. In the Tower of Hanoi, each move involves changing the position of exactly one disk while maintaining the rule that no larger disk can be placed on a smaller one.
2. In Gray codes, each consecutive value differs by exactly one bit, meaning only one binary digit changes per step.
3. If each bit in a Gray code represents the position of a disk, then each bit flip corresponds to moving one disk.
4. Thus, the Gray code sequence can directly generate the correct sequence of moves for solving the Tower of Hanoi without recursion.
### Advantages of the Iterative (Gray Code–based) Method
1. Eliminates recursion:
The traditional recursive solution uses function calls, stack frames, and base cases.
The Gray code approach replaces this with a simple loop, improving efficiency and making it suitable for low-level architectures like RISC-V that have limited stack space.
2. Deterministic and easy to control:
The iterative version computes each move directly from the Gray code pattern, avoiding recursive overhead and potential stack overflows.
3. One move per loop iteration:
Since Gray codes guarantee only one bit changes at a time, each loop iteration corresponds neatly to one valid move in the Tower of Hanoi.
4. Efficient for system emulation:
On a bare-metal RISC-V environment (like rv32emu), iterative control is easier to implement and debug compared to recursive function calls, which would require full call-stack handling.
### Result

### Examine ELF Headers
the file is a 32-bit ELF executable (ELF32) for the RISC-V architecture, using little-endian byte order and following the UNIX System V ABI. The entry point address is 0x10000, which is where program execution begins. The file contains 2 program headers and 13 section headers, with standard header sizes (52 bytes for the ELF header, 32 bytes per program header, and 40 bytes per section header).

### Display Section Sizes
text = 15056 bytes — the size of the program code (instructions).
data = 59 bytes — the size of initialized global/static variables.
bss = 4101 bytes — the size of uninitialized variables (allocated at runtime).
dec = 19216 bytes — the total memory size in decimal (sum of text, data, and bss).
hex = 4b10 — the total size in hexadecimal.
the compiled hanoi.elf program occupies about 19 KB of memory in total.

## Problem C from Quiz 3
Hand Written Assembly Code < [fast_rsqrt.c](https://github.com/kaihsiang092/Assignment2-Complete-Applications/blob/main/quiz3_problem%20C/fast_rsqrt.c) >
The fast_rsqrt algorithm combines three stages lookup table, linear interpolation, and Newton-Raphson iteration to efficiently approximate the reciprocal square root using only integer arithmetic on RV32I (without a floating-point unit).
### 1. Lookup Table (Initial Estimate)
The algorithm first uses a precomputed lookup table (rsqrt_table) that contains approximate values$$ \frac{65536}{\sqrt{2^{exp}}} $$ for powers of two.
The position of the most significant bit (MSB) of the input number determines which entry to use.
This provides a rough initial estimate but has limited precision—typically around 20% error.
### 2. Linear Interpolation
Because most numbers are not exact powers of two, the algorithm improves accuracy by interpolating between adjacent table values.
This reduces the error to about 3–5%, already a big improvement.
### 3. Newton-Raphson Iteration
Finally, the algorithm applies Newton’s method to iteratively refine the estimate.
For the reciprocal square root, the Newton update formula is:
$$ y_{n+1} = y_n \cdot \frac{3 - x \cdot y_n^2 / 2^{16}}{2} $$
Each iteration rapidly increases accuracy by adjusting y toward the true root based on the function’s slope.
The first iteration reduces the error significantly (from 20% → ~8–10%)
The second iteration further refines it to 3–8% error, which is sufficient for real-time applications like graphics or physics simulations.
### Result

### Examine ELF Headers
the file is a 32-bit ELF executable (ELF32) for the RISC-V architecture, using little-endian byte order and following the UNIX System V ABI. The entry point address is 0x2c4, meaning program execution begins at that memory location. The file contains 3 program headers and 7 section headers, with standard ELF header sizes (52 bytes for the ELF header, 32 bytes per program header, and 40 bytes per section header).

### **Comparison of Different Optimization Compilation Results**
#### Handwritten

### -Ofast (optimized for speed)

### -Os (optimized for size)

| | Cycles | text size (bytes) |
| ---- | -------- | -------- |
| Handwritten | 1495 | 2764 |
| -Ofast | 4 | 4712 |
| -Os | 4 | 1588 |
**Execution speed:**
The handwritten assembly version with **1495 cycles** is a normal value, but the optimized versions showing **4 cycles** are not realistic results.
**Program size:**
`-Ofast` performs aggressive optimizations such as **loop unrolling**, **function inlining**, and **instruction reordering**, which increase the total number of instructions.
`-Os`, on the other hand, focuses on **code size optimization**, reusing code segments and reducing inlining or unrolling to minimize the final binary size.
**Conclusion:**
`-Ofast`, in pursuit of maximum speed, produces a binary size of **4712 bytes**, roughly **three times larger** than that of `-Os`.
In contrast, `-Os` maintains high execution speed while reducing the size to **1588 bytes**, making it more practical for **embedded systems** or **memory-constrained environments**.