# Assignment1: RISC-V Assembly and Instruction Pipeline
:::danger
Check the requirements carefully.
:::
## Fibonacci Number ([LeetCode 509](https://leetcode.com/problems/fibonacci-number/description/))
Fibonacci numbers are a sequence introduced by the Italian mathematician Leonardo of Pisa (commonly known as Fibonacci) in 1202.
Fibonacci numbers can be expressed using the recursive formula:
$$
F(n) = F(n-1) + F(n-2)
$$
Where:
$$
F(0) = 0 、F(1) = 1
$$
### Applications of Fibonacci Numbers
Fibonacci numbers have various applications in mathematics and computer science, including the following examples:
1. Mathematical Research:
- Fibonacci numbers are related to the Golden Ratio, where the ratio F(n+1)/F(n) approaches 1.618 as n approaches infinity.
2. Algorithms and Data Structures:
- The Fibonacci heap is a data structure that optimizes merge and decrease-key operations, making it very efficient for certain graph algorithms.
3. Financial Markets:
- In technical analysis, Fibonacci retracement levels are used to predict asset price bounce-back and pull-back levels.
4. Art and Architecture:
- Fibonacci numbers and the Golden Ratio are often applied in art and architectural design to achieve aesthetic balance.
## Implementation
### C code
Presenting Fibonacci using `recursion`.
```c
#include <stdio.h>
int fib(int n) {
if(n <= 1){
return n;
}
else{
return fib(n - 1) + fib(n - 2);
}
}
int main() {
int n = 3;
printf("Fibonacci(%d) = %d\n", n, fib(n));
return 0;
}
```
Presenting Fibonacci using `loops`.
```c
#include <stdio.h>
int main() {
int n = 3;
int a = 0, b = 1, next;
printf("Fibonacci(%d) = ", n);
if (n == 0) {
printf("%d\n", a);
return 0;
}
if (n == 1) {
printf("%d\n", b);
return 0;
}
for (int i = 2; i <= n; i++) {
next = a + b;
a = b;
b = next;
}
printf("%d\n", b);
return 0;
}
```
### Assembly
Presenting Fibonacci using `recursion`.
```asm
.data
input_num: .word 3 # Store the input value for Fibonacci calculation
prefix: .string "Fibonacci(" # String to print before the result
suffix: .string ") = " # String to print after the result
.text
main: # Main entry point
lw a0, input_num # Load input number (n = 3)
li s0, 1 # Initialize base case (for comparison n <= 1)
jal ra, fibonacci # Call fibonacci function
mv a1, a0 # Move final Fibonacci value to a1
lw a0, input_num # Load input number again
jal ra, Result # Call function to display result
j terminate # Jump to exit
fibonacci: # Fibonacci function
ble a0, s0, L1 # If n <= 1, go to base case
addi sp, sp, -12 # Allocate stack space
sw ra, 8(sp) # Save return address
sw a0, 4(sp) # Save argument n
addi a0, a0, -1 # n = n - 1
jal ra, fibonacci # Recursive call fib(n - 1)
sw a0, 0(sp) # Save fib(n - 1) result
lw a0, 4(sp) # Load original n
addi a0, a0, -2 # n = n - 2
jal ra, fibonacci # Recursive call fib(n - 2)
lw t0, 0(sp) # Load fib(n - 1) result
add a0, a0, t0 # Calculate fib(n) = fib(n - 1) + fib(n - 2)
lw ra, 8(sp) # Restore return address
addi sp, sp, 12 # Deallocate stack space
ret # Return from function
L1: # Base case label
ret # Return for base case
Result: # Function to display the result
mv t0, a0 # Move Fibonacci result to t0
mv t1, a1 # Move input number to t1
la a0, prefix # Load prefix string address
li a7, 4 # Set syscall for print string
ecall # Print prefix
mv a0, t0 # Move Fibonacci result to a0
li a7, 1 # Set syscall for print integer
ecall # Print Fibonacci result
la a0, suffix # Load suffix string address
li a7, 4 # Set syscall for print string
ecall # Print suffix
mv a0, t1 # Move input number to a0
li a7, 1 # Set syscall for print integer
ecall # Print input number
ret # Return from Result
terminate: # Exit procedure
li a7, 12 # Set syscall for exit
ecall # Exit the program
```
Presenting Fibonacci using `loops`.
```asm
.data
input_num: .word 12
prefix: .string "Fibonacci("
suffix: .string ") = "
.text
main:
lw a0, input_num # Load the input_num (N) from memory
li s0, 1 # Set s0 to 1 (this will be the first Fibonacci number)
li s1, 0 # Initialize s1 to 0 (this will be the second Fibonacci number)
li t0, 0 # Initialize counter t0 to 0
loop:
beq t0, a0, L1 # If counter (t0) equals N, exit loop
add t2, s0, s1 # t2 = s0 + s1 (Fibonacci calculation)
mv s0, s1 # Move s1 to s0 for next iteration
mv s1, t2 # Move t2 to s1 for next iteration
addi t0, t0, 1 # Increment counter t0
j loop # Jump back to start of loop
L1:
mv a0, s1 # The result is in s1 (the N-th Fibonacci number)
# Prepare to print the result
lw a1, input_num # Load the input_num (N) again for printing
jal ra, Result
j terminate
Result:
mv t0, a1 # Move N to t0 for printing
mv t1, a0 # Move the Fibonacci result to t1 for printing
la a0, prefix # Load address of prefix
li a7, 4 # System call for print string
ecall
mv a0, t0 # Move N to a0 for printing
li a7, 1 # System call for print integer
ecall
la a0, suffix # Load address of suffix
li a7, 4 # System call for print string
ecall
mv a0, t1 # Move Fibonacci result to a0 for printing
li a7, 1 # System call for print integer
ecall
ret
terminate:
li a7, 12 # Exit system call
ecall
```
recursion:
<img src="https://hackmd.io/_uploads/HJeDfnqy1e.png" alt="螢幕擷取畫面" width="300"/>
loops:
<img src="https://hackmd.io/_uploads/Hk5Prnc1Jx.png" alt="螢幕擷取畫面" width="300"/>
Using `loops` is generally better than `recursion` for the Fibonacci sequence for the following reasons:
1. Efficiency
- Execution Time: Recursion recalculates many values multiple times, leading to poor performance. Loops calculate each Fibonacci number only once.
- Memory Usage: Recursion uses the call stack, which can lead to stack overflow, especially for larger values of `n`. Loops require less memory.
2. Readability and Simplicity
- Loop implementations are often more straightforward, making it easier to understand the logic.
3. Avoiding Stack Overflow
- Recursion can cause stack overflow with large `n`, while loops do not have this issue.
4. Execution Speed
- Loops typically run faster because they avoid the overhead of multiple function calls.
## Analysis
We test our code using [Ripes](https://ripes.me/) simulator.
### Pseudo instruction
```
00000000 <main>:
0: 10000517 auipc x10 0x10000
4: 00052503 lw x10 0 x10
8: 00100413 addi x8 x0 1
c: 00000493 addi x9 x0 0
10: 00000293 addi x5 x0 0
00000014 <loop>:
14: 00a28c63 beq x5 x10 24 <L1>
18: 009403b3 add x7 x8 x9
1c: 00048413 addi x8 x9 0
20: 00038493 addi x9 x7 0
24: 00128293 addi x5 x5 1
28: fedff06f jal x0 -20 <loop>
0000002c <L1>:
2c: 00048513 addi x10 x9 0
30: 10000597 auipc x11 0x10000
34: fd05a583 lw x11 -48 x11
38: 008000ef jal x1 8 <Result>
3c: 0480006f jal x0 72 <terminate>
00000040 <Result>:
40: 00058293 addi x5 x11 0
44: 00050313 addi x6 x10 0
48: 10000517 auipc x10 0x10000
4c: fbc50513 addi x10 x10 -68
50: 00400893 addi x17 x0 4
54: 00000073 ecall
58: 00028513 addi x10 x5 0
5c: 00100893 addi x17 x0 1
60: 00000073 ecall
64: 10000517 auipc x10 0x10000
68: fab50513 addi x10 x10 -85
6c: 00400893 addi x17 x0 4
70: 00000073 ecall
74: 00030513 addi x10 x6 0
78: 00100893 addi x17 x0 1
7c: 00000073 ecall
80: 00008067 jalr x0 x1 0
00000084 <terminate>:
84: 00c00893 addi x17 x0 12
88: 00000073 ecall
```
## Problem C from [Quiz1](https://hackmd.io/@sysprog/arch2024-quiz1-sol)
### C code
```c
static inline float fabsf(float x) {
uint32_t i = *(uint32_t *)&x; // Read the bits of the float into an integer
i &= 0x7FFFFFFF; // Clear the sign bit to get the absolute value
x = *(float *)&i; // Write the modified bits back into the float
return x;
}
```
```c
static inline int my_clz(uint32_t x) {
int count = 0;
for (int i = 31; i >= 0; --i) {
if (x & (1U << i))
break;
count++;
}
return count;
}
```
```c
static inline uint32_t fp16_to_fp32(uint16_t h) {
/*
* Extends the 16-bit half-precision floating-point number to 32 bits
* by shifting it to the upper half of a 32-bit word:
* +---+-----+------------+-------------------+
* | S |EEEEE|MM MMMM MMMM|0000 0000 0000 0000|
* +---+-----+------------+-------------------+
* Bits 31 26-30 16-25 0-15
*
* S - sign bit, E - exponent bits, M - mantissa bits, 0 - zero bits.
*/
const uint32_t w = (uint32_t) h << 16;
/*
* Isolates the sign bit from the input number, placing it in the most
* significant bit of a 32-bit word:
*
* +---+----------------------------------+
* | S |0000000 00000000 00000000 00000000|
* +---+----------------------------------+
* Bits 31 0-31
*/
const uint32_t sign = w & UINT32_C(0x80000000);
/*
* Extracts the mantissa and exponent from the input number, placing
* them in bits 0-30 of the 32-bit word:
*
* +---+-----+------------+-------------------+
* | 0 |EEEEE|MM MMMM MMMM|0000 0000 0000 0000|
* +---+-----+------------+-------------------+
* Bits 30 27-31 17-26 0-16
*/
const uint32_t nonsign = w & UINT32_C(0x7FFFFFFF);
/*
* The renorm_shift variable indicates how many bits the mantissa
* needs to be shifted to normalize the half-precision number.
* For normalized numbers, renorm_shift will be 0. For denormalized
* numbers, renorm_shift will be greater than 0. Shifting a
* denormalized number will move the mantissa into the exponent,
* normalizing it.
*/
uint32_t renorm_shift = my_clz(nonsign);
renorm_shift = renorm_shift > 5 ? renorm_shift - 5 : 0;
/*
* If the half-precision number has an exponent of 15, adding a
* specific value will cause overflow into bit 31, which converts
* the upper 9 bits into ones. Thus:
* inf_nan_mask ==
* 0x7F800000 if the half-precision number is
* NaN or infinity (exponent of 15)
* 0x00000000 otherwise
*/
const int32_t inf_nan_mask = ((int32_t)(nonsign + 0x04000000) >> 8) &
INT32_C(0x7F800000);
/*
* If nonsign equals 0, subtracting 1 will cause overflow, setting
* bit 31 to 1. Otherwise, bit 31 will be 0. Shifting this result
* propagates bit 31 across all bits in zero_mask. Thus:
* zero_mask ==
* 0xFFFFFFFF if the half-precision number is
* zero (+0.0h or -0.0h)
* 0x00000000 otherwise
*/
const int32_t zero_mask = (int32_t)(nonsign - 1) >> 31;
/*
* 1. Shifts nonsign left by renorm_shift to normalize it (for denormal
* inputs).
* 2. Shifts nonsign right by 3, adjusting the exponent to fit in the
* 8-bit exponent field and moving the mantissa into the correct
* position within the 23-bit mantissa field of the single-precision
* format.
* 3. Adds 0x70 to the exponent to account for the difference in bias
* between half-precision and single-precision.
* 4. Subtracts renorm_shift from the exponent to account for any
* renormalization that occurred.
* 5. ORs with inf_nan_mask to set the exponent to 0xFF if the input
* was NaN or infinity.
* 6. ANDs with the inverted zero_mask to set the mantissa and exponent
* to zero if the input was zero.
* 7. Combines everything with the sign bit of the input number.
*/
return sign | ((((nonsign << renorm_shift >> 3) +
((0x70 - renorm_shift) << 23)) | inf_nan_mask) & ~zero_mask);
}
```
### Assembly
```asm
```