Assignment1: RISC-V Assembly and Instruction Pipeline

# Assignment1: RISC-V Assembly and Instruction Pipeline :::danger Check the requirements carefully. ::: ## Fibonacci Number ([LeetCode 509](https://leetcode.com/problems/fibonacci-number/description/)) Fibonacci numbers are a sequence introduced by the Italian mathematician Leonardo of Pisa (commonly known as Fibonacci) in 1202. Fibonacci numbers can be expressed using the recursive formula: $$ F(n) = F(n-1) + F(n-2) $$ Where: $$ F(0) = 0 、F(1) = 1 $$ ### Applications of Fibonacci Numbers Fibonacci numbers have various applications in mathematics and computer science, including the following examples: 1. Mathematical Research: - Fibonacci numbers are related to the Golden Ratio, where the ratio F(n+1)/F(n) approaches 1.618 as n approaches infinity. 2. Algorithms and Data Structures: - The Fibonacci heap is a data structure that optimizes merge and decrease-key operations, making it very efficient for certain graph algorithms. 3. Financial Markets: - In technical analysis, Fibonacci retracement levels are used to predict asset price bounce-back and pull-back levels. 4. Art and Architecture: - Fibonacci numbers and the Golden Ratio are often applied in art and architectural design to achieve aesthetic balance. ## Implementation ### C code Presenting Fibonacci using `recursion`. ```c #include <stdio.h> int fib(int n) { if(n <= 1){ return n; } else{ return fib(n - 1) + fib(n - 2); } } int main() { int n = 3; printf("Fibonacci(%d) = %d\n", n, fib(n)); return 0; } ``` Presenting Fibonacci using `loops`. ```c #include <stdio.h> int main() { int n = 3; int a = 0, b = 1, next; printf("Fibonacci(%d) = ", n); if (n == 0) { printf("%d\n", a); return 0; } if (n == 1) { printf("%d\n", b); return 0; } for (int i = 2; i <= n; i++) { next = a + b; a = b; b = next; } printf("%d\n", b); return 0; } ``` ### Assembly Presenting Fibonacci using `recursion`. ```asm .data input_num: .word 3 # Store the input value for Fibonacci calculation prefix: .string "Fibonacci(" # String to print before the result suffix: .string ") = " # String to print after the result .text main: # Main entry point lw a0, input_num # Load input number (n = 3) li s0, 1 # Initialize base case (for comparison n <= 1) jal ra, fibonacci # Call fibonacci function mv a1, a0 # Move final Fibonacci value to a1 lw a0, input_num # Load input number again jal ra, Result # Call function to display result j terminate # Jump to exit fibonacci: # Fibonacci function ble a0, s0, L1 # If n <= 1, go to base case addi sp, sp, -12 # Allocate stack space sw ra, 8(sp) # Save return address sw a0, 4(sp) # Save argument n addi a0, a0, -1 # n = n - 1 jal ra, fibonacci # Recursive call fib(n - 1) sw a0, 0(sp) # Save fib(n - 1) result lw a0, 4(sp) # Load original n addi a0, a0, -2 # n = n - 2 jal ra, fibonacci # Recursive call fib(n - 2) lw t0, 0(sp) # Load fib(n - 1) result add a0, a0, t0 # Calculate fib(n) = fib(n - 1) + fib(n - 2) lw ra, 8(sp) # Restore return address addi sp, sp, 12 # Deallocate stack space ret # Return from function L1: # Base case label ret # Return for base case Result: # Function to display the result mv t0, a0 # Move Fibonacci result to t0 mv t1, a1 # Move input number to t1 la a0, prefix # Load prefix string address li a7, 4 # Set syscall for print string ecall # Print prefix mv a0, t0 # Move Fibonacci result to a0 li a7, 1 # Set syscall for print integer ecall # Print Fibonacci result la a0, suffix # Load suffix string address li a7, 4 # Set syscall for print string ecall # Print suffix mv a0, t1 # Move input number to a0 li a7, 1 # Set syscall for print integer ecall # Print input number ret # Return from Result terminate: # Exit procedure li a7, 12 # Set syscall for exit ecall # Exit the program ``` Presenting Fibonacci using `loops`. ```asm .data input_num: .word 12 prefix: .string "Fibonacci(" suffix: .string ") = " .text main: lw a0, input_num # Load the input_num (N) from memory li s0, 1 # Set s0 to 1 (this will be the first Fibonacci number) li s1, 0 # Initialize s1 to 0 (this will be the second Fibonacci number) li t0, 0 # Initialize counter t0 to 0 loop: beq t0, a0, L1 # If counter (t0) equals N, exit loop add t2, s0, s1 # t2 = s0 + s1 (Fibonacci calculation) mv s0, s1 # Move s1 to s0 for next iteration mv s1, t2 # Move t2 to s1 for next iteration addi t0, t0, 1 # Increment counter t0 j loop # Jump back to start of loop L1: mv a0, s1 # The result is in s1 (the N-th Fibonacci number) # Prepare to print the result lw a1, input_num # Load the input_num (N) again for printing jal ra, Result j terminate Result: mv t0, a1 # Move N to t0 for printing mv t1, a0 # Move the Fibonacci result to t1 for printing la a0, prefix # Load address of prefix li a7, 4 # System call for print string ecall mv a0, t0 # Move N to a0 for printing li a7, 1 # System call for print integer ecall la a0, suffix # Load address of suffix li a7, 4 # System call for print string ecall mv a0, t1 # Move Fibonacci result to a0 for printing li a7, 1 # System call for print integer ecall ret terminate: li a7, 12 # Exit system call ecall ``` recursion： <img src="https://hackmd.io/_uploads/HJeDfnqy1e.png" alt="螢幕擷取畫面" width="300"/> loops： <img src="https://hackmd.io/_uploads/Hk5Prnc1Jx.png" alt="螢幕擷取畫面" width="300"/> Using `loops` is generally better than `recursion` for the Fibonacci sequence for the following reasons: 1. Efficiency - Execution Time: Recursion recalculates many values multiple times, leading to poor performance. Loops calculate each Fibonacci number only once. - Memory Usage: Recursion uses the call stack, which can lead to stack overflow, especially for larger values of `n`. Loops require less memory. 2. Readability and Simplicity - Loop implementations are often more straightforward, making it easier to understand the logic. 3. Avoiding Stack Overflow - Recursion can cause stack overflow with large `n`, while loops do not have this issue. 4. Execution Speed - Loops typically run faster because they avoid the overhead of multiple function calls. ## Analysis We test our code using [Ripes](https://ripes.me/) simulator. ### Pseudo instruction ``` 00000000 <main>: 0: 10000517 auipc x10 0x10000 4: 00052503 lw x10 0 x10 8: 00100413 addi x8 x0 1 c: 00000493 addi x9 x0 0 10: 00000293 addi x5 x0 0 00000014 <loop>: 14: 00a28c63 beq x5 x10 24 <L1> 18: 009403b3 add x7 x8 x9 1c: 00048413 addi x8 x9 0 20: 00038493 addi x9 x7 0 24: 00128293 addi x5 x5 1 28: fedff06f jal x0 -20 <loop> 0000002c <L1>: 2c: 00048513 addi x10 x9 0 30: 10000597 auipc x11 0x10000 34: fd05a583 lw x11 -48 x11 38: 008000ef jal x1 8 <Result> 3c: 0480006f jal x0 72 <terminate> 00000040 <Result>: 40: 00058293 addi x5 x11 0 44: 00050313 addi x6 x10 0 48: 10000517 auipc x10 0x10000 4c: fbc50513 addi x10 x10 -68 50: 00400893 addi x17 x0 4 54: 00000073 ecall 58: 00028513 addi x10 x5 0 5c: 00100893 addi x17 x0 1 60: 00000073 ecall 64: 10000517 auipc x10 0x10000 68: fab50513 addi x10 x10 -85 6c: 00400893 addi x17 x0 4 70: 00000073 ecall 74: 00030513 addi x10 x6 0 78: 00100893 addi x17 x0 1 7c: 00000073 ecall 80: 00008067 jalr x0 x1 0 00000084 <terminate>: 84: 00c00893 addi x17 x0 12 88: 00000073 ecall ``` ## Problem C from [Quiz1](https://hackmd.io/@sysprog/arch2024-quiz1-sol) ### C code ```c static inline float fabsf(float x) { uint32_t i = *(uint32_t *)&x; // Read the bits of the float into an integer i &= 0x7FFFFFFF; // Clear the sign bit to get the absolute value x = *(float *)&i; // Write the modified bits back into the float return x; } ``` ```c static inline int my_clz(uint32_t x) { int count = 0; for (int i = 31; i >= 0; --i) { if (x & (1U << i)) break; count++; } return count; } ``` ```c static inline uint32_t fp16_to_fp32(uint16_t h) { /* * Extends the 16-bit half-precision floating-point number to 32 bits * by shifting it to the upper half of a 32-bit word: * +---+-----+------------+-------------------+ * | S |EEEEE|MM MMMM MMMM|0000 0000 0000 0000| * +---+-----+------------+-------------------+ * Bits 31 26-30 16-25 0-15 * * S - sign bit, E - exponent bits, M - mantissa bits, 0 - zero bits. */ const uint32_t w = (uint32_t) h << 16; /* * Isolates the sign bit from the input number, placing it in the most * significant bit of a 32-bit word: * * +---+----------------------------------+ * | S |0000000 00000000 00000000 00000000| * +---+----------------------------------+ * Bits 31 0-31 */ const uint32_t sign = w & UINT32_C(0x80000000); /* * Extracts the mantissa and exponent from the input number, placing * them in bits 0-30 of the 32-bit word: * * +---+-----+------------+-------------------+ * | 0 |EEEEE|MM MMMM MMMM|0000 0000 0000 0000| * +---+-----+------------+-------------------+ * Bits 30 27-31 17-26 0-16 */ const uint32_t nonsign = w & UINT32_C(0x7FFFFFFF); /* * The renorm_shift variable indicates how many bits the mantissa * needs to be shifted to normalize the half-precision number. * For normalized numbers, renorm_shift will be 0. For denormalized * numbers, renorm_shift will be greater than 0. Shifting a * denormalized number will move the mantissa into the exponent, * normalizing it. */ uint32_t renorm_shift = my_clz(nonsign); renorm_shift = renorm_shift > 5 ? renorm_shift - 5 : 0; /* * If the half-precision number has an exponent of 15, adding a * specific value will cause overflow into bit 31, which converts * the upper 9 bits into ones. Thus: * inf_nan_mask == * 0x7F800000 if the half-precision number is * NaN or infinity (exponent of 15) * 0x00000000 otherwise */ const int32_t inf_nan_mask = ((int32_t)(nonsign + 0x04000000) >> 8) & INT32_C(0x7F800000); /* * If nonsign equals 0, subtracting 1 will cause overflow, setting * bit 31 to 1. Otherwise, bit 31 will be 0. Shifting this result * propagates bit 31 across all bits in zero_mask. Thus: * zero_mask == * 0xFFFFFFFF if the half-precision number is * zero (+0.0h or -0.0h) * 0x00000000 otherwise */ const int32_t zero_mask = (int32_t)(nonsign - 1) >> 31; /* * 1. Shifts nonsign left by renorm_shift to normalize it (for denormal * inputs). * 2. Shifts nonsign right by 3, adjusting the exponent to fit in the * 8-bit exponent field and moving the mantissa into the correct * position within the 23-bit mantissa field of the single-precision * format. * 3. Adds 0x70 to the exponent to account for the difference in bias * between half-precision and single-precision. * 4. Subtracts renorm_shift from the exponent to account for any * renormalization that occurred. * 5. ORs with inf_nan_mask to set the exponent to 0xFF if the input * was NaN or infinity. * 6. ANDs with the inverted zero_mask to set the mantissa and exponent * to zero if the input was zero. * 7. Combines everything with the sign bit of the input number. */ return sign | ((((nonsign << renorm_shift >> 3) + ((0x70 - renorm_shift) << 23)) | inf_nan_mask) & ~zero_mask); } ``` ### Assembly ```asm ```