# Assignment1: RISC-V Assembly and Instruction Pipeline contributed by <[吳睿秉](https://github.com/XTX7900XTX/Computer-Architecture-2024.git)> :::danger Be aware of Markdown headings. ::: ## Quiz1 [Problem C](https://hackmd.io/@sysprog/arch2024-quiz1-sol#Problem-C) :::danger Use color sparingly, only when necessary to differentiate from plain text. Prioritize content over formatting. ::: ### <font color="white">fabsf</font> The function fabsf is used to obtain the absolute value of a floating-point number. This is achieved by applying the AND operation with the mask `0x7FFFFFFF`, which sets the sign bit to 0. #### C In this C program, the input floating point number's address is first cast to a `uint32_t` type to enable bitwise operations. After this, the `uint32_t` value is masked with `0x7FFFFFFF` to clear the sign bit. Finally, the result is cast back to the original floating point type, which gives the absolute value of the floating point number. ```c static inline float fabsf(float x) { uint32_t i = *(uint32_t *)&x; i &= 0x7FFFFFFF; x = *(float *)&i; return x; } ``` :::danger Don't paste code snip without comprehensive discussions. ::: #### RISC-V There are three test cases, so the len is set to `3` in advance. and the process will print the input to the console before calling the function `fabsf`. After the function `fabsf` returns, the output will be stored in `a0` and then moved to `a1`. Next, the value in `a1` will be compared to the expected answer in `a2`. The console will print `1` if the output equals the expected answer. | register name | content | |:------------- |:------------------------------------------------- | | t0 | data start address (input) | | t1 | number of data | | t2 | data start address (expect output) | | t3 | hold the value in a0 during a syscall temporarily | | t4 | hold 0x7FFFFFFF temporarily | | a1 | fabsf output | | a2 | expect output | ```c .data data: .word 0x3f000000, 0xc1420000, 0xf0700000 len: .word 0x3 expect: .word 0x3f000000, 0x41420000, 0x70700000 endl: .string "\n" .text la t0, data lw t1, len la t2, expect main: lw a0, 0(t0) # Show input value li a7, 2 # ecall # jal ra, newline # jal ra, fabsf # Call fabsf add a1, zero, a0 # Show output value li a7, 2 # ecall # jal ra, newline # lw a2, 0(t2) # Check if the output matches the expected result li a7, 1 # bne a1, a2, wrong # right: # If output == expected, print 1 li a0, 1 # j goback wrong: # If output != expected, print 0 li a0, 0 # goback: ecall jal ra newline addi t0, t0, 4 # Iterate data array addi t1, t1,-1 addi t2, t2, 4 # Iterate expect array bne t1, zero, main li a7, 10 ecall fabsf: # fabsf function li t4, 0x7fffffff and a0, a0, t4 # x &= 0x7FFFFFFF; ret newline: # print "\n" mv t3, a0 la a0, endl li a7, 4 ecall mv a0, t3 ret ``` #### Testcase * ##### Case1: > Input: 0x3f000000 (0.5 in decimal) > Output: 0x3f000000 (0.5 in decimal) * ##### Case2: > Input: 0xc1420000 (-12.125 in decimal) > Output: 0x41420000 (12.125 in decimal) * ##### Case3: > Input: 0xf0700000 (-2.97106e+29 in decimal) > Output: 0x70700000 (2.97106e+29 in decimal) #### Console ``` 0.5 0.5 1 -12.125 12.125 1 -2.97106e+29 2.97106e+29 1 Program exited with code: 0 ``` :::danger Use fewer instructions. ::: ### <font color="white">my_clz</font> The `my_clz` function counts the leading zeros in a 32-bit digit. Starting from the leftmost digit and moving toward the right, it counts the zeros until a '1' appears. The function then returns the number of zeros in `t4` register. Note that if the 32-bit digit is all zeros (i.e., 0), the leading zeros are considered undefined. #### C This C program simply counts the leading zeros in the input `x`. It starts by setting the `count` to `0`. Then, it iterates from left to right through all 32 bits. Using a left shift operation (`1U << i`), for `i=31` to `0`, it performs an AND operation with `x` to check if the bit is 1. As soon as the first 1 appears from the left side, the loop breaks and returns `count`, which stores the number of leading zeros. ```c static inline int my_clz(uint32_t x) { int count = 0; for (int i = 31; i >= 0; --i) { if (x & (1U << i)) break; count++; } return count; } ``` #### RISC-V The RISC-V program for counting leading zeros works as follows. There are four inputs. The program displays the 32-bit input before starting `my_clz`. It uses a loop to accumulate the number of leading zeros until the `t6` register holds a 1, meaning the condition `(x & (1 << i))` is true. At this point, the loop stops, and the count is stored in `t4`. The program automatically checks the answer, and if it's correct, it will print "correct." | register name | content | | ------------- |:----------------------------------------------------------------- | | t0 | data start address (input) | | t1 | number of data | | t2 | data start address (expect output) | | t3 | hold the value in a0 during a syscall temporarily | | t4 | count | | t5 | i | | t6 | (x & (1 << i)) | | a1 | my_clz output | | a2 | expect output | | s0 | store the immediate value 32 to check if the result is undefined. | ```c .data data: .word 0x0000002, 0x000000FF, 0xFFFFFFFF, 0x00000000 len: .word 0x4 expect: .word 30, 24, 0, 32 endl: .string "\n" undef: .string "undefined" correct: .string " correct" incorrect: .string " incorrect" .text la t0, data lw t1, len la t2, expect main: lw a0, 0(t0) # Show input value li a7, 1 # ecall # jal ra, newline # jal ra, my_clz # Call my_clz mv a1, a0 # Show output value li s0, 32 # Check if the result is undefined beq a1, s0, isundef # li a7, 1 # Show output value ecall # lw a2, 0(t2) # Check if the output matches the expected result li a7, 4 # bne a1, a2, wrong # right: # If output == expected, print correct la a0, correct # j goback # wrong: # If output != expected, print incorrect la a0, incorrect # j goback # isundef: # undefined la a0, undef # goback: ecall jal ra, newline addi t0, t0, 4 addi t1, t1,-1 addi t2, t2, 4 bne t1, zero, main li a7, 10 ecall my_clz: # my_clz function li t4 0 li t5 31 loop: li t6, 1 # t6 = 1 sll t6, t6, t5 # t6 = (1 << i) and t6, a0, t6 # t6 = (x & (1 << i)) bnez t6, quit # if (x & (1U << i)) break; addi t4, t4, 1 addi t5, t5, -1 bge t5, zero, loop quit: mv a0, t4 ret newline: # to print "\n" mv t3, a0 la a0, endl li a7, 4 ecall mv a0, t3 ret ``` #### Testcase * ##### Case1: > Input: 0x0000002 (2 in decimal, `00000000 00000000 00000000 00000010` in binary) > Output: 30 * ##### Case2: > Input: 0x000000FF (255 in decimal, `00000000 00000000 00000000 11111111` in binary) > Output: 24 * ##### Case3: > Input: 0xFFFFFFFF (-1 in decimal, `11111111 11111111 11111111 11111111` in binary) > Output: 0 * ##### Case4: > Input: 0x00000000 (0) > Output: undefined #### Console ``` 2 30 correct 255 24 correct -1 0 correct 0 undefined Program exited with code: 0 ``` ### <font color="white">fp16_to_fp32</font> The purpose of fp16_to_fp32 is to convert a 16-bit floating-point number to a 32-bit floating-point number. Due to the format differences between FP16 and FP32, certain transformations are required during the conversion process. There are several steps involved. First, the input `h` is shifted left by 16 bits into a 32-bit `uint32_t`, where the upper bits hold the FP16 data and the lower bits remain zero. Second, the `sign` bit is extracted, and the `nonsign` bits (i.e., exponent and mantissa) are separated. The third step involves calculating the renormalization shift. The function `my_clz`, as mentioned earlier, is used to count the leading zeros in the nonsign part (exponent and mantissa). The result is adjusted by subtracting 5 if it's greater than 5, otherwise it's set to 0. This shift value is used to normalize the mantissa by shifting it left and then right. The fourth step handles the special cases for the exponent being all ones (infinity or NaN). In FP16, an exponent of 0x1F (31 in decimal) indicates either infinity or NaN. The result is right-shifted by 8 bits to move the exponent to the correct position in the 32-bit representation. The mask `0x7F800000` ensures that only the exponent bits are retained in the final result. In the fifth step, the zero_mask is used to checks if `nonsign` is zero. If so, equals 0, subtracting 1 causes overflow, setting bit 31 to 1. If nonsign is not zero, bit 31 remains 0. Finally, the function combines the sign, the normalized nonsign, and applies the `inf_nan_mask` and `zero_mask` to handle cases of `infinity`, `NaN`, and `zero`, ultimately returning the 32-bit floating-point value. #### C ```c static inline uint32_t fp16_to_fp32(uint16_t h) { const uint32_t w = (uint32_t) h << 16; const uint32_t sign = w & UINT32_C(0x80000000); const uint32_t nonsign = w & UINT32_C(0x7FFFFFFF); uint32_t renorm_shift = my_clz(nonsign); renorm_shift = renorm_shift > 5 ? renorm_shift - 5 : 0; const int32_t inf_nan_mask = ((int32_t)(nonsign + 0x04000000) >> 8) & INT32_C(0x7F800000); const int32_t zero_mask = (int32_t)(nonsign - 1) >> 31; return sign | ((((nonsign << renorm_shift >> 3) + ((0x70 - renorm_shift) << 23)) | inf_nan_mask) & ~zero_mask); } ``` #### RISC-V This RISC-V program executes the process described in the C program, utilizing the previously mentioned `my_clz` function to convert from FP16 to FP32. | register name | content | |:------------- |:---------------------------------------------------- | | s1 | h, w, renorm_shift | | s2 | sign | | s3 | nonsign | | s4 | inf_nan_mask | | s5 | zero_mask | | s6 | (nonsign << renorm_shift >> 3) | | s7 | ((0x70 - renorm_shift) << 23) | | s8 | result | | s11 | Temporarily store an immediate value for operations. | ```c .data data: .word 0x3C00, 0xCC10, 0x962C len: .word 0x3 expect: .word 0x3F800000, 0xC1820000, 0xBAC58000 endl: .string "\n" correct: .string " correct" incorrect: .string " incorrect" .text la t0, data lw t1, len la t2, expect main: lw a0, 0(t0) # Show input value li a7, 1 # ecall # jal ra, newline # jal ra, fp16_to_fp32 # Call my_clz mv a1, a0 # li a7, 1 # Show output value ecall # lw a2, 0(t2) # Check if the output matches the expected result li a7, 4 # bne a1, a2, wrong # right: # If output == expected, print correct la a0, correct # j goback # wrong: # If output != expected, print incorrect la a0, incorrect # goback: ecall jal ra, newline addi t0, t0, 4 addi t1, t1,-1 addi t2, t2, 4 bne t1, zero, main li a7, 10 ecall fp16_to_fp32: slli a0 a0 16 li s11 0x80000000 and s2 a0 s11 li s11 0x7FFFFFFF and s3 a0 s11 mv a0 s3 my_clz: # my_clz function li t4 0 li t5 31 loop: li t6, 1 # t6 = 1 sll t6, t6, t5 # t6 = (1 << i) and t6, a0, t6 # t6 = (x & (1 << i)) bnez t6, quit # if (x & (1U << i)) break; addi t4, t4, 1 addi t5, t5, -1 bge t5, zero, loop quit: mv s1 t4 li s11 5 blt s11 s1 Sub_Five # if (5 < renorm_shift) li s1 0 j continue Sub_Five: addi s1 s1 -5 # renorm_shift -= 5 continue: li s11 0x04000000 add s4 s3 s11 srli s4 s4 8 li s11 0x7F800000 and s4 s4 s11 addi s5 s3 -1 srli s5 s5 31 sll s6 s3 s1 srli s6 s6 3 li s11 0x70 sub s7 s11 s1 slli s7 s7 23 add s8 s6 s7 or s8 s8 s4 xori s5 s5 -1 and s8 s8 s5 or s8 s8 s2 mv a0 s8 ret newline: # to print "\n" mv t3, a0 la a0, endl li a7, 4 ecall mv a0, t3 ret ``` #### Testcase * ##### Case1: > Input: 0x3C00 (1 in fp16, 15360 in decimal on the console) > Output: 0x3F800000 (1 in fp32, 1065353216 in decimal on the console) * ##### Case2: > Input: 0xCC10 (-16.25 in fp16, 52240 in decimal on the console) > Output: 0xC1820000 (-16.25 in fp32, -1048444928 in decimal on the console) * ##### Case3: > Input: 0x962C (-0.001507 in fp16, 38444 in decimal on the console) > Output: 0xBAC58000 (-0.0015068054 in fp32, -1161461760 in decimal on the console) #### Console ``` 15360 1065353216 correct 52240 -1048444928 correct 38444 -1161461760 correct Program exited with code: 0 ``` ## [3158. Find the XOR of Numbers Which Appear Twice](https://leetcode.com/problems/find-the-xor-of-numbers-which-appear-twice/description/) ### <font color="white">Description</font> You are given an array `nums`, where each number in the array appears either once or twice. Return the bitwise `XOR` of all the numbers that appear twice in the array, or 0 if no number appears twice. #### Example: > Input: nums = [1,2,2,1] > Output: 3 > Explanation: Numbers 1 and 2 appeared twice. `1 XOR 2 == 3`. #### Constraints: * `1 <= nums.length <= 50` * `1 <= nums[i] <= 50` * Each number in `nums` appears either once or twice. --- ### <font color="white">Solution - Naive Approach</font> This problem can easily be solved with a nested loop that checks the values in the array. If a value repeats, perform the XOR operation to get the final answer. This approach is straightforward and easy to understand. #### C ```c int duplicateNumbersXOR(int* nums, int numsSize) { int xor = 0; for(int i=0;i<numsSize;++i) for(int j=i+1;j<numsSize;++j) if(nums[i]==nums[j]) xor ^= nums[i]; return xor; } ``` #### RISC-V | register name | content | |:------------- |:--------------- | | s3 | nums[i] address | | s9 | nums[i] value | | s11 | i | | s4 | nums[j] address | | t2 | nums[j] value | | t1 | j | ```c .text lw s1, len la s2, size la s3, dataset # for nums[i] main: lw a1, 0(s2) # a1 = numsSize li s5, 0 jal ra, dupNumXOR # call dupNumXOR li a7, 1 # show output ecall # addi s2, s2, 4 # the next dataset addi s1, s1, -1 # < len keep doing bne s1, zero, main li a7, 10 ecall dupNumXOR: li s6, 0 li s11, 0 loop_i: bge s11, a1, loop_i_exit addi t1, s11, 1 addi s4, s3, 4 # for nums[j] loop_j: bge t1, a1, loop_j_exit lw s9, 0(s3) lw t2, 0(s4) bne s9, t2, skipxor xor s6, s6, s9 skipxor: addi t1, t1, 1 # j++ addi s4, s4, 4 j loop_j loop_j_exit: addi s11, s11, 1 # i++ addi s3, s3, 4 j loop_i loop_i_exit: mv a0, s6 ret ``` | Execution info | Value | |:--------------- |:----- | | Cycles | 3047 | | Instrs. retired | 1816 | | CPI | 1.68 | | IPC | 0.596 | This problem can easily be solved with **Naive Approach**. While it’s easy to understand, it is far from being the most time-efficient solution. After studying Problem C in Quiz 1, I applied a similar concept by using bitwise shift and AND operations to efficiently maintain information. ### <font color="white">Solution - Optimized Bitwise Approach</font> This problem has the constraint that `nums[i] < 50`. Instead of using a naive `O(n²)` solution, a 64-bit variable (uint64_t) can be used to record whether each number has appeared. Since 64 bits exceed the range of 50, using this method is efficient and sufficient for this problem. If a number appears for the first time, set the corresponding bit in the uint64_t using an OR operation (i.e., `set |= (1ULL << nums[i])` at line 8). To check if the number has appeared a second time, use an AND operation(i.e., `set & (1ULL << nums[i])` at line 6). If it has, then perform the XOR operation. #### C ```c= #include <stdint.h> int duplicateNumbersXOR(int *nums, int numsSize){ uint64_t set = 0ULL; int xor = 0; for (int i = 0; i < numsSize; ++i){ if (set & (1ULL << nums[i])) xor ^= nums[i]; set |= (1ULL << nums[i]); } return xor; } ``` ```c #include <stdio.h> #define test1_Size 4 #define test1_ans 1 int test1_data[test1_Size] = {1, 2, 1, 3}; #define test2_Size 14 #define test2_ans 12 int test2_data[test2_Size] = {1, 2, 3, 4, 6, 7, 8, 2, 9, 9, 3, 5, 1, 5}; #define test3_Size 16 #define test3_ans 43 int test3_data[test3_Size] = { 3, 49, 49, 3, 8, 12, 19, 19, 28, 27, 28, 50, 31, 36, 50, 36}; int main() { int ans1 = duplicateNumbersXOR(test1_data, test1_Size); printf("%2d, %2d, %2d\n", ans1, test1_ans, ans1 == test1_ans); int ans2 = duplicateNumbersXOR(test2_data, test2_Size); printf("%2d, %2d, %2d\n", ans2, test2_ans, ans2 == test2_ans); int ans3 = duplicateNumbersXOR(test3_data, test3_Size); printf("%2d, %2d, %2d\n", ans3, test3_ans, ans3 == test3_ans); } ``` #### Console ``` 1, 1, 1 12, 12, 1 43, 43, 1 ``` In the C program, the XOR result is stored in a uint64_t variable. However, in RISC-V, the registers are only 32 bits wide. If we use only one 32-bit register (`s7` register in **Wrong version of dupNumXOR function**)to store the result, it could lead to incorrect answers. #### Wrong version of dupNumXOR function ```c dupNumXOR: li s6 0 # xor result li s7 0 # set li s11 0 # i li t5 1 # const 1 loop: lw s9, 0(s3) # s9 = nums[i] sll s10, t5, s9 # set &= (1 << nums[i]) and s10, s7, s10 # beqz s10, skipxor xor s6, s6 ,s9 # xor ^= nums[i]; skipxor: sll s10, t5, s9 # set |= (1ULL << nums[i]) or s7, s7, s10 # addi s3, s3, 4 # iterate from nums[i] to nums[i+1] addi s11, s11 1 # i<numsSize; i++; bge s11, a1, quit # j loop quit: mv a0 s6 # s6 is the xor result ret ``` This scenario is similar to the **C** code, where changing` uint64_t set = 0ULL;` to `uint32_t set = 0ULL;` in line 3 would also lead to an incorrect result. In this case, if `nums[i] > 31`, the information from `(1ULL << nums[i])` cannot be stored in a single 32-bit register. To correctly store a 64-bit unsigned integer in RISC-V, we need to utilize two 32-bit registers(in **RISC-V Corrected version**, register `s7` and `s8`). #### RISC-V Corrected version ```c .data len: .word 3 # number of dataset size: .word 4, 14, 16 # length of every dataset dataset: .word 1, 2, 1, 3 .word 1, 2, 3, 4, 6, 7, 8, 2, 9, 9, 3, 5, 1, 5 .word 3, 49, 49, 3, 8, 12, 19, 19, 28, 27, 28, 50, 31, 36, 50, 36 expect: .word 1, 12, 43 endl: .string "\n" .text lw s1, len la s2, size la s3, dataset main: lw a1, 0(s2) # numsSize li s5, 0 jal ra dupNumXOR # (s5 < numsSize) print(nums[i]) li a7, 1 ecall goback: addi s2, s2,4 jal ra newline addi s1, s1,-1 # < len keep doing bne s1, zero, main li a7, 10 ecall dupNumXOR: # duplicateNumbersXOR function li s6 0 # xor result li s7 0 # set_A (the left half bits) li s8 0 # set_B (the right half bits) li s11 0 li t5 1 loop: lw s9, 0(s3) # s9 = nums[i] li t4 32 bge s9 t4 left_half_bits right_half_bits: # s8: set_B (the right half bits) sll s10, t5, s9 and s10, s8, s10 beqz s10, right_skipxor # if not(set & (1ULL << nums[i])) xor s6, s6 ,s9 right_skipxor: sll s10, t5, s9 or s8, s8, s10 j loop_goback left_half_bits: # s7: set_A (the left half bits) addi t4 s9 -31 # (nums[i] - 31) sll s10, t5, t4 and s10, s7, s10 beqz s10, left_skipxor # if not(set & (1ULL << nums[i])) xor s6, s6 ,s9 left_skipxor: sll s10, t5, t4 or s7, s7, s10 loop_goback: addi s3, s3, 4 # iter nums[i] -> nums[i+1] addi s11 s11 1 # i<numsSize; i++; bge s11, a1, quit # j loop quit: mv a0 s6 ret newline: # to print "\n" mv t6, a0 la a0, endl li a7, 4 ecall mv a0, t6 ret ``` | Execution info | Value | |:--------------- |:----- | | Cycles | 754 | | Instrs. retired | 528 | | CPI | 1.43 | | IPC | 0.7 | #### RISC-V Full Revised Version Eventually, after slightly rearranging the registers in the checks for upper and lower bits and improving the branch conditions, I revised the code, reducing the number of instructions and cycles. | register name | content | |:------------- |:------------------------------------------------- | | t4 | immediate 32 for comparision, (nums[i] - 31) | | t5 | const 1 | | t6 | hold the value in a0 during a syscall temporarily | | a1 | array length (numsSize) | | s1 | number of dataset | | s2 | length of every dataset base address | | s3 | dataset base address | | s4 | expect base address | | s5 | currently doing x-th in a dataset | | s6 | xor result | | s7 | set_A (the left half bits) | | s8 | set_B (the right half bits) | | s9 | nums[i] | | s10 | (set & (1ULL << nums[i])) | | s11 | i | ```c .data len: .word 3 # number of dataset size: .word 4, 14, 16 # length of every dataset dataset: .word 1, 2, 1, 3 .word 1, 2, 3, 4, 6, 7, 8, 2, 9, 9, 3, 5, 1, 5 .word 3, 49, 49, 3, 8, 12, 19, 19, 28, 27, 28, 50, 31, 36, 50, 36 expect: .word 1, 12, 43 endl: .string "\n" .text lw s1, len la s2, size la s3, dataset main: lw a1, 0(s2) # numsSize li s5, 0 jal ra dupNumXOR # (s5 < numsSize) print(nums[i]) li a7, 1 ecall jal ra newline # print "\n" addi s2, s2,4 # the next dataset addi s1, s1,-1 # < len keep doing bne s1, zero, main # goback li a7, 10 # quit ecall dupNumXOR: # duplicateNumbersXOR function li s6 0 # xor result li s7 0 # set_A (the left half bits) li s8 0 # set_B (the right half bits) li s11 0 li t5 1 loop: lw s9, 0(s3) # s9 = nums[i] li t4 32 bge s9 t4 left_half_bits # If nums[i] > 31, it belongs to the upper half of the bits right_half_bits: # s8: set_B (the right half bits) sll t4, t5, s9 and s10, s8, t4 beqz s10, right_skipxor # if not(set & (1ULL << nums[i])) xor s6, s6 ,s9 right_skipxor: or s8, s8, t4 j loop_goback left_half_bits: # s7: set_A (the left half bits) addi t4, s9, -31 # (nums[i] - 31) sll t4, t5, t4 and s10, s7, t4 beqz s10, left_skipxor # if not(set & (1ULL << nums[i])) xor s6, s6, s9 left_skipxor: or s7, s7, t4 loop_goback: addi s3, s3, 4 # iter nums[i] -> nums[i+1] addi s11, s11, 1 # i<numsSize; i++; blt s11, a1, loop mv a0 s6 # quit ret newline: # to print "\n" mv t6, a0 la a0, endl li a7, 4 ecall mv a0, t6 ret ``` | Execution info | Value | |:--------------- |:----- | | Cycles | 683 | | Instrs. retired | 463 | | CPI | 1.48 | | IPC | 0.678 | #### Testcase * ##### Case1: > Input: nums = [1, 2, 1, 3] > Output: 1 * ##### Case2: > Input: nums = [1, 2, 3, 4, 6, 7, 8, 2, 9, 9, 3, 5, 1, 5] > Output: 12 * ##### Case3: > Input: nums = [3, 49, 49, 3, 8, 12, 19, 19, 28, 27, 28, 50, 31, 36, 50, 36] > Output: 43 #### Console ``` 1 12 43 Program exited with code: 0 ``` --- ### <font color="white">Analysis</font> In the [Ripes](https://ripes.me/) simulator, the CPU is organized as a classic five-stage pipeline, with pipeline registers between each stage. An instruction will sequentially go through these stages, allowing for different instructions to overlap in execution. For instance, consider the instruction `lw a1, 0(s2)` , which retrieves the starting address of the `size` array and stores it in the a1 register. This example illustrates how the pipeline operates in the following five stages: ![2024_10_13_205708_FiveStagePipeline](https://hackmd.io/_uploads/HJk3rDYyJl.png) #### **1. Instruction fetch (IF):** In RISC-V, the register `a1` is represented as `x11`, and the register `s2` is represented as `x18`. Therefore, the instruction becomes `lw x11, 0 x18`, as shown below, which means loading a word from the address specified by `s2` (in `x18`) into `a1` (in `x11`). The `lw` instruction format is as follows: `lw rd, offset(rs1)` | 31 ~ 20 | 19 ~ 15 | 14 ~ 12 | 11 ~ 7 | 6 ~ 0 | |:------------:|:-------:|:-------:|:------:|:-------:| | imm [11:0] | rs1 | funct3 | rd | opcode | | 0 | x18 | | x11 | | | 000000000000 | 10010 | 010 | 01011 | 0000011 | According to the Program Counter (PC), the instruction address in instruction memory is `0x00000018`, which retrieves the instruction `00092583`. The PC will then load the address of the next instruction, which is the current instruction address plus 4 bytes (i.e., 0x0000001C). ![2024_10_13_233817_](https://hackmd.io/_uploads/BJhiFwFyke.png) #### 2. Instruction decode (ID) The instruction is decoded from binary according to the above table, and control signals are generated to determine what the instruction is supposed to do. Additionally, this stage reads the values from the source registers in the register file. In this case, `rs1`, which corresponds to `s2`, contains the starting address of the `size` array in the data section. The starting address is exactly `0x10000004`, and the offset immediate is `0`. ![2024_10_14_000400_](https://hackmd.io/_uploads/S1HtJuFkye.png) In memory, the orange area indicates the address and the words of the `size` array that are stored. The starting address is `0x10000004`, with space for 3 words. The corresponding data is `4, 14, 16`, as previously defined. ![2024_10_14_001830_](https://hackmd.io/_uploads/Syp1mOFy1l.png) #### 3. Execute (EX): Arithmetic or logical operations are performed in this stage. The address calculation uses the base address from the source register `s2` and the offset (which is `0`). The multiplexer selects the register value from the previous step (i.e., the ID stage). The ALU adds these two values together, resulting in an effective address that is the same as the value stored in `s2`. ![2024_10_14_002541_](https://hackmd.io/_uploads/BkU9NuYkkl.png) #### 4. Memory access (MEM) For memory-related instructions, this stage handles reading from or writing to memory. In this stage, it retrieves the word stored at the address specified by the offset in data memory. As a result, the first word in the `size` array, which is `4`, will be retrieved. ![2024_10_14_003100_](https://hackmd.io/_uploads/B111IuKJkg.png) #### 5. Writeback (WB) The word from data memory is written back to the register file. The result `4`, is stored in the destination register `a1` (`0b`). ![2024_10_14_004903_](https://hackmd.io/_uploads/HJr7q_Fkyx.png) ## Reference * Assignment1 https://hackmd.io/@sysprog/2024-arch-homework1 * Ripes simulator https://ripes.me/ <s> * RISC-V 指令集架構介紹 - Integer Calling convention https://tclin914.github.io/77838749/ * Classic RISC pipeline https://en.wikipedia.org/wiki/Classic_RISC_pipeline </s> :::danger Always refer to primary sources, such as official RISC-V documentation. :::