# Assignment1: RISC-V Assembly and Instruction Pipeline contributed by <[naan1](https://hackmd.io/@naan1)> ## Quiz 1 problem C ### fabsf function #### C code ``` c= static inline float fabsf(float x) { uint32_t i = *(uint32_t *)&x; i &= 0x7FFFFFFF; x = *(float *)&i; return x; } ``` #### Assembly code ``` risc-v= .data test1: .word 0xC0600000 # -3.5 in IEEE-754 single point format expected1: .word 0x40600000 # 3.5, expected result after fabsf equal_msg: .string "Values are equal\n" notequal_msg: .string "Values are not equal\n" .text fabsf: la t0, test1 lw a0, 0(t0) li t3, 0x7FFFFFFF and a0, a0, t3 la t1, expected1 lw t1, 0(t1) beq a0, t1, values_equal # If equal, branch to values_equal values_equal: # Print "Values are equal" la a0, equal_msg li a7, 4 ecall end_program: # Exit the program li a7, 10 ecall ``` ### my_clz function #### C code ``` c= static inline int my_clz(uint32_t x) { int count = 0; for (int i = 31; i >= 0; --i) { if (x & (1U << i)) break; count++; } return count; } ``` #### Assembly code ``` risc-v= .data test_data: .word 0x0000F000 expected: .word 16 equal_msg: .string "\nValues are equal\n" notequal_msg: .string "\nValues are not equal\n" .text my_clz: la t0, test_data lw a0, 0(t0) addi sp, sp, -8 sw s0, 0(sp) sw s1, 4(sp) li s0, 0 li s1, 31 clz_loop: blt s1, zero, clz_end li t0, 1 sll t0, t0, s1 and t0, a0, t0 bne t0, zero, clz_end addi s0, s0, 1 addi s1, s1, -1 j clz_loop clz_end: mv a0, s0 la t1, expected lw t2, 0(t1) beq a0, t2, values_equal la a0, notequal_msg li a7, 4 ecall j end_program values_equal: # Print "Values are equal" la a0, equal_msg li a7, 4 ecall end_program: li a7, 10 ecall ``` ### fp16_to_fp32 function #### C code ``` c= static inline uint32_t fp16_to_fp32(uint16_t h) { const uint32_t w = (uint32_t) h << 16; const uint32_t sign = w & UINT32_C(0x80000000); const uint32_t nonsign = w & UINT32_C(0x7FFFFFFF); uint32_t renorm_shift = my_clz(nonsign); renorm_shift = renorm_shift > 5 ? renorm_shift - 5 : 0; const int32_t inf_nan_mask = ((int32_t)(nonsign + 0x04000000) >> 8) & INT32_C(0x7F800000); const int32_t zero_mask = (int32_t)(nonsign - 1) >> 31; return sign | ((((nonsign << renorm_shift >> 3) + ((0x70 - renorm_shift) << 23)) | inf_nan_mask) & ~zero_mask); } ``` #### Assembly code ``` risc-v= .data testdata: .word 0x3C00 expect: .word 0x3F800000 equal_msg: .string "value equal\n" unequal_msg: .string "value not equal\n" .text .globl main main: # Load testdata and expect la t0, testdata # Load address of testdata lw t0, 0(t0) # Load testdata into t0 la s6, expect lw s5, 0(s6) slli t0, t0, 16 # Shift left 16 bits to align with 32-bit float li t1, 0x80000000 # Load the sign mask (0x80000000) and t2, t0, t1 # Extract sign bit (t2 = sign) li t1, 0x7FFFFFFF # Load nonsign mask (0x7FFFFFFF) and t3, t0, t1 # Extract nonsign part (t3 = nonsign) # Function to count leading zeros in a 32-bit value my_clz: li s0, 0 li t0, 31 clz_loop: blt t0, zero, clz_end srl t1, t3, t0 andi t1, t1, 1 bnez t1, clz_end addi s0, s0, 1 # Increment count addi t0, t0, -1 # Decrement index j clz_loop # Continue loop clz_end: li t1, 5 blt s0, t1, no_adjust # If renorm_shift < 5, skip adjustment sub s0, s0, t1 # s0 = renorm_shift = renorm_shift - 5 j apply_shift # Jump to apply_shift no_adjust: li s0, 0 # renorm_shift = 0 apply_shift: li t5, 0x04000000 add t5, t5, t3 # nonsign + 0x04000000 srli t5, t5, 8 li t6, 0x7F800000 # Load inf_nan mask and t5, t5, t6 # inf_nan_mask=t5 = ((nonsign + 0x04000000) >> 8) & 0x7F800000 # Compute zero_mask addi t6, t3, -1 # nonsign - 1 srai t6, t6, 31 # zero_mask = t6 = (nonsign - 1) >> 31 sll t3, t3, s0 # nonsign << renorm_shift srli t3, t3, 3 # nonsign >> 3 li t1, 0x70 # Load constant 0x70 sub t1, t1, s0 # 0x70 - renorm_shift slli t1, t1, 23 add t3, t3, t1 or t3, t3, t5 # Add inf_nan_mask not t6, t6 # ~zero_mask and t3, t3, t6 # &~zero_mask or s4, t2, t3 # beq s4, s5, equal j unequal equal: la a0, equal_msg li a7, 4 ecall j end unequal: la a0, unequal_msg li a7, 4 ecall end: li a7, 10 ecall ``` ## Leetcode 89. Gray Code ### Motivation > Gray code is widely used for error detection and correction in digital communication, data storage systems or k-map. It represents a way of sequencing binary numbers such that adjacent values vary by only one binary digit. And there are some ways to generate gray codes, thus I choose this problem as a practice to compare two of methods and practice the bitwise operation from quiz. ### Problem > For the [89. Gray Code](https://leetcode.com/problems/gray-code/) problem from LeetCode, the objective is to generate a sequence of integers that represents a Gray code, where each successive value in the sequence differs by exactly one bit from the previous value. ## Gray Code Solution > The sequence of a Gray code for `n` bits follows a specific pattern: > - For a 1-bit Gray code, the sequence starts with 0 and 1. > - For a 2-bit Gray code, the sequence builds upon the 1-bit sequence, resulting in `00`, `01`, `11`, and `10`. > - For a 3-bit Gray code, it extends the 2-bit sequence by reflecting the previous sequence and adding a leading 1 to each reflected element. > > The fundamental idea is that each integer in the sequence should only differ by one binary digit from its adjacent integers, ensuring a minimal transition state. The structure of Gray code can be recursively generated or constructed using specific formulas that guarantee these properties. > > I think that there are 2 methods to generate the gray codes > ## Recursive way: > This code first generates an n-1 bit Gray Code recursively. Then, it reflects the current generated result and adds 1 to the most significant bit of the reflected part. This expands the smaller Gray Code result into a full n-bit Gray Code. > ## Bitset > Using bitwise operations to generate Gray Code: for each i, it performs an XOR operation between i and the result of i shifted right by one bit. This directly calculates the Gray Code representation of i. ### Test data | Input | Output | | -------- | -------- | | n = 1 | [0,1]| | n = 2 | [0,1,3,2]| | n = 3 | [0,1,3,2,6,7,5,4]| ### C Code in recursive ``` c= #include <stdio.h> #include <stdio.h> void generateGrayCode(int n, int* result) { if (n == 0) { result[0] = 0; return; } generateGrayCode(n - 1, result); int size = 1 << (n - 1); for (int i = 0; i < size; i++) { result[size + i] = result[size - 1 - i] | (1 << (n - 1)); } } int main() { int n = 3; int size = 1 << n; int result[size]; generateGrayCode(n, result); printf("Gray Code for n=%d:\n", n); for (int i = 0; i < size; i++) { printf("%d ", result[i]); } return 0; } ``` ### C Code in bitset ``` c= #include <stdio.h> void grayCode(int n) { int size = 1 << n; int result[size]; for (int i = 0; i < size; i++) { result[i] = i ^ (i >> 1); printf("%d ", result[i]); } } int main() { printf("Gray Code for n=3:\n"); grayCode(3); // n = 3 return 0; } ``` ### Assembly Code in recursive way ``` risc-v= .data arr: .word 32 index: .word 0 .text main: la t0, index li t1, 0 sw t1, 0(t0) la t2, arr # Prepare arguments for generateGray(3, 0, arr, &index) li a0, 3 li a1, 0 mv a2, t2 la a3, index # Call the recursive function jal ra, generateGray # Exit the program properly li a7, 10 ecall generateGray: addi sp, sp, -8 sw ra, 4(sp) sw s0, 0(sp) mv s0, a0 mv s1, a3 beqz s0, generateGray_base_case addi a0, s0, -1 mv a1, a1 mv a2, a2 mv a3, s1 jal ra, generateGray # First recursive call li t0, 1 addi t1, s0, -1 sll t0, t0, t1 xor a1, a1, t0 addi a0, s0, -1 mv a2, a2 mv a3, s1 jal ra, generateGray lw ra, 4(sp) lw s0, 0(sp) addi sp, sp, 8 ret generateGray_base_case: lw t0, 0(s1) slli t1, t0, 2 add t2, a2, t1 sw a1, 0(t2) addi t0, t0, 1 sw t0, 0(s1) mv a0, a1 li a7, 1 ecall li a0, 10 li a7, 11 ecall lw ra, 4(sp) lw s0, 0(sp) addi sp, sp, 8 ret ``` ![image](https://hackmd.io/_uploads/ByRkhwKkJx.png) ### Assembly Code in bitset way ``` risc-v= .data gray_codes: .word 0 num: .word 8 .text .globl _start _start: la t0, gray_codes la t1, num lw t2, 0(t1) li t3, 0 loop: bge t3, t2, end mv t4, t3 srai t5, t3, 1 xor t6, t4, t5 sw t6, 0(t0) addi t0, t0, 4 addi t3, t3, 1 j loop end: li a7, 10 ecall ``` ![image](https://hackmd.io/_uploads/rkvzJOtk1e.png) ## Analysis ### comparison | Implementation | Cycles | Instrs. retired | CPI (Cycles Per Instruction) | IPC (Instructions Per Cycle) | Clock rate | | -------------- | ------ | ---------------- | ---------------------------- | ---------------------------- | ---------- | | Recursive | 488 | 351 | 1.39 | 0.719 | 9.09 Hz | | Bitwise | 97 | 73 | 1.33 | 0.753 | 11.11 Hz | ## Ripes Simulator Here we take the srai command as an example ### Instruction fetch ![image](https://hackmd.io/_uploads/SyyFiZq11g.png) * The location of instruction `srai x30 x18 1` is `0x00000020`(PC currently points to the address). * After fetching this instruction, the PC will be incremented by 4 (to `0x00000024`) The fetched instruction is `0x401e5f13`, which is the encoded form of the `srai x30, x28, 1` instruction. ### Instruction decode ![image](https://hackmd.io/_uploads/H1FQ9G9y1e.png) * The instruction `0x401e5f13` is decoded, extracting the opcode (0x01 for srai), source register (x28), destination register (x30). * The immediate value 1 is extracted as `0x00000001`. ### Execution ![image](https://hackmd.io/_uploads/BJXJbM9Jye.png) ALU takes in two operands: the value from register `x28` (currently shown as `0x00000000`) and the immediate value 1 (0x00000001). The ALU will perform `srai` operation on `x28` by 1 bit. ### DataMemory Access ![image](https://hackmd.io/_uploads/SJAiMM9k1x.png) Because this instruction does not save the value back to memory, it will not be used. Instead, the value is sent to the next stage through the pipeline. ### Write back ![image](https://hackmd.io/_uploads/Syg_1QG9ykl.png) Finally we will write the value `0x00000000` back to the target register ## Reference [Generate n-bit Gray Codes](https://www.geeksforgeeks.org/generate-n-bit-gray-codes/) [89. Gray Code](https://leetcode.com/problems/gray-code/submissions/1413462004/)