Lab1: RV32I Assembly

# Assignment 1 RISC-V Assembly ## **Quiz1 problem C** ### c code for function fabsf ```c static inline float fabsf(float x) { uint32_t i = *(uint32_t *)&x; // Read the bits of the float into an integer i &= 0x7FFFFFFF; // Clear the sign bit to get the absolute value x = *(float *)&i; // Write the modified bits back into the float return x; } ``` :::danger Don't paste code snip without comprehensive discussions. ::: ### **assembly code for function fabsf** ```c .data argument: .word 0x00000000 # Placeholder for input float value .text .globl fabsf fabsf: # Load the input float value from memory la t0, argument # Load address of argument into t0 lw t1, 0(t0) # Load the bits of the float into t1 li t2, 0x7FFFFFFF # Load mask to clear the sign bit into t2 and t1, t1, t2 # Get the absolute value of the float # Return the modified value mv a0, t1 # Move the result to a0 li a7, 2 # System call for printing an float ecall # Make the system call to print the value # Return from the function ``` **testing data** | Input (Decimal) | Input (IEEE 754) | Output (Decimal) | |-----------------|------------------|------------------| | -10.0 | 0xC1200000 | 10.0 | | -8.5 | 0xC1000000 | 8.5 | | 15.0 | 0x41700000 | 15.0 | | -3.14 | 0xC048F5C3 | 3.14 | | 0.0 | 0x00000000 | 0.0 | ### **c code for function __builtin_clz** ```c static inline int my_clz(uint32_t x) { int count = 0; for (int i = 31; i >= 0; --i) { if (x & (1U << i)) break; count++; } return count; } ``` ### **assembly code for function __builtin_clz** ```c .data test_data:.word 0x00000000 # Placeholder for input data .text .globl my_clz my_clz: li t0, 0 # Initialize count to 0 li t1, 31 # Start from bit 31 (for 32-bit integer) clz_loop: la t4, test_data # Load address of test_data into t4 lw t4, 0(t4) # Load the value of test_data into t4 li t2, 0 # Set t2 to 0 for comparison blt t1, t2, clz_end # If bit index < 0, exit loop li t3, 1 # Store integer 1 into t3 sll t3, t3, t1 # Shift left t3 by t1 and t2, t4, t3 # Checking if bit t1 is set bnez t2, clz_end # If t2 is non-zero, break the loop (found a set bit) addi t0, t0, 1 # Increment count (leading zero count) addi t1, t1, -1 # Decrement bit index j clz_loop # Jump back to start of loop clz_end: mv a0, t0 # Move count to a0 for return li a7, 1 # System call for printing an integer ecall # Make the system call to print the value li a7, 10 # System call for exit ecall # Make the system call to exit the program # Return from function ``` **testing data** | Input (Decimal) | Input (IEEE 754) | Output (Decimal) | |-----------------|------------------|------------------| | 0 | 0x00000000 | 32 | | 8 | 0x41000000 | 28 | | 531 | 0x44054000 | 22 | | 1000 | 0x447A0000 | 21 | | -10 | 0xC1200000 | 28 | ### **code for function fp16_to_fp32** ```c static inline uint32_t fp16_to_fp32(uint16_t h) { /* * Extends the 16-bit half-precision floating-point number to 32 bits * by shifting it to the upper half of a 32-bit word: * +---+-----+------------+-------------------+ * | S |EEEEE|MM MMMM MMMM|0000 0000 0000 0000| * +---+-----+------------+-------------------+ * Bits 31 26-30 16-25 0-15 * * S - sign bit, E - exponent bits, M - mantissa bits, 0 - zero bits. */ const uint32_t w = (uint32_t) h << 16; /* * Isolates the sign bit from the input number, placing it in the most * significant bit of a 32-bit word: * * +---+----------------------------------+ * | S |0000000 00000000 00000000 00000000| * +---+----------------------------------+ * Bits 31 0-31 */ const uint32_t sign = w & UINT32_C(0x80000000); /* * Extracts the mantissa and exponent from the input number, placing * them in bits 0-30 of the 32-bit word: * * +---+-----+------------+-------------------+ * | 0 |EEEEE|MM MMMM MMMM|0000 0000 0000 0000| * +---+-----+------------+-------------------+ * Bits 30 27-31 17-26 0-16 */ const uint32_t nonsign = w & UINT32_C(0x7FFFFFFF); /* * The renorm_shift variable indicates how many bits the mantissa * needs to be shifted to normalize the half-precision number. * For normalized numbers, renorm_shift will be 0. For denormalized * numbers, renorm_shift will be greater than 0. Shifting a * denormalized number will move the mantissa into the exponent, * normalizing it. */ uint32_t renorm_shift = my_clz(nonsign); renorm_shift = renorm_shift > 5 ? renorm_shift - 5 : 0; /* * If the half-precision number has an exponent of 15, adding a * specific value will cause overflow into bit 31, which converts * the upper 9 bits into ones. Thus: * inf_nan_mask == * 0x7F800000 if the half-precision number is * NaN or infinity (exponent of 15) * 0x00000000 otherwise */ const int32_t inf_nan_mask = ((int32_t)(nonsign + 0x04000000) >> 8) & INT32_C(0x7F800000); /* * If nonsign equals 0, subtracting 1 will cause overflow, setting * bit 31 to 1. Otherwise, bit 31 will be 0. Shifting this result * propagates bit 31 across all bits in zero_mask. Thus: * zero_mask == * 0xFFFFFFFF if the half-precision number is * zero (+0.0h or -0.0h) * 0x00000000 otherwise */ const int32_t zero_mask = (int32_t)(nonsign - 1) >> 31; /* * 1. Shifts nonsign left by renorm_shift to normalize it (for denormal * inputs). * 2. Shifts nonsign right by 3, adjusting the exponent to fit in the * 8-bit exponent field and moving the mantissa into the correct * position within the 23-bit mantissa field of the single-precision * format. * 3. Adds 0x70 to the exponent to account for the difference in bias * between half-precision and single-precision. * 4. Subtracts renorm_shift from the exponent to account for any * renormalization that occurred. * 5. ORs with inf_nan_mask to set the exponent to 0xFF if the input * was NaN or infinity. * 6. ANDs with the inverted zero_mask to set the mantissa and exponent * to zero if the input was zero. * 7. Combines everything with the sign bit of the input number. */ return sign | ((((nonsign << renorm_shift >> 3) + ((0x70 - renorm_shift) << 23)) | inf_nan_mask) & ~zero_mask); } ``` ### **assembly code for function fp16_to_fp32(and my_clz)** ```c #upper code for function fp16_to_fp32 .data # Define the test data with half-precision floating-point value (16-bit) testing_data: .word 0x7c00 .text .globl fp16_to_fp32 fp16_to_fp32: # Load the address of testing_data into t1 la t1, testing_data # Load the value at the address stored in t1 lw t1, 0(t1) # Shift the 16-bit half-precision value to the upper half of a 32-bit word slli t1, t1, 16 # Load sign bit mask into t2 (0x80000000) li t2, 0x80000000 # Isolate the sign bit from the input number (bit 31) and t2, t2, t1 # Load mask to extract mantissa and exponent (0x7FFFFFFF) li t3, 0x7FFFFFFF # Apply the mask to extract the mantissa and exponent from the input value and t3, t3, t1 # Calculate renorm_shift using my_clz # Move value of t3 to a0 for my_clz function call mv a0, t3 # Call my_clz function to calculate the leading zero count jal ra, my_clz # Move the result of my_clz from a0 to t4 (renorm_shift) mv t4, a0 # Continue with the rest of the code li t5, 5 # Load immediate value 5 to t5 # If renorm_shift < 5, skip subtracting 5 blt t4, t5, skip_renorm # If renorm_shift == 5, skip subtracting 5 beq t4, t5, skip_renorm # renorm_shift = renorm_shift - 5 sub t4, t4, t5 skip_renorm: # Calculate inf_nan_mask li t6, 0x04000000 # Load value 0x04000000 to t6 add t6, t6, t3 # nonsign + 0x04000000 srai t6, t6, 8 # Shift right by 8 to adjust for exponent position li a1, 0x7F800000 # Load mask for inf_nan_mask and t6, t6, a1 # Apply mask to calculate inf_nan_mask # Calculate zero_mask addi a1, t3, -1 # nonsign - 1 srai a1, a1, 31 # Shift right by 31 to propagate bit 31 across all bits # Normalize, adjust exponent, apply masks and combine with sign sll t3, t3, t4 # Shift left by renorm_shift srli t3, t3, 3 # Shift right by 3 to adjust exponent and mantissa li a2, 0x70 # Load bias adjustment value 0x70 sub a2, a2, t4 # Subtract renorm_shift from exponent bias slli a2, a2, 23 # Shift exponent into the correct position add t3, t3, a2 # Add adjusted exponent to the value or t3, t3, t6 # Combine with inf_nan_mask not a1, a1 # Invert zero_mask and t3, t3, a1 # Apply inverted zero_mask to clear bits if needed or a0, t2, t3 # Combine with the sign bit to get final result # Print the result li a7, 34 # System call for printing float value ecall # Exit the program li a7, 10 # System call for exit ecall #-------------------------------------------------------------------------------# #lower code for function my_clz .text .globl my_clz my_clz: # Initialize count to 0 in t0 li t0, 0 # Start from bit index 31 for 32-bit integer li t5, 31 clz_loop: # Load value from a0 to t4 (current input value) addi t4, a0, 0 # Set t6 to 0 for comparison li t6, 0 # If bit index < 0, exit loop blt t5, t6, clz_end # Set a1 to 1 for masking specific bit li a1, 1 # Shift left a1 by t5 to create the mask for checking specific bit sll a1, a1, t5 # AND operation to check if bit t5 is set and t6, t4, a1 # If bit is set, break loop bnez t6, clz_end # Otherwise, increment the leading zero count addi t0, t0, 1 # Decrement the bit index addi t5, t5, -1 # Jump back to the start of the loop j clz_loop clz_end: # Move leading zero count to a0 for return mv a0, t0 # Return from the function ret ``` :::danger Use fewer instructions. ::: ### Input/Output Table | Input (Hexadecimal) | Output (IEEE 754) | |---------------------|-------------------| | 0x8000 | 0x80000000 | | 0x6A0C | 0x52030000 | | 0x7C00 | 0x7f800000 | | 0xC400 | 0xc8800000 | ### **LeetCode :2595. Number of Even and Odd Bits** [https://leetcode.com/problems/number-of-even-and-odd-bits/description/](https://leetcode.com/problems/number-of-even-and-odd-bits/description/) problem description : You are given a positive integer n. Let even denote the number of even indices in the binary representation of n with value 1. Let odd denote the number of odd indices in the binary representation of n with value 1. Note that bits are indexed from right to left in the binary representation of a number. Return the array [even, odd]. Example 1: Input: n = 50 Output: [1,2] Explanation: The binary representation of 50 is 110010. It contains 1 on indices 1, 4, and 5 Example 2: Input: n = 2 Output: [0,1] Explanation: The binary representation of 2 is 10. It contains 1 only on index 1. Constraints: 1 <= n <= 1000 ### **Code Implementation** ### **C code** ```c #include <stdio.h> int even(int x){ int count = 0; for(int i = 0;i<9;i = i+2){ if(x & (1U<<i)){ count++; } } return count; } int odd(int x){ int count = 0; for(int i = 1;i<10;i = i+2){ if(x & (1U<<i)){ count++; } } return count; } int main(){ printf("please enter a integer in range 0 to 1000: "); int x; scanf("%d",&x); while(x < 0 || x > 1000){ printf("please enter again in range 0 to 1000 :"); scanf("%d",&x); } printf("[%d,%d]\n",even(x),odd(x)); return 0; } ``` **Verifing input and output by c code** ``` please enter a integer in range 0 to 1000: 8 [0,1] please enter a integer in range 0 to 1000: 531 [2,2] please enter a integer in range 0 to 1000: 1000 [2,4] ``` ### **Assembly code** ```c .data .align 2 test_value1: .word 8 # Define test value 1 (8) .align 2 test_value2: .word 531 # Define test value 2 (531) .align 2 test_value3: .word 1000 # Define test value 3 (1000) .align 2 prompt_msg: .string "testing three integers 8,531,1000 " # Prompt message to display .align 2 left_parenthesis: .string "[" # Left parenthesis for result display .align 2 right_parenthesis: .string "]" # Right parenthesis for result display .align 2 comma_space: .string ", " # Comma and space separator .align 2 newline: .string "\n" # Newline character .text .globl main main: # Prompt for input la a0, prompt_msg # Load address of prompt message li a7, 4 # System call for print string (uses a0 for address) ecall # Print the prompt la a0, newline # Load address of newline li a7, 4 # System call for print string ecall # Print newline process_input1: # Test value 1: Set input value to 8 la t0, test_value1 # Load address of test_value1 lw t0, 0(t0) # Load value of test_value1 into t0 li t6, 0 # Set t6 to 0 to indicate the first test case j even_loop_arg # Jump to the even loop argument initialization process_input2: # Test value 2: Set input value to 531 li t6, 1 # Set t6 to 1 to indicate the second test case la t0, test_value2 # Load address of test_value2 lw t0, 0(t0) # Load value of test_value2 into t0 j even_loop_arg # Jump to the even loop argument initialization process_input3: # Test value 3: Set input value to 1000 li t6, 2 # Set t6 to 2 to indicate the third test case la t0, test_value3 # Load address of test_value3 lw t0, 0(t0) # Load value of test_value3 into t0 j even_loop_arg # Jump to the even loop argument initialization even_loop_arg: li t1, 0 # t1 will hold the even count, initialize to 0 li t2, 0 # Start with bit index 0 even_loop: li t4, 9 # Set upper limit for even index to 9 bge t2, t4, odd_loop # If index >= 9, move to odd count li t3, 1 # Load 1 into t3 for bitwise shifting sll t3, t3, t2 # t3 = 1 << t2 (1 shifted by t2 positions) and t4, t0, t3 # t4 = t0 & (1 << t2), check if bit at position t2 is 1 beqz t4, skip_even_inc # If t4 == 0, skip increment of even count addi t1, t1, 1 # Increment even count skip_even_inc: addi t2, t2, 2 # Increment index by 2 to move to the next even bit j even_loop # Repeat even loop # Calculate number of 1s at odd indices odd_loop: li t2, 1 # Start with bit index 1 for odd bits li t5, 0 # t5 will hold the odd count, initialize to 0 odd_loop_inner: li t4, 10 # Set upper limit for odd index to 10 bge t2, t4, print_result # If index >= 10, move to printing result li t3, 1 # Load 1 into t3 for bitwise shifting sll t3, t3, t2 # t3 = 1 << t2 (1 shifted by t2 positions) and t4, t0, t3 # t4 = t0 & (1 << t2), check if bit at position t2 is 1 beqz t4, skip_odd_inc # If t4 == 0, skip increment of odd count addi t5, t5, 1 # Increment odd count skip_odd_inc: addi t2, t2, 2 # Increment index by 2 to move to the next odd bit j odd_loop_inner # Repeat odd loop print_result: # Print result [even, odd] la a0, left_parenthesis # Load address of left parenthesis li a7, 4 # System call for print string ecall # Print left parenthesis mv a0, t1 # Move even count to a0 li a7, 1 # System call number 1 for printing an integer (a0 contains the value to print) ecall # Print even count la a0, comma_space # Load address of comma and space li a7, 4 # System call for print string ecall # Print comma and space mv a0, t5 # Move odd count to a0 li a7, 1 # System call for print integer ecall # Print odd count la a0, right_parenthesis # Load address of right parenthesis li a7, 4 # System call for print string ecall # Print right parenthesis la a0, newline # Load address of newline li a7, 4 # System call for print string ecall # Print newline li t2, 0 # Set t2 to 0 to check the current test case beq t6, t2, process_input2 # If t6 == 0, jump to process_input2 li t2, 1 # Set t2 to 1 to check the next test case beq t6, t2, process_input3 # If t6 == 1, jump to process_input3 # Exit li a7, 10 # System call for exit ecall # Exit the program ``` **testing data** ### Input/Output Table | Input (Hex) | Output (Hex) | |--------------|--------------| | 8 | [ 0, 1 ] | | 531 | [ 2, 2 ] | | 1000 | [ 2, 4 ] | ![螢幕擷取畫面 2024-10-11 030319](https://hackmd.io/_uploads/ryg7BjB1yg.png) ### **RISC-V Program Execution Metrics** The execution information provides an overall assessment of the program's performance on a RISC-V CPU. It took 262 cycles to complete, during which 177 instructions were executed. The average number of cycles needed per instruction (CPI) was 1.48, indicating the efficiency of instruction execution, while the IPC value of 0.676 suggests a moderate level of concurrent execution. The clock rate being 0 Hz implies a simulated environment without a set frequency. Overall, these metrics provide valuable insights into areas for potential optimization, such as improving the CPI or IPC to enhance performance.![螢幕擷取畫面 2024-10-11 031759](https://hackmd.io/_uploads/H1zutsBkyx.png) ### **5-Stage RISC-V Processor Pipeline Diagram** This diagram illustrates the flow of instructions through a 5-stage RISC-V pipeline, showing how each stage — Fetch, Decode, Execute, Memory, and Write Back — processes an instruction simultaneously for efficient execution. ![螢幕擷取畫面 2024-10-11 032426](https://hackmd.io/_uploads/S1Uscjry1l.png) * ### **Instruction Fetch (IF)** This diagram shows the Instruction Fetch (IF) stage, where the Program Counter (PC) is used to retrieve the instruction from memory. It highlights how the PC is updated, and the instruction `bge x7 x29 32 <print_result>` is fetched from the instruction memory for further processing. ![螢幕擷取畫面 2024-10-11 032948](https://hackmd.io/_uploads/S1HUhjByJe.png) * ### **Instruction Decode (ID)** This diagram illustrates the Instruction Decode (ID) stage. In this stage, the opcode `0x0a00e93` of the current instruction `addi x29 x0 10` is decoded, and the relevant register values are fetched. The instruction’s operands are read from the registers, which are then forwarded to the next pipeline stage. The immediate value `0x0000000a` is also extracted for use in the following stages. ![螢幕擷取畫面 2024-10-11 033306](https://hackmd.io/_uploads/r1EcniBJkl.png) * ### **Execution** This diagram represents the Execution `EX` stage of the 5-stage RISC-V pipeline. In this stage, the ALU performs operations based on the control signals and inputs `Op1 and Op2` provided by the decode stage. Here, the operation appears to be addi x30 x0 0. The ALU performs arithmetic or logic operations to produce a result `Res`. Additionally, the branch logic `Branch taken` determines if a branch should be taken based on the provided conditions, which controls the program flow. ![螢幕擷取畫面 2024-10-11 033559](https://hackmd.io/_uploads/rJgupoSk1g.png) * ### **Memory Access (MEM)** This diagram shows the Memory Access`MEM` stage of the 5-stage RISC-V pipeline. In this stage, data memory access operations occur based on the instructions decoded in earlier stages. The Data memory block is used for reading from or writing to memory. The Wr signal determines if a write operation is performed, while the Data in represents the data to be written. The output `Read out` shows the result that will be passed on to the next stage `Write Back`. Here, the instruction addi x7 x0 1 is performed, and it leads to a nop `flush` indicating a no-operation possibly due to a pipeline hazard or delay. ![螢幕擷取畫面 2024-10-11 033809](https://hackmd.io/_uploads/BkZ26sH1Je.png) * ### **Write Back (WB)** This diagram represents the Write Back `WB` stage of the RISC-V 5-stage pipeline. The purpose of this stage is to write the results back to the register file. The data to be written back (`0xdeadbeef` in this case) is taken from the previous Memory `MEM` stage or the ALU. The MEM/WB register holds the value until it is ready to be written to the destination register. The write operation allows the value to be stored for future instruction use. The value `0x00000000` indicates that there may have been no operation performed in some cases or simply a reset state. This stage completes the execution of the instruction, making the output available for subsequent instructions. ![螢幕擷取畫面 2024-10-11 034051](https://hackmd.io/_uploads/Sy1BAjSkJx.png) * ### **General Purpose Register (GPR) Status Snapshot** ![螢幕擷取畫面 2024-10-11 034339](https://hackmd.io/_uploads/S1NaAir11l.png) * ### **Motivation and Connection:** Both Problem C and the LeetCode problem on even and odd bits involve working with binary representation and bit manipulation. Specifically, in Problem C, the focus is on converting a half-precision floating-point value to a single-precision value by manipulating specific bits—isolating the sign bit, mantissa, and exponent—and then normalizing them to fit into a 32-bit word. This requires careful use of bitwise operations such as masking, shifting, and combining bits to create the final 32-bit value. In the LeetCode problem on even and odd bits, the task is to count the number of bits set at even and odd positions in a binary representation. This also requires the use of similar bitwise operations—such as bit shifting and bit masking—to identify which bits are set at specific positions in the integer's binary representation. The motivation behind both problems is to develop a deep understanding of how numbers are represented at the bit level and how to manipulate bits to achieve specific tasks. In both scenarios, efficient manipulation of bits is key, and it emphasizes the importance of understanding binary arithmetic and operations in lower-level programming. * ### **Learnings** 1. Bitwise Arithmetic: Both problems reinforce the importance of bitwise arithmetic for manipulating binary data. Learning how to use bitwise AND, OR, shifts, and negations are key skills when working with raw data at the bit level. 2. Normalization and Masking: In Problem C, learning how to normalize floating-point numbers and apply bitwise masking to create specific values, while in the LeetCode problem, masking is used to identify the bits at even or odd positions. 3. Working with Low-Level Data: Both tasks emphasize the need to be comfortable with low-level data operations. By learning how to directly work with memory and data representation, become more proficient at writing optimized, lower-level code such as the fabsf function and implementing efficient memory copies without relying on built-in functions. --- ## Reference [2024 Architecture Homework 1](https://hackmd.io/@sysprog/2024-arch-homework1) [Accomdemy Note](https://hackmd.io/@accomdemy/SyoatR-sc) [LeetCode Problem: Number of Even and Odd Bits](https://leetcode.com/problems/number-of-even-and-odd-bits/) [HackMD: 2024-arch-homework1](https://hackmd.io/@sysprog/H1TpVYMdB) [Wiki: Arch Schedule](https://wiki.csie.ncku.edu.tw/arch/schedule)