# Assignment 1 RISC-V Assembly
## **Quiz1 problem C**
### c code for function fabsf
```c
static inline float fabsf(float x) {
uint32_t i = *(uint32_t *)&x; // Read the bits of the float into an integer
i &= 0x7FFFFFFF; // Clear the sign bit to get the absolute value
x = *(float *)&i; // Write the modified bits back into the float
return x;
}
```
:::danger
Don't paste code snip without comprehensive discussions.
:::
### **assembly code for function fabsf**
```c
.data
argument: .word 0x00000000 # Placeholder for input float value
.text
.globl fabsf
fabsf:
# Load the input float value from memory
la t0, argument # Load address of argument into t0
lw t1, 0(t0) # Load the bits of the float into t1
li t2, 0x7FFFFFFF # Load mask to clear the sign bit into t2
and t1, t1, t2 # Get the absolute value of the float
# Return the modified value
mv a0, t1 # Move the result to a0
li a7, 2 # System call for printing an float
ecall # Make the system call to print the value # Return from the function
```
**testing data**
| Input (Decimal) | Input (IEEE 754) | Output (Decimal) |
|-----------------|------------------|------------------|
| -10.0 | 0xC1200000 | 10.0 |
| -8.5 | 0xC1000000 | 8.5 |
| 15.0 | 0x41700000 | 15.0 |
| -3.14 | 0xC048F5C3 | 3.14 |
| 0.0 | 0x00000000 | 0.0 |
### **c code for function __builtin_clz**
```c
static inline int my_clz(uint32_t x) {
int count = 0;
for (int i = 31; i >= 0; --i) {
if (x & (1U << i))
break;
count++;
}
return count;
}
```
### **assembly code for function __builtin_clz**
```c
.data
test_data:.word 0x00000000 # Placeholder for input data
.text
.globl my_clz
my_clz:
li t0, 0 # Initialize count to 0
li t1, 31 # Start from bit 31 (for 32-bit integer)
clz_loop:
la t4, test_data # Load address of test_data into t4
lw t4, 0(t4) # Load the value of test_data into t4
li t2, 0 # Set t2 to 0 for comparison
blt t1, t2, clz_end # If bit index < 0, exit loop
li t3, 1 # Store integer 1 into t3
sll t3, t3, t1 # Shift left t3 by t1
and t2, t4, t3 # Checking if bit t1 is set
bnez t2, clz_end # If t2 is non-zero, break the loop (found a set bit)
addi t0, t0, 1 # Increment count (leading zero count)
addi t1, t1, -1 # Decrement bit index
j clz_loop # Jump back to start of loop
clz_end:
mv a0, t0 # Move count to a0 for return
li a7, 1 # System call for printing an integer
ecall # Make the system call to print the value
li a7, 10 # System call for exit
ecall # Make the system call to exit the program # Return from function
```
**testing data**
| Input (Decimal) | Input (IEEE 754) | Output (Decimal) |
|-----------------|------------------|------------------|
| 0 | 0x00000000 | 32 |
| 8 | 0x41000000 | 28 |
| 531 | 0x44054000 | 22 |
| 1000 | 0x447A0000 | 21 |
| -10 | 0xC1200000 | 28 |
### **code for function fp16_to_fp32**
```c
static inline uint32_t fp16_to_fp32(uint16_t h) {
/*
* Extends the 16-bit half-precision floating-point number to 32 bits
* by shifting it to the upper half of a 32-bit word:
* +---+-----+------------+-------------------+
* | S |EEEEE|MM MMMM MMMM|0000 0000 0000 0000|
* +---+-----+------------+-------------------+
* Bits 31 26-30 16-25 0-15
*
* S - sign bit, E - exponent bits, M - mantissa bits, 0 - zero bits.
*/
const uint32_t w = (uint32_t) h << 16;
/*
* Isolates the sign bit from the input number, placing it in the most
* significant bit of a 32-bit word:
*
* +---+----------------------------------+
* | S |0000000 00000000 00000000 00000000|
* +---+----------------------------------+
* Bits 31 0-31
*/
const uint32_t sign = w & UINT32_C(0x80000000);
/*
* Extracts the mantissa and exponent from the input number, placing
* them in bits 0-30 of the 32-bit word:
*
* +---+-----+------------+-------------------+
* | 0 |EEEEE|MM MMMM MMMM|0000 0000 0000 0000|
* +---+-----+------------+-------------------+
* Bits 30 27-31 17-26 0-16
*/
const uint32_t nonsign = w & UINT32_C(0x7FFFFFFF);
/*
* The renorm_shift variable indicates how many bits the mantissa
* needs to be shifted to normalize the half-precision number.
* For normalized numbers, renorm_shift will be 0. For denormalized
* numbers, renorm_shift will be greater than 0. Shifting a
* denormalized number will move the mantissa into the exponent,
* normalizing it.
*/
uint32_t renorm_shift = my_clz(nonsign);
renorm_shift = renorm_shift > 5 ? renorm_shift - 5 : 0;
/*
* If the half-precision number has an exponent of 15, adding a
* specific value will cause overflow into bit 31, which converts
* the upper 9 bits into ones. Thus:
* inf_nan_mask ==
* 0x7F800000 if the half-precision number is
* NaN or infinity (exponent of 15)
* 0x00000000 otherwise
*/
const int32_t inf_nan_mask = ((int32_t)(nonsign + 0x04000000) >> 8) &
INT32_C(0x7F800000);
/*
* If nonsign equals 0, subtracting 1 will cause overflow, setting
* bit 31 to 1. Otherwise, bit 31 will be 0. Shifting this result
* propagates bit 31 across all bits in zero_mask. Thus:
* zero_mask ==
* 0xFFFFFFFF if the half-precision number is
* zero (+0.0h or -0.0h)
* 0x00000000 otherwise
*/
const int32_t zero_mask = (int32_t)(nonsign - 1) >> 31;
/*
* 1. Shifts nonsign left by renorm_shift to normalize it (for denormal
* inputs).
* 2. Shifts nonsign right by 3, adjusting the exponent to fit in the
* 8-bit exponent field and moving the mantissa into the correct
* position within the 23-bit mantissa field of the single-precision
* format.
* 3. Adds 0x70 to the exponent to account for the difference in bias
* between half-precision and single-precision.
* 4. Subtracts renorm_shift from the exponent to account for any
* renormalization that occurred.
* 5. ORs with inf_nan_mask to set the exponent to 0xFF if the input
* was NaN or infinity.
* 6. ANDs with the inverted zero_mask to set the mantissa and exponent
* to zero if the input was zero.
* 7. Combines everything with the sign bit of the input number.
*/
return sign | ((((nonsign << renorm_shift >> 3) +
((0x70 - renorm_shift) << 23)) | inf_nan_mask) & ~zero_mask);
}
```
### **assembly code for function fp16_to_fp32(and my_clz)**
```c
#upper code for function fp16_to_fp32
.data
# Define the test data with half-precision floating-point value (16-bit)
testing_data: .word 0x7c00
.text
.globl fp16_to_fp32
fp16_to_fp32:
# Load the address of testing_data into t1
la t1, testing_data
# Load the value at the address stored in t1
lw t1, 0(t1)
# Shift the 16-bit half-precision value to the upper half of a 32-bit word
slli t1, t1, 16
# Load sign bit mask into t2 (0x80000000)
li t2, 0x80000000
# Isolate the sign bit from the input number (bit 31)
and t2, t2, t1
# Load mask to extract mantissa and exponent (0x7FFFFFFF)
li t3, 0x7FFFFFFF
# Apply the mask to extract the mantissa and exponent from the input value
and t3, t3, t1
# Calculate renorm_shift using my_clz
# Move value of t3 to a0 for my_clz function call
mv a0, t3
# Call my_clz function to calculate the leading zero count
jal ra, my_clz
# Move the result of my_clz from a0 to t4 (renorm_shift)
mv t4, a0
# Continue with the rest of the code
li t5, 5 # Load immediate value 5 to t5
# If renorm_shift < 5, skip subtracting 5
blt t4, t5, skip_renorm
# If renorm_shift == 5, skip subtracting 5
beq t4, t5, skip_renorm
# renorm_shift = renorm_shift - 5
sub t4, t4, t5
skip_renorm:
# Calculate inf_nan_mask
li t6, 0x04000000 # Load value 0x04000000 to t6
add t6, t6, t3 # nonsign + 0x04000000
srai t6, t6, 8 # Shift right by 8 to adjust for exponent position
li a1, 0x7F800000 # Load mask for inf_nan_mask
and t6, t6, a1 # Apply mask to calculate inf_nan_mask
# Calculate zero_mask
addi a1, t3, -1 # nonsign - 1
srai a1, a1, 31 # Shift right by 31 to propagate bit 31 across all bits
# Normalize, adjust exponent, apply masks and combine with sign
sll t3, t3, t4 # Shift left by renorm_shift
srli t3, t3, 3 # Shift right by 3 to adjust exponent and mantissa
li a2, 0x70 # Load bias adjustment value 0x70
sub a2, a2, t4 # Subtract renorm_shift from exponent bias
slli a2, a2, 23 # Shift exponent into the correct position
add t3, t3, a2 # Add adjusted exponent to the value
or t3, t3, t6 # Combine with inf_nan_mask
not a1, a1 # Invert zero_mask
and t3, t3, a1 # Apply inverted zero_mask to clear bits if needed
or a0, t2, t3 # Combine with the sign bit to get final result
# Print the result
li a7, 34 # System call for printing float value
ecall
# Exit the program
li a7, 10 # System call for exit
ecall
#-------------------------------------------------------------------------------#
#lower code for function my_clz
.text
.globl my_clz
my_clz:
# Initialize count to 0 in t0
li t0, 0
# Start from bit index 31 for 32-bit integer
li t5, 31
clz_loop:
# Load value from a0 to t4 (current input value)
addi t4, a0, 0
# Set t6 to 0 for comparison
li t6, 0
# If bit index < 0, exit loop
blt t5, t6, clz_end
# Set a1 to 1 for masking specific bit
li a1, 1
# Shift left a1 by t5 to create the mask for checking specific bit
sll a1, a1, t5
# AND operation to check if bit t5 is set
and t6, t4, a1
# If bit is set, break loop
bnez t6, clz_end
# Otherwise, increment the leading zero count
addi t0, t0, 1
# Decrement the bit index
addi t5, t5, -1
# Jump back to the start of the loop
j clz_loop
clz_end:
# Move leading zero count to a0 for return
mv a0, t0
# Return from the function
ret
```
:::danger
Use fewer instructions.
:::
### Input/Output Table
| Input (Hexadecimal) | Output (IEEE 754) |
|---------------------|-------------------|
| 0x8000 | 0x80000000 |
| 0x6A0C | 0x52030000 |
| 0x7C00 | 0x7f800000 |
| 0xC400 | 0xc8800000 |
### **LeetCode :2595. Number of Even and Odd Bits**
[https://leetcode.com/problems/number-of-even-and-odd-bits/description/](https://leetcode.com/problems/number-of-even-and-odd-bits/description/)
problem description :
You are given a positive integer n.
Let even denote the number of even indices in the binary representation of n with value 1.
Let odd denote the number of odd indices in the binary representation of n with value 1.
Note that bits are indexed from right to left in the binary representation of a number.
Return the array [even, odd].
Example 1:
Input: n = 50
Output: [1,2]
Explanation:
The binary representation of 50 is 110010.
It contains 1 on indices 1, 4, and 5
Example 2:
Input: n = 2
Output: [0,1]
Explanation:
The binary representation of 2 is 10.
It contains 1 only on index 1.
Constraints:
1 <= n <= 1000
### **Code Implementation**
### **C code**
```c
#include <stdio.h>
int even(int x){
int count = 0;
for(int i = 0;i<9;i = i+2){
if(x & (1U<<i)){
count++;
}
}
return count;
}
int odd(int x){
int count = 0;
for(int i = 1;i<10;i = i+2){
if(x & (1U<<i)){
count++;
}
}
return count;
}
int main(){
printf("please enter a integer in range 0 to 1000: ");
int x;
scanf("%d",&x);
while(x < 0 || x > 1000){
printf("please enter again in range 0 to 1000 :");
scanf("%d",&x);
}
printf("[%d,%d]\n",even(x),odd(x));
return 0;
}
```
**Verifing input and output by c code**
```
please enter a integer in range 0 to 1000: 8
[0,1]
please enter a integer in range 0 to 1000: 531
[2,2]
please enter a integer in range 0 to 1000: 1000
[2,4]
```
### **Assembly code**
```c
.data
.align 2
test_value1: .word 8 # Define test value 1 (8)
.align 2
test_value2: .word 531 # Define test value 2 (531)
.align 2
test_value3: .word 1000 # Define test value 3 (1000)
.align 2
prompt_msg: .string "testing three integers 8,531,1000 " # Prompt message to display
.align 2
left_parenthesis: .string "[" # Left parenthesis for result display
.align 2
right_parenthesis: .string "]" # Right parenthesis for result display
.align 2
comma_space: .string ", " # Comma and space separator
.align 2
newline: .string "\n" # Newline character
.text
.globl main
main:
# Prompt for input
la a0, prompt_msg # Load address of prompt message
li a7, 4 # System call for print string (uses a0 for address)
ecall # Print the prompt
la a0, newline # Load address of newline
li a7, 4 # System call for print string
ecall # Print newline
process_input1:
# Test value 1: Set input value to 8
la t0, test_value1 # Load address of test_value1
lw t0, 0(t0) # Load value of test_value1 into t0
li t6, 0 # Set t6 to 0 to indicate the first test case
j even_loop_arg # Jump to the even loop argument initialization
process_input2:
# Test value 2: Set input value to 531
li t6, 1 # Set t6 to 1 to indicate the second test case
la t0, test_value2 # Load address of test_value2
lw t0, 0(t0) # Load value of test_value2 into t0
j even_loop_arg # Jump to the even loop argument initialization
process_input3:
# Test value 3: Set input value to 1000
li t6, 2 # Set t6 to 2 to indicate the third test case
la t0, test_value3 # Load address of test_value3
lw t0, 0(t0) # Load value of test_value3 into t0
j even_loop_arg # Jump to the even loop argument initialization
even_loop_arg:
li t1, 0 # t1 will hold the even count, initialize to 0
li t2, 0 # Start with bit index 0
even_loop:
li t4, 9 # Set upper limit for even index to 9
bge t2, t4, odd_loop # If index >= 9, move to odd count
li t3, 1 # Load 1 into t3 for bitwise shifting
sll t3, t3, t2 # t3 = 1 << t2 (1 shifted by t2 positions)
and t4, t0, t3 # t4 = t0 & (1 << t2), check if bit at position t2 is 1
beqz t4, skip_even_inc # If t4 == 0, skip increment of even count
addi t1, t1, 1 # Increment even count
skip_even_inc:
addi t2, t2, 2 # Increment index by 2 to move to the next even bit
j even_loop # Repeat even loop
# Calculate number of 1s at odd indices
odd_loop:
li t2, 1 # Start with bit index 1 for odd bits
li t5, 0 # t5 will hold the odd count, initialize to 0
odd_loop_inner:
li t4, 10 # Set upper limit for odd index to 10
bge t2, t4, print_result # If index >= 10, move to printing result
li t3, 1 # Load 1 into t3 for bitwise shifting
sll t3, t3, t2 # t3 = 1 << t2 (1 shifted by t2 positions)
and t4, t0, t3 # t4 = t0 & (1 << t2), check if bit at position t2 is 1
beqz t4, skip_odd_inc # If t4 == 0, skip increment of odd count
addi t5, t5, 1 # Increment odd count
skip_odd_inc:
addi t2, t2, 2 # Increment index by 2 to move to the next odd bit
j odd_loop_inner # Repeat odd loop
print_result:
# Print result [even, odd]
la a0, left_parenthesis # Load address of left parenthesis
li a7, 4 # System call for print string
ecall # Print left parenthesis
mv a0, t1 # Move even count to a0
li a7, 1 # System call number 1 for printing an integer (a0 contains the value to print)
ecall # Print even count
la a0, comma_space # Load address of comma and space
li a7, 4 # System call for print string
ecall # Print comma and space
mv a0, t5 # Move odd count to a0
li a7, 1 # System call for print integer
ecall # Print odd count
la a0, right_parenthesis # Load address of right parenthesis
li a7, 4 # System call for print string
ecall # Print right parenthesis
la a0, newline # Load address of newline
li a7, 4 # System call for print string
ecall # Print newline
li t2, 0 # Set t2 to 0 to check the current test case
beq t6, t2, process_input2 # If t6 == 0, jump to process_input2
li t2, 1 # Set t2 to 1 to check the next test case
beq t6, t2, process_input3 # If t6 == 1, jump to process_input3
# Exit
li a7, 10 # System call for exit
ecall # Exit the program
```
**testing data**
### Input/Output Table
| Input (Hex) | Output (Hex) |
|--------------|--------------|
| 8 | [ 0, 1 ] |
| 531 | [ 2, 2 ] |
| 1000 | [ 2, 4 ] |

### **RISC-V Program Execution Metrics**
The execution information provides an overall assessment of the program's performance on a RISC-V CPU. It took 262 cycles to complete, during which 177 instructions were executed. The average number of cycles needed per instruction (CPI) was 1.48, indicating the efficiency of instruction execution, while the IPC value of 0.676 suggests a moderate level of concurrent execution. The clock rate being 0 Hz implies a simulated environment without a set frequency. Overall, these metrics provide valuable insights into areas for potential optimization, such as improving the CPI or IPC to enhance performance.
### **5-Stage RISC-V Processor Pipeline Diagram**
This diagram illustrates the flow of instructions through a 5-stage RISC-V pipeline, showing how each stage — Fetch, Decode, Execute, Memory, and Write Back — processes an instruction simultaneously for efficient execution.

* ### **Instruction Fetch (IF)**
This diagram shows the Instruction Fetch (IF) stage, where the Program Counter (PC) is used to retrieve the instruction from memory. It highlights how the PC is updated, and the instruction `bge x7 x29 32 <print_result>` is fetched from the instruction memory for further processing.

* ### **Instruction Decode (ID)**
This diagram illustrates the Instruction Decode (ID) stage. In this stage, the opcode `0x0a00e93` of the current instruction `addi x29 x0 10` is decoded, and the relevant register values are fetched. The instruction’s operands are read from the registers, which are then forwarded to the next pipeline stage. The immediate value `0x0000000a` is also extracted for use in the following stages.

* ### **Execution**
This diagram represents the Execution `EX` stage of the 5-stage RISC-V pipeline. In this stage, the ALU performs operations based on the control signals and inputs `Op1 and Op2` provided by the decode stage. Here, the operation appears to be addi x30 x0 0. The ALU performs arithmetic or logic operations to produce a result `Res`. Additionally, the branch logic `Branch taken` determines if a branch should be taken based on the provided conditions, which controls the program flow.

* ### **Memory Access (MEM)**
This diagram shows the Memory Access`MEM` stage of the 5-stage RISC-V pipeline. In this stage, data memory access operations occur based on the instructions decoded in earlier stages. The Data memory block is used for reading from or writing to memory. The Wr signal determines if a write operation is performed, while the Data in represents the data to be written. The output `Read out` shows the result that will be passed on to the next stage `Write Back`. Here, the instruction addi x7 x0 1 is performed, and it leads to a nop `flush` indicating a no-operation possibly due to a pipeline hazard or delay.

* ### **Write Back (WB)**
This diagram represents the Write Back `WB` stage of the RISC-V 5-stage pipeline. The purpose of this stage is to write the results back to the register file. The data to be written back (`0xdeadbeef` in this case) is taken from the previous Memory `MEM` stage or the ALU. The MEM/WB register holds the value until it is ready to be written to the destination register. The write operation allows the value to be stored for future instruction use. The value `0x00000000` indicates that there may have been no operation performed in some cases or simply a reset state. This stage completes the execution of the instruction, making the output available for subsequent instructions.

* ### **General Purpose Register (GPR) Status Snapshot**

* ### **Motivation and Connection:**
Both Problem C and the LeetCode problem on even and odd bits involve working with binary representation and bit manipulation. Specifically, in Problem C, the focus is on converting a half-precision floating-point value to a single-precision value by manipulating specific bits—isolating the sign bit, mantissa, and exponent—and then normalizing them to fit into a 32-bit word. This requires careful use of bitwise operations such as masking, shifting, and combining bits to create the final 32-bit value.
In the LeetCode problem on even and odd bits, the task is to count the number of bits set at even and odd positions in a binary representation. This also requires the use of similar bitwise operations—such as bit shifting and bit masking—to identify which bits are set at specific positions in the integer's binary representation.
The motivation behind both problems is to develop a deep understanding of how numbers are represented at the bit level and how to manipulate bits to achieve specific tasks. In both scenarios, efficient manipulation of bits is key, and it emphasizes the importance of understanding binary arithmetic and operations in lower-level programming.
* ### **Learnings**
1. Bitwise Arithmetic: Both problems reinforce the importance of bitwise arithmetic for manipulating binary data. Learning how to use bitwise AND, OR, shifts, and negations are key skills when working with raw data at the bit level.
2. Normalization and Masking: In Problem C, learning how to normalize floating-point numbers and apply bitwise masking to create specific values, while in the LeetCode problem, masking is used to identify the bits at even or odd positions.
3. Working with Low-Level Data: Both tasks emphasize the need to be comfortable with low-level data operations. By learning how to directly work with memory and data representation, become more proficient at writing optimized, lower-level code such as the fabsf function and implementing efficient memory copies without relying on built-in functions.
---
## Reference
[2024 Architecture Homework 1](https://hackmd.io/@sysprog/2024-arch-homework1)
[Accomdemy Note](https://hackmd.io/@accomdemy/SyoatR-sc)
[LeetCode Problem: Number of Even and Odd Bits](https://leetcode.com/problems/number-of-even-and-odd-bits/)
[HackMD: 2024-arch-homework1](https://hackmd.io/@sysprog/H1TpVYMdB)
[Wiki: Arch Schedule](https://wiki.csie.ncku.edu.tw/arch/schedule)