# Assignment1: RISC-V Assembly and Instruction Pipeline
contributed by < `bhtang` >
## Problem C in [Quiz1](https://hackmd.io/@sysprog/arch2024-quiz1-sol)
### C Code
```c
static inline int my_clz(uint32_t x) {
int count = 0;
for (int i = 31; i >= 0; --i) {
if (x & (1U << i))
break;
count++;
}
return count;
}
static inline uint32_t fp16_to_fp32(uint16_t h) {
const uint32_t w = (uint32_t) h << 16;
const uint32_t sign = w & UINT32_C(0x80000000);
const uint32_t nonsign = w & UINT32_C(0x7FFFFFFF);
uint32_t renorm_shift = my_clz(nonsign);
renorm_shift = renorm_shift > 5 ? renorm_shift - 5 : 0;
const int32_t inf_nan_mask = ((int32_t)(nonsign + 0x04000000) >> 8) &
INT32_C(0x7F800000);
const int32_t zero_mask = (int32_t)(nonsign - 1) >> 31;
return sign | ((((nonsign << renorm_shift >> 3) +
((0x70 - renorm_shift) << 23)) | inf_nan_mask) & ~zero_mask);
}
```
### Assembly Code for function fp16_to_fp32
```c
.data
argument: .word 0x0000
endl: .string "\n"
.text
la t0, argument
lw t0, 0(t0) # store x in t0
slli t0, t0, 16
li s1, 0x80000000
and s1, s1, t0 # s1 = x and with 0x80000000(sign)
li s2, 0x7FFFFFFF
and s2, s2, t0 # s2 = x and with 0x7FFFFFFF(unsign)
addi t1, t1, 31 # t1 = 31 (t1 is i)
addi t2, t2, 1 # t2 = 1
clz_loop:
sll t4, t2, t1 # t4 record 1U shift i
addi t1, t1, -1 # i = i - 1
and t4, t4, s1 # x & (1U << i)
bnez t4, out_clz # break
addi t5, t5, 1 # did not enter the if statement and incremented count
bge t1, zero, clz_loop # return to the clz_loop
out_clz:
addi s3 ,s3 , 5
addi t5, t5, -5
bge t5, s3, out
add t5, x0, x0 # t5 = renorm_shift
out:
li t3, 0x4000000
add t3, s2, t3
srli t3, t3, 8
li s3, 0x7F800000
and t3, t3, s3 # s3 = inf_nan_mask
addi s4, s2, -1
srli s4, s4, 31 # s4 = zero mask
sll s2, s2, t5
srli s2, s2, 3
sub t5, zero, t5
addi t5, t5, 0x70
slli t5, t5, 23 # 0x70 - renorm << 23
or t5, t5, s3
xori s4, s4, -1
and t5, t5, s4
add s2, t5, s2
or s1, s1, s2
mv a0, s1
li a7,34
ecall
```
## Add Digits ([LeetCode 258](https://leetcode.com/problems/add-digits/description/))
### Problem
Given an integer num, repeatedly add all its digits until the result has only one digit, and return it.
- **Example 1:**
Input: num = 38
Output: 2
Explanation: The process is
38 --> 3 + 8 --> 11
11 --> 1 + 1 --> 2
Since 2 has only one digit, return it.
- **Example 2:**
Input: num = 199
Output: 1
## Solution
### Idea for problem solving
1. **Digit extraction and summation:**
- The function repeatedly extracts the digits of the integer `num` and sums them. This is done by using the operation `num % 10` to get the last digit of `num`, and then adding it to `sum` with `sum += num % 10`.
- After extracting the last digit, the number is divided by 10 (`num /= 10`) to remove the last digit, and the process is repeated until `num` becomes 0.
2. **Repeat until a single-digit result:**
- After summing all the digits of `num`, the resulting sum might still be a two-digit number or larger. In this case, the function checks if `sum > 9`.
- If `sum` is greater than 9, the function recursively calls itself with `sum` as the new input (`addDigits(sum)`), repeating the digit extraction and summation process on the new `sum`.
- If `sum <= 9`, the function returns the sum as the final result, because it’s already a single-digit number.
3. **Base case/termination condition:**
- When the sum of digits is less than or equal to 9, the recursion stops, and the function returns this result as it is already a single-digit number.
You can find the source code [here](https://github.com/BeeeeeHu/Computer_Architecture.git).
[leetcode 258](https://leetcode.com/problems/add-digits/description/)
### C Code
```C=
int addDigits(int num) {
int sum = 0;
while(num!=0)
{
sum += num%10;
num /=10;
}
if(sum>9)
return addDigits(sum);
else
return sum;
}
```
### Assembly Code
```Assembly=
.globl main
main:
li a0, 38 # Test case 1: 38
jal ra, addDigits
# Save the result
mv s0, a0
li a0, 199 # Test case 2: 199
jal ra, addDigits
# Save the result
mv s1, a0
# s0 should be 2 (result for test case 1), s1 should be 1 (result for test case 2)
li a7, 10 # Set the system call number to 10 (exit program)
ecall # End the program
.globl addDigits
addDigits:
# a0 contains the input number (num)
addi sp, sp, -12 # Allocate stack space
sw ra, 8(sp) # Save return address
sw s0, 4(sp) # Save s0
sw s1, 0(sp) # Save s1
mv s0, a0 # s0 = num
li s1, 0 # s1 = sum = 0
loop:
beqz s0, check_sum # If num == 0, exit loop
# sum += num % 10
li t0, 10
rem t1, s0, t0 # t1 = num % 10
add s1, s1, t1 # sum += t1
# num /= 10
div s0, s0, t0
j loop
check_sum:
li t0, 9
ble s1, t0, return_sum # If sum <= 9, return sum
# Recursive call: return addDigits(sum)
mv a0, s1 # Set argument to sum
jal ra, addDigits # Call addDigits recursively
j done
return_sum:
mv a0, s1 # Set return value to sum
done:
lw s1, 0(sp) # Restore s1
lw s0, 4(sp) # Restore s0
lw ra, 8(sp) # Restore return address
addi sp, sp, 12 # Deallocate stack space
ret # Return to caller
```
## Result


## Performance

## Analysis
### Pseudo instruction
```Pseudo instruction=
00000000 <main>:
0: 02600513 addi x10 x0 38
4: 01c000ef jal x1 28 <addDigits>
8: 00050413 addi x8 x10 0
c: 0c700513 addi x10 x0 199
10: 010000ef jal x1 16 <addDigits>
14: 00050493 addi x9 x10 0
18: 00a00893 addi x17 x0 10
1c: 00000073 ecall
00000020 <addDigits>:
20: ff410113 addi x2 x2 -12
24: 00112423 sw x1 8 x2
28: 00812223 sw x8 4 x2
2c: 00912023 sw x9 0 x2
30: 00050413 addi x8 x10 0
34: 00000493 addi x9 x0 0
00000038 <loop>:
38: 00040c63 beq x8 x0 24 <check_sum>
3c: 00a00293 addi x5 x0 10
40: 02546333 rem x6 x8 x5
44: 006484b3 add x9 x9 x6
48: 02544433 div x8 x8 x5
4c: fedff06f jal x0 -20 <loop>
00000050 <check_sum>:
50: 00900293 addi x5 x0 9
54: 0092d863 bge x5 x9 16 <return_sum>
58: 00048513 addi x10 x9 0
5c: fc5ff0ef jal x1 -60 <addDigits>
60: 0080006f jal x0 8 <done>
00000064 <return_sum>:
64: 00048513 addi x10 x9 0
00000068 <done>:
68: 00012483 lw x9 0 x2
6c: 00412403 lw x8 4 x2
70: 00812083 lw x1 8 x2
74: 00c10113 addi x2 x2 12
78: 00008067 jalr x0 x1 0
```
### 5-stage pipelined processor
The 5 stages is designed to deal with the instructions in pipeline approach to enhance the execution efficiency. The 5 stages is shown as below:

- Instruction fetch (IF)
- Instruction decode and register fetch (ID)
- Execute (EX)
- Memory access (MEM)
- Register write back (WB)
The five-stage pipeline in a processor is used to parallelize the execution of instructions, allowing multiple instructions to be processed simultaneously at different stages. Here’s an overview of the pipeline stages and an explanation of how an R-type instruction progresses through these stages:
### R-Type Instruction Example
To analyze the **R-cycle** instruction `rem x6, x8, x5` (remainder operation, where `x6` receives the remainder of `x8` divided by `x5`) as it moves through the five stages of a pipeline processor, we will break down each stage as follows:
### 1. **Instruction Fetch (IF)**:

- **Task**: The processor fetches the instruction `rem x6, x8, x5` from memory.
- **Details**:
- The Program Counter (PC) points to the memory address where the instruction resides.
- The instruction is fetched from instruction memory and placed in a pipeline register, ready for the next stage.
- **Output**: The instruction `rem x6, x8, x5` is fetched and stored in the pipeline register.
### 2. **Instruction Decode and Register Fetch (ID)**:

- **Task**: Decode the instruction and read the operands from the register file.
- **Details**:
- The `rem` instruction is decoded as a remainder operation, identified as an R-type instruction.
- The required registers `x8` (dividend) and `x5` (divisor) are read from the register file.
- Control signals are generated to inform the ALU that it needs to perform a remainder operation.
- **Control Signals**:
- The ALU control signal specifies that a remainder operation should be performed.
- **Output**: The values from `x8` and `x5`, along with the control signals, are passed to the next stage.
### 3. **Execute (EX)**:

- **Task**: The ALU performs the remainder operation on the values from the registers.
- **Details**:
- The ALU receives the values from registers `x8` and `x5` and computes the remainder (`x8 % x5`).
- For example, if `x8 = 15` and `x5 = 4`, the result will be `15 % 4 = 3`.
- **Output**: The result `3` is stored in a pipeline register, ready for writing back to the destination register `x6`.
### 4. **Memory Access (MEM)**:

- **Task**: This stage involves memory access, but for the `rem` instruction, no memory access is required.
- **Output**: The result from the EX stage is passed to the next stage for write-back.
### 5. **Write Back (WB)**:

- **Task**: Write the result of the ALU operation back to the destination register.
- **Details**:
- The result `3` from the ALU is written into the destination register `x6`.
- **Output**: The register `x6` is updated with the value `3`, completing the operation.
### Brief Summary:
The R-cycle instruction `rem x6, x8, x5` focuses on performing a register-based operation, specifically the remainder calculation. The most significant action takes place in the EX (execute) stage, where the ALU computes the remainder, and the result is finally written back to the destination register `x6`.
### memory update:

This image shows a portion of a register file from a RISC-V processor, where various general-purpose registers (GPRs) are listed along with their current values in hexadecimal format.
Here’s a breakdown of what the image represents:
- **Register Names**: Each register is labeled with a standard RISC-V name, such as `x0`, `x1`, `x2`, and so on. The names in this view are split into two columns: "Name" (the register number) and "Alias" (a more readable name commonly associated with the register’s use).
- Example: `x2` is also called `sp` (stack pointer), which typically points to the top of the stack in memory.
- **Values**: The current value stored in each register is shown in hexadecimal format. For example:
- The register `x2` (stack pointer, `sp`) has a value of `0x7fffffd8`, indicating its current location in the stack.
- `x1` (return address, `ra`) holds `0x00000060`, which is the return address for a function call.
- `x5` (temporary register, `t0`) contains `0x00000009`, which could be used in computations.
- **Registers in use**: Some registers have values other than zero, indicating they are currently in use. For example:
- `x2` (stack pointer, `sp`) is actively pointing to an address in the stack.
- `x5` and `x6` (temporary registers `t0` and `t1`) hold non-zero values, which suggests that they may be used for intermediate computations in the current instruction execution.
- **Zero register**: `x0` is always zero in RISC-V, and as expected, it has the value `0x00000000`.
## Summary
- **Problem Overview**: The problem involves repeatedly adding the digits of a given number until a single-digit result is achieved.
- **C Code**: A simple iterative solution that continuously sums the digits of a number using modulo (`%`) and division (`/`) operations.
- **Assembly Code**: The RISC-V assembly implementation mimics the C code. Key steps include:
- Using a loop to extract digits and sum them until the sum is a single digit.
- If the result is greater than 9, the function is called recursively.
- The result is returned once a single-digit number is reached.
#### **Pipeline Analysis**:
- The five-stage pipeline of a RISC-V processor is explained with respect to both problems, particularly focusing on the instruction `rem x6, x8, x5` (remainder operation).
- Each instruction passes through the following pipeline stages:
1. **Instruction Fetch (IF)**: The instruction is fetched from memory.
2. **Instruction Decode (ID)**: The instruction is decoded, and registers are read.
3. **Execute (EX)**: The ALU performs the remainder operation.
4. **Memory Access (MEM)**: No memory access for this instruction.
5. **Write Back (WB)**: The result is written back to the destination register.
The pipeline's structure and operation are illustrated, highlighting how the processor efficiently processes multiple instructions simultaneously.
#### **Registers Analysis**:
- A snapshot of the general-purpose registers (GPRs) is shown, with explanations on the role and current values of each register.
- Key registers like the stack pointer (`sp`), return address (`ra`), and temporary registers (`t0`, `t1`) are examined to show their role in the program’s execution.
### Key Takeaways:
- The project provides an in-depth understanding of **RISC-V assembly language**, including how common operations (like floating-point conversion and digit summing) are implemented and optimized.
- The **five-stage pipeline** is crucial in optimizing instruction throughput by handling multiple instructions concurrently, improving overall processor efficiency.
- Practical examples of instruction decoding, register usage, and ALU operations are demonstrated to deepen the understanding of assembly-level programming on the RISC-V architecture.
## Reference
> [Example RISC-V Assembly Programs](https://marz.utk.edu/my-courses/cosc230/book/example-risc-v-assembly-programs/)
[Leetcode 258 ](https://leetcode.com/problems/add-digits/description/)
[RISC-V Assembly Programmer’s Manual](https://github.com/riscv-non-isa/riscv-asm-manual/blob/main/src/asm-manual.adoc)