# Assignment1: RISC-V Assembly and Instruction Pipeline
## Problem B
The below C program show how the fp32 turn into bf16
```c
static inline bf16_t fp32_to_bf16(float s)
{
bf16_t h;
union {
float f;
uint32_t i;
} u = {.f = s};
if ((u.i & 0x7fffffff) > 0x7f800000) { /* NaN */
h.bits = (u.i >> 16) | 64; /* force to quiet */
return h;
}
h.bits = (u.i + (0x7fff + ((u.i >> 0x10) & 1))) >> 0x10;
return h;
}
```
## RISC-V assembly program
```c
.data # declare the variable that used in program, store in RAM
test_data: .word 0x3f800000, 0x41200000, 0x7f800001 # test data 1.0f, 10.0f, NaN
result: .word 0, 0, 0 # variable used to store result
fp32_str: .string "fp32 0x" # result text
bf16_str: .string " to bf16: 0x"
newline: .string "\n"
.text # where to put the code
.globl main # main symbol bind into GLOBAL, make CPU know where to execute
main:
la s0, test_data # load the test value address
la s1, result # load the store variable address
li s2, 3 # set the loop times
loop:
lw a0, 0(s0) # load the fp32 first test vlaue, it locate at 0(s0), plus four to get next test value
jal ra, fp32_to_bf16 # call the fp32_to_bf_16 function, and store instruction set into ra see figure (a)
sw a0, 0(s1) # Store the bf16 result from function
# print result
lw a1, 0(s0) # load origin fp32 result
mv a2, a0 # move bf16 value to a2
jal ra, printResult # call printResult function, and store next instruction set address to ra, see figure (b)
addi s0, s0, 4 # move to next test data
addi s1, s1, 4 # move to the location that next result store
addi s2, s2, -1 # sub the loop value
bnez s2, loop # loop if the value != 0
li a7, 10 # exit with ecall
ecall
fp32_to_bf16:
li t0, 0x7fffffff # load 0x7fffffff to t0 register
and t1, a0, t0 # fp32 value is in a0, compare with 0x7fffffff, store the result to t1
li t2, 0x7f800000 # load 0x7f800000 to t2 reguster
bgt t1, t2, handle_nan # if fp32 exponent bit is all 1, represent NaN send to handle nan function
li t0, 0x7fff
srli t1, a0, 16 # fp32 value in a0 shift 16 bit right and store to t1
andi t1, t1, 1 # round if bf16 LSB is 1
add t0, t0, t1
add a0, a0, t0 # add to fp32
srli a0, a0, 16 # shift right 16 bit to get the result
ret
handle_nan:
srli a0, a0, 16
ori a0, a0, 64 # force to quiet NaN
ret
printResult:
mv t0, a1 # move fp32 value to t0
mv t1, a2 # move bf16 value to t0
la a0, fp32_str # print fp32 string
li a7, 4
ecall
mv a0, t0 # print fp32 value
li a7, 34
ecall
la a0, bf16_str # print bf16 string
li a7, 4
ecall
mv a0, t1 # print fp16 string
li a7, 34
ecall
la a0, newline # trun to next line
li a7, 4
ecall
ret
```
:::danger
Do not use screenshots for plain text content, as this is inaccessible to visually impaired users.
:::
## pipeline diagram

- IF - Instruction Fetch
- ID - Instruction Decode & Risgister read
- EX - Execute or address calculation
- MEM - Data memory access
- WB - write back to register
:::info
take this stage for explain
:::


- the instruction decode 0x00042503 show that this instruction is in ID stage

- and the next instruction(`jal x1 44.....`) which address is `02c00ef` ,this address is in Instruction memory(Instr. memory) where will be load next to Compredded decoder

- EX is at `addi x18 0 3` which will make `0x00000000`(x18 reg value) add `0x00000003` in ALU

this stage `0x10000008` (result) will be put into x9
## Reference
[RISC-V User Level ISA](https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf)
[RISC-V Unprivileged ISA](https://riscv.org/wp-content/uploads/2019/06/riscv-spec.pdf)
[RISC-V optimization guide](https://riscv-optimization-guide-riseproject-c94355ae3e6872252baa952524.gitlab.io/riscv-optimization-guide.html)
:::danger
Always refer to primary sources, such as official RISC-V documentation.
:::