Computer Architecture - Lab1: RV32I Assembly

# Assignment1: RISC-V Assembly and Instruction Pipeline ## Problem B The below C program show how the fp32 turn into bf16 ```c static inline bf16_t fp32_to_bf16(float s) { bf16_t h; union { float f; uint32_t i; } u = {.f = s}; if ((u.i & 0x7fffffff) > 0x7f800000) { /* NaN */ h.bits = (u.i >> 16) | 64; /* force to quiet */ return h; } h.bits = (u.i + (0x7fff + ((u.i >> 0x10) & 1))) >> 0x10; return h; } ``` ## RISC-V assembly program ```c .data # declare the variable that used in program, store in RAM test_data: .word 0x3f800000, 0x41200000, 0x7f800001 # test data 1.0f, 10.0f, NaN result: .word 0, 0, 0 # variable used to store result fp32_str: .string "fp32 0x" # result text bf16_str: .string " to bf16: 0x" newline: .string "\n" .text # where to put the code .globl main # main symbol bind into GLOBAL, make CPU know where to execute main: la s0, test_data # load the test value address la s1, result # load the store variable address li s2, 3 # set the loop times loop: lw a0, 0(s0) # load the fp32 first test vlaue, it locate at 0(s0), plus four to get next test value jal ra, fp32_to_bf16 # call the fp32_to_bf_16 function, and store instruction set into ra see figure (a) sw a0, 0(s1) # Store the bf16 result from function # print result lw a1, 0(s0) # load origin fp32 result mv a2, a0 # move bf16 value to a2 jal ra, printResult # call printResult function, and store next instruction set address to ra, see figure (b) addi s0, s0, 4 # move to next test data addi s1, s1, 4 # move to the location that next result store addi s2, s2, -1 # sub the loop value bnez s2, loop # loop if the value != 0 li a7, 10 # exit with ecall ecall fp32_to_bf16: li t0, 0x7fffffff # load 0x7fffffff to t0 register and t1, a0, t0 # fp32 value is in a0, compare with 0x7fffffff, store the result to t1 li t2, 0x7f800000 # load 0x7f800000 to t2 reguster bgt t1, t2, handle_nan # if fp32 exponent bit is all 1, represent NaN send to handle nan function li t0, 0x7fff srli t1, a0, 16 # fp32 value in a0 shift 16 bit right and store to t1 andi t1, t1, 1 # round if bf16 LSB is 1 add t0, t0, t1 add a0, a0, t0 # add to fp32 srli a0, a0, 16 # shift right 16 bit to get the result ret handle_nan: srli a0, a0, 16 ori a0, a0, 64 # force to quiet NaN ret printResult: mv t0, a1 # move fp32 value to t0 mv t1, a2 # move bf16 value to t0 la a0, fp32_str # print fp32 string li a7, 4 ecall mv a0, t0 # print fp32 value li a7, 34 ecall la a0, bf16_str # print bf16 string li a7, 4 ecall mv a0, t1 # print fp16 string li a7, 34 ecall la a0, newline # trun to next line li a7, 4 ecall ret ``` :::danger Do not use screenshots for plain text content, as this is inaccessible to visually impaired users. ::: ## pipeline diagram ![image](https://hackmd.io/_uploads/HJgjUpKyJg.png) - IF - Instruction Fetch - ID - Instruction Decode & Risgister read - EX - Execute or address calculation - MEM - Data memory access - WB - write back to register :::info take this stage for explain ::: ![image](https://hackmd.io/_uploads/S1ApuTYJkl.png) ![image](https://hackmd.io/_uploads/HJIndTtJJe.png) - the instruction decode 0x00042503 show that this instruction is in ID stage ![image](https://hackmd.io/_uploads/r1s45TYy1g.png) - and the next instruction(`jal x1 44.....`) which address is `02c00ef` ,this address is in Instruction memory(Instr. memory) where will be load next to Compredded decoder ![image](https://hackmd.io/_uploads/rJ2tjTKkJl.png) - EX is at `addi x18 0 3` which will make `0x00000000`(x18 reg value) add `0x00000003` in ALU ![image](https://hackmd.io/_uploads/rksoyRY1Jl.png) this stage `0x10000008` (result) will be put into x9 ## Reference [RISC-V User Level ISA](https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf) [RISC-V Unprivileged ISA](https://riscv.org/wp-content/uploads/2019/06/riscv-spec.pdf) [RISC-V optimization guide](https://riscv-optimization-guide-riseproject-c94355ae3e6872252baa952524.gitlab.io/riscv-optimization-guide.html) :::danger Always refer to primary sources, such as official RISC-V documentation. :::