# Computer Architecture - Assignment 1 :::warning __Notice !__ In this assignment, I used ChatGPT to fix my code, understand the problem requirements, and improve my grammar. ::: contributed by <[Hao2152](https://github.com/Hao2152)> [Source Code](https://github.com/Hao2152/ca2025-quizzes) ## Quiz1 - Problem B ### Problem B Assume uf8 implements a logarithmic 8-bit codec that maps 20-bit unsigned integers ([0, 1,015,792]) to 8-bit symbols via logarithmic quantization, delivering 2.5:1 compression and ≤6.25% relative error. The uf8 encoding scheme is ideal for sensor data applications such as temperature and distance measurements where range is more important than precision. In graphics applications, it efficiently represents level-of-detail (LOD) distances and fog density values. The scheme is also well-suited for timer implementations that use exponential backoff strategies. However, the uf8 encoding should not be used for financial calculations where exact precision is required and rounding errors are unacceptable. It is inappropriate for cryptographic applications that require uniform distribution of values for security purposes. - __Requirement__ - Input a 20-bit value, encode it into an 8-bit format, and then decode the 8-bit value back into the 20-bit format. - __Input__ - 20-bit value(Hex/Decimal). - __Output__ - - Encode input value(20-bit) to 8-bit value(Hex/Decimal). - Decode 8-bit value to 20-bit value(Hex/Decimal). --------------- - [ ] Decoding $$D(b) = m \cdot 2^e + (2^e - 1) \cdot 16$$ Where $e = \lfloor b/16 \rfloor$ and $m = b \bmod 16$ - [ ] Encoding $$E(v) = \begin{cases} v, & \text{if } v < 16 \\ 16e + \lfloor(v - \text{offset}(e))/2^e\rfloor, & \text{otherwise} \end{cases}$$ Where $\text{offset}(e) = (2^e - 1) \cdot 16$ ### Problem B - Source Code > These are source code of decode and encode from [Quiz1](https://hackmd.io/@sysprog/arch2025-quiz1-sol) :::spoiler open ```c= /* Decode uf8 to uint32_t */ uint32_t uf8_decode(uf8 fl) { uint32_t mantissa = fl & 0x0f; uint8_t exponent = fl >> 4; uint32_t offset = (0x7FFF >> (15 - exponent)) << 4; return (mantissa << exponent) + offset; } /* Encode uint32_t to uf8 */ uf8 uf8_encode(uint32_t value) { /* Use CLZ for fast exponent calculation */ if (value < 16) return value; /* Find appropriate exponent using CLZ hint */ int lz = clz(value); int msb = 31 - lz; /* Start from a good initial guess */ uint8_t exponent = 0; uint32_t overflow = 0; if (msb >= 5) { /* Estimate exponent - the formula is empirical */ exponent = msb - 4; if (exponent > 15) exponent = 15; /* Calculate overflow for estimated exponent */ for (uint8_t e = 0; e < exponent; e++) overflow = (overflow << 1) + 16; /* Adjust if estimate was off */ while (exponent > 0 && value < overflow) { overflow = (overflow - 16) >> 1; exponent--; } } /* Find exact exponent */ while (exponent < 15) { uint32_t next_overflow = (overflow << 1) + 16; if (value < next_overflow) break; overflow = next_overflow; exponent++; } uint8_t mantissa = (value - overflow) >> exponent; return (exponent << 4) | mantissa; } ``` ::: ### Translating C code into RISC-V assembly code(a part of my code) :::spoiler open ```asm= uf8_decode: andi t0, a0, 0x0f # mantissa = b & 0x0F srli t1, a0, 4 # exponent = b >> 4 li t2, 1 # init t2 sll t2, t2, t1 # t2 = 2^e addi t2, t2, -1 # t2 = 2^e - 1 slli t2, t2, 4 # t2 = (2^e - 1) * 16 sll t0, t0, t1 # t0 = m * 2^e add a0, t0, t2 # a0 = m*2^e + (2^e -1)*16 ret uf8_encode: li t4, 16 bltu a0, t4, v_small # if v < 16 return v addi sp, sp, -8 # sw ra, 4(sp) # sw s0, 0(sp) # mv s0, a0 # mv a0, s0 # jal ra, clz # call clz(count leading zeros) to find leading 1 position li t0, 31 # sub t3, t0, a0 # t3 = msb postion li t4, 0 # e = 0 li t5, 0 # offset = 0 li t0, 5 # blt t3, t0, Lskip_est # jump to skip when msb < 5 (msb < 5 is mean that the value is in the safe range) li t0, 4 # sub t4, t3, t0 # e = msb - 4 li t0, 15 # ble t4, t0, Lcap_ok # li t4, 15 # if e > 15, then set e = 15 ``` ::: ### Result(Ripes console with 7 test data) ```= Input value: 0 (hex) = 0x00000 Encoded uf8: 0 (hex) = 0x00 Decoded value: 0 (hex) = 0x00000 Input value: 15 (hex) = 0x0000F Encoded uf8: 15 (hex) = 0x0F Decoded value: 15 (hex) = 0x0000F Input value: 16 (hex) = 0x00010 Encoded uf8: 16 (hex) = 0x10 Decoded value: 16 (hex) = 0x00010 Input value: 255 (hex) = 0x000FF Encoded uf8: 64 (hex) = 0x40 Decoded value: 240 (hex) = 0x000F0 Input value: 1024 (hex) = 0x00400 Encoded uf8: 96 (hex) = 0x60 Decoded value: 1008 (hex) = 0x003F0 Input value: 65535 (hex) = 0x0FFFF Encoded uf8: 192 (hex) = 0xC0 Decoded value: 65520 (hex) = 0x0FFF0 Input value: 1000000 (hex) = 0xF4240 Encoded uf8: 254 (hex) = 0xFE Decoded value: 983024 (hex) = 0xEFFF0 ``` ### ## Quiz1 - Problem C ### Problem C The bfloat16 format (16-bit, from Google Brain) preserves float32’s dynamic range by keeping the same 8-bit exponent, but reduces precision to a 7-bit significand (vs. 23). Bit Layout ``` ┌─────────┬──────────────┬──────────────┐ │Sign (1) │ Exponent (8) │ Mantissa (7) │ └─────────┴──────────────┴──────────────┘ 15 14 7 6 0 S: Sign bit (0 = positive, 1 = negative) E: Exponent bits (8 bits, bias = 127) M: Mantissa/fraction bits (7 bits) ``` The value 𝑣 of a BFloat16 number is calculated as: $$v = (-1)^S \times 2^{E-127} \times \left(1 + \frac{M}{128}\right)$$ where: - $S \in \{0, 1\}$ is the sign bit - $E \in [1, 254]$ is the biased exponent - $M \in [0, 127]$ is the mantissa value Special Cases - Zero : $E = 0, M = 0 \Rightarrow v = (-1)^S \times 0$ - Inf : $E = 255, M = 0 \Rightarrow v = (-1)^S \times \infty$ - Nan : $E = 255, M \neq 0 \Rightarrow v = \text{NaN}$ - Denormal : Not supported (flush to zero) ### F32 to BF16 convertion(C Code) :::spoiler open ```c= static inline bf16_t f32_to_bf16(float val) { uint32_t f32bits; memcpy(&f32bits, &val, sizeof(float)); if (((f32bits >> 23) & 0xFF) == 0xFF) return (bf16_t) {.bits = (f32bits >> 16) & 0xFFFF}; f32bits += ((f32bits >> 16) & 1) + 0x7FFF; return (bf16_t) {.bits = f32bits >> 16}; } ``` ::: ### F32 to BF16 convertion(Assembly Code) > Input : F32 > Output : BF16 :::spoiler open ```asm= f32_to_bf16: addi sp, sp, -16 sw ra, 12(sp) sw t0, 8(sp) sw t1, 4(sp) sw t2, 0(sp) mv t0, a0 # t0 = f32bits srli t1, t0, 23 andi t1, t1, 0xFF # t1 = exp li t2, 0xFF beq t1, t2, f_is_special # exp==255 => NaN/Inf, just take high16 # rounding: f32bits += ((f32bits>>16)&1) + 0x7FFF; then >>16 srli t1, t0, 16 andi t1, t1, 1 add t0, t0, t1 li t1, 0x7FFF add t0, t0, t1 srli t0, t0, 16 li t1, 0xFFFF and t0, t0, t1 mv a0, t0 j f_ret f_is_special: srli t0, t0, 16 li t1, 0xFFFF and t0, t0, t1 mv a0, t0 f_ret: lw t2, 0(sp) lw t1, 4(sp) lw t0, 8(sp) lw ra, 12(sp) addi sp, sp, 16 ret ``` ::: ### Special Case Check(C Code) :::spoiler open ```c= static inline bool bf16_isnan(bf16_t a) { return ((a.bits & BF16_EXP_MASK) == BF16_EXP_MASK) && (a.bits & BF16_MANT_MASK); } static inline bool bf16_isinf(bf16_t a) { return ((a.bits & BF16_EXP_MASK) == BF16_EXP_MASK) && !(a.bits & BF16_MANT_MASK); } static inline bool bf16_iszero(bf16_t a) { return !(a.bits & 0x7FFF); } ``` ::: ### Special Case Check(Assembly Code) > Input : BF16 > Output : Zero/Inf/Nan/Normal :::spoiler open ```asm= # iszero: (bits & 0x7FFF) == 0 li t1, 0x7FFF and t2, t0, t1 beqz t2, is_zero # isinf / isnan: (exp==0xFF) and (mant==0/!=0) # get exp = (bits>>7)&0xFF srli t1, t0, 7 andi t1, t1, 0xFF li t2, 0xFF bne t1, t2, is_normal # exp != 0xFF -> normal # exp==0xFF => check mant andi t1, t0, 0x7F # mant beqz t1, is_inf # nan la a0, msg_nan li a7, 4 ecall j done_flags is_inf: la a0, msg_inf li a7, 4 ecall j done_flags is_zero: la a0, msg_zero li a7, 4 ecall j done_flags is_normal: la a0, msg_norm li a7, 4 ecall done_flags: lw t2, 0(sp) lw t1, 4(sp) lw t0, 8(sp) lw ra, 12(sp) addi sp, sp, 16 ret ``` ::: ### BF16 Operations(C Code) :::spoiler open ``` c= static inline bf16_t bf16_add(bf16_t a, bf16_t b) { uint16_t sign_a = (a.bits >> 15) & 1; uint16_t sign_b = (b.bits >> 15) & 1; int16_t exp_a = ((a.bits >> 7) & 0xFF); int16_t exp_b = ((b.bits >> 7) & 0xFF); uint16_t mant_a = a.bits & 0x7F; uint16_t mant_b = b.bits & 0x7F; if (exp_a == 0xFF) { if (mant_a) return a; if (exp_b == 0xFF) return (mant_b || sign_a == sign_b) ? b : BF16_NAN(); return a; } if (exp_b == 0xFF) return b; if (!exp_a && !mant_a) return b; if (!exp_b && !mant_b) return a; if (exp_a) mant_a |= 0x80; if (exp_b) mant_b |= 0x80; int16_t exp_diff = exp_a - exp_b; uint16_t result_sign; int16_t result_exp; uint32_t result_mant; if (exp_diff > 0) { result_exp = exp_a; if (exp_diff > 8) return a; mant_b >>= exp_diff; } else if (exp_diff < 0) { result_exp = exp_b; if (exp_diff < -8) return b; mant_a >>= -exp_diff; } else { result_exp = exp_a; } if (sign_a == sign_b) { result_sign = sign_a; result_mant = (uint32_t) mant_a + mant_b; if (result_mant & 0x100) { result_mant >>= 1; if (++result_exp >= 0xFF) return (bf16_t) {.bits = (result_sign << 15) | 0x7F80}; } } else { if (mant_a >= mant_b) { result_sign = sign_a; result_mant = mant_a - mant_b; } else { result_sign = sign_b; result_mant = mant_b - mant_a; } if (!result_mant) return BF16_ZERO(); while (!(result_mant & 0x80)) { result_mant <<= 1; if (--result_exp <= 0) return BF16_ZERO(); } } return (bf16_t) { .bits = (result_sign << 15) | ((result_exp & 0xFF) << 7) | (result_mant & 0x7F), }; } static inline bf16_t bf16_sub(bf16_t a, bf16_t b) { b.bits ^= BF16_SIGN_MASK; return bf16_add(a, b); } static inline bf16_t bf16_mul(bf16_t a, bf16_t b) { uint16_t sign_a = (a.bits >> 15) & 1; uint16_t sign_b = (b.bits >> 15) & 1; int16_t exp_a = ((a.bits >> 7) & 0xFF); int16_t exp_b = ((b.bits >> 7) & 0xFF); uint16_t mant_a = a.bits & 0x7F; uint16_t mant_b = b.bits & 0x7F; uint16_t result_sign = sign_a ^ sign_b; if (exp_a == 0xFF) { if (mant_a) return a; if (!exp_b && !mant_b) return BF16_NAN(); return (bf16_t) {.bits = (result_sign << 15) | 0x7F80}; } if (exp_b == 0xFF) { if (mant_b) return b; if (!exp_a && !mant_a) return BF16_NAN(); return (bf16_t) {.bits = (result_sign << 15) | 0x7F80}; } if ((!exp_a && !mant_a) || (!exp_b && !mant_b)) return (bf16_t) {.bits = result_sign << 15}; int16_t exp_adjust = 0; if (!exp_a) { while (!(mant_a & 0x80)) { mant_a <<= 1; exp_adjust--; } exp_a = 1; } else mant_a |= 0x80; if (!exp_b) { while (!(mant_b & 0x80)) { mant_b <<= 1; exp_adjust--; } exp_b = 1; } else mant_b |= 0x80; uint32_t result_mant = (uint32_t) mant_a * mant_b; int32_t result_exp = (int32_t) exp_a + exp_b - BF16_EXP_BIAS + exp_adjust; if (result_mant & 0x8000) { result_mant = (result_mant >> 8) & 0x7F; result_exp++; } else result_mant = (result_mant >> 7) & 0x7F; if (result_exp >= 0xFF) return (bf16_t) {.bits = (result_sign << 15) | 0x7F80}; if (result_exp <= 0) { if (result_exp < -6) return (bf16_t) {.bits = result_sign << 15}; result_mant >>= (1 - result_exp); result_exp = 0; } return (bf16_t) {.bits = (result_sign << 15) | ((result_exp & 0xFF) << 7) | (result_mant & 0x7F)}; } static inline bf16_t bf16_div(bf16_t a, bf16_t b) { uint16_t sign_a = (a.bits >> 15) & 1; uint16_t sign_b = (b.bits >> 15) & 1; int16_t exp_a = ((a.bits >> 7) & 0xFF); int16_t exp_b = ((b.bits >> 7) & 0xFF); uint16_t mant_a = a.bits & 0x7F; uint16_t mant_b = b.bits & 0x7F; uint16_t result_sign = sign_a ^ sign_b; if (exp_b == 0xFF) { if (mant_b) return b; /* Inf/Inf = NaN */ if (exp_a == 0xFF && !mant_a) return BF16_NAN(); return (bf16_t) {.bits = result_sign << 15}; } if (!exp_b && !mant_b) { if (!exp_a && !mant_a) return BF16_NAN(); return (bf16_t) {.bits = (result_sign << 15) | 0x7F80}; } if (exp_a == 0xFF) { if (mant_a) return a; return (bf16_t) {.bits = (result_sign << 15) | 0x7F80}; } if (!exp_a && !mant_a) return (bf16_t) {.bits = result_sign << 15}; if (exp_a) mant_a |= 0x80; if (exp_b) mant_b |= 0x80; uint32_t dividend = (uint32_t) mant_a << 15; uint32_t divisor = mant_b; uint32_t quotient = 0; for (int i = 0; i < 16; i++) { quotient <<= 1; if (dividend >= (divisor << (15 - i))) { dividend -= (divisor << (15 - i)); quotient |= 1; } } int32_t result_exp = (int32_t) exp_a - exp_b + BF16_EXP_BIAS; if (!exp_a) result_exp--; if (!exp_b) result_exp++; if (quotient & 0x8000) quotient >>= 8; else { while (!(quotient & 0x8000) && result_exp > 1) { quotient <<= 1; result_exp--; } quotient >>= 8; } quotient &= 0x7F; if (result_exp >= 0xFF) return (bf16_t) {.bits = (result_sign << 15) | 0x7F80}; if (result_exp <= 0) return (bf16_t) {.bits = result_sign << 15}; return (bf16_t) {.bits = (result_sign << 15) | ((result_exp & 0xFF) << 7) | (quotient & 0x7F)}; } static inline bf16_t bf16_sqrt(bf16_t a) { uint16_t sign = (a.bits >> 15) & 1; int16_t exp = ((a.bits >> 7) & 0xFF); uint16_t mant = a.bits & 0x7F; /* Handle special cases */ if (exp == 0xFF) { if (mant) return a; /* NaN propagation */ if (sign) return BF16_NAN(); /* sqrt(-Inf) = NaN */ return a; /* sqrt(+Inf) = +Inf */ } /* sqrt(0) = 0 (handle both +0 and -0) */ if (!exp && !mant) return BF16_ZERO(); /* sqrt of negative number is NaN */ if (sign) return BF16_NAN(); /* Flush denormals to zero */ if (!exp) return BF16_ZERO(); /* Direct bit manipulation square root algorithm */ /* For sqrt: new_exp = (old_exp - bias) / 2 + bias */ int32_t e = exp - BF16_EXP_BIAS; int32_t new_exp; /* Get full mantissa with implicit 1 */ uint32_t m = 0x80 | mant; /* Range [128, 256) representing [1.0, 2.0) */ /* Adjust for odd exponents: sqrt(2^odd * m) = 2^((odd-1)/2) * sqrt(2*m) */ if (e & 1) { m <<= 1; /* Double mantissa for odd exponent */ new_exp = ((e - 1) >> 1) + BF16_EXP_BIAS; } else { new_exp = (e >> 1) + BF16_EXP_BIAS; } /* Now m is in range [128, 256) or [256, 512) if exponent was odd */ /* Binary search for integer square root */ /* We want result where result^2 = m * 128 (since 128 represents 1.0) */ uint32_t low = 90; /* Min sqrt (roughly sqrt(128)) */ uint32_t high = 256; /* Max sqrt (roughly sqrt(512)) */ uint32_t result = 128; /* Default */ /* Binary search for square root of m */ while (low <= high) { uint32_t mid = (low + high) >> 1; uint32_t sq = (mid * mid) / 128; /* Square and scale */ if (sq <= m) { result = mid; /* This could be our answer */ low = mid + 1; } else { high = mid - 1; } } /* result now contains sqrt(m) * sqrt(128) / sqrt(128) = sqrt(m) */ /* But we need to adjust the scale */ /* Since m is scaled where 128=1.0, result should also be scaled same way */ /* Normalize to ensure result is in [128, 256) */ if (result >= 256) { result >>= 1; new_exp++; } else if (result < 128) { while (result < 128 && new_exp > 1) { result <<= 1; new_exp--; } } /* Extract 7-bit mantissa (remove implicit 1) */ uint16_t new_mant = result & 0x7F; /* Check for overflow/underflow */ if (new_exp >= 0xFF) return (bf16_t) {.bits = 0x7F80}; /* +Inf */ if (new_exp <= 0) return BF16_ZERO(); return (bf16_t) {.bits = ((new_exp & 0xFF) << 7) | new_mant}; } ``` ::: ### BF16 Operations - Add&Sub(Assembly Code) :::spoiler open ```asm= bf16_add: addi sp, sp, -40 sw ra, 36(sp) sw s0, 32(sp) sw s1, 28(sp) sw s2, 24(sp) sw s3, 20(sp) sw s4, 16(sp) sw s5, 12(sp) sw s6, 8(sp) sw s7, 4(sp) mv s0, a0 # a mv s1, a1 # b # extract fields srli s2, s0, 15 # sign_a andi s2, s2, 1 srli s3, s1, 15 # sign_b andi s3, s3, 1 srli s4, s0, 7 # exp_a andi s4, s4, 0xFF srli s5, s1, 7 # exp_b andi s5, s5, 0xFF andi s6, s0, 0x7F # mant_a andi s7, s1, 0x7F # mant_b # handle specials (NaN/Inf/zero) - same logic as C # if exp_a==0xFF: li t0, 0xFF bne s4, t0, add_chk_b_inf # exp_a==0xFF bnez s6, add_ret_a # NaN -> return a # a is Inf bne s5, t0, add_ret_a # if b not Inf -> return a # both inf bnez s7, add_ret_b # b NaN -> return b # both inf with signs: same sign -> b, else NaN bne s2, s3, add_nan j add_ret_b add_chk_b_inf: bne s5, t0, add_chk_a_zero # b is Inf/NaN bnez s7, add_ret_b j add_ret_b add_chk_a_zero: # if a is zero (exp==0 && mant==0) return b bnez s4, add_chk_b_zero beqz s6, add_ret_b add_chk_b_zero: # if b is zero -> return a bnez s5, add_prep beqz s7, add_ret_a add_prep: # add hidden 1 if normalized beqz s4, add_nohid_a ori s6, s6, 0x80 add_nohid_a: beqz s5, add_nohid_b ori s7, s7, 0x80 add_nohid_b: # align exponents sub t0, s4, s5 # exp_diff = exp_a - exp_b mv t1, s4 # result_exp bgtz t0, add_shift_b bltz t0, add_shift_a j add_same_exp add_shift_b: li t2, 8 bgt t0, t2, add_ret_a # if diff>8 return a (underflow of b side) mv t1, s4 srl s7, s7, t0 j add_do add_shift_a: neg t3, t0 # -exp_diff li t2, 8 bgt t3, t2, add_ret_b # if diff<-8 return b mv t1, s5 srl s6, s6, t3 j add_do add_same_exp: mv t1, s4 add_do: # same sign -> add mant; else subtract bne s2, s3, add_diff_sign # same sign add t4, s6, s7 # result_mant # if overflow (bit8) li t5, 0x100 and t2, t4, t5 beqz t2, add_pack_same srli t4, t4, 1 addi t1, t1, 1 li t2, 0xFF blt t1, t2, add_pack_same # overflow to INF slli t6, s2, 15 li t1, 0x7F80 or t6, t6, t1 mv a0, t6 j add_epilogue add_pack_same: # pack with same sign slli t6, s2, 15 slli t2, t1, 7 andi t4, t4, 0x7F or t6, t6, t2 or t6, t6, t4 mv a0, t6 j add_epilogue add_diff_sign: # subtract larger - smaller bge s6, s7, add_a_ge_b mv t6, s3 # result_sign sub t4, s7, s6 j add_norm_sub add_a_ge_b: mv t6, s2 sub t4, s6, s7 add_norm_sub: beqz t4, add_zero # normalize until mant has bit7 set li t2, 0x80 add_norm_loop: and t5, t4, t2 bnez t5, add_pack_sub slli t4, t4, 1 addi t1, t1, -1 blez t1, add_zero j add_norm_loop add_pack_sub: slli t5, t6, 15 slli t2, t1, 7 andi t4, t4, 0x7F or t5, t5, t2 or t5, t5, t4 mv a0, t5 j add_epilogue add_zero: li a0, 0x0000 j add_epilogue add_nan: li a0, 0x7FC0 j add_epilogue add_ret_a: mv a0, s0 j add_epilogue add_ret_b: mv a0, s1 add_epilogue: lw s7, 4(sp) lw s6, 8(sp) lw s5, 12(sp) lw s4, 16(sp) lw s3, 20(sp) lw s2, 24(sp) lw s1, 28(sp) lw s0, 32(sp) lw ra, 36(sp) addi sp, sp, 40 ret bf16_sub: addi sp, sp, -8 sw ra, 4(sp) sw t1, 0(sp) li t1, 0x8000 xor a1, a1, t1 jal ra, bf16_add lw t1, 0(sp) lw ra, 4(sp) addi sp, sp, 8 ret ``` ::: ### BF16 Operations - Mul&Div(Assembly Code) :::spoiler open ```asm= bf16_mul: addi sp, sp, -48 sw ra, 44(sp) sw s0, 40(sp) sw s1, 36(sp) sw s2, 32(sp) sw s3, 28(sp) sw s4, 24(sp) sw s5, 20(sp) sw s6, 16(sp) sw s7, 12(sp) sw s8, 8(sp) sw s9, 4(sp) mv s0, a0 # a mv s1, a1 # b # fields srli s2, s0, 15 # sign_a andi s2, s2, 1 srli s3, s1, 15 # sign_b andi s3, s3, 1 xor s4, s2, s3 # result_sign srli s5, s0, 7 # exp_a andi s5, s5, 0xFF srli s6, s1, 7 # exp_b andi s6, s6, 0xFF andi s7, s0, 0x7F # mant_a andi s8, s1, 0x7F # mant_b # specials li t0, 0xFF beq s5, t0, mul_a_inf_nan beq s6, t0, mul_b_inf_nan # zero check beqz s5, mul_a_sub_chk j mul_a_ok mul_a_sub_chk: beqz s7, mul_ret_zero mul_a_ok: beqz s6, mul_b_sub_chk j mul_b_ok mul_b_sub_chk: beqz s8, mul_ret_zero mul_b_ok: # subnormal normalization + hidden 1 li s9, 0 # exp_adjust beqz s5, mul_norm_a ori s7, s7, 0x80 j mul_norm_b mul_norm_a: # shift left until bit7 set, dec exp_adjust li t1, 0x80 mul_na_loop: and t2, s7, t1 bnez t2, mul_na_done slli s7, s7, 1 addi s9, s9, -1 j mul_na_loop mul_na_done: li s5, 1 mul_norm_b: beqz s6, mul_norm_b2 ori s8, s8, 0x80 j mul_mul mul_norm_b2: li t1, 0x80 mul_nb_loop: and t2, s8, t1 bnez t2, mul_nb_done slli s8, s8, 1 addi s9, s9, -1 j mul_nb_loop mul_nb_done: li s6, 1 mul_mul: # 8-bit * 8-bit -> up to 16-bit mul t3, s7, s8 # result_mant (16-bit max) # result_exp = exp_a + exp_b - 127 + exp_adjust la t4, BF16_EXP_BIAS lw t4, 0(t4) # 127 add t5, s5, s6 sub t5, t5, t4 add t5, t5, s9 # result_exp # normalize: if (mant & 0x8000) -> shift>>8, exp++ li t6, 0x8000 and t1, t3, t6 beqz t1, mul_shift7 srli t3, t3, 8 addi t5, t5, 1 j mul_pack_check mul_shift7: srli t3, t3, 7 mul_pack_check: # overflow / underflow li t1, 0xFF bge t5, t1, mul_ret_inf blez t5, mul_underflow_chk # normal pack andi t3, t3, 0x7F slli t0, s4, 15 slli t1, t5, 7 or t0, t0, t1 or t0, t0, t3 mv a0, t0 j mul_done mul_underflow_chk: # if result_exp < -6 -> zero else shift mant right and set exp=0 li t1, -6 bgt t5, t1, mul_denorm mul_ret_zero: slli t0, s4, 15 mv a0, t0 j mul_done mul_denorm: neg t2, t5 # 0 - exp addi t2, t2, 1 # 1 - result_exp srl t3, t3, t2 andi t3, t3, 0x7F slli t0, s4, 15 or t0, t0, t3 mv a0, t0 j mul_done mul_ret_inf: slli t0, s4, 15 li t1, 0x7F80 or t0, t0, t1 mv a0, t0 j mul_done mul_a_inf_nan: andi t1, s0, 0x7F bnez t1, mul_ret_a # NaN -> a # a=Inf # if b==0 -> NaN; else Inf with sign beqz s6, mul_b0_chk_m j mul_ret_sign_inf mul_b0_chk_m: beqz s8, mul_ret_nan j mul_ret_sign_inf mul_b_inf_nan: andi t1, s1, 0x7F bnez t1, mul_ret_b beqz s5, mul_a0_chk_m j mul_ret_sign_inf mul_a0_chk_m: beqz s7, mul_ret_nan j mul_ret_sign_inf mul_ret_a: mv a0, s0 j mul_done mul_ret_b: mv a0, s1 j mul_done mul_ret_sign_inf: slli t0, s4, 15 li t1, 0x7F80 or t0, t0, t1 mv a0, t0 j mul_done mul_ret_nan: li a0, 0x7FC0 mul_done: lw s9, 4(sp) lw s8, 8(sp) lw s7, 12(sp) lw s6, 16(sp) lw s5, 20(sp) lw s4, 24(sp) lw s3, 28(sp) lw s2, 32(sp) lw s1, 36(sp) lw s0, 40(sp) lw ra, 44(sp) addi sp, sp, 48 ret ######################################### bf16_div: addi sp, sp, -56 sw ra, 52(sp) sw s0, 48(sp) sw s1, 44(sp) sw s2, 40(sp) sw s3, 36(sp) sw s4, 32(sp) sw s5, 28(sp) sw s6, 24(sp) sw s7, 20(sp) sw s8, 16(sp) sw s9, 12(sp) sw s10,8(sp) sw s11,4(sp) mv s0, a0 # a mv s1, a1 # b # fields srli s2, s0, 15 # sign_a andi s2, s2, 1 srli s3, s1, 15 # sign_b andi s3, s3, 1 xor s4, s2, s3 # result_sign srli s5, s0, 7 # exp_a andi s5, s5, 0xFF srli s6, s1, 7 # exp_b andi s6, s6, 0xFF andi s7, s0, 0x7F # mant_a andi s8, s1, 0x7F # mant_b # specials (follow your C) li t0, 0xFF beq s6, t0, div_b_inf_nan beqz s6, div_b_zero_chk j div_chk_a div_b_zero_chk: beqz s8, div_div_by_zero j div_chk_a div_b_inf_nan: bnez s8, div_ret_b # NaN # b=Inf beq s5, t0, div_inf_inf_nan slli t1, s4, 15 # result is signed zero mv a0, t1 j div_done div_inf_inf_nan: # a Inf and b Inf => NaN andi t1, s0, 0x7F bnez t1, div_ret_a li a0, 0x7FC0 j div_done div_div_by_zero: # a/0 -> Inf (unless a==0 -> NaN handled below) beqz s5, div_a_zero_chk_m slli t1, s4, 15 li t2, 0x7F80 or t1, t1, t2 mv a0, t1 j div_done div_a_zero_chk_m: beqz s7, div_ret_nan slli t1, s4, 15 li t2, 0x7F80 or t1, t1, t2 mv a0, t1 j div_done div_chk_a: beq s5, t0, div_a_inf_nan beqz s5, div_a_zero_sub_chk j div_prep div_a_zero_sub_chk: beqz s7, div_ret_zero div_a_inf_nan: bnez s7, div_ret_a # NaN # a=Inf slli t1, s4, 15 li t2, 0x7F80 or t1, t1, t2 mv a0, t1 j div_done div_ret_zero: slli t1, s4, 15 mv a0, t1 j div_done div_ret_a: mv a0, s0 j div_done div_ret_b: mv a0, s1 j div_done div_ret_nan: li a0, 0x7FC0 j div_done div_prep: # add hidden 1 if normalized beqz s5, div_nohid_a ori s7, s7, 0x80 div_nohid_a: beqz s6, div_nohid_b ori s8, s8, 0x80 div_nohid_b: # long division to 16-bit quotient slli t2, s7, 15 # dividend mv t3, s8 # divisor li t4, 0 # quotient li t5, 0 li t6, 16 # 會跑 16 次:shift = 15..0 div_loop: addi t6, t6, -1 # 先減,t6 = 15..0 slli t4, t4, 1 # quotient <<= 1 mv t5, t3 # t5 = divisor addi t1, t6, 0 # t1 = 當前位移量 (15..0) sll t5, t5, t1 # t5 = divisor << shift bgeu t2, t5, div_sub_ok j div_next div_sub_ok: sub t2, t2, t5 # dividend -= (divisor << shift) ori t4, t4, 1 # quotient|=1 div_next: bnez t6, div_loop # result_exp = exp_a - exp_b + 127; adjust for subnormals la t0, BF16_EXP_BIAS lw t0, 0(t0) sub t1, s5, s6 add t1, t1, t0 beqz s5, div_adj_a j div_adj_b div_adj_a: addi t1, t1, -1 div_adj_b: beqz s6, div_adj_b2 j div_norm div_adj_b2: addi t1, t1, 1 div_norm: # normalize quotient to have bit15 set then >>8 to 7-bit mant li t0, 0x8000 and t2, t4, t0 bnez t2, div_hi # shift left until bit15 set, dec exp li t2, 0x8000 div_norm2: and t5, t4, t2 bnez t5, div_shift_done slli t4, t4, 1 addi t1, t1, -1 j div_norm2 div_shift_done: srli t4, t4, 8 j div_pack div_hi: srli t4, t4, 8 div_pack: andi t4, t4, 0x7F li t2, 0xFF bge t1, t2, div_ret_inf blez t1, div_underflow slli t0, s4, 15 slli t2, t1, 7 or t0, t0, t2 or t0, t0, t4 mv a0, t0 j div_done div_ret_inf: slli t0, s4, 15 li t1, 0x7F80 or t0, t0, t1 mv a0, t0 j div_done div_underflow: slli t0, s4, 15 mv a0, t0 div_done: lw s11,4(sp) lw s10,8(sp) lw s9, 12(sp) lw s8, 16(sp) lw s7, 20(sp) lw s6, 24(sp) lw s5, 28(sp) lw s4, 32(sp) lw s3, 36(sp) lw s2, 40(sp) lw s1, 44(sp) lw s0, 48(sp) lw ra, 52(sp) addi sp, sp, 56 ret ``` ::: ### BF16 Operations - Sqrt(Assembly Code) :::spoiler open ```asm= bf16_sqrt: addi sp, sp, -40 sw ra, 36(sp) sw s0, 32(sp) sw s1, 28(sp) sw s2, 24(sp) sw s3, 20(sp) sw s4, 16(sp) sw s5, 12(sp) sw s6, 8(sp) sw s7, 4(sp) mv s0, a0 srli s1, s0, 15 andi s1, s1, 1 # sign srli s2, s0, 7 andi s2, s2, 0xFF # exp andi s3, s0, 0x7F # mant # if exp==0xFF li t0, 0xFF bne s2, t0, sqrt_check_zero bnez s3, sqrt_ret_a # NaN propagate bnez s1, sqrt_ret_nan # sqrt(-Inf) = NaN mv a0, s0 # +Inf j sqrt_done sqrt_check_zero: beqz s2, sqrt_chk_mant j sqrt_check_sign sqrt_chk_mant: beqz s3, sqrt_ret_zero sqrt_check_sign: bnez s1, sqrt_ret_nan # negative -> NaN beqz s2, sqrt_ret_zero # denorm -> 0 # e = exp - 127 la t1, BF16_EXP_BIAS lw t1, 0(t1) sub s4, s2, t1 # s4 = e # m = 0x80 | mant ori s5, s3, 0x80 # odd exponent? andi t2, s4, 1 beqz t2, sqrt_even_exp slli s5, s5, 1 addi s4, s4, -1 sqrt_even_exp: srai s4, s4, 1 add s4, s4, t1 # new_exp = (e>>1)+bias # binary search for sqrt(m) li t3, 90 li t4, 256 li s6, 128 sqrt_loop: bgt t3, t4, sqrt_loop_end add t5, t3, t4 srli t5, t5, 1 mul t6, t5, t5 srli t6, t6, 7 ble t6, s5, sqrt_set j sqrt_high sqrt_set: mv s6, t5 addi t3, t5, 1 j sqrt_loop sqrt_high: addi t4, t5, -1 j sqrt_loop sqrt_loop_end: mv s5, s6 # normalize to [128,256] li t0, 256 bge s5, t0, sqrt_shift_r li t0, 128 blt s5, t0, sqrt_shift_l j sqrt_pack sqrt_shift_r: srli s5, s5, 1 addi s4, s4, 1 j sqrt_pack sqrt_shift_l: li t0, 128 sqrt_l_loop: bge s5, t0, sqrt_pack slli s5, s5, 1 addi s4, s4, -1 j sqrt_l_loop sqrt_pack: andi s5, s5, 0x7F li t0, 0xFF bge s4, t0, sqrt_ret_inf blez s4, sqrt_ret_zero slli t1, s4, 7 or a0, t1, s5 j sqrt_done sqrt_ret_a: mv a0, s0 j sqrt_done sqrt_ret_inf: li a0, 0x7F80 j sqrt_done sqrt_ret_nan: li a0, 0x7FC0 j sqrt_done sqrt_ret_zero: li a0, 0x0000 sqrt_done: lw s7, 4(sp) lw s6, 8(sp) lw s5, 12(sp) lw s4, 16(sp) lw s3, 20(sp) lw s2, 24(sp) lw s1, 28(sp) lw s0, 32(sp) lw ra, 36(sp) addi sp, sp, 40 ret ``` ::: ### Result(Ripes console with 4 test data) > Test1 : a = 4.0, b = 2.0 Expected : $a+b=6, a-b=2, a*b=8, a/b=2, a^{0.5}=2$ Test2 : a = 9.0, b = 3.0 Expected : $a+b=12, a-b=6, a*b=27, a/b=3, a^{0.5}=3$ Test3 : a = $\infty$, b = 2.0 Expected : $a+b=\infty, a-b=\infty, a*b=\infty, a/b=\infty, a^{0.5}=\infty$ Test4 : a = 1.0, b = 0.0 Expected : $a+b=1, a-b=1, a*b=0, a/b=Nan, a^{0.5}=1$ ``` A (f32 hex): 0x40800000 -> BF16: 0x4080 [normal] B (f32 hex): 0x40000000 -> BF16: 0x4000 [normal] ADD (A+B) BF16: 0x40C0 [normal] SUB (A-B) BF16: 0x4000 [normal] MUL (A*B) BF16: 0x4100 [normal] DIV (A/B) BF16: 0x4000 [normal] SQRT (A^0.5) BF16: 0x4000 [normal] A (f32 hex): 0x41100000 -> BF16: 0x4110 [normal] B (f32 hex): 0x40400000 -> BF16: 0x4040 [normal] ADD (A+B) BF16: 0x4140 [normal] SUB (A-B) BF16: 0x40C0 [normal] MUL (A*B) BF16: 0x41D8 [normal] DIV (A/B) BF16: 0x4040 [normal] SQRT (A^0.5) BF16: 0x4040 [normal] A (f32 hex): 0x7F7FFFFF -> BF16: 0x7F80 [inf] B (f32 hex): 0x40000000 -> BF16: 0x4000 [normal] ADD (A+B) BF16: 0x7F80 [inf] SUB (A-B) BF16: 0x7F80 [inf] MUL (A*B) BF16: 0x7F80 [inf] DIV (A/B) BF16: 0x7F80 [inf] SQRT (A^0.5) BF16: 0x7F80 [inf] A (f32 hex): 0x3F800000 -> BF16: 0x3F80 [normal] B (f32 hex): 0x00000000 -> BF16: 0x0000 [zero] ADD (A+B) BF16: 0x3F80 [normal] SUB (A-B) BF16: 0x3F80 [normal] MUL (A*B) BF16: 0x0000 [zero] DIV (A/B) BF16: 0x7F80 [inf] SQRT (A^0.5) BF16: 0x3F80 [normal] ``` ## Optimize LeetCode [Problem #342](https://leetcode.com/problems/power-of-four/description/) using CLZ to reduce excution time. ### 342. Power of Four Given an integer n, return true if it is a power of four. Otherwise, return false. An integer n is a power of four, if there exists an integer x such that n == 4x. Example 1: >Input: n = 16 Output: true Example 2: >Input: n = 5 Output: false Example 3: >Input: n = 1 Output: true Constraints: > -231 <= n <= 231 - 1 ### Without using CLZ(C Code) :::spoiler open ```c= bool isPowerOfFour(int n) { if (n <= 0) return false; while (n % 4 == 0) n /= 4; return n == 1; } ``` ::: ### Without using CLZ(Assembly Code) :::spoiler open ```asm= isPowerOfFour: blez a0, ret_false div_loop: # t0 = n % 4 andi t0, a0, 3 bnez t0, check_one # n /= 4 srli a0, a0, 2 j div_loop check_one: li t0, 1 beq a0, t0, ret_true j ret_false ret_true: li a0, 1 ret ret_false: li a0, 0 ret ``` ::: ### Using CLZ(C Code) :::spoiler open ```c= static inline unsigned clz(uint32_t x) { int n = 32, c = 16; do { uint32_t y = x >> c; if (y) { n -= c; x = y; } c >>= 1; } while (c); return n - x; } bool isPowerOfFour(int n) { if (n <= 0) return false; if (n & (n - 1)) return false; int msb = 31 - clz((uint32_t)n); return (msb % 2 == 0); } ``` ::: ### Using CLZ(Assembly Code) :::spoiler open ```asm= clz: addi sp, sp, -12 sw ra, 8(sp) sw t0, 4(sp) sw t1, 0(sp) beqz a0, clz_zero # if x==0 -> return 32 li t0, 0 # count = 0 li t2, 0x20 # limit = 32 clz_loop: sll t1, a0, t0 # left shift to leading 1 bltz t1, clz_done # if sign bit=1 -> found MSB addi t0, t0, 1 # count++ blt t0, t2, clz_loop # count < 32 → continue clz_done: mv a0, t0 j clz_exit clz_zero: li a0, 0x20 # return 32 clz_exit: lw t1, 0(sp) lw t0, 4(sp) lw ra, 8(sp) addi sp, sp, 12 ret ``` ::: ### Using Optimize CLZ(C code) :::spoiler open ```c= int clz(uint32_t x) { int n = 0; if (x >> 16) { n += 16; x >>= 16; } if (x >> 8) { n += 8; x >>= 8; } if (x >> 4) { n += 4; x >>= 4; } if (x >> 2) { n += 2; x >>= 2; } if (x >> 1) { n += 1; } return 31 - n; } ``` ::: ### Using Optimize CLZ(Assembly Code) :::spoiler open ```asm= clz: addi sp, sp, -20 sw ra, 16(sp) sw t0, 12(sp) sw t1, 8(sp) sw t2, 4(sp) sw t3, 0(sp) beqz a0, clz_zero # if x == 0 → return 32 li t3, 0 # n = 0 # Step 1: check upper 16 bits srli t1, a0, 16 beqz t1, clz_step2 mv a0, t1 # x = x >> 16 li t2, 16 add t3, t3, t2 # n += 16 clz_step2: # Step 2: check upper 8 bits srli t1, a0, 8 beqz t1, clz_step3 mv a0, t1 li t2, 8 add t3, t3, t2 clz_step3: # Step 3: check upper 4 bits srli t1, a0, 4 beqz t1, clz_step4 mv a0, t1 li t2, 4 add t3, t3, t2 clz_step4: # Step 4: check upper 2 bits srli t1, a0, 2 beqz t1, clz_step5 mv a0, t1 li t2, 2 add t3, t3, t2 clz_step5: # Step 5: final correction srli t1, a0, 1 beqz t1, clz_done li t2, 1 add t3, t3, t2 clz_done: li t2, 31 sub a0, t2, t3 # a0 = 31 - n j clz_exit clz_zero: li a0, 32 clz_exit: lw t3, 0(sp) lw t2, 4(sp) lw t1, 8(sp) lw t0, 12(sp) lw ra, 16(sp) addi sp, sp, 20 ret ``` ::: ### Analysis Linear CLZ and Binary Search CLZ - __Linear CLZ (bit-by-bit scan): checks every bit from MSB (bit31) to LSB one by one — up to 32 iterations.__ - __Binary-search CLZ: divides the search range by 2 each step (16 / 8 / 4 / 2 / 1), finishing in ≤5 steps.__ In some test cases (e.g., high MSB values), Linear CLZ may even show slightly lower cycles because it avoids multiple branch penalties.Binary Search CLZ reduces the number of shifts and branches from 32 to 5, giving a clear advantage in real hardware or pipelined systems. But in Ripes (pure RV32I, no CLZ, no branch prediction), Linear CLZ can appear comparable or even faster because of branch and stack overhead. ### Compare Linear CLZ and Binary Search CLZ | Item| Linear CLZ| Binary Search CLZ| | --------------------- | ------------------- | -------------------------------- | | Number of Shifts| Up to 32 times| Up to 5 times| | Time Complexity| Fixed O(32)| O(log₂32) = 5| | Efficiency|Slow| Fast| | Result Accuracy| Same| Same| |Cycle(data=64)|235|107| |Ins(data=64)|157|75| |Result|![image](https://hackmd.io/_uploads/S1hUxk_Teg.png)|![image](https://hackmd.io/_uploads/ryhmlJOTge.png)| ## References [Quiz 1](https://hackmd.io/@sysprog/arch2025-quiz1-sol) [How to Write a Git Commit Message](https://cbea.ms/git-commit/) [ca2025-quizzes](https://github.com/sysprog21/ca2025-quizzes) [Ripes](https://github.com/mortbopet/Ripes) [Hackmd Features](https://hackmd.io/features-tw?view)