# Computer Architecture - Assignment 1
:::warning
__Notice !__
In this assignment, I used ChatGPT to fix my code, understand the problem requirements, and improve my grammar.
:::
contributed by <[Hao2152](https://github.com/Hao2152)>
[Source Code](https://github.com/Hao2152/ca2025-quizzes)
## Quiz1 - Problem B
### Problem B
Assume uf8 implements a logarithmic 8-bit codec that maps 20-bit unsigned integers ([0, 1,015,792]) to 8-bit symbols via logarithmic quantization, delivering 2.5:1 compression and ≤6.25% relative error.
The uf8 encoding scheme is ideal for sensor data applications such as temperature and distance measurements where range is more important than precision. In graphics applications, it efficiently represents level-of-detail (LOD) distances and fog density values. The scheme is also well-suited for timer implementations that use exponential backoff strategies. However, the uf8 encoding should not be used for financial calculations where exact precision is required and rounding errors are unacceptable. It is inappropriate for cryptographic applications that require uniform distribution of values for security purposes.
- __Requirement__ - Input a 20-bit value, encode it into an 8-bit format, and then decode the 8-bit value back into the 20-bit format.
- __Input__ - 20-bit value(Hex/Decimal).
- __Output__ -
- Encode input value(20-bit) to 8-bit value(Hex/Decimal).
- Decode 8-bit value to 20-bit value(Hex/Decimal).
---------------
- [ ] Decoding
$$D(b) = m \cdot 2^e + (2^e - 1) \cdot 16$$
Where $e = \lfloor b/16 \rfloor$ and $m = b \bmod 16$
- [ ] Encoding
$$E(v) = \begin{cases}
v, & \text{if } v < 16 \\
16e + \lfloor(v - \text{offset}(e))/2^e\rfloor, & \text{otherwise}
\end{cases}$$
Where $\text{offset}(e) = (2^e - 1) \cdot 16$
### Problem B - Source Code
> These are source code of decode and encode from [Quiz1](https://hackmd.io/@sysprog/arch2025-quiz1-sol)
:::spoiler open
```c=
/* Decode uf8 to uint32_t */
uint32_t uf8_decode(uf8 fl)
{
uint32_t mantissa = fl & 0x0f;
uint8_t exponent = fl >> 4;
uint32_t offset = (0x7FFF >> (15 - exponent)) << 4;
return (mantissa << exponent) + offset;
}
/* Encode uint32_t to uf8 */
uf8 uf8_encode(uint32_t value)
{
/* Use CLZ for fast exponent calculation */
if (value < 16)
return value;
/* Find appropriate exponent using CLZ hint */
int lz = clz(value);
int msb = 31 - lz;
/* Start from a good initial guess */
uint8_t exponent = 0;
uint32_t overflow = 0;
if (msb >= 5) {
/* Estimate exponent - the formula is empirical */
exponent = msb - 4;
if (exponent > 15)
exponent = 15;
/* Calculate overflow for estimated exponent */
for (uint8_t e = 0; e < exponent; e++)
overflow = (overflow << 1) + 16;
/* Adjust if estimate was off */
while (exponent > 0 && value < overflow) {
overflow = (overflow - 16) >> 1;
exponent--;
}
}
/* Find exact exponent */
while (exponent < 15) {
uint32_t next_overflow = (overflow << 1) + 16;
if (value < next_overflow)
break;
overflow = next_overflow;
exponent++;
}
uint8_t mantissa = (value - overflow) >> exponent;
return (exponent << 4) | mantissa;
}
```
:::
### Translating C code into RISC-V assembly code(a part of my code)
:::spoiler open
```asm=
uf8_decode:
andi t0, a0, 0x0f # mantissa = b & 0x0F
srli t1, a0, 4 # exponent = b >> 4
li t2, 1 # init t2
sll t2, t2, t1 # t2 = 2^e
addi t2, t2, -1 # t2 = 2^e - 1
slli t2, t2, 4 # t2 = (2^e - 1) * 16
sll t0, t0, t1 # t0 = m * 2^e
add a0, t0, t2 # a0 = m*2^e + (2^e -1)*16
ret
uf8_encode:
li t4, 16
bltu a0, t4, v_small # if v < 16 return v
addi sp, sp, -8 #
sw ra, 4(sp) #
sw s0, 0(sp) #
mv s0, a0 #
mv a0, s0 #
jal ra, clz # call clz(count leading zeros) to find leading 1 position
li t0, 31 #
sub t3, t0, a0 # t3 = msb postion
li t4, 0 # e = 0
li t5, 0 # offset = 0
li t0, 5 #
blt t3, t0, Lskip_est # jump to skip when msb < 5 (msb < 5 is mean that the value is in the safe range)
li t0, 4 #
sub t4, t3, t0 # e = msb - 4
li t0, 15 #
ble t4, t0, Lcap_ok #
li t4, 15 # if e > 15, then set e = 15
```
:::
### Result(Ripes console with 7 test data)
```=
Input value: 0 (hex) = 0x00000
Encoded uf8: 0 (hex) = 0x00
Decoded value: 0 (hex) = 0x00000
Input value: 15 (hex) = 0x0000F
Encoded uf8: 15 (hex) = 0x0F
Decoded value: 15 (hex) = 0x0000F
Input value: 16 (hex) = 0x00010
Encoded uf8: 16 (hex) = 0x10
Decoded value: 16 (hex) = 0x00010
Input value: 255 (hex) = 0x000FF
Encoded uf8: 64 (hex) = 0x40
Decoded value: 240 (hex) = 0x000F0
Input value: 1024 (hex) = 0x00400
Encoded uf8: 96 (hex) = 0x60
Decoded value: 1008 (hex) = 0x003F0
Input value: 65535 (hex) = 0x0FFFF
Encoded uf8: 192 (hex) = 0xC0
Decoded value: 65520 (hex) = 0x0FFF0
Input value: 1000000 (hex) = 0xF4240
Encoded uf8: 254 (hex) = 0xFE
Decoded value: 983024 (hex) = 0xEFFF0
```
###
## Quiz1 - Problem C
### Problem C
The bfloat16 format (16-bit, from Google Brain) preserves float32’s dynamic range by keeping the same 8-bit exponent, but reduces precision to a 7-bit significand (vs. 23).
Bit Layout
```
┌─────────┬──────────────┬──────────────┐
│Sign (1) │ Exponent (8) │ Mantissa (7) │
└─────────┴──────────────┴──────────────┘
15 14 7 6 0
S: Sign bit (0 = positive, 1 = negative)
E: Exponent bits (8 bits, bias = 127)
M: Mantissa/fraction bits (7 bits)
```
The value 𝑣 of a BFloat16 number is calculated as:
$$v = (-1)^S \times 2^{E-127} \times \left(1 + \frac{M}{128}\right)$$
where:
- $S \in \{0, 1\}$ is the sign bit
- $E \in [1, 254]$ is the biased exponent
- $M \in [0, 127]$ is the mantissa value
Special Cases
- Zero : $E = 0, M = 0 \Rightarrow v = (-1)^S \times 0$
- Inf : $E = 255, M = 0 \Rightarrow v = (-1)^S \times \infty$
- Nan : $E = 255, M \neq 0 \Rightarrow v = \text{NaN}$
- Denormal : Not supported (flush to zero)
### F32 to BF16 convertion(C Code)
:::spoiler open
```c=
static inline bf16_t f32_to_bf16(float val)
{
uint32_t f32bits;
memcpy(&f32bits, &val, sizeof(float));
if (((f32bits >> 23) & 0xFF) == 0xFF)
return (bf16_t) {.bits = (f32bits >> 16) & 0xFFFF};
f32bits += ((f32bits >> 16) & 1) + 0x7FFF;
return (bf16_t) {.bits = f32bits >> 16};
}
```
:::
### F32 to BF16 convertion(Assembly Code)
> Input : F32
> Output : BF16
:::spoiler open
```asm=
f32_to_bf16:
addi sp, sp, -16
sw ra, 12(sp)
sw t0, 8(sp)
sw t1, 4(sp)
sw t2, 0(sp)
mv t0, a0 # t0 = f32bits
srli t1, t0, 23
andi t1, t1, 0xFF # t1 = exp
li t2, 0xFF
beq t1, t2, f_is_special # exp==255 => NaN/Inf, just take high16
# rounding: f32bits += ((f32bits>>16)&1) + 0x7FFF; then >>16
srli t1, t0, 16
andi t1, t1, 1
add t0, t0, t1
li t1, 0x7FFF
add t0, t0, t1
srli t0, t0, 16
li t1, 0xFFFF
and t0, t0, t1
mv a0, t0
j f_ret
f_is_special:
srli t0, t0, 16
li t1, 0xFFFF
and t0, t0, t1
mv a0, t0
f_ret:
lw t2, 0(sp)
lw t1, 4(sp)
lw t0, 8(sp)
lw ra, 12(sp)
addi sp, sp, 16
ret
```
:::
### Special Case Check(C Code)
:::spoiler open
```c=
static inline bool bf16_isnan(bf16_t a)
{
return ((a.bits & BF16_EXP_MASK) == BF16_EXP_MASK) &&
(a.bits & BF16_MANT_MASK);
}
static inline bool bf16_isinf(bf16_t a)
{
return ((a.bits & BF16_EXP_MASK) == BF16_EXP_MASK) &&
!(a.bits & BF16_MANT_MASK);
}
static inline bool bf16_iszero(bf16_t a)
{
return !(a.bits & 0x7FFF);
}
```
:::
### Special Case Check(Assembly Code)
> Input : BF16
> Output : Zero/Inf/Nan/Normal
:::spoiler open
```asm=
# iszero: (bits & 0x7FFF) == 0
li t1, 0x7FFF
and t2, t0, t1
beqz t2, is_zero
# isinf / isnan: (exp==0xFF) and (mant==0/!=0)
# get exp = (bits>>7)&0xFF
srli t1, t0, 7
andi t1, t1, 0xFF
li t2, 0xFF
bne t1, t2, is_normal # exp != 0xFF -> normal
# exp==0xFF => check mant
andi t1, t0, 0x7F # mant
beqz t1, is_inf
# nan
la a0, msg_nan
li a7, 4
ecall
j done_flags
is_inf:
la a0, msg_inf
li a7, 4
ecall
j done_flags
is_zero:
la a0, msg_zero
li a7, 4
ecall
j done_flags
is_normal:
la a0, msg_norm
li a7, 4
ecall
done_flags:
lw t2, 0(sp)
lw t1, 4(sp)
lw t0, 8(sp)
lw ra, 12(sp)
addi sp, sp, 16
ret
```
:::
### BF16 Operations(C Code)
:::spoiler open
``` c=
static inline bf16_t bf16_add(bf16_t a, bf16_t b)
{
uint16_t sign_a = (a.bits >> 15) & 1;
uint16_t sign_b = (b.bits >> 15) & 1;
int16_t exp_a = ((a.bits >> 7) & 0xFF);
int16_t exp_b = ((b.bits >> 7) & 0xFF);
uint16_t mant_a = a.bits & 0x7F;
uint16_t mant_b = b.bits & 0x7F;
if (exp_a == 0xFF) {
if (mant_a)
return a;
if (exp_b == 0xFF)
return (mant_b || sign_a == sign_b) ? b : BF16_NAN();
return a;
}
if (exp_b == 0xFF)
return b;
if (!exp_a && !mant_a)
return b;
if (!exp_b && !mant_b)
return a;
if (exp_a)
mant_a |= 0x80;
if (exp_b)
mant_b |= 0x80;
int16_t exp_diff = exp_a - exp_b;
uint16_t result_sign;
int16_t result_exp;
uint32_t result_mant;
if (exp_diff > 0) {
result_exp = exp_a;
if (exp_diff > 8)
return a;
mant_b >>= exp_diff;
} else if (exp_diff < 0) {
result_exp = exp_b;
if (exp_diff < -8)
return b;
mant_a >>= -exp_diff;
} else {
result_exp = exp_a;
}
if (sign_a == sign_b) {
result_sign = sign_a;
result_mant = (uint32_t) mant_a + mant_b;
if (result_mant & 0x100) {
result_mant >>= 1;
if (++result_exp >= 0xFF)
return (bf16_t) {.bits = (result_sign << 15) | 0x7F80};
}
} else {
if (mant_a >= mant_b) {
result_sign = sign_a;
result_mant = mant_a - mant_b;
} else {
result_sign = sign_b;
result_mant = mant_b - mant_a;
}
if (!result_mant)
return BF16_ZERO();
while (!(result_mant & 0x80)) {
result_mant <<= 1;
if (--result_exp <= 0)
return BF16_ZERO();
}
}
return (bf16_t) {
.bits = (result_sign << 15) | ((result_exp & 0xFF) << 7) |
(result_mant & 0x7F),
};
}
static inline bf16_t bf16_sub(bf16_t a, bf16_t b)
{
b.bits ^= BF16_SIGN_MASK;
return bf16_add(a, b);
}
static inline bf16_t bf16_mul(bf16_t a, bf16_t b)
{
uint16_t sign_a = (a.bits >> 15) & 1;
uint16_t sign_b = (b.bits >> 15) & 1;
int16_t exp_a = ((a.bits >> 7) & 0xFF);
int16_t exp_b = ((b.bits >> 7) & 0xFF);
uint16_t mant_a = a.bits & 0x7F;
uint16_t mant_b = b.bits & 0x7F;
uint16_t result_sign = sign_a ^ sign_b;
if (exp_a == 0xFF) {
if (mant_a)
return a;
if (!exp_b && !mant_b)
return BF16_NAN();
return (bf16_t) {.bits = (result_sign << 15) | 0x7F80};
}
if (exp_b == 0xFF) {
if (mant_b)
return b;
if (!exp_a && !mant_a)
return BF16_NAN();
return (bf16_t) {.bits = (result_sign << 15) | 0x7F80};
}
if ((!exp_a && !mant_a) || (!exp_b && !mant_b))
return (bf16_t) {.bits = result_sign << 15};
int16_t exp_adjust = 0;
if (!exp_a) {
while (!(mant_a & 0x80)) {
mant_a <<= 1;
exp_adjust--;
}
exp_a = 1;
} else
mant_a |= 0x80;
if (!exp_b) {
while (!(mant_b & 0x80)) {
mant_b <<= 1;
exp_adjust--;
}
exp_b = 1;
} else
mant_b |= 0x80;
uint32_t result_mant = (uint32_t) mant_a * mant_b;
int32_t result_exp = (int32_t) exp_a + exp_b - BF16_EXP_BIAS + exp_adjust;
if (result_mant & 0x8000) {
result_mant = (result_mant >> 8) & 0x7F;
result_exp++;
} else
result_mant = (result_mant >> 7) & 0x7F;
if (result_exp >= 0xFF)
return (bf16_t) {.bits = (result_sign << 15) | 0x7F80};
if (result_exp <= 0) {
if (result_exp < -6)
return (bf16_t) {.bits = result_sign << 15};
result_mant >>= (1 - result_exp);
result_exp = 0;
}
return (bf16_t) {.bits = (result_sign << 15) | ((result_exp & 0xFF) << 7) |
(result_mant & 0x7F)};
}
static inline bf16_t bf16_div(bf16_t a, bf16_t b)
{
uint16_t sign_a = (a.bits >> 15) & 1;
uint16_t sign_b = (b.bits >> 15) & 1;
int16_t exp_a = ((a.bits >> 7) & 0xFF);
int16_t exp_b = ((b.bits >> 7) & 0xFF);
uint16_t mant_a = a.bits & 0x7F;
uint16_t mant_b = b.bits & 0x7F;
uint16_t result_sign = sign_a ^ sign_b;
if (exp_b == 0xFF) {
if (mant_b)
return b;
/* Inf/Inf = NaN */
if (exp_a == 0xFF && !mant_a)
return BF16_NAN();
return (bf16_t) {.bits = result_sign << 15};
}
if (!exp_b && !mant_b) {
if (!exp_a && !mant_a)
return BF16_NAN();
return (bf16_t) {.bits = (result_sign << 15) | 0x7F80};
}
if (exp_a == 0xFF) {
if (mant_a)
return a;
return (bf16_t) {.bits = (result_sign << 15) | 0x7F80};
}
if (!exp_a && !mant_a)
return (bf16_t) {.bits = result_sign << 15};
if (exp_a)
mant_a |= 0x80;
if (exp_b)
mant_b |= 0x80;
uint32_t dividend = (uint32_t) mant_a << 15;
uint32_t divisor = mant_b;
uint32_t quotient = 0;
for (int i = 0; i < 16; i++) {
quotient <<= 1;
if (dividend >= (divisor << (15 - i))) {
dividend -= (divisor << (15 - i));
quotient |= 1;
}
}
int32_t result_exp = (int32_t) exp_a - exp_b + BF16_EXP_BIAS;
if (!exp_a)
result_exp--;
if (!exp_b)
result_exp++;
if (quotient & 0x8000)
quotient >>= 8;
else {
while (!(quotient & 0x8000) && result_exp > 1) {
quotient <<= 1;
result_exp--;
}
quotient >>= 8;
}
quotient &= 0x7F;
if (result_exp >= 0xFF)
return (bf16_t) {.bits = (result_sign << 15) | 0x7F80};
if (result_exp <= 0)
return (bf16_t) {.bits = result_sign << 15};
return (bf16_t) {.bits = (result_sign << 15) | ((result_exp & 0xFF) << 7) |
(quotient & 0x7F)};
}
static inline bf16_t bf16_sqrt(bf16_t a)
{
uint16_t sign = (a.bits >> 15) & 1;
int16_t exp = ((a.bits >> 7) & 0xFF);
uint16_t mant = a.bits & 0x7F;
/* Handle special cases */
if (exp == 0xFF) {
if (mant)
return a; /* NaN propagation */
if (sign)
return BF16_NAN(); /* sqrt(-Inf) = NaN */
return a; /* sqrt(+Inf) = +Inf */
}
/* sqrt(0) = 0 (handle both +0 and -0) */
if (!exp && !mant)
return BF16_ZERO();
/* sqrt of negative number is NaN */
if (sign)
return BF16_NAN();
/* Flush denormals to zero */
if (!exp)
return BF16_ZERO();
/* Direct bit manipulation square root algorithm */
/* For sqrt: new_exp = (old_exp - bias) / 2 + bias */
int32_t e = exp - BF16_EXP_BIAS;
int32_t new_exp;
/* Get full mantissa with implicit 1 */
uint32_t m = 0x80 | mant; /* Range [128, 256) representing [1.0, 2.0) */
/* Adjust for odd exponents: sqrt(2^odd * m) = 2^((odd-1)/2) * sqrt(2*m) */
if (e & 1) {
m <<= 1; /* Double mantissa for odd exponent */
new_exp = ((e - 1) >> 1) + BF16_EXP_BIAS;
} else {
new_exp = (e >> 1) + BF16_EXP_BIAS;
}
/* Now m is in range [128, 256) or [256, 512) if exponent was odd */
/* Binary search for integer square root */
/* We want result where result^2 = m * 128 (since 128 represents 1.0) */
uint32_t low = 90; /* Min sqrt (roughly sqrt(128)) */
uint32_t high = 256; /* Max sqrt (roughly sqrt(512)) */
uint32_t result = 128; /* Default */
/* Binary search for square root of m */
while (low <= high) {
uint32_t mid = (low + high) >> 1;
uint32_t sq = (mid * mid) / 128; /* Square and scale */
if (sq <= m) {
result = mid; /* This could be our answer */
low = mid + 1;
} else {
high = mid - 1;
}
}
/* result now contains sqrt(m) * sqrt(128) / sqrt(128) = sqrt(m) */
/* But we need to adjust the scale */
/* Since m is scaled where 128=1.0, result should also be scaled same way */
/* Normalize to ensure result is in [128, 256) */
if (result >= 256) {
result >>= 1;
new_exp++;
} else if (result < 128) {
while (result < 128 && new_exp > 1) {
result <<= 1;
new_exp--;
}
}
/* Extract 7-bit mantissa (remove implicit 1) */
uint16_t new_mant = result & 0x7F;
/* Check for overflow/underflow */
if (new_exp >= 0xFF)
return (bf16_t) {.bits = 0x7F80}; /* +Inf */
if (new_exp <= 0)
return BF16_ZERO();
return (bf16_t) {.bits = ((new_exp & 0xFF) << 7) | new_mant};
}
```
:::
### BF16 Operations - Add&Sub(Assembly Code)
:::spoiler open
```asm=
bf16_add:
addi sp, sp, -40
sw ra, 36(sp)
sw s0, 32(sp)
sw s1, 28(sp)
sw s2, 24(sp)
sw s3, 20(sp)
sw s4, 16(sp)
sw s5, 12(sp)
sw s6, 8(sp)
sw s7, 4(sp)
mv s0, a0 # a
mv s1, a1 # b
# extract fields
srli s2, s0, 15 # sign_a
andi s2, s2, 1
srli s3, s1, 15 # sign_b
andi s3, s3, 1
srli s4, s0, 7 # exp_a
andi s4, s4, 0xFF
srli s5, s1, 7 # exp_b
andi s5, s5, 0xFF
andi s6, s0, 0x7F # mant_a
andi s7, s1, 0x7F # mant_b
# handle specials (NaN/Inf/zero) - same logic as C
# if exp_a==0xFF:
li t0, 0xFF
bne s4, t0, add_chk_b_inf
# exp_a==0xFF
bnez s6, add_ret_a # NaN -> return a
# a is Inf
bne s5, t0, add_ret_a # if b not Inf -> return a
# both inf
bnez s7, add_ret_b # b NaN -> return b
# both inf with signs: same sign -> b, else NaN
bne s2, s3, add_nan
j add_ret_b
add_chk_b_inf:
bne s5, t0, add_chk_a_zero
# b is Inf/NaN
bnez s7, add_ret_b
j add_ret_b
add_chk_a_zero:
# if a is zero (exp==0 && mant==0) return b
bnez s4, add_chk_b_zero
beqz s6, add_ret_b
add_chk_b_zero:
# if b is zero -> return a
bnez s5, add_prep
beqz s7, add_ret_a
add_prep:
# add hidden 1 if normalized
beqz s4, add_nohid_a
ori s6, s6, 0x80
add_nohid_a:
beqz s5, add_nohid_b
ori s7, s7, 0x80
add_nohid_b:
# align exponents
sub t0, s4, s5 # exp_diff = exp_a - exp_b
mv t1, s4 # result_exp
bgtz t0, add_shift_b
bltz t0, add_shift_a
j add_same_exp
add_shift_b:
li t2, 8
bgt t0, t2, add_ret_a # if diff>8 return a (underflow of b side)
mv t1, s4
srl s7, s7, t0
j add_do
add_shift_a:
neg t3, t0 # -exp_diff
li t2, 8
bgt t3, t2, add_ret_b # if diff<-8 return b
mv t1, s5
srl s6, s6, t3
j add_do
add_same_exp:
mv t1, s4
add_do:
# same sign -> add mant; else subtract
bne s2, s3, add_diff_sign
# same sign
add t4, s6, s7 # result_mant
# if overflow (bit8)
li t5, 0x100
and t2, t4, t5
beqz t2, add_pack_same
srli t4, t4, 1
addi t1, t1, 1
li t2, 0xFF
blt t1, t2, add_pack_same
# overflow to INF
slli t6, s2, 15
li t1, 0x7F80
or t6, t6, t1
mv a0, t6
j add_epilogue
add_pack_same:
# pack with same sign
slli t6, s2, 15
slli t2, t1, 7
andi t4, t4, 0x7F
or t6, t6, t2
or t6, t6, t4
mv a0, t6
j add_epilogue
add_diff_sign:
# subtract larger - smaller
bge s6, s7, add_a_ge_b
mv t6, s3 # result_sign
sub t4, s7, s6
j add_norm_sub
add_a_ge_b:
mv t6, s2
sub t4, s6, s7
add_norm_sub:
beqz t4, add_zero
# normalize until mant has bit7 set
li t2, 0x80
add_norm_loop:
and t5, t4, t2
bnez t5, add_pack_sub
slli t4, t4, 1
addi t1, t1, -1
blez t1, add_zero
j add_norm_loop
add_pack_sub:
slli t5, t6, 15
slli t2, t1, 7
andi t4, t4, 0x7F
or t5, t5, t2
or t5, t5, t4
mv a0, t5
j add_epilogue
add_zero:
li a0, 0x0000
j add_epilogue
add_nan:
li a0, 0x7FC0
j add_epilogue
add_ret_a:
mv a0, s0
j add_epilogue
add_ret_b:
mv a0, s1
add_epilogue:
lw s7, 4(sp)
lw s6, 8(sp)
lw s5, 12(sp)
lw s4, 16(sp)
lw s3, 20(sp)
lw s2, 24(sp)
lw s1, 28(sp)
lw s0, 32(sp)
lw ra, 36(sp)
addi sp, sp, 40
ret
bf16_sub:
addi sp, sp, -8
sw ra, 4(sp)
sw t1, 0(sp)
li t1, 0x8000
xor a1, a1, t1
jal ra, bf16_add
lw t1, 0(sp)
lw ra, 4(sp)
addi sp, sp, 8
ret
```
:::
### BF16 Operations - Mul&Div(Assembly Code)
:::spoiler open
```asm=
bf16_mul:
addi sp, sp, -48
sw ra, 44(sp)
sw s0, 40(sp)
sw s1, 36(sp)
sw s2, 32(sp)
sw s3, 28(sp)
sw s4, 24(sp)
sw s5, 20(sp)
sw s6, 16(sp)
sw s7, 12(sp)
sw s8, 8(sp)
sw s9, 4(sp)
mv s0, a0 # a
mv s1, a1 # b
# fields
srli s2, s0, 15 # sign_a
andi s2, s2, 1
srli s3, s1, 15 # sign_b
andi s3, s3, 1
xor s4, s2, s3 # result_sign
srli s5, s0, 7 # exp_a
andi s5, s5, 0xFF
srli s6, s1, 7 # exp_b
andi s6, s6, 0xFF
andi s7, s0, 0x7F # mant_a
andi s8, s1, 0x7F # mant_b
# specials
li t0, 0xFF
beq s5, t0, mul_a_inf_nan
beq s6, t0, mul_b_inf_nan
# zero check
beqz s5, mul_a_sub_chk
j mul_a_ok
mul_a_sub_chk:
beqz s7, mul_ret_zero
mul_a_ok:
beqz s6, mul_b_sub_chk
j mul_b_ok
mul_b_sub_chk:
beqz s8, mul_ret_zero
mul_b_ok:
# subnormal normalization + hidden 1
li s9, 0 # exp_adjust
beqz s5, mul_norm_a
ori s7, s7, 0x80
j mul_norm_b
mul_norm_a:
# shift left until bit7 set, dec exp_adjust
li t1, 0x80
mul_na_loop:
and t2, s7, t1
bnez t2, mul_na_done
slli s7, s7, 1
addi s9, s9, -1
j mul_na_loop
mul_na_done:
li s5, 1
mul_norm_b:
beqz s6, mul_norm_b2
ori s8, s8, 0x80
j mul_mul
mul_norm_b2:
li t1, 0x80
mul_nb_loop:
and t2, s8, t1
bnez t2, mul_nb_done
slli s8, s8, 1
addi s9, s9, -1
j mul_nb_loop
mul_nb_done:
li s6, 1
mul_mul:
# 8-bit * 8-bit -> up to 16-bit
mul t3, s7, s8 # result_mant (16-bit max)
# result_exp = exp_a + exp_b - 127 + exp_adjust
la t4, BF16_EXP_BIAS
lw t4, 0(t4) # 127
add t5, s5, s6
sub t5, t5, t4
add t5, t5, s9 # result_exp
# normalize: if (mant & 0x8000) -> shift>>8, exp++
li t6, 0x8000
and t1, t3, t6
beqz t1, mul_shift7
srli t3, t3, 8
addi t5, t5, 1
j mul_pack_check
mul_shift7:
srli t3, t3, 7
mul_pack_check:
# overflow / underflow
li t1, 0xFF
bge t5, t1, mul_ret_inf
blez t5, mul_underflow_chk
# normal pack
andi t3, t3, 0x7F
slli t0, s4, 15
slli t1, t5, 7
or t0, t0, t1
or t0, t0, t3
mv a0, t0
j mul_done
mul_underflow_chk:
# if result_exp < -6 -> zero else shift mant right and set exp=0
li t1, -6
bgt t5, t1, mul_denorm
mul_ret_zero:
slli t0, s4, 15
mv a0, t0
j mul_done
mul_denorm:
neg t2, t5 # 0 - exp
addi t2, t2, 1 # 1 - result_exp
srl t3, t3, t2
andi t3, t3, 0x7F
slli t0, s4, 15
or t0, t0, t3
mv a0, t0
j mul_done
mul_ret_inf:
slli t0, s4, 15
li t1, 0x7F80
or t0, t0, t1
mv a0, t0
j mul_done
mul_a_inf_nan:
andi t1, s0, 0x7F
bnez t1, mul_ret_a # NaN -> a
# a=Inf
# if b==0 -> NaN; else Inf with sign
beqz s6, mul_b0_chk_m
j mul_ret_sign_inf
mul_b0_chk_m:
beqz s8, mul_ret_nan
j mul_ret_sign_inf
mul_b_inf_nan:
andi t1, s1, 0x7F
bnez t1, mul_ret_b
beqz s5, mul_a0_chk_m
j mul_ret_sign_inf
mul_a0_chk_m:
beqz s7, mul_ret_nan
j mul_ret_sign_inf
mul_ret_a:
mv a0, s0
j mul_done
mul_ret_b:
mv a0, s1
j mul_done
mul_ret_sign_inf:
slli t0, s4, 15
li t1, 0x7F80
or t0, t0, t1
mv a0, t0
j mul_done
mul_ret_nan:
li a0, 0x7FC0
mul_done:
lw s9, 4(sp)
lw s8, 8(sp)
lw s7, 12(sp)
lw s6, 16(sp)
lw s5, 20(sp)
lw s4, 24(sp)
lw s3, 28(sp)
lw s2, 32(sp)
lw s1, 36(sp)
lw s0, 40(sp)
lw ra, 44(sp)
addi sp, sp, 48
ret
#########################################
bf16_div:
addi sp, sp, -56
sw ra, 52(sp)
sw s0, 48(sp)
sw s1, 44(sp)
sw s2, 40(sp)
sw s3, 36(sp)
sw s4, 32(sp)
sw s5, 28(sp)
sw s6, 24(sp)
sw s7, 20(sp)
sw s8, 16(sp)
sw s9, 12(sp)
sw s10,8(sp)
sw s11,4(sp)
mv s0, a0 # a
mv s1, a1 # b
# fields
srli s2, s0, 15 # sign_a
andi s2, s2, 1
srli s3, s1, 15 # sign_b
andi s3, s3, 1
xor s4, s2, s3 # result_sign
srli s5, s0, 7 # exp_a
andi s5, s5, 0xFF
srli s6, s1, 7 # exp_b
andi s6, s6, 0xFF
andi s7, s0, 0x7F # mant_a
andi s8, s1, 0x7F # mant_b
# specials (follow your C)
li t0, 0xFF
beq s6, t0, div_b_inf_nan
beqz s6, div_b_zero_chk
j div_chk_a
div_b_zero_chk:
beqz s8, div_div_by_zero
j div_chk_a
div_b_inf_nan:
bnez s8, div_ret_b # NaN
# b=Inf
beq s5, t0, div_inf_inf_nan
slli t1, s4, 15 # result is signed zero
mv a0, t1
j div_done
div_inf_inf_nan:
# a Inf and b Inf => NaN
andi t1, s0, 0x7F
bnez t1, div_ret_a
li a0, 0x7FC0
j div_done
div_div_by_zero:
# a/0 -> Inf (unless a==0 -> NaN handled below)
beqz s5, div_a_zero_chk_m
slli t1, s4, 15
li t2, 0x7F80
or t1, t1, t2
mv a0, t1
j div_done
div_a_zero_chk_m:
beqz s7, div_ret_nan
slli t1, s4, 15
li t2, 0x7F80
or t1, t1, t2
mv a0, t1
j div_done
div_chk_a:
beq s5, t0, div_a_inf_nan
beqz s5, div_a_zero_sub_chk
j div_prep
div_a_zero_sub_chk:
beqz s7, div_ret_zero
div_a_inf_nan:
bnez s7, div_ret_a # NaN
# a=Inf
slli t1, s4, 15
li t2, 0x7F80
or t1, t1, t2
mv a0, t1
j div_done
div_ret_zero:
slli t1, s4, 15
mv a0, t1
j div_done
div_ret_a:
mv a0, s0
j div_done
div_ret_b:
mv a0, s1
j div_done
div_ret_nan:
li a0, 0x7FC0
j div_done
div_prep:
# add hidden 1 if normalized
beqz s5, div_nohid_a
ori s7, s7, 0x80
div_nohid_a:
beqz s6, div_nohid_b
ori s8, s8, 0x80
div_nohid_b:
# long division to 16-bit quotient
slli t2, s7, 15 # dividend
mv t3, s8 # divisor
li t4, 0 # quotient
li t5, 0
li t6, 16 # 會跑 16 次:shift = 15..0
div_loop:
addi t6, t6, -1 # 先減,t6 = 15..0
slli t4, t4, 1 # quotient <<= 1
mv t5, t3 # t5 = divisor
addi t1, t6, 0 # t1 = 當前位移量 (15..0)
sll t5, t5, t1 # t5 = divisor << shift
bgeu t2, t5, div_sub_ok
j div_next
div_sub_ok:
sub t2, t2, t5 # dividend -= (divisor << shift)
ori t4, t4, 1 # quotient|=1
div_next:
bnez t6, div_loop
# result_exp = exp_a - exp_b + 127; adjust for subnormals
la t0, BF16_EXP_BIAS
lw t0, 0(t0)
sub t1, s5, s6
add t1, t1, t0
beqz s5, div_adj_a
j div_adj_b
div_adj_a:
addi t1, t1, -1
div_adj_b:
beqz s6, div_adj_b2
j div_norm
div_adj_b2:
addi t1, t1, 1
div_norm:
# normalize quotient to have bit15 set then >>8 to 7-bit mant
li t0, 0x8000
and t2, t4, t0
bnez t2, div_hi
# shift left until bit15 set, dec exp
li t2, 0x8000
div_norm2:
and t5, t4, t2
bnez t5, div_shift_done
slli t4, t4, 1
addi t1, t1, -1
j div_norm2
div_shift_done:
srli t4, t4, 8
j div_pack
div_hi:
srli t4, t4, 8
div_pack:
andi t4, t4, 0x7F
li t2, 0xFF
bge t1, t2, div_ret_inf
blez t1, div_underflow
slli t0, s4, 15
slli t2, t1, 7
or t0, t0, t2
or t0, t0, t4
mv a0, t0
j div_done
div_ret_inf:
slli t0, s4, 15
li t1, 0x7F80
or t0, t0, t1
mv a0, t0
j div_done
div_underflow:
slli t0, s4, 15
mv a0, t0
div_done:
lw s11,4(sp)
lw s10,8(sp)
lw s9, 12(sp)
lw s8, 16(sp)
lw s7, 20(sp)
lw s6, 24(sp)
lw s5, 28(sp)
lw s4, 32(sp)
lw s3, 36(sp)
lw s2, 40(sp)
lw s1, 44(sp)
lw s0, 48(sp)
lw ra, 52(sp)
addi sp, sp, 56
ret
```
:::
### BF16 Operations - Sqrt(Assembly Code)
:::spoiler open
```asm=
bf16_sqrt:
addi sp, sp, -40
sw ra, 36(sp)
sw s0, 32(sp)
sw s1, 28(sp)
sw s2, 24(sp)
sw s3, 20(sp)
sw s4, 16(sp)
sw s5, 12(sp)
sw s6, 8(sp)
sw s7, 4(sp)
mv s0, a0
srli s1, s0, 15
andi s1, s1, 1 # sign
srli s2, s0, 7
andi s2, s2, 0xFF # exp
andi s3, s0, 0x7F # mant
# if exp==0xFF
li t0, 0xFF
bne s2, t0, sqrt_check_zero
bnez s3, sqrt_ret_a # NaN propagate
bnez s1, sqrt_ret_nan # sqrt(-Inf) = NaN
mv a0, s0 # +Inf
j sqrt_done
sqrt_check_zero:
beqz s2, sqrt_chk_mant
j sqrt_check_sign
sqrt_chk_mant:
beqz s3, sqrt_ret_zero
sqrt_check_sign:
bnez s1, sqrt_ret_nan # negative -> NaN
beqz s2, sqrt_ret_zero # denorm -> 0
# e = exp - 127
la t1, BF16_EXP_BIAS
lw t1, 0(t1)
sub s4, s2, t1 # s4 = e
# m = 0x80 | mant
ori s5, s3, 0x80
# odd exponent?
andi t2, s4, 1
beqz t2, sqrt_even_exp
slli s5, s5, 1
addi s4, s4, -1
sqrt_even_exp:
srai s4, s4, 1
add s4, s4, t1 # new_exp = (e>>1)+bias
# binary search for sqrt(m)
li t3, 90
li t4, 256
li s6, 128
sqrt_loop:
bgt t3, t4, sqrt_loop_end
add t5, t3, t4
srli t5, t5, 1
mul t6, t5, t5
srli t6, t6, 7
ble t6, s5, sqrt_set
j sqrt_high
sqrt_set:
mv s6, t5
addi t3, t5, 1
j sqrt_loop
sqrt_high:
addi t4, t5, -1
j sqrt_loop
sqrt_loop_end:
mv s5, s6
# normalize to [128,256]
li t0, 256
bge s5, t0, sqrt_shift_r
li t0, 128
blt s5, t0, sqrt_shift_l
j sqrt_pack
sqrt_shift_r:
srli s5, s5, 1
addi s4, s4, 1
j sqrt_pack
sqrt_shift_l:
li t0, 128
sqrt_l_loop:
bge s5, t0, sqrt_pack
slli s5, s5, 1
addi s4, s4, -1
j sqrt_l_loop
sqrt_pack:
andi s5, s5, 0x7F
li t0, 0xFF
bge s4, t0, sqrt_ret_inf
blez s4, sqrt_ret_zero
slli t1, s4, 7
or a0, t1, s5
j sqrt_done
sqrt_ret_a:
mv a0, s0
j sqrt_done
sqrt_ret_inf:
li a0, 0x7F80
j sqrt_done
sqrt_ret_nan:
li a0, 0x7FC0
j sqrt_done
sqrt_ret_zero:
li a0, 0x0000
sqrt_done:
lw s7, 4(sp)
lw s6, 8(sp)
lw s5, 12(sp)
lw s4, 16(sp)
lw s3, 20(sp)
lw s2, 24(sp)
lw s1, 28(sp)
lw s0, 32(sp)
lw ra, 36(sp)
addi sp, sp, 40
ret
```
:::
### Result(Ripes console with 4 test data)
>
Test1 : a = 4.0, b = 2.0 Expected : $a+b=6, a-b=2, a*b=8, a/b=2, a^{0.5}=2$
Test2 : a = 9.0, b = 3.0 Expected : $a+b=12, a-b=6, a*b=27, a/b=3, a^{0.5}=3$
Test3 : a = $\infty$, b = 2.0 Expected : $a+b=\infty, a-b=\infty, a*b=\infty, a/b=\infty, a^{0.5}=\infty$
Test4 : a = 1.0, b = 0.0 Expected : $a+b=1, a-b=1, a*b=0, a/b=Nan, a^{0.5}=1$
```
A (f32 hex): 0x40800000 -> BF16: 0x4080 [normal]
B (f32 hex): 0x40000000 -> BF16: 0x4000 [normal]
ADD (A+B) BF16: 0x40C0 [normal]
SUB (A-B) BF16: 0x4000 [normal]
MUL (A*B) BF16: 0x4100 [normal]
DIV (A/B) BF16: 0x4000 [normal]
SQRT (A^0.5) BF16: 0x4000 [normal]
A (f32 hex): 0x41100000 -> BF16: 0x4110 [normal]
B (f32 hex): 0x40400000 -> BF16: 0x4040 [normal]
ADD (A+B) BF16: 0x4140 [normal]
SUB (A-B) BF16: 0x40C0 [normal]
MUL (A*B) BF16: 0x41D8 [normal]
DIV (A/B) BF16: 0x4040 [normal]
SQRT (A^0.5) BF16: 0x4040 [normal]
A (f32 hex): 0x7F7FFFFF -> BF16: 0x7F80 [inf]
B (f32 hex): 0x40000000 -> BF16: 0x4000 [normal]
ADD (A+B) BF16: 0x7F80 [inf]
SUB (A-B) BF16: 0x7F80 [inf]
MUL (A*B) BF16: 0x7F80 [inf]
DIV (A/B) BF16: 0x7F80 [inf]
SQRT (A^0.5) BF16: 0x7F80 [inf]
A (f32 hex): 0x3F800000 -> BF16: 0x3F80 [normal]
B (f32 hex): 0x00000000 -> BF16: 0x0000 [zero]
ADD (A+B) BF16: 0x3F80 [normal]
SUB (A-B) BF16: 0x3F80 [normal]
MUL (A*B) BF16: 0x0000 [zero]
DIV (A/B) BF16: 0x7F80 [inf]
SQRT (A^0.5) BF16: 0x3F80 [normal]
```
## Optimize LeetCode [Problem #342](https://leetcode.com/problems/power-of-four/description/) using CLZ to reduce excution time.
### 342. Power of Four
Given an integer n, return true if it is a power of four. Otherwise, return false.
An integer n is a power of four, if there exists an integer x such that n == 4x.
Example 1:
>Input: n = 16
Output: true
Example 2:
>Input: n = 5
Output: false
Example 3:
>Input: n = 1
Output: true
Constraints:
> -231 <= n <= 231 - 1
### Without using CLZ(C Code)
:::spoiler open
```c=
bool isPowerOfFour(int n) {
if (n <= 0) return false;
while (n % 4 == 0)
n /= 4;
return n == 1;
}
```
:::
### Without using CLZ(Assembly Code)
:::spoiler open
```asm=
isPowerOfFour:
blez a0, ret_false
div_loop:
# t0 = n % 4
andi t0, a0, 3
bnez t0, check_one
# n /= 4
srli a0, a0, 2
j div_loop
check_one:
li t0, 1
beq a0, t0, ret_true
j ret_false
ret_true:
li a0, 1
ret
ret_false:
li a0, 0
ret
```
:::
### Using CLZ(C Code)
:::spoiler open
```c=
static inline unsigned clz(uint32_t x)
{
int n = 32, c = 16;
do {
uint32_t y = x >> c;
if (y) {
n -= c;
x = y;
}
c >>= 1;
} while (c);
return n - x;
}
bool isPowerOfFour(int n) {
if (n <= 0) return false;
if (n & (n - 1)) return false;
int msb = 31 - clz((uint32_t)n);
return (msb % 2 == 0);
}
```
:::
### Using CLZ(Assembly Code)
:::spoiler open
```asm=
clz:
addi sp, sp, -12
sw ra, 8(sp)
sw t0, 4(sp)
sw t1, 0(sp)
beqz a0, clz_zero # if x==0 -> return 32
li t0, 0 # count = 0
li t2, 0x20 # limit = 32
clz_loop:
sll t1, a0, t0 # left shift to leading 1
bltz t1, clz_done # if sign bit=1 -> found MSB
addi t0, t0, 1 # count++
blt t0, t2, clz_loop # count < 32 → continue
clz_done:
mv a0, t0
j clz_exit
clz_zero:
li a0, 0x20 # return 32
clz_exit:
lw t1, 0(sp)
lw t0, 4(sp)
lw ra, 8(sp)
addi sp, sp, 12
ret
```
:::
### Using Optimize CLZ(C code)
:::spoiler open
```c=
int clz(uint32_t x) {
int n = 0;
if (x >> 16) { n += 16; x >>= 16; }
if (x >> 8) { n += 8; x >>= 8; }
if (x >> 4) { n += 4; x >>= 4; }
if (x >> 2) { n += 2; x >>= 2; }
if (x >> 1) { n += 1; }
return 31 - n;
}
```
:::
### Using Optimize CLZ(Assembly Code)
:::spoiler open
```asm=
clz:
addi sp, sp, -20
sw ra, 16(sp)
sw t0, 12(sp)
sw t1, 8(sp)
sw t2, 4(sp)
sw t3, 0(sp)
beqz a0, clz_zero # if x == 0 → return 32
li t3, 0 # n = 0
# Step 1: check upper 16 bits
srli t1, a0, 16
beqz t1, clz_step2
mv a0, t1 # x = x >> 16
li t2, 16
add t3, t3, t2 # n += 16
clz_step2:
# Step 2: check upper 8 bits
srli t1, a0, 8
beqz t1, clz_step3
mv a0, t1
li t2, 8
add t3, t3, t2
clz_step3:
# Step 3: check upper 4 bits
srli t1, a0, 4
beqz t1, clz_step4
mv a0, t1
li t2, 4
add t3, t3, t2
clz_step4:
# Step 4: check upper 2 bits
srli t1, a0, 2
beqz t1, clz_step5
mv a0, t1
li t2, 2
add t3, t3, t2
clz_step5:
# Step 5: final correction
srli t1, a0, 1
beqz t1, clz_done
li t2, 1
add t3, t3, t2
clz_done:
li t2, 31
sub a0, t2, t3 # a0 = 31 - n
j clz_exit
clz_zero:
li a0, 32
clz_exit:
lw t3, 0(sp)
lw t2, 4(sp)
lw t1, 8(sp)
lw t0, 12(sp)
lw ra, 16(sp)
addi sp, sp, 20
ret
```
:::
### Analysis Linear CLZ and Binary Search CLZ
- __Linear CLZ (bit-by-bit scan): checks every bit from MSB (bit31) to LSB one by one — up to 32 iterations.__
- __Binary-search CLZ: divides the search range by 2 each step (16 / 8 / 4 / 2 / 1), finishing in ≤5 steps.__
In some test cases (e.g., high MSB values), Linear CLZ may even show slightly lower cycles because it avoids multiple branch penalties.Binary Search CLZ reduces the number of shifts and branches from 32 to 5, giving a clear advantage in real hardware or pipelined systems. But in Ripes (pure RV32I, no CLZ, no branch prediction), Linear CLZ can appear comparable or even faster because of branch and stack overhead.
### Compare Linear CLZ and Binary Search CLZ
| Item| Linear CLZ| Binary Search CLZ|
| --------------------- | ------------------- | -------------------------------- |
| Number of Shifts| Up to 32 times| Up to 5 times|
| Time Complexity| Fixed O(32)| O(log₂32) = 5|
| Efficiency|Slow| Fast|
| Result Accuracy| Same| Same|
|Cycle(data=64)|235|107|
|Ins(data=64)|157|75|
|Result|||
## References
[Quiz 1](https://hackmd.io/@sysprog/arch2025-quiz1-sol)
[How to Write a Git Commit Message](https://cbea.ms/git-commit/)
[ca2025-quizzes](https://github.com/sysprog21/ca2025-quizzes)
[Ripes](https://github.com/mortbopet/Ripes)
[Hackmd Features](https://hackmd.io/features-tw?view)