contributed by < 周姵彣 >
quiz1 - problem B
The problem requires converting a 32-bit floating point number (IEEE 754 float format) into a 16-bit bfloat16 format.
Purpose
Reduce memory usage: bfloat16 uses 16 bits instead of 32 bits.
Increase computational speed: Faster processing due to reduced data size.
Maintain dynamic range: Exponent bits remain the same, allowing for the same numerical range.
Sacrifice precision: bfloat16 retains only 7 bits for the mantissa, resulting in lower precision, but generally sufficient.