姵彣 - HackMD

Computer Architecture HW1

contributed by < 周姵彣 > quiz1 - problem B The problem requires converting a 32-bit floating point number (IEEE 754 float format) into a 16-bit bfloat16 format. Purpose Reduce memory usage: bfloat16 uses 16 bits instead of 32 bits. Increase computational speed: Faster processing due to reduced data size. Maintain dynamic range: Exponent bits remain the same, allowing for the same numerical range. Sacrifice precision: bfloat16 retains only 7 bits for the mantissa, resulting in lower precision, but generally sufficient.