contributed by <RBing>(Rong-Bing, Fang)
:::danger
Be aware of the headings with Markdown syntax.
:::
Abstract
Problem C in the quiz1 defines a set of functions to convert a 16-bit half-precision floating-point number (fp16) to a 32-bit single-precision floating-point number (fp32). The core function, fp16_to_fp32, shifts the 16-bit floating-point input into the upper half of a 32-bit word, separates the sign bit, and normalizes the mantissa and exponent. It then adjusts for the differences in exponent bias between half-precision and single-precision formats. The function handles special cases such as denormalized numbers, zero, NaN, and infinity, ensuring proper conversion by setting the correct bits for each scenario. The helper function my_clz is used to count leading zeros in the exponent and normalize denormalized numbers.
The following illustration is about each function and its corresponding RISC-Vassembly in Problem C of the quiz1.