rbing

@rbing

Joined on Mar 4, 2024

  • contributed by <RBing>(Rong-Bing, Fang) :::danger Be aware of the headings with Markdown syntax. ::: Abstract Problem C in the quiz1 defines a set of functions to convert a 16-bit half-precision floating-point number (fp16) to a 32-bit single-precision floating-point number (fp32). The core function, fp16_to_fp32, shifts the 16-bit floating-point input into the upper half of a 32-bit word, separates the sign bit, and normalizes the mantissa and exponent. It then adjusts for the differences in exponent bias between half-precision and single-precision formats. The function handles special cases such as denormalized numbers, zero, NaN, and infinity, ensuring proper conversion by setting the correct bits for each scenario. The helper function my_clz is used to count leading zeros in the exponent and normalize denormalized numbers. The following illustration is about each function and its corresponding RISC-Vassembly in Problem C of the quiz1.
     Like  Bookmark
  • WSL、VScode and Opencv 0. Setup a. download compiler and builder $ sudo apt-get install -y g++ $ sudo apt-get install -y cmake $ sudo apt-get install -y make $ sudo apt-get install -y wget $ sudo apt-get install -y unzip $ sudo apt-get install -y git b. Install various dependent libraries
     Like  Bookmark