contributed by <shhung
>
This assignment implements the conversion from a 64-bit integer to a 64 bit floating-point data type.
Instead of accumulating shifts through iteration, using clz is more efficient
64 bits value is represented using dword
, num
represents test data.
mask
is used to filter the mantisa of integer. Although under the IEEE 754 standard, Double-precision floating-point has a 53-bit mantissa(52 explicitly stored), in the context of rv32, two registers are used to store the value. Therefore, processing only needs to be done on the upper half of the bits.
maskclz
is used to implement popCount in clz
itof
clz
The program stores a 64-bit value in two registers, with the upper half in a0
and the lower half in a1
. itof
first checks if it exceeds the range that double storage allows, and if it does, the conversion process begins. It starts by determining the shifts to calculate the exponent, and then aligns and retrieves the mantissa based on the shifts value. Finally, it combines these components to obtain the IEEE 754 double format value.
Since two registers are used to store the value, operations such as addition, subtraction, bit shifting, etc., need to take into account the crossing of bits between the two registers.
Addition
For addition, it's relatively straightforwardโjust need to check whether there will be a carry from the lower half of the registers to the upper half. If the result after addition is less than either of the operands, it means there's a need for carrying over
Subtraction
For subtraction, in this assignment, the minuend is guaranteed to be greater than the subtrahend. Therefore, it's only necessary to check whether borrowing is needed from the upper half of the registers to the lower half. If borrowing is needed, then swap the minuend and subtrahend, and subtract the difference with the borrow.
Shifting
Shifting, whether left or right, will always encounter cases where the upper half crosses over to the lower half of the registers, or vice versa.
In the case of the left shift used in this assignment, if the shift is less than 32, it means the bits will move to the lower bits of the upper half of the registers. Therefore, reverse shifting is performed to the correct position, and then the or operation is used to set the bits in the upper half of the registers.
When the shift is greater than or equal to 32, it means the bits will move to the higher bits of the upper half of the registers. In this case, the shift direction aligns with the original operation, but it needs to subtract 32 to obtain the correct bits. These corrected bits are then directly assigned to the upper half of the registers, and the lower half is set to zero.
Since ripes doesn't provide output for 64-bit integers and 64-bit floating-point numbers, it's necessary to inspect the values in registers to verify the results.
The results are stored in two registers, with a0
holding the upper half and a1
holding the lower half.
Here are two tools for quick format conversion. The Hexadecimal to Decimal converter is used to obtain the decimal representation, while FractionConvert provides the IEEE 754 standard representation.
To have a clear comparison of results, I've stored the input values in s0 and s1 registers.
Comparing the execution results of different test cases, it can be observed that for larger values, the performance of conversion based on clz is inferior to the naive approach. However, for smaller values, it outperforms the naive conversion, and the performance difference becomes more pronounced as the values decrease.
The main difference lies in the fact that the clz method can obtain the shifts value in a fixed number of cycles, while the naive approach involves iterative calculations, leading to an increase in iterations as the values decrease.
0xBBFFFFFFFF
itof_clz | itof | |
---|---|---|
cycles | 234 | 156 |
Insrs: retired | 182 | 111 |
CPI | 1.29 | 1.41 |
IPC | 0.774 | 0.712 |
0x84f2
itof_clz | itof | |
---|---|---|
cycles | 234 | 342 |
Insrs: retired | 181 | 253 |
CPI | 1.29 | 1.35 |
IPC | 0.774 | 0.74 |
0x11
itof_clz | itof | |
---|---|---|
cycles | 234 | 430 |
Insrs: retired | 181 | 319 |
CPI | 1.29 | 1.35 |
IPC | 0.774 | 0.742 |
Thanks to population count
, we were able to achieve constant time complexity for CLZ.
In this assignment, we leveraged this optimization to improve the conversion from integers to floating-point numbers.