contributed by < ranvd
>
The sine function holds significant importance in both mathematics and computer graphics. Within the context of the project, the sine and cosine functions are employed to calculate the phase of where lines should be drawn.
Since sine function is defined as the ratio of the height of an angle to its corresponding hypotenuse, the value always sit in . We can generalize the sine value into a function like the image below.
According to Taylor expansion, any bounded continuous function can be approximate by Taylor series. So, In computer science, we can use this technique to get the sine value and fix the domain in . Derive from Taylor expansion, we can know that
fmul32
Since we don't support any NaN, Inf and denormalize number as the input. We can still provide a "just ready to used" floating point multiplication. We can return zero by checking the a and b value if either of them equals zero.
Furthermore, fundamental overflow detection can be carried out by examining the MSB in the result of the multiplication and confirming the equality of the true signed bits. Consequently, the statement r ^ ia ^ ib
can be generated. If an overflow occurs, the variable ovfl
will be set to -1; otherwise, it will be set to 0.
imul32
In floating-point multiplication, only the upper 25 bits are necessary when performing mantissa multiplication. To accommodate this, I have modified the function name from imul32
to imul24
in the following code. Instead of shifting the addend to the right, the approach involves shifting the result left, thereby ensuring the upper bit is consistently preserved.
Since the input argument of b
is always 24-bit lenght. I can easily use loop unrolling technique as below to reduce to execution cycle.
fdiv32
In the fdiv32
function, NaN, inf and denormalized number are not allowed as fmul32
. I still provide the basic floating point multiplication by the technique as the mention above.
idiv24
idiv24
can handle 32 bits integer division. The dividend and quotient can be update by shifting 1 bit left. If the dividend are greater then the divisor, the LSB of quotient should be 1. This is basicly the same as short division.
In the above code, we can know that a
always need to be shifted left. The only difference is whether b
should be subtract from a or not. In the opposite way, we can use -(a < 0)
to determine whether b
should be added back to a
or not, which is called Restoring division
Also, the loop time is fixed, the loop unrolling technique can fit very will.
Ignore the loop unrolling example (Can be done by simply copy paste).
fadd32
In floating-point addition, it is essential to align the smaller value with the greater value. In the straightforward approach, the comparison typically involves checking the exponent first and then comparing the mantissa. However, by leveraging the IEEE 754 format, this comparison can be performed simultaneously, which literaily means compare it directly, as the cmp_a
and cmp_b
shows.
After aligning the values, it is necessary to compare the equality of their signed bits. If the signed bits are not the same, subtraction should be applied.
f2i32
& i2f32
The functions i2f32
and f2i32
are used for converting values between floating-point and integer representations. While 2 means "to". i2f
means "int to float".
Sin
The sine value can be obtained using the Taylor series. The following code represents the corresponding implementation of the Taylor series for sine calculation
with -O0
option
with -O2
option
with -O3
option
fmul32
& imul24
By directly incorporating the logic of imul24
within fmul32
, we eliminate the need for an additional function call, which can improve performance and reduce the store/load operation overhead. This technique is often employed in optimization to enhance the efficiency of code execution.
fdiv32
& idiv24
The same idea comes to fdiv32 and idiv32. We embedded the idiv24
into fdiv32
to prevent additional load/store overhead.
fadd32
Upon analysis, it's clear that the CPI and the number of executed instructions (Instrs. Retired) align somewhere between the values achieved with the -O2
and -O0
options in the GCC compiler. This indicates there's ample opportunity for improvement.
In the optioned Ripes CPU architecture, there is no branch prediction mechanism, and the computation of branch addresses takes place in the 3rd stage. Consequently, a branch miss incurs a significant penalty of 2 cycles, which is notably high in this case.
Based on the given scenario, loop unrolling appears to be an ideal optimization technique in this case. I have applied the unrolling method to the imul24
and idiv24
functions, both of which are extensively used in this homework assignment.
Upon analysis, it is apparent that the CPI is lower than the -O3
optimization level, yet the number of executed instructions (Instructions Retired) is higher. We still execute about 2000 more operation in this implementation. This implies that there is still some room for optimizing our assembly code. There are likely some redundanct code that can be eliminated for better efficiency.
The values exhibit slight different when compared to sinf
in the C math library. The primary deviation arises from the rounding strategy employed. Our implementation consistently utilizes rounding down, whereas IEEE 754 adheres to the "Round to nearest even" method.
However, these discrepancies are acceptable, since floating point only guarantee accuracy up to 6 to 7 significant digits.
degree | mySin | sinf (math.h) |
---|---|---|
45.0 | 0.7071068287 | 0.7071067691 |
20.0 | 0.3420201540 | 0.3420201242 |
90.0 | 1.0000035763 | 1.0000000000 |