---
tags: jserv, 2023-arch, RISC-V
---
# Assignment1: RISC-V Assembly and Instruction Pipeline
contributed by [freshLiver](https://gist.github.com/freshLiver/1b2300a91d466a7f2cc0a78b53fa5075)
## Quiz 1 Problem C
According to <https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc#integer-calling-convention>:
> Scalars that are 2×XLEN bits wide are passed in a pair of argument registers, with the low-order XLEN bits in the lower-numbered register and the high-order XLEN bits in the higher-numbered register. If no argument registers are available, the scalar is passed on the stack by value.
### Mask the trailing 1s
The function `mask_lowest_zero()` has a 64 bits integer as input. However, in order to understand the concept of this function, here we simplify this function into 8 bits version:
```c
uint8_t mask_lowest_zero(uint8_t x)
{
uint8_t mask = x;
mask &= (mask << 1) | 0x1;
mask &= (mask << 2) | 0x3;
mask &= (mask << 4) | 0xF;
return mask;
}
```
And the 8 bits input value `x`, or said the initial value of `mask`, can be represented in:
$$
mask = (b_7, b_6, ..., b_1, b_0), b_i = \{0,1\}
$$
Then, the expression `mask &= (mask << 1) | 0x1` (`mask & ((mask << 1) | 0x1)`) can be viewed as:
```text
b7 b6 b5 b4 b3 b2 b1 b0
&) b6 b5 b4 b3 b2 b1 b0 1
```
And the result (the new `mask`) could be represented as:
$$
mask = (\Pi^{7}_{i=6}b_i, \Pi^{6}_{i=5}b_i, ..., \Pi^{1}_{i=0}b_i, \Pi^{0}_{i=0}b_i)
$$
Similarly, the next expression `mask &= (mask << 2) | 0x3` (`mask & ((mask << 2) | 0x3)`) could be viewed as:
```text
b7&b6 b6&b5 b5&b4 b4&b3 b3&b2 b2&b1 b1&b0 b0&1
&) b5&b4 b4&b3 b3&b2 b2&b1 b1&b0 b0&1 1 1
```
And its result could be represented in:
$$
mask = (\Pi^{7}_{i=4}b_i, \Pi^{6}_{i=3}b_i, \Pi^{5}_{i=2}b_i, \Pi^{4}_{i=1}b_i, \Pi^{3}_{i=0}b_i, \Pi^{2}_{i=0}b_i, \Pi^{1}_{i=0}b_i, \Pi^{0}_{i=0}b_i)
$$
Then, the next operation `mask &= (mask << 4) | 0xF` (`mask & ((mask << 4) | 0xF)`) is:
```text
b7&b6&b5&b4 b6&b5&b4&b3 b5&b4&b3&b2 b4&b3&b2&b1 b3&b2&b1&b0 b2&b1&b0&1 b1&b0&1 b0&1&1
&) b3&b2&b1&b0 b2&b1&b0&1 b1&b0&1 b0&1&1 1 1 1 1
```
So, the result of the 8 bits version is:
$$
mask = (\Pi^{7}_{i=0}b_i, \Pi^{6}_{i=0}b_i, \Pi^{5}_{i=0}b_i, \Pi^{4}_{i=0}b_i, \Pi^{3}_{i=0}b_i, \Pi^{2}_{i=0}b_i, \Pi^{1}_{i=0}b_i, \Pi^{0}_{i=0}b_i)
$$
Now, back to the 64 bits version of `mask_lowest_zero`, the output should be:
$$
mask = (b'_{63}, b'_{62}, ..., b'_1, b'_0)
= (\Pi^{63}_{i=0}b_i, \Pi^{62}_{i=0}b_i, ..., \Pi^{1}_{i=0}b_i, \Pi^{0}_{i=0}b_i)
$$
For each bit $b'_i$ of the result, $b'_i$ will be 1 only if all the lower $i$ bits ($b_{[0,i]}$) are 1.
In other word, this function will return a mask which mask all the trailing 1s ($b'_0, ..., b'_{k-1}$, if the first zero bit is $b_k$) of the given `x`.
#### Implement `mask_lowest_zero` with rv32i
```asm
mask_lowest_zero:
# prologue (1 64 bits arg, no func call)
# uint64_t mask = x
mv t0, a0 # lower
mv t1, a1 # upper
li t5, 0xFFFF
li t6, 0xFFFFFFFF
# mask &= (mask << 1) | 0x1
srli t2, t0, 31 # move MSB to LSB (other bits are 0)
slli t3, t0, 1
slli t4, t1, 1
or t4, t4, t2 # apply MSB of lower part
ori t3, t3, 0x1
and t0, t0, t3
and t1, t1, t4
# mask &= (mask << 2) | 0x3
srli t2, t0, 30
slli t3, t0, 2
slli t4, t1, 2
or t4, t4, t2
ori t3, t3, 0x3
and t0, t0, t3
and t1, t1, t4
# mask &= (mask << 4) | 0xF
srli t2, t0, 28
slli t3, t0, 4
slli t4, t1, 4
or t4, t4, t2
ori t3, t3, 0xF
and t0, t0, t3
and t1, t1, t4
# mask &= (mask << 8) | 0xFF
srli t2, t0, 24
slli t3, t0, 8
slli t4, t1, 8
or t4, t4, t2
ori t3, t3, 0xFF
and t0, t0, t3
and t1, t1, t4
# mask &= (mask << 16) | 0xFFFF
srli t2, t0, 16
slli t3, t0, 16
slli t4, t1, 16
or t4, t4, t2
or t3, t3, t5
and t0, t0, t3
and t1, t1, t4
# mask &= (mask << 32) | 0xFFFFFFFF
li t3, 0
mv t4, t1
or t3, t3, t6
and a0, t0, t3
and a1, t1, t4
# epilogue (return mask)
jalr ra
```
To test this function, add the following codes:
```asm
.data
str_true: .string "assertion passed!"
str_false: .string "assertion failed..."
.text
main:
li a0, 0xFFFFFFFF # input lower
li a1, 0xFFFFFFFF # input upper
jal mask_lowest_zero
li a2, 0xFFFFFFFF # expected lower
li a3, 0xFFFFFFFF # expected upper
jal assert_result
li a7, 93
ecall
assert_result:
mv t0, a0 # expected result lower
mv t1, a1 # expected result upper
mv t2, a2 # result lower
mv t3, a3 # result upper
bne t0, t2, assert_result_false # not expected lower
bne t1, t3, assert_result_false # not expected upper
assert_result_true:
li a0, 1 # file descriptor
la a1, str_true # address of string
li a2, 17 # length of string
li a7, 64 # syscall number for write
ecall
li a0, 0
ret
assert_result_false:
li a0, 1 # file descriptor
la a1, str_false # address of string
li a2, 19 # length of string
li a7, 64 # syscall number for write
ecall
li a0, 1
ret
```
### Increase a integer with bitwise operations only
```c
int64_t inc(int64_t x)
{
if (~x == 0)
return 0;
/* TODO: Carry flag */
int64_t mask = mask_lowest_zero(x);
int64_t z1 = mask ^ ((mask << 1) | 1);
return (x & ~mask) | z1;
}
```
Regardless of whether the input value is positive or negative, the increase operation is just set the lowest zero bit ($b_k$), and clear all the bits lower than it ($b_{[0,k)}$).
1. Set the lowest zero bit $b_k$
Since the function `mask_lowest_zero()` only mask the bits lower than $b_k$, here use `mask ^ ((mask << 1) | 1)` to mask only the lowest zero bit $b_k$.
2. Clear all the bits lower than $b_k$
All the bits lower than $b_k$ can be clear by using `~mask`.
:::warning
**Is `~x == 0` redundant ?**
This function will return right away at `~x == 0` (`x == 0xFFFFFFFF`).
However, when the value of `x` is `-1` (`0xFFFFFFFF`), `mask` and `z1` will be `0xFFFFFFFF` and `0`, respectively. And the result of `(x & ~mask) | z1` (`(0xFFFFFFFF & 0) | 0`) should be `0`, which is the correct value after the given `x` being increased. Therefore, the branch condition should be redundant.
:::
#### Implement `inc` with rv32i
```asm
inc:
# prologue
addi sp, sp, -12
sw s0, -8(sp)
sw s1, -4(sp)
sw ra, 0(sp)
mv s0, a0
mv s1, a1
# int64_t mask = mask_lowest_zero(x);
jal mask_lowest_zero
# int64_t z1 = mask ^ ((mask << 1) | 1);
slli t0, a0, 1 # a0 is mask's lower
slli t1, a1, 1 # a1 is mask's upper
srli t2, a0, 31
or t1, t1, t2
ori t0, t0, 1
xor t0, a0, t0 # t0 is z1's lower
xor t1, a1, t1 # t1 is z1's upper
# return (x & ~mask) | z1;
li t2, 0xFFFFFFFF
xor t3, a0, t2 # t3 is ~mask's lower
xor t4, a1, t2 # t4 is ~mask's upper
and t3, t3, s0
and t4, t4, s1
or a0, t0, t3
or a1, t1, t4
# epilogue:
lw ra, 0(sp)
lw s1, -4(sp)
lw s0, -8(sp)
addi sp, sp, 12
ret
```
To verify this, replace the main function with following codes:
```asm
main:
li a0, 0xFFFFFFFF # input lower
li a1, 0xFFFFFFFF # input upper
jal inc
li a2, 0 # expected lower
li a3, 0 # expected upper
jal assert_result
li a7, 93
ecall
```
### Get n-th Bit
```c
static inline int64_t getbit(int64_t value, int n)
{
return (value >> n) & 1;
}
```
#### Implement `getbit` with rv32i
```asm
getbit:
addi t0, a2, -32
bltz t0, getbit_lower
getbit_upper: # target at upper part
srl t1, a1, t0
andi a0, t1, 1
li a1, 0
ret
getbit_lower: # target at lower part
srl t1, a0, a2
andi a0, t1, 1
li a1, 0
ret
```
If the given `n` is less than 32, we only need to shift the lower 32 bits to get the result. Otherwise, just shift the upper 32 bits to get the result.
The `a1` should always be zero since the result must be 1 or 0.
To verify the implementation, replace the main function with following codes:
```asm
main:
li a0, 0x7FFFFFFF # input lower
li a1, 0x80000000 # input upper
li a2, 63 # input n
jal getbit
li a2, 1 # expected lower
li a3, 0 # expected upper
jal assert_result
li a7, 93
ecall
```
### Int32 Multiplication
```c
int64_t imul32(int32_t a, int32_t b)
{
int64_t r = 0, a64 = (int64_t) a, b64 = (int64_t) b;
for (int i = 0; i < 32; i++) {
if (getbit(b64, i))
r += a64 << i;
}
return r;
}
```
This function does what the long multiplication does, it uses a loop to multiply the multiplicand (`a64`) with each bits, from LSB to MSB, of the multiplier (`b64`), and adds the results together.
#### Implement `imul32` with rv32i
```asm
imul32:
# prologue
addi sp, sp, -40
sw s0, -36(sp) # s0 for a
sw s1, -32(sp) # s1 for b
sw s2, -28(sp) # s2 for lower r
sw s3, -24(sp) # s3 for upper r
sw s4, -20(sp) # s4 for lower a64
sw s5, -16(sp) # s5 for upper a64
sw s6, -12(sp) # s6 for lower b64
sw s7, -8(sp) # s7 for upper b64
sw s8, -4(sp) # s8 for i
sw ra, 0(sp)
mv s0, a0
mv s1, a1
li s2, 0
li s3, 0
li s8, 0
mv s4, s0
slti s5, s0, 0 # test sign for upper a64
neg s5, s5 # use 2'c to do sign-extend
mv s6, s1
slti s7, s1, 0 # test sign for upper b64
neg s7, s7 # use 2'c to do
imul32_loop:
mv a0, s6
mv a1, s7
mv a2, s8
jal getbit
beqz a0, imul32_loop_cont
neg t2, s8
addi t2, t2, 32 # 32 - i
srl t2, s4, t2 # t2 for upper (32-i) bits of lower a64
sll t0, s4, s8 # t0 for shifted lower a64
sll t1, s5, s8
or t1, t1, t2 # t1 for shifted upper a64
mv t3, s2 # keep old lower r for testing overflow
add s2, s2, t0
sltu t0, s2, t3 # set carry if lower overflow
add s3, s3, t1
add s3, s3, t0 # add carry bit from lower
imul32_loop_cont:
addi s8, s8, 1
slti t0, s8, 32
bnez t0, imul32_loop
# epilogue
mv a0 ,s2
mv a1 ,s3
lw ra, 0(sp)
lw s8, -4(sp)
lw s7, -8(sp)
lw s6, -12(sp)
lw s5, -16(sp)
lw s4, -20(sp)
lw s3, -24(sp)
lw s2, -28(sp)
lw s1, -32(sp)
lw s0, -36(sp)
addi sp, sp, 40
ret
```
To verify this implementation, replace main function with:
```asm
main:
li a0, 0x81234567 # input a
li a1, 0x90ABCDEF # input b
jal imul32
li a2, 0x1A4E4629 # expected lower
li a3, 0xB84EB38C # expected upper
jal assert_result
li a7, 93
ecall
```
### Float32 Multiplication
```c
float fmul32(float a, float b)
{
/* TODO: Special values like NaN and INF */
int32_t ia = *(int32_t *) &a, ib = *(int32_t *) &b;
/* sign */
int sa = ia >> 31;
int sb = ib >> 31;
/* mantissa */
int32_t ma = (ia & 0x7FFFFF) | 0x800000;
int32_t mb = (ib & 0x7FFFFF) | 0x800000;
/* exponent */
int32_t ea = ((ia >> 23) & 0xFF);
int32_t eb = ((ib >> 23) & 0xFF);
```
The former part just extract the 1 bit sign, 8 bits exponent, and 23 bits fraction from the input 32 bits floating-points.
Note that, in IEEE 754 32 bits floating-point format, the normalized mantissa part could be represented in the format $1.fraction$, we cannot use the lower 23 bits as the mantissa part directly, and the code thus add the hidden 1 back to the `ma` and `mb` using the OR operation.
```c
/* 'r' = result */
int64_t mrtmp = imul32(ma, mb) >> 23;
int mshift = getbit(mrtmp, C01);
int64_t mr = mrtmp >> mshift;
int32_t ertmp = ea + eb - C02;
int32_t er = mshift ? inc(ertmp) : ertmp;
/* TODO: Overflow ^ */
int sr = sa ^ sb;
int32_t r = (sr << C03) | ((er & 0xFF) << C04) | (mr & 0x7FFFFF);
return *(float *) &r;
}
```
And the, to multiply the two floating-points, we can get the result sign bit (`sr`) by simply multiply the two sign bits `sa` and `sb` together (here use XOR operation, same effect).
For the exponent part, we need to add the exponent part together. However, since the exponent parts extract from the input floating-points are biased:
$$
biased\ exponent = real\ exponent + 127
$$
If we simply add the two exponent bits `ea` and `eb` together, the bias will be added twice, and we will get
$$
real\ exponent_a + 127 + real\ exponent_b + 127 = real\ exponent_a + real\ exponent_b + 254
$$
Instead of:
$$
real\ exponent_a + real\ exponent_b + 127
$$
Therefore, we need to additionally subtract 127 from the result of `ea + eb`, which means the `C02` should be 127.
And for the mantissa part, we need to multiply the mantissa parts (include the hidden 1). Since the hidden 1s had been added back to the `ma` and `mb`, we can multiply the mantissa together and get the result `mrtmp`.
:::info
**Ignore the lower 23 bits**
After multiplying the two 24 bits mantissas, the result will have at most 48 bits, which is larger than the number of bits of a floating-point's fraction part.
Therefore, here simply neglect the lower 23 bits, and use the other 25 bits to calculate the normalized result mantissa.
:::
Then, since the value of the result mantissa (`mrtmp`) must smaller than 4 ($1.frac_a \times 1.frac_b$), we can simply test the bit 24 (zero based, the second bit left to the decimal point) to determine whether the result mantissa is already normalized or not, which means the `C01` should be 24. If the target bit is 1, we should increse the result exponent value `ertmp` and ignore (right shift) 1 more bit of the result mantissa `mrtmp`.
Finally, we need to combine the result sign bit (`sr`), the result exponent (`ertmp`, already biased and increased if needed), and the result mantissa (`mrtmp`, normalized). Therefore, `C03` and `C04` should 31 (MSB, for sign bit) and 23 (exponent part, 8 bits after MSB) respectively.
#### Implement `fmul32` with rv32i
```asm
fmul32:
# prologue
addi sp, sp, -36
sw s0, -32(sp) # for a
sw s1, -28(sp) # for b
sw s2, -24(sp) # for sa
sw s3, -20(sp) # for sb
sw s4, -16(sp) # for ma
sw s5, -12(sp) # for mb
sw s6, -8(sp) # for ea
sw s7, -4(sp) # for eb
sw ra, 0(sp)
mv s0, a0
mv s1, a1
srli s2, s0, 31 # uint32_t sa = ia >> 31
srli s3, s1, 31 # uint32_t sb = ib >> 31
li t0, 0x7FFFFF # & 0x7FFFFF
and s4, s0, t0
and s5, s1, t0
li t0, 0x800000 # | 0x800000
or s4, s4, t0 # ma
or s5, s5, t0 # mb
srli s6, s0, 23 # int32_t ea = ((ia >> 23) & 0xFF)
andi s6, s6, 0xFF
srli s7, s1, 23 # int32_t eb = ((ib >> 23) & 0xFF)
andi s7, s7, 0xFF
mv a0, s4
mv a1, s5
jal imul32
li t0, 0x7FFFFF # tX is not safe after func call!
and t0, a1, t0 # int64_t mrtmp = imul32(ma, mb) >> 23
slli t0, t0, 9
srli a0, a0, 23
srli a1, a1, 23
or a0, a0, t0
addi sp, sp, -8 # use stack to save the 64b mrtmp
sw a0, -4(sp)
sw a1, 0(sp)
li a2, 24
jal getbit # int mshift = getbit(mrtmp, C01)
lw t0, -4(sp) # lower mrtmp
lw t1, 0(sp) # upper mrtmp
beqz a0, fmul32_normalize
andi t2, t1, 1 # LSB of upper, mshift must be 1 or 0!
slli t2, t2, 31 # move to MSB
srli t0, t0, 1 # mrtmp >> mshift
srli t1, t1, 1
or t0, t0, t2
fmul32_normalize: # t1 t0 for mr
mv t2, a0 # t2 for mshift
add a0, s6, s7 # int32_t ertmp = ea + eb - C02
addi a0, a0, -127 # a0 for ertmp (or er if branch)
beqz t2, fmul32_no_inc_exp
sw t1, -4(sp)
sw t0, 0(sp)
jal inc # int32_t er = mshift ? inc(ertmp) : ertmp
lw t0, 0(sp)
lw t1, -4(sp)
fmul32_no_inc_exp: # a0 for er, t1 t0 for mr (upper, t1, not used)
xor t2, s2, s3 # int sr = sa ^ sb
slli t2, t2, 31 # (sr << C03)
andi a0, a0, 0xFF # | ((er & 0xFF) << C04)
slli a0, a0, 23
or t2, t2, a0
li t1, 0x7FFFFF # | (mr & 0x7FFFFF), lower mr only
and t0, t0, t1
or a0, t0, t2
# epilogue
addi sp, sp, 8
lw ra, 0(sp)
lw s7, -4(sp) # for eb
lw s6, -8(sp) # for ea
lw s5, -12(sp) # for mb
lw s4, -16(sp) # for ma
lw s3, -20(sp) # for sb
lw s2, -24(sp) # for sa
lw s1, -28(sp) # for b
lw s0, -32(sp) # for a
addi sp, sp, 36
ret
```
To test the implementation, use the following main function:
```asm
main:
li a0, 0x47F12064 # input a
li a1, 0x4940BD02 # input b
jal fmul32
li a2, 0x51B58A51 # expected
li a3, 0 # no upper
li a1, 0 # only 32b
jal assert_result
li a7, 93
ecall
```
## Assignment 1
Multiplication of a floating-point and an integer (in limited value range).
### Integer to Float
```c
uint32_t bits_before_frac(uint32_t x) // FIXME: 0 is undefined
{
int n = 1;
if ((x >> 16) == 0) { n += 16; x <<= 16; }
if ((x >> 24) == 0) { n += 8; x <<= 8; }
if ((x >> 28) == 0) { n += 4; x <<= 4; }
if ((x >> 30) == 0) { n += 2; x <<= 2; }
n = n - (x >> 31);
return n + 1; // the leading 1
}
```
To get the fraction part (the leading 1 is excluded) of the converted floating-point value, we need a function similar to clz, but also counts the leading 1.
```c
uint32_t itof(const int32_t i) // FIXME: only valid for range -2^24 ~ 2^24
{
if (i == 0)
return 0;
uint32_t neg = !!(i < 0) << 31;
uint32_t ans = neg ? -i : i;
uint32_t bbf = bits_before_frac(ans);
ans <<= bbf;
ans >>= 9;
ans = neg | ((32 - bbf + 127) << 23) | ans;
return ans;
}
```
Then, we can combine the 3 parts to get the result floating-point value.
#### Implement `bits_before_frac` in rv32i
```asm
bits_before_frac:
# no prologue
li t0, 1 # int n = 1;
srli t1, a0, 16 # if ((x >> 16) == 0) { n += 16; x <<= 16; }
bnez t1, upper16
addi t0, t0, 16
slli a0, a0, 16
upper16:
srli t1, a0, 24 # if ((x >> 24) == 0) { n += 8; x <<= 8; }
bnez t1, upper8
addi t0, t0, 8
slli a0, a0, 8
upper8:
srli t1, a0, 28 # if ((x >> 28) == 0) { n += 4; x <<= 4; }
bnez t1, upper4
addi t0, t0, 4
slli a0, a0, 4
upper4:
srli t1, a0, 30 # if ((x >> 30) == 0) { n += 2; x <<= 2; }
bnez t1, upper2
addi t0, t0, 2
slli a0, a0, 2
upper2:
srli a0, a0, 31 # n = n - (x >> 31);
neg a0, a0
add t0, t0, a0
addi a0, t0, 1 # return n + 1;
# no epilogue
ret
```
To verify this implementation, replace main with following instructions:
```asm
main:
li a0, 0x003FFF00 # input x
li a1, 0
jal bits_before_frac
li a2, 11 # expected
mv a3, a1 # no upper
jal assert_result
li a7, 93
ecall
```
#### Implement `itof` in rv32i
```asm
itof:
addi sp, sp, -8 # prologue
sw s1, -8(sp) # s1 for ans
sw s0, -4(sp) # s0 for neg
sw ra, 0(sp)
beqz a0, itof_out # if (i == 0) return 0;
li t0, 0x80000000 # uint32_t neg = !!(i < 0) << 31;
and s0, a0, t0
beqz s0, itof_pos # uint32_t ans = neg ? -i : i;
neg a0, a0
itof_pos:
mv s1, a0
jal bits_before_frac # uint32_t bbf = bits_before_frac(ans);
sll s1, s1, a0 # ans <<= bbf;
srli s1, s1, 9 # ans >>= 9;
or s1, s1, s0 # ans = neg | ((32 - bbf + 127) << 23) | ans;
neg a0, a0
addi a0, a0, 159
slli a0, a0, 23
or a0, s1, a0
lw ra, 0(sp) # epilogue
lw s0, -4(sp)
lw s1, -8(sp)
itof_out:
addi sp, sp, 8
ret
```
To verify this implementation:
```asm
main:
li a0, 0x00F12064 # input i, limited range (+-16777216)
li a1, 0
jal itof
li a2, 0x4b712064 # expected
mv a3, a1 # only 32b
jal assert_result
li a7, 93
ecall
```
### Multiply
```c
float ifmul(uint32_t i, uint32_t f)
{
// convert integer to float
float fi = itof(*(int32_t *)&i);
return fmul(fi, f);
}
```
To multiply an integer and a floating-point, we should first convert the integer to floating-point. And then we can reuse the function `fmul` to perform a floating-point multiplication.
#### Implement `ifmul` in rv32i
```asm
ifmul:
addi sp, sp, -4 # prologue
sw s0, -4(sp) # s0 for f
sw ra, 0(sp)
mv s0, a1
jal itof
mv a1, s0
jal fmul32
lw ra, 0(sp) # epilogue
lw s0, -4(sp) # s0 for f
addi sp, sp, 4
ret
```
And to verify this implementation:
```asm
main:
li a0, 124 # input i, limited in +-16777216
li a1, 0x42f6e979 # input f
jal ifmul
li a2, 0x466f322d # expected
mv a3, a1 # only 32b
jal assert_result
li a7, 93
ecall
```
FIXME: use array for multiple test data
### Multiplication Error
```c
typedef union
{
float f;
int32_t i;
uint32_t raw;
} value_t;
int main(int argc, char const *argv[])
{
value_t i, fi, f, o, a;
i.i = atoi(argv[1]);
fi.f = (float)atoi(argv[1]);
i.raw = itof(i.i);
f.f = atof(argv[2]);
if (i.raw != fi.raw)
printf("Expected itof=%f(0x%08x), but got 0x%08x...\n", i.f, i.raw, fi.raw);
o.f = fmul32(i.f, f.f);
a.f = f.f * i.f;
printf("0x%x x 0x%x = 0x%08x (ans: 0x%08x)\n", i.raw, f.raw, o.raw, a.raw);
printf("%s\n", o.raw == a.raw ? "correct!" : "wrong...");
return 0;
}
```
However, the result of the `fmul32` function and that of the `*` operation may have slight difference.
```text
$ ./a.out 123 123.456
0x42f60000 x 0x42f6e979 = 0x466d445a (ans: 0x466d445a)
correct!
$ ./a.out 124 123.456
0x42f80000 x 0x42f6e979 = 0x466f322d (ans: 0x466f322d)
correct!
$ ./a.out 12345 123.456
0x4640e400 x 0x42f6e979 = 0x49ba0b02 (ans: 0x49ba0b03)
wrong...
```
ongoing...
### Analysis
#### Execution Info

## References
- [Lab1: RV32I Simulator](https://hackmd.io/@sysprog/H1TpVYMdB#Lab1-RV32I-Simulator)
- [IEEE-754 Floating Point Converter](https://www.h-schmidt.net/FloatConverter/IEEE754.html)