# Optimize sqrtf without FPU
> 黃書堯
## Problem C explanation
This `sqrtf` RISC-V code originates from [Quiz3 - Problem C](https://hackmd.io/@sysprog/arch2024-quiz3-sol#Problem-C).
Below is an explanation of the code logic.
We have a floating point number $x$ and we want to calculate its square root.
A floating-point number can be represented as : $x = 2^e\times m$.
where $2^e$ is the exponent part of $x$ and $m$ is the mantissa part of $x$.
### Exponent part
Since
$$
\sqrt{2^e}=2^{\frac{e}{2}}
$$
The square root of the exponent part can be easily calculated by right-shifting the exponent by 1 bit.
### Mantissa part
We aim to compute $\sqrt{m}$, without using division instructions.
Instead, we can achieve this using multiplication operations only with:
$$
\sqrt{m}=m\times{\frac{1}{\sqrt{m}}}
$$
From the above, we realize that we only need to calculate $\frac{1}{\sqrt{m}}$.
### Newton-Raphson method
The formula of the Newton-Raphson method is as follows:
$$
X_{n+1} = X_n -\frac{f(x)}{f'(x)}
$$
Assuming the function is:
$$
f(x) = x^2 - \frac{1}{m}
$$
The iterative form becomes:
$$
x_{n+1} = x_n + \frac{1}{2}x_n(1 - mx_n^2)
$$
## Remove B extension
The B extension part shows as follows:
```c
bseti a1, a1, 31
```
This line sets the hidden leading bit of the mantissa. To remove the B extension dependency, we replace it with `lui` and `or`.
```diff
-bseti a1, a1, 31
+lui t0, 0x80000
+or a1, a1, t0
```
## Running with rv32emu
### rv32emu
To emulate the code, follow these steps:
1. Clone the repository:
```shell
$ git clone https://github.com/sysprog21/rv32emu
$ cd rv32emu
$ make
```
2. To verify the installation, run:
```shell
$ make check
```
Initially, an error might occur:
```shell
$ make check
Running hello.elf ... [OK]
Fetching SHA-1 of prebuilt binaries ... [OK]
Checking SHA-1 of prebuilt binaries ... [OK]
rv32emu: src/riscv.c:415: rv_create: Assertion `elf && elf_open(elf, attr->data.
user.elf_program)' failed.
Aborted (core dumped)
rv32emu: src/riscv.c:415: rv_create: Assertion `elf && elf_open(elf, attr->data.
user.elf_program)' failed.
Aborted (core dumped)
rv32emu: src/riscv.c:415: rv_create: Assertion `elf && elf_open(elf, attr->data.
user.elf_program)' failed.
Aborted (core dumped)
Running puzzle ... Failed.
make: *** [Makefile:354: check] 錯誤 1
```
To resolve this, change the system language to English (United States) via:
Select `Setting` -> `Region & Language` -> `Language English(United States)`.
After changing the language, rerun:
```shell
$ make check
Check the file build/.config for configured items.
Running hello.elf ... [OK]
Fetching SHA-1 of prebuilt binaries ... [OK]
Checking SHA-1 of prebuilt binaries ...
SHA-1 verification fails! Re-fetching prebuilt binaries from "rv32emu-prebuilt" ...
- 100%[===================>] 4.49M 14.7MB/s in 0.3s
Running puzzle ... [OK]
Running fcalc ... [OK]
Running pi ... [OK]
```
This should pass all tests.
### Performance Measurement
To evaluate the performance of the `sqrt.S` implementation, we can utilize the performance counter example available in `rv32emu/tests/perfcounter`.
1. Copy the example to the main directory:
```shell
$ cp -r tests/perfcounter .
```
2. Modify `Makefile`:
```diff
.PHONY: clean
include ../mk/toolchain.mk
-CFLAGS = -march=rv32i_zicsr_zifencei -mabi=ilp32 -O2 -Wall
+CFLAGS = -march=rv32im_zicsr_zifencei -mabi=ilp32 -O2 -Wall
OBJS = \
getcycles.o \
getinstret.o \
- sparkle.o \
+ sqrtf.o \
main.o \
BIN = perfcount.elf
%.o: %.S
$(CROSS_COMPILE)gcc $(CFLAGS) -c -o $@ $<
%.o: %.c
$(CROSS_COMPILE)gcc $(CFLAGS) -c -o $@ $<
all: $(BIN)
$(BIN): $(OBJS)
$(CROSS_COMPILE)gcc -o $@ $^
clean:
$(RM) $(BIN) $(OBJS)
```
3. Update `main.c`:
```diff
#include <stdint.h>
#include <stdio.h>
#include <string.h>
extern uint64_t get_cycles();
extern uint64_t get_instret();
/*
* Taken from the Sparkle-suite which is a collection of lightweight symmetric
* cryptographic algorithms currently in the final round of the NIST
* standardization effort.
* See https://sparkle-lwc.github.io/
*/
-extern void sparkle_asm(unsigned int *state, unsigned int ns);
+extern float my_sqrtf(float input);
-#define WORDS 12
-#define ROUNDS 7
int main(void)
{
- unsigned int state[WORDS] = {0};
+ float input = 12.4;
/* measure cycles */
uint64_t instret = get_instret();
uint64_t oldcount = get_cycles();
- sparkle_asm(state, ROUNDS);
+ float result = sqrt(x);
uint64_t cyclecount = get_cycles() - oldcount;
+ printf("square root of %f is %f\n", input, result);
printf("cycle count: %u\n", (unsigned int) cyclecount);
printf("instret: %x\n", (unsigned) (instret & 0xffffffff));
- memset(state, 0, WORDS * sizeof(uint32_t));
- sparkle_asm(state, ROUNDS);
- printf("Sparkle state:\n");
- for (int i = 0; i < WORDS; i += 2)
- printf("%X %X\n", state[i], state[i + 1]);
return 0;
}
```
4. Compile and run the test:
```shell
$ cd perfcounter
$ make
$ ../build/rv32emu perfcount.elf
```
Initially, the compilation might result in an error:
```shell
(leetcode) (base) phlab@phlab-System-Product-Name:~/rv32emu/perfcounter$ make
riscv-none-elf-gcc -march=rv32im_zicsr_zifencei -mabi=ilp32 -O2 -Wall -c -o getcycles.o getcycles.S
riscv-none-elf-gcc -march=rv32im_zicsr_zifencei -mabi=ilp32 -O2 -Wall -c -o getinstret.o getinstret.S
riscv-none-elf-gcc -march=rv32im_zicsr_zifencei -mabi=ilp32 -O2 -Wall -c -o sparkle.o sparkle.S
riscv-none-elf-gcc -march=rv32im_zicsr_zifencei -mabi=ilp32 -O2 -Wall -c -o main.o main.c
main.c:7:13: warning: conflicting types for built-in function 'sqrtf'; expected 'float(float)' [-Wbuiltin-declaration-mismatch]
7 | extern void sqrtf();
| ^~~~~
main.c:4:1: note: 'sqrtf' is declared in header '<math.h>'
3 | #include <string.h>
+++ |+#include <math.h>
4 |
riscv-none-elf-gcc -march=rv32im_zicsr_zifencei -mabi=ilp32 -O2 -Wall -c -o sqrtf.o sqrtf.S
sqrtf.S: Assembler messages:
sqrtf.S:24: Error: unrecognized opcode `bseti a1,a1,31', extension `zbs' required
make: *** [Makefile:16: sqrtf.o] Error 1
```
To resolve this, rename the function and file from `sqrtf` to `my_sqrt`.
## Performance of Square root
After removing the B extension, the test output is:
```shell
../build/rv32emu perfcount.elf
square root of 16.900000 is 4.110961
cycle count: 67
instret: 2d6
inferior exit code 0
```