Optimize sqrtf without FPU

黃書堯

Problem C explanation

This sqrtf RISC-V code originates from Quiz3 - Problem C.
Below is an explanation of the code logic.
We have a floating point number

x and we want to calculate its square root.
A floating-point number can be represented as :
x=2e×m
.
where
2e
is the exponent part of
x
and
m
is the mantissa part of
x
.

Exponent part

Since

2e=2e2
The square root of the exponent part can be easily calculated by right-shifting the exponent by 1 bit.

Mantissa part

We aim to compute

m, without using division instructions.
Instead, we can achieve this using multiplication operations only with:
m=m×1m

From the above, we realize that we only need to calculate

1m.

Newton-Raphson method

The formula of the Newton-Raphson method is as follows:

Xn+1=Xnf(x)f(x)

Assuming the function is:

f(x)=x21m

The iterative form becomes:

xn+1=xn+12xn(1mxn2)

Remove B extension

The B extension part shows as follows:

bseti    a1, a1, 31

This line sets the hidden leading bit of the mantissa. To remove the B extension dependency, we replace it with lui and or.

-bseti    a1, a1, 31
+lui     t0, 0x80000
+or      a1, a1, t0

Running with rv32emu

rv32emu

To emulate the code, follow these steps:

  1. Clone the repository:
$ git clone https://github.com/sysprog21/rv32emu
$ cd rv32emu
$ make
  1. To verify the installation, run:
$ make check

Initially, an error might occur:

$ make check
Running hello.elf ... [OK]
Fetching SHA-1 of prebuilt binaries ... [OK]
Checking SHA-1 of prebuilt binaries ... [OK]
rv32emu: src/riscv.c:415: rv_create: Assertion `elf && elf_open(elf, attr->data.
user.elf_program)' failed.
Aborted (core dumped)
rv32emu: src/riscv.c:415: rv_create: Assertion `elf && elf_open(elf, attr->data.
user.elf_program)' failed.
Aborted (core dumped)
rv32emu: src/riscv.c:415: rv_create: Assertion `elf && elf_open(elf, attr->data.
user.elf_program)' failed.
Aborted (core dumped)
Running puzzle ... Failed.
make: *** [Makefile:354: check] 錯誤 1

To resolve this, change the system language to English (United States) via:
Select Setting -> Region & Language -> Language English(United States).
After changing the language, rerun:

$ make check
Check the file build/.config for configured items.
Running hello.elf ... [OK]
Fetching SHA-1 of prebuilt binaries ... [OK]
Checking SHA-1 of prebuilt binaries ... 
SHA-1 verification fails! Re-fetching prebuilt binaries from "rv32emu-prebuilt" ...
-                   100%[===================>]   4.49M  14.7MB/s    in 0.3s    
Running puzzle ... [OK]
Running fcalc ... [OK]
Running pi ... [OK]

This should pass all tests.

Performance Measurement

To evaluate the performance of the sqrt.S implementation, we can utilize the performance counter example available in rv32emu/tests/perfcounter.

  1. Copy the example to the main directory:
$ cp -r tests/perfcounter .
  1. Modify Makefile:
.PHONY: clean

include ../mk/toolchain.mk

-CFLAGS = -march=rv32i_zicsr_zifencei -mabi=ilp32 -O2 -Wall
+CFLAGS = -march=rv32im_zicsr_zifencei -mabi=ilp32 -O2 -Wall

OBJS = \
    getcycles.o \
    getinstret.o \
-   sparkle.o \
+   sqrtf.o \
    main.o \
	
BIN = perfcount.elf

%.o: %.S
	$(CROSS_COMPILE)gcc $(CFLAGS) -c -o $@ $<

%.o: %.c
	$(CROSS_COMPILE)gcc $(CFLAGS) -c -o $@ $<

all: $(BIN)

$(BIN): $(OBJS)
	 $(CROSS_COMPILE)gcc -o $@ $^

clean:
	$(RM) $(BIN) $(OBJS)
  1. Update main.c:
#include <stdint.h>
#include <stdio.h>
#include <string.h>

extern uint64_t get_cycles();
extern uint64_t get_instret();

/*
 * Taken from the Sparkle-suite which is a collection of lightweight symmetric
 * cryptographic algorithms currently in the final round of the NIST
 * standardization effort.
 * See https://sparkle-lwc.github.io/
 */
-extern void sparkle_asm(unsigned int *state, unsigned int ns);
+extern float my_sqrtf(float input);

-#define WORDS 12
-#define ROUNDS 7

int main(void)
{
-   unsigned int state[WORDS] = {0};
+   float input = 12.4;
    /* measure cycles */
    uint64_t instret = get_instret();
    uint64_t oldcount = get_cycles();
-   sparkle_asm(state, ROUNDS);
+   float result = sqrt(x);
    uint64_t cyclecount = get_cycles() - oldcount;

+   printf("square root of %f is %f\n", input, result);
    printf("cycle count: %u\n", (unsigned int) cyclecount);
    printf("instret: %x\n", (unsigned) (instret & 0xffffffff));

-   memset(state, 0, WORDS * sizeof(uint32_t));

-   sparkle_asm(state, ROUNDS);

-   printf("Sparkle state:\n");
-   for (int i = 0; i < WORDS; i += 2)
-       printf("%X %X\n", state[i], state[i + 1]);

    return 0;
}
  1. Compile and run the test:
$ cd perfcounter
$ make
$ ../build/rv32emu perfcount.elf

Initially, the compilation might result in an error:

(leetcode) (base) phlab@phlab-System-Product-Name:~/rv32emu/perfcounter$ make
riscv-none-elf-gcc -march=rv32im_zicsr_zifencei -mabi=ilp32 -O2 -Wall -c -o getcycles.o getcycles.S
riscv-none-elf-gcc -march=rv32im_zicsr_zifencei -mabi=ilp32 -O2 -Wall -c -o getinstret.o getinstret.S
riscv-none-elf-gcc -march=rv32im_zicsr_zifencei -mabi=ilp32 -O2 -Wall -c -o sparkle.o sparkle.S
riscv-none-elf-gcc -march=rv32im_zicsr_zifencei -mabi=ilp32 -O2 -Wall -c -o main.o main.c
main.c:7:13: warning: conflicting types for built-in function 'sqrtf'; expected 'float(float)' [-Wbuiltin-declaration-mismatch]
    7 | extern void sqrtf();
      |             ^~~~~
main.c:4:1: note: 'sqrtf' is declared in header '<math.h>'
    3 | #include <string.h>
  +++ |+#include <math.h>
    4 | 
riscv-none-elf-gcc -march=rv32im_zicsr_zifencei -mabi=ilp32 -O2 -Wall -c -o sqrtf.o sqrtf.S
sqrtf.S: Assembler messages:
sqrtf.S:24: Error: unrecognized opcode `bseti a1,a1,31', extension `zbs' required
make: *** [Makefile:16: sqrtf.o] Error 1

To resolve this, rename the function and file from sqrtf to my_sqrt.

Performance of Square root

After removing the B extension, the test output is:

../build/rv32emu perfcount.elf
square root of 16.900000 is 4.110961
cycle count: 67
instret: 2d6
inferior exit code 0