# Assignment2: GNU Toolchain
contributed by < [`Hotmercury`](https://github.com/Hotmercury) >
> [Assignment requirements](https://hackmd.io/@sysprog/2023-arch-homework2)
## [lab2](https://hackmd.io/@sysprog/SJAR5XMmi) step
**basic command**
compiler {}.c
```
$ riscv-none-elf-gcc {}.c
```
Run elf file
```
$ build/re32emu
```
Display the assembler
```
$ riscv-none-elf-objdump -d {file}.elf
```
Display the FILE file header
```
$ riscv-none-elf-readelf -h {file}.elf
```
List section size
```
$ riscv-none-elf-size {}.elf
```
## Get sine value without floating point multiplication support
I choose the problem from [戴鈞彥](https://hackmd.io/@ranvd/computer-arch-hw1), because I want to leran more about IEEE translate between Integer and floating pointer.

We can find the sine mathmetic function is
$$
sin(x)\approx \sum^n_{i = 0}{{(-1)^{i}}\over(2i+1)!}{x^{2i+1}}
$$
**original c code**
`fmul32`
### Fix makefile
**problem**
> when I input make command, sometimes will error.
<s>

</s>
:::warning
:warning: Don't put the screenshots which contain plain text only. Instead, utilize HackMD syntax to annotate the text.
:notes: jserv
:::
:::success
Because it(riscv-none-elf-) should to export to environment again or set to global ervironment.
$ source ~/riscv-none-elf-gcc/setenv
:::
> where to define $(CROSS_COMPILE) and $(RM)
asm-hello makefile flow, we can know how to product `{}.o file` and `{}.elf file`
```
$(CROSS_COMPILE) = riscv-none-elf-
$(RM) = rm-rf
```
Two step
```
fold data section into text section
$ riscv-none-elf-as -R -march=rv32i mabi=ilp32 -o hello.o hello.S
$ riscv-none-elf-ld -o hello.elf -T hello.ld --oformat=elf32-littleriscv hello.o
```

:::spoiler Makefile will error
```
.PHONY: clean
include ../../mk/toolchain.mk
ASFLAGS = -march=rv32i -mabi=ilp32
LDFLAGS = --oformat=elf32-littleriscv
%.o: %.S
$(CROSS_COMPILE)as -R $(ASFLAGS) -o $@ $<
all: sine.elf
sine.elf : sine.o
$(CROSS_COMPILE)ld -o $@ -T sine.ld $(LDFLAGS) $<
sine.S : sine.c
$(CROSS_COMPILE)gcc -S $(ASFLAGS) -o $@ $<
clean:
$(RM) $(TARGET) $(OBJ) $(ASM)
```
:::
```
riscv-none-elf-gcc -S {}.c // {}.c -> {}.S
riscv-none-elf-as -R {}.o // {}.S -> {}.o
riscv-none-elf-ld -T {}.ld {}.o // {}.o -> {}.elf
```

Typing `$make` it will get error.
:::danger
warning : can not fund entry symbol _start; defaulting to 00000000
undefined reference to `put`
:::
:::success
Because compiler will translate printf to put, but we can't find put system call
:::
So I add append "-fno-builtin" to GCC like this, but still get underlying error.
`$ riscv-none-elf-gcc -S $(ASFLAG) -fno-builtin -o {}.s {}.c`
`$ riscv-none-elf-as -R $(ASFLAG) -o {}.o {}.s`
`$ riscv-none-elf-ld -T sine.ld $(LDFLAG) -o {}.elf {}.o`
:::danger
sine.c:(.text+0xd18): undefined reference to `printf'
:::
> I think error is come from riscv-none-elf-ld
## code
[Get sine value without floating point multiplication support](https://github.com/ranvd/ComputerArch/tree/main/hw1)
**Assembly in c**
> Where can I find that total document about this method ?
We should use keyword `asm volatile`, every instruction we use `\n` to reply next instrution
```
asm volatile( asm instructions
∶ output(not necessary)
∶ input(not necessary)
∶ broken(not necessary));
```
Output
We can use `:` imply new line.
There two method
1. `:[out1]"=r"(output1), [out2]"=r"(output2)`
choose specific symbolic
2. `"r"=(output)`
%0, %1 order select
So if we use asm inline we can etting up the corresponding registers
```c
: "=r"(h), "=r"(l), "=r"(h2));
```
---
### fmul32
We can reduce branch from 2 to 1
```c
float fmul32(float a, float b)
{
/* TODO: Special values like NaN and INF */
int32_t ia = *(int32_t *)&a, ib = *(int32_t *)&b;
// if (ia == 0 || ib == 0) return 0; fix here
if (!(ia | ib)) return 0;
```
```c
/* mantissa */
int32_t ma = (ia & 0x7FFFFF) | 0x800000;
int32_t mb = (ib & 0x7FFFFF) | 0x800000;
/* exponent */
int32_t sea = ia & 0xFF800000;
int32_t seb = ib & 0xFF800000;
/* result of mantissa */
int32_t m = imul24(ma, mb);
int32_t mshift = getbit(m, 24);
m >>= mshift;
int32_t r = ((sea - 0x3f800000 + seb) & 0xFF800000) +
(m & (0x7fffff | mshift << 23));
```
We can check overflow with `Sr ^ Sb ^ Sa`. If an overflow occurs `ovfl` will be -1, so `r ^ r` will offset, so we can set `r` to NaN.
```c
int32_t ovfl = (r ^ seb ^ sea) >> 31;
r = r ^ ((r ^ 0x7f800000) & ovfl);
return *(float *)&r;
}
```
### imul32
```c
static int32_t imul24(int32_t a, int32_t b)
{
uint32_t r = 0;
for (; b; b >>= 1)
r = (r >> 1) + (a & -getbit(b, 0));
return r;
}
```
I think we can use checking multiplier bit directly, It may reduce the time and cycle to use `jal` to call `getbit` function.
```c
static int32_t imul24(int32_t a, int32_t b){
uint32_t r = 0;
for(; b; b>>= 1)
r = (r >> 1) + (a & (b & 1));
return r;
}
```
Here can use unrolling skill to reduce execution cycle.
### fdiv32
I remove `getbit()`
```c
float fdiv32(float a, float b){
int32_t ia = *(int32_t *)&a, ib = *(int32_t *)&b;
if (a == 0) return a;
if (b == 0) return *(float*)&(int){0x7f800000};
/* mantissa */
int32_t ma = (ia & 0x7FFFFF) | 0x800000;
int32_t mb = (ib & 0x7FFFFF) | 0x800000;
/* sign and exponent */
int32_t sea = ia & 0xFF800000;
int32_t seb = ib & 0xFF800000;
/* result of mantissa */
int32_t m = idiv24(ma, mb);
int32_t mshift = !getbit(m, 31);
m <<= mshift;
int32_t r = ((sea - seb + 0x3f800000) - (0x800000 & -mshift)) | (m & 0x7fffff00) >> 8;
int32_t ovfl = (ia ^ ib ^ r) >> 31;
r = r ^ ((r ^ 0x7f800000) & ovfl);
return *(float *) &r; // return a / b;
}
```
### idiv24
```c
static int32_t idiv24(int32_t a, int32_t b) {
uint32_t r = 0;
for (int i = 0; i < 32; i++) {
a -= b;
r = (r << 1) | a >= 0;
a = (a + (b & -(a < 0))) << 1;
}
return r;
}
```

### fadd32
```c
float fadd32(float a, float b) {
int32_t ia = *(int32_t *)&a, ib = *(int32_t *)&b;
int32_t cmp_a = ia & 0x7fffffff;
int32_t cmp_b = ib & 0x7fffffff;
if (cmp_a < cmp_b)
iswap(ia, ib);
/* exponent */
int32_t ea = (ia >> 23) & 0xff;
int32_t eb = (ib >> 23) & 0xff;
/* mantissa */
int32_t ma = ia & 0x7fffff | 0x800000;
int32_t mb = ib & 0x7fffff | 0x800000;
int32_t align = (ea - eb > 24) ? 24 : (ea - eb);
mb >>= align;
if ((ia ^ ib) >> 31) {
ma -= mb;
} else {
ma += mb;
}
int32_t clz = count_leading_zeros(ma);
int32_t shift = 0;
if (clz <= 8) {
shift = 8 - clz;
ma >>= shift;
ea += shift;
} else {
shift = clz - 8;
ma <<= shift;
ea -= shift;
}
int32_t r = ia & 0x80000000 | ea << 23 | ma & 0x7fffff;
float tr = a + b;
return *(float *)&r;
}
```
### f2i32 & i2f32
This piece of code is for transforming between integers and floating-point numbers. I noticed that this code alone cannot handle sign conversions. Therefore, I added some checks to ensure that the sign bit is preserved, even when it's set to 1.
```c
int f2i32(int x) {
int32_t a = *(int *)&x;
int32_t ma = (a & 0x7FFFFF) | 0x800000;
int32_t ea = ((a >> 23) & 0xFF) - 127;
if (ea < 0)
return 0;
else if (ea <= 23)
ma >>= (23 - ea);
else
ma <<= (ea - 23);
return ma;
}
```
Add sign condition
```c
int f2i32(float x) {
int32_t a = *(int *)&x;
int32_t ma = (a & 0x7FFFFF) | 0x800000;
int32_t ea = ((a >> 23) & 0xFF) - 127;
int32_t sa = a & 0x80000000;
if (ea < 0)
return 0;
else if (ea <= 23)
ma >>= (23 - ea);
else
ma <<= (ea - 23);
if(a) return -ma;
return ma;
}
```
```c
int i2f32(int x) {
if (x == 0) return 0;
int32_t s = x & 0x80000000;
if (s) x = -x;
int32_t clz = count_leading_zeros(x);
int32_t e = 31 - clz + 127;
if (clz <= 8) {
x >>= 8 - clz;
} else {
x <<= clz - 8;
}
int r = s | e << 23 | x & 0x7fffff;
return r;
}
```
### sin
`int s = 1 ^ ((-2) & -(n & 0x1));` -> -1 is 0xFFFFFFFF, and 1 is 0x1, so this code can decide `if n is odd -> s = -1 else s = 1` without if branch.
```c
float myPow(float x, int n) {
float r = 1.0;
while (n) {
if (n & 0x1) {
r = fmul32(r, x);
n -= 1;
} else {
x = fmul32(x, x);
n >>= 1;
}
}
return r;
}
// n!
float factorial(int n) {
float r = 1.0;
for (int i = 1; i <= n; i++) {
r = fmul32(r, i2f32(i));
}
return r;
}
// Sine by Taylor series
float mySin(float x) {
float r = 0.0;
for (int n = 0; n < 5;n++) {
int k = f2i32(fadd32(fmul32(i2f32(2), i2f32(n)), i2f32(1)));
int s = 1 ^ ((-2) & -(n & 0x1));
r = fadd32(r, fdiv32(fmul32(i2f32(s), myPow(x, k)), factorial(k)));
}
return r;
}
```
## Analyze
List the cycle elapsed of [original C code](https://github.com/ranvd/ComputerArch/blob/main/hw1/Ripes.c)
```shell
ASFLAGS = -march=rv32i -mabi=ilp32
LDFLAGS = --oformat=elf32-littleriscv
riscv-none-elf-gcc $(ASFLAGS) -o0 -o sine sine.c
../../build/rv32emu sine
```
| O0 |O1 | O2 | O3 |
| -------- | -------- | -------- |-|
| 110635 | 16830 | 15326 | 13727|
:::warning
Improve your English writing via ChatGPT or similar tools.
:notes: jserv
:::
"List the cycle elapsed by the [original ABI code](https://github.com/ranvd/ComputerArch/blob/main/hw1/Ripes.s). I ran this code in Ripes because I didn't know how to translate floating-point numbers to strings. Based on the [rv32emu syscall](https://github.com/sysprog21/rv32emu/blob/master/src/syscall.c), if we want to print data, we can only use the 'write' operation, and it can only print strings. Here are the results we obtained."
"So, after verifying the correctness of the code, I removed the 'printf' for floating-point values and used..."[perfcounter](https://github.com/sysprog21/rv32emu/tree/master/tests/perfcounter)only print cycle

:::warning
You shall use RDCYCLE/RDCYCLEH instruction for the statistics of your program’s execution.
:notes: jserv
:::

And [unrolling code](https://github.com/ranvd/ComputerArch/blob/main/hw1/O_unrollRipes.s)

:::warning
original_sine.s:286: Error: illegal operands bgez t1 2f
original_sine.s:433: Error: illegal operands li a1 0x40000000
original_sine.s:483: Error: illegal operands la a0 sine
original_sine.s:484: Error: illegal operands lw a0 0(a0)
original_sine.s:485: Error: illegal operands li a7 2
:::
So i need to fix [Pseudo instruction](https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf)`p.122` to Base instruction
```diff
- bgez t1 2f
+ bge t1, x0, 2f
- li a1 0x40000000
+ lui a1, 0x40000
- la a0 sine
+ lui a0,%hi(sine)
+ addi a0,a0,%lo(sine)
- print
```
I compiled [ticks](https://github.com/sysprog21/rv32emu/blob/master/tests/ticks.c) into an {}.s file and simulated it.
Using the underlying command, I can observe how 'sineO3.s' prints the elapsed cycles of 'sine.s' when it runs in the rv32emu simulator.
```shell
Compile only; do not assemble or link.
$ $(CROSS_COMPILE)gcc $(ASFLAGS) O3 -o sineO3.S sine.c
```
Add RDCYCLE/RDCYCLEH instruction to main function
```asm
main:
# get tick
rdcycleh s1
rdcycle s0
rdcycleh a5
sub s1, s1, a5
seqz s1, s1
sub s1, zero, s1
and s0, s0, s1
# do sine function
la t0, rads
lw a0, 0(t0)
jal mySin
rdcycleh a3
rdcycle a5
rdcycleh a4
sub a3, a3, a4
seqz a3, a3
sub a3, zero, a3
and a5, a5, a3
sub a2,a5,s7
sgtu a5,a2,a5
sub a3,a3,s1
lui a0,%hi(.LC0)
sub a3,a3,a5
addi a0,a0,%lo(.LC0)
call printf
li a7, 93 //exit
ecall
```
.LC0: if we compile {}.s without use .global main, it will error with `undefined reference to main`
```asm
.LC0:
.string "elapsed cycle: %llu\n"
.section .text.startup,"ax",@progbits
.align 2
.globl main
.type main, @function
```
We will get cycle elapsed
| original sine.s |
| -------- |
| 17997 |
---
I try to compare different `.asm` file to figure out what cause different cycle elapse. and I made some modifications to reduce cycle.
Here because `j printAns` will jump to next label, so it can be remove.
```diff
main:
...
jal mySin
- j printAns
printAns:
mv t0, a0
...
```
Because of using `static inline function`, which the compiler copies the code from the function definition directly into the code of the calling function rather than creating a separate set of instructions in memory, so it can reduce the `jmp` instruction.
```c
static inline int64_t getbit(int64_t value, int n)
{ return (value >> n) & 1; }
```
We can see that there is `inline function` so we can copy underlying function code to caller.
This function can check the specific position count from LSB, and it can only count 32 bit, if we want to find 64 bit, we should use 2 32bit register to compute.
```asm
getbit:
# prologue
srl a0, a0, a1
andi a0, a0, 0x1
ret
```
I run this code again, we can find that cycle elapsed from 18058 to 17713

## related
https://mathworld.wolfram.com/Sine.html
[asm inline](https://www.cnblogs.com/sureZ-learning/p/16286560.html)