# Assignment2: GNU Toolchain contributed by < [`Hotmercury`](https://github.com/Hotmercury) > > [Assignment requirements](https://hackmd.io/@sysprog/2023-arch-homework2) ## [lab2](https://hackmd.io/@sysprog/SJAR5XMmi) step **basic command** compiler {}.c ``` $ riscv-none-elf-gcc {}.c ``` Run elf file ``` $ build/re32emu ``` Display the assembler ``` $ riscv-none-elf-objdump -d {file}.elf ``` Display the FILE file header ``` $ riscv-none-elf-readelf -h {file}.elf ``` List section size ``` $ riscv-none-elf-size {}.elf ``` ## Get sine value without floating point multiplication support I choose the problem from [戴鈞彥](https://hackmd.io/@ranvd/computer-arch-hw1), because I want to leran more about IEEE translate between Integer and floating pointer. ![](https://hackmd.io/_uploads/ryvvxHGfT.png) We can find the sine mathmetic function is $$ sin(x)\approx \sum^n_{i = 0}{{(-1)^{i}}\over(2i+1)!}{x^{2i+1}} $$ **original c code** `fmul32` ### Fix makefile **problem** > when I input make command, sometimes will error. <s> ![](https://hackmd.io/_uploads/r1UzkKzza.png) </s> :::warning :warning: Don't put the screenshots which contain plain text only. Instead, utilize HackMD syntax to annotate the text. :notes: jserv ::: :::success Because it(riscv-none-elf-) should to export to environment again or set to global ervironment. $ source ~/riscv-none-elf-gcc/setenv ::: > where to define $(CROSS_COMPILE) and $(RM) asm-hello makefile flow, we can know how to product `{}.o file` and `{}.elf file` ``` $(CROSS_COMPILE) = riscv-none-elf- $(RM) = rm-rf ``` Two step ``` fold data section into text section $ riscv-none-elf-as -R -march=rv32i mabi=ilp32 -o hello.o hello.S $ riscv-none-elf-ld -o hello.elf -T hello.ld --oformat=elf32-littleriscv hello.o ``` ![](https://hackmd.io/_uploads/ry88GdGz6.png) :::spoiler Makefile will error ``` .PHONY: clean include ../../mk/toolchain.mk ASFLAGS = -march=rv32i -mabi=ilp32 LDFLAGS = --oformat=elf32-littleriscv %.o: %.S $(CROSS_COMPILE)as -R $(ASFLAGS) -o $@ $< all: sine.elf sine.elf : sine.o $(CROSS_COMPILE)ld -o $@ -T sine.ld $(LDFLAGS) $< sine.S : sine.c $(CROSS_COMPILE)gcc -S $(ASFLAGS) -o $@ $< clean: $(RM) $(TARGET) $(OBJ) $(ASM) ``` ::: ``` riscv-none-elf-gcc -S {}.c // {}.c -> {}.S riscv-none-elf-as -R {}.o // {}.S -> {}.o riscv-none-elf-ld -T {}.ld {}.o // {}.o -> {}.elf ``` ![](https://hackmd.io/_uploads/Sk3_XtMzT.png) Typing `$make` it will get error. :::danger warning : can not fund entry symbol _start; defaulting to 00000000 undefined reference to `put` ::: :::success Because compiler will translate printf to put, but we can't find put system call ::: So I add append "-fno-builtin" to GCC like this, but still get underlying error. `$ riscv-none-elf-gcc -S $(ASFLAG) -fno-builtin -o {}.s {}.c` `$ riscv-none-elf-as -R $(ASFLAG) -o {}.o {}.s` `$ riscv-none-elf-ld -T sine.ld $(LDFLAG) -o {}.elf {}.o` :::danger sine.c:(.text+0xd18): undefined reference to `printf' ::: > I think error is come from riscv-none-elf-ld ## code [Get sine value without floating point multiplication support](https://github.com/ranvd/ComputerArch/tree/main/hw1) **Assembly in c** > Where can I find that total document about this method ? We should use keyword `asm volatile`, every instruction we use `\n` to reply next instrution ``` asm volatile( asm instructions ∶ output(not necessary) ∶ input(not necessary) ∶ broken(not necessary)); ``` Output We can use `:` imply new line. There two method 1. `:[out1]"=r"(output1), [out2]"=r"(output2)` choose specific symbolic 2. `"r"=(output)` %0, %1 order select So if we use asm inline we can etting up the corresponding registers ```c : "=r"(h), "=r"(l), "=r"(h2)); ``` --- ### fmul32 We can reduce branch from 2 to 1 ```c float fmul32(float a, float b) { /* TODO: Special values like NaN and INF */ int32_t ia = *(int32_t *)&a, ib = *(int32_t *)&b; // if (ia == 0 || ib == 0) return 0; fix here if (!(ia | ib)) return 0; ``` ```c /* mantissa */ int32_t ma = (ia & 0x7FFFFF) | 0x800000; int32_t mb = (ib & 0x7FFFFF) | 0x800000; /* exponent */ int32_t sea = ia & 0xFF800000; int32_t seb = ib & 0xFF800000; /* result of mantissa */ int32_t m = imul24(ma, mb); int32_t mshift = getbit(m, 24); m >>= mshift; int32_t r = ((sea - 0x3f800000 + seb) & 0xFF800000) + (m & (0x7fffff | mshift << 23)); ``` We can check overflow with `Sr ^ Sb ^ Sa`. If an overflow occurs `ovfl` will be -1, so `r ^ r` will offset, so we can set `r` to NaN. ```c int32_t ovfl = (r ^ seb ^ sea) >> 31; r = r ^ ((r ^ 0x7f800000) & ovfl); return *(float *)&r; } ``` ### imul32 ```c static int32_t imul24(int32_t a, int32_t b) { uint32_t r = 0; for (; b; b >>= 1) r = (r >> 1) + (a & -getbit(b, 0)); return r; } ``` I think we can use checking multiplier bit directly, It may reduce the time and cycle to use `jal` to call `getbit` function. ```c static int32_t imul24(int32_t a, int32_t b){ uint32_t r = 0; for(; b; b>>= 1) r = (r >> 1) + (a & (b & 1)); return r; } ``` Here can use unrolling skill to reduce execution cycle. ### fdiv32 I remove `getbit()` ```c float fdiv32(float a, float b){ int32_t ia = *(int32_t *)&a, ib = *(int32_t *)&b; if (a == 0) return a; if (b == 0) return *(float*)&(int){0x7f800000}; /* mantissa */ int32_t ma = (ia & 0x7FFFFF) | 0x800000; int32_t mb = (ib & 0x7FFFFF) | 0x800000; /* sign and exponent */ int32_t sea = ia & 0xFF800000; int32_t seb = ib & 0xFF800000; /* result of mantissa */ int32_t m = idiv24(ma, mb); int32_t mshift = !getbit(m, 31); m <<= mshift; int32_t r = ((sea - seb + 0x3f800000) - (0x800000 & -mshift)) | (m & 0x7fffff00) >> 8; int32_t ovfl = (ia ^ ib ^ r) >> 31; r = r ^ ((r ^ 0x7f800000) & ovfl); return *(float *) &r; // return a / b; } ``` ### idiv24 ```c static int32_t idiv24(int32_t a, int32_t b) { uint32_t r = 0; for (int i = 0; i < 32; i++) { a -= b; r = (r << 1) | a >= 0; a = (a + (b & -(a < 0))) << 1; } return r; } ``` ![](https://hackmd.io/_uploads/ry6BGaoMa.png) ### fadd32 ```c float fadd32(float a, float b) { int32_t ia = *(int32_t *)&a, ib = *(int32_t *)&b; int32_t cmp_a = ia & 0x7fffffff; int32_t cmp_b = ib & 0x7fffffff; if (cmp_a < cmp_b) iswap(ia, ib); /* exponent */ int32_t ea = (ia >> 23) & 0xff; int32_t eb = (ib >> 23) & 0xff; /* mantissa */ int32_t ma = ia & 0x7fffff | 0x800000; int32_t mb = ib & 0x7fffff | 0x800000; int32_t align = (ea - eb > 24) ? 24 : (ea - eb); mb >>= align; if ((ia ^ ib) >> 31) { ma -= mb; } else { ma += mb; } int32_t clz = count_leading_zeros(ma); int32_t shift = 0; if (clz <= 8) { shift = 8 - clz; ma >>= shift; ea += shift; } else { shift = clz - 8; ma <<= shift; ea -= shift; } int32_t r = ia & 0x80000000 | ea << 23 | ma & 0x7fffff; float tr = a + b; return *(float *)&r; } ``` ### f2i32 & i2f32 This piece of code is for transforming between integers and floating-point numbers. I noticed that this code alone cannot handle sign conversions. Therefore, I added some checks to ensure that the sign bit is preserved, even when it's set to 1. ```c int f2i32(int x) { int32_t a = *(int *)&x; int32_t ma = (a & 0x7FFFFF) | 0x800000; int32_t ea = ((a >> 23) & 0xFF) - 127; if (ea < 0) return 0; else if (ea <= 23) ma >>= (23 - ea); else ma <<= (ea - 23); return ma; } ``` Add sign condition ```c int f2i32(float x) { int32_t a = *(int *)&x; int32_t ma = (a & 0x7FFFFF) | 0x800000; int32_t ea = ((a >> 23) & 0xFF) - 127; int32_t sa = a & 0x80000000; if (ea < 0) return 0; else if (ea <= 23) ma >>= (23 - ea); else ma <<= (ea - 23); if(a) return -ma; return ma; } ``` ```c int i2f32(int x) { if (x == 0) return 0; int32_t s = x & 0x80000000; if (s) x = -x; int32_t clz = count_leading_zeros(x); int32_t e = 31 - clz + 127; if (clz <= 8) { x >>= 8 - clz; } else { x <<= clz - 8; } int r = s | e << 23 | x & 0x7fffff; return r; } ``` ### sin `int s = 1 ^ ((-2) & -(n & 0x1));` -> -1 is 0xFFFFFFFF, and 1 is 0x1, so this code can decide `if n is odd -> s = -1 else s = 1` without if branch. ```c float myPow(float x, int n) { float r = 1.0; while (n) { if (n & 0x1) { r = fmul32(r, x); n -= 1; } else { x = fmul32(x, x); n >>= 1; } } return r; } // n! float factorial(int n) { float r = 1.0; for (int i = 1; i <= n; i++) { r = fmul32(r, i2f32(i)); } return r; } // Sine by Taylor series float mySin(float x) { float r = 0.0; for (int n = 0; n < 5;n++) { int k = f2i32(fadd32(fmul32(i2f32(2), i2f32(n)), i2f32(1))); int s = 1 ^ ((-2) & -(n & 0x1)); r = fadd32(r, fdiv32(fmul32(i2f32(s), myPow(x, k)), factorial(k))); } return r; } ``` ## Analyze List the cycle elapsed of [original C code](https://github.com/ranvd/ComputerArch/blob/main/hw1/Ripes.c) ```shell ASFLAGS = -march=rv32i -mabi=ilp32 LDFLAGS = --oformat=elf32-littleriscv riscv-none-elf-gcc $(ASFLAGS) -o0 -o sine sine.c ../../build/rv32emu sine ``` | O0 |O1 | O2 | O3 | | -------- | -------- | -------- |-| | 110635 | 16830 | 15326 | 13727| :::warning Improve your English writing via ChatGPT or similar tools. :notes: jserv ::: "List the cycle elapsed by the [original ABI code](https://github.com/ranvd/ComputerArch/blob/main/hw1/Ripes.s). I ran this code in Ripes because I didn't know how to translate floating-point numbers to strings. Based on the [rv32emu syscall](https://github.com/sysprog21/rv32emu/blob/master/src/syscall.c), if we want to print data, we can only use the 'write' operation, and it can only print strings. Here are the results we obtained." "So, after verifying the correctness of the code, I removed the 'printf' for floating-point values and used..."[perfcounter](https://github.com/sysprog21/rv32emu/tree/master/tests/perfcounter)only print cycle ![image.png](https://hackmd.io/_uploads/ByF-JWMm6.png) :::warning You shall use RDCYCLE/RDCYCLEH instruction for the statistics of your program’s execution. :notes: jserv ::: ![](https://hackmd.io/_uploads/rkcUxWpMT.png) And [unrolling code](https://github.com/ranvd/ComputerArch/blob/main/hw1/O_unrollRipes.s) ![](https://hackmd.io/_uploads/BkU2eWpz6.png) :::warning original_sine.s:286: Error: illegal operands bgez t1 2f original_sine.s:433: Error: illegal operands li a1 0x40000000 original_sine.s:483: Error: illegal operands la a0 sine original_sine.s:484: Error: illegal operands lw a0 0(a0) original_sine.s:485: Error: illegal operands li a7 2 ::: So i need to fix [Pseudo instruction](https://riscv.org/wp-content/uploads/2017/05/riscv-spec-v2.2.pdf)`p.122` to Base instruction ```diff - bgez t1 2f + bge t1, x0, 2f - li a1 0x40000000 + lui a1, 0x40000 - la a0 sine + lui a0,%hi(sine) + addi a0,a0,%lo(sine) - print ``` I compiled [ticks](https://github.com/sysprog21/rv32emu/blob/master/tests/ticks.c) into an {}.s file and simulated it. Using the underlying command, I can observe how 'sineO3.s' prints the elapsed cycles of 'sine.s' when it runs in the rv32emu simulator. ```shell Compile only; do not assemble or link. $ $(CROSS_COMPILE)gcc $(ASFLAGS) O3 -o sineO3.S sine.c ``` Add RDCYCLE/RDCYCLEH instruction to main function ```asm main: # get tick rdcycleh s1 rdcycle s0 rdcycleh a5 sub s1, s1, a5 seqz s1, s1 sub s1, zero, s1 and s0, s0, s1 # do sine function la t0, rads lw a0, 0(t0) jal mySin rdcycleh a3 rdcycle a5 rdcycleh a4 sub a3, a3, a4 seqz a3, a3 sub a3, zero, a3 and a5, a5, a3 sub a2,a5,s7 sgtu a5,a2,a5 sub a3,a3,s1 lui a0,%hi(.LC0) sub a3,a3,a5 addi a0,a0,%lo(.LC0) call printf li a7, 93 //exit ecall ``` .LC0: if we compile {}.s without use .global main, it will error with `undefined reference to main` ```asm .LC0: .string "elapsed cycle: %llu\n" .section .text.startup,"ax",@progbits .align 2 .globl main .type main, @function ``` We will get cycle elapsed | original sine.s | | -------- | | 17997 | --- I try to compare different `.asm` file to figure out what cause different cycle elapse. and I made some modifications to reduce cycle. Here because `j printAns` will jump to next label, so it can be remove. ```diff main: ... jal mySin - j printAns printAns: mv t0, a0 ... ``` Because of using `static inline function`, which the compiler copies the code from the function definition directly into the code of the calling function rather than creating a separate set of instructions in memory, so it can reduce the `jmp` instruction. ```c static inline int64_t getbit(int64_t value, int n) { return (value >> n) & 1; } ``` We can see that there is `inline function` so we can copy underlying function code to caller. This function can check the specific position count from LSB, and it can only count 32 bit, if we want to find 64 bit, we should use 2 32bit register to compute. ```asm getbit: # prologue srl a0, a0, a1 andi a0, a0, 0x1 ret ``` I run this code again, we can find that cycle elapsed from 18058 to 17713 ![](https://hackmd.io/_uploads/rJw-BQ6G6.png) ## related https://mathworld.wolfram.com/Sine.html [asm inline](https://www.cnblogs.com/sureZ-learning/p/16286560.html)