contributed by < paulpeng-popo
>
I selected Bitwise AND of Numbers Range from JoshuaLee
, because its concept is similar to my topic which I have chosen in assignment 1.
This question is asking us to find the common prefix of left and right 's binary code.
To find the common prefix of left and right, we can shift both of them to the right until they are equal.
int rangeBitwiseAnd(int left, int right) {
int res = 0;
while (left != right) {
left >>= 1;
right >>= 1;
res += 1;
}
return left << res;
}
After making optimization by JoshuaLee
it looks like
int rangeBitwiseAnd(int L, int R) {
return L == R ? L : L & ~(0xffffffff >> clz(L ^ R));
}
He uses bitmask trick to remove loops in previous implementation.
To compare between these two function, I add read_cycles
to calculate how much clock cycles will take during execution.
static inline uint64_t
read_cycles(void)
{
uint64_t cycles;
__asm__ volatile("rdcycle %0" : "=r"(cycles));
return cycles;
}
And modify main()
int main()
{
int left = 26, right = 30;
uint64_t cnt0 = read_cycles();
printf("%d\n", rangeBitwiseAnd(left, right));
uint64_t cyclecount = read_cycles() - cnt0;
printf("Cycle Count: %u\n", (unsigned int)cyclecount);
return 0;
}
Compile it
$ riscv-none-elf-gcc -march=rv32i_zicsr_zifencei -mabi=ilp32 -Wall -S -o range.s range.c
$ riscv-none-elf-gcc -march=rv32i_zicsr_zifencei -mabi=ilp32 -Wall -c -o range.o range.s
$ riscv-none-elf-gcc -o a.elf range.o
Result of execution on rv32emu
24
Cycle Count: 1573
Disassembly of a.elf
a.elf: file format elf32-littleriscv
Disassembly of section .text:
00010094 <exit>:
10094: 1141 add sp,sp,-16
10096: 4581 li a1,0
10098: c422 sw s0,8(sp)
1009a: c606 sw ra,12(sp)
1009c: 842a mv s0,a0
1009e: 2f1000ef jal 10b8e
...
00010142 <clz>:
10142: 1101 add sp,sp,-32
10144: ce22 sw s0,28(sp)
10146: 1000 add s0,sp,32
10148: fea42623 sw a0,-20(s0)
1014c: fec42783 lw a5,-20(s0)
10150: 8385 srl a5,a5,0x1
10152: fec42703 lw a4,-20(s0)
10156: 8fd9 or a5,a5,a4
10158: fef42623 sw a5,-20(s0)
1015c: fec42783 lw a5,-20(s0)
10160: 8389 srl a5,a5,0x2
...
00010264 <rangeBitwiseAnd2>:
10264: 1101 add sp,sp,-32
10266: ce06 sw ra,28(sp)
10268: cc22 sw s0,24(sp)
1026a: 1000 add s0,sp,32
1026c: fea42623 sw a0,-20(s0)
10270: feb42423 sw a1,-24(s0)
10274: fec42703 lw a4,-20(s0)
10278: fe842783 lw a5,-24(s0)
1027c: 00f71563 bne a4,a5,10286
...
Display the a.elf
header
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x100c2
Start of program headers: 52 (bytes into file)
Start of section headers: 69304 (bytes into file)
Flags: 0x1, RVC, soft-float ABI
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
Program size
text data bss dec hex filename
51524 1876 1528 54928 d690 a.elf
According to rv32emu docs, rv32emu
does not support printf system call.
Error will be like
$ rv32emu a.elf
unknown syscall 1
TODO: Modify assembly code
From gcc man page, there’s 8 different valid -O options
Optimization Options
-faggressive-loop-optimizations
…
-fweb -fwhole-program -fwpa -fuse-linker-plugin –param
name=value -O -O0 -O1 -O2 -O3 -Os -Ofast -Og
In Makefile, we add OPT
for variable to change compilation settings.
Default is -O0
SRC = range
OUT = a.elf
OPT = -O0
.PHONY: clean
ASFLAGS = -march=rv32i_zicsr_zifencei -mabi=ilp32
$(OUT): $(SRC).o
riscv-none-elf-gcc -o $(OUT) $(SRC).o
$(SRC).o: $(SRC).s
riscv-none-elf-gcc $(OPT) -c -o $(SRC).o $(SRC).s
$(SRC).s: $(SRC).c
riscv-none-elf-gcc $(OPT) -S -o $(SRC).s $(SRC).c
clean:
rm -f $(SRC).s $(SRC).o $(OUT)
Below we can observe at code size and cycle count it took
$ make OPT=-O0
$ riscv-none-elf-size a.elf > sizeO0
Cycle Count: 1573
text data bss dec hex filename
51524 1876 1528 54928 d690 a.elf
$ make OPT=-O1
$ riscv-none-elf-size a.elf > sizeO1
Cycle Count: 1493
text data bss dec hex filename
51194 1876 1528 54598 d546 a.elf
$ make OPT=-O2
$ riscv-none-elf-size a.elf > sizeO2
Cycle Count: 1493
text data bss dec hex filename
51186 1876 1528 54590 d53e a.elf
$ make OPT=-O3
$ riscv-none-elf-size a.elf > sizeO3
Cycle Count: 1452
text data bss dec hex filename
51256 1876 1528 54660 d584 a.elf
$ make OPT=-Os
$ riscv-none-elf-size a.elf > sizeOs
Cycle Count: 1493
text data bss dec hex filename
51188 1876 1528 54592 d540 a.elf
$ make OPT=-Ofast
$ riscv-none-elf-size a.elf > sizeOfast
Cycle Count: 1452
text data bss dec hex filename
51256 1876 1528 54660 d584 a.elf
$ make OPT=-Og
$ riscv-none-elf-size a.elf > sizeOg
Cycle Count: 1506
text data bss dec hex filename
51176 1876 1528 54580 d534 a.elf
CSR
Name | cycle count | size |
---|---|---|
O0 | 1573 | 51524 |
O1 | 1493 | 51194 |
O2 | 1493 | 51186 |
O3 | 1452 | 51256 |
Os | 1493 | 51188 |
Ofast | 1452 | 51256 |
Besides option -O0
, other options explicitly reduce the cycle count. The reason could be that the code is too simple, there is no more room for optimization.
And with option -Os
, it indeed reduce the code size compare to others.
Improve your English writing. Explain more according to the experiments.
jserv