Try   HackMD

Assignment2: RISC-V Toolchain

contributed by < paulpeng-popo >

Choose a question

I selected Bitwise AND of Numbers Range from JoshuaLee, because its concept is similar to my topic which I have chosen in assignment 1.

Analyzation of Original code

This question is asking us to find the common prefix of left and right 's binary code.
To find the common prefix of left and right, we can shift both of them to the right until they are equal.

int rangeBitwiseAnd(int left, int right) {
    int res = 0;
    while (left != right) {
        left >>= 1;
        right >>= 1;
        res += 1;
    }
    return left << res;   
}

After making optimization by JoshuaLee it looks like

int rangeBitwiseAnd(int L, int R) {
    return L == R ? L : L & ~(0xffffffff >> clz(L ^ R));
}

He uses bitmask trick to remove loops in previous implementation.

To compare between these two function, I add read_cycles to calculate how much clock cycles will take during execution.

static inline uint64_t
read_cycles(void)
{
    uint64_t cycles;
    __asm__ volatile("rdcycle %0" : "=r"(cycles));
    return cycles;
}

And modify main()

int main()
{
    int left = 26, right = 30;
    uint64_t cnt0 = read_cycles();
    printf("%d\n", rangeBitwiseAnd(left, right));
    uint64_t cyclecount = read_cycles() - cnt0;
    printf("Cycle Count: %u\n", (unsigned int)cyclecount);
    return 0;
}

Compile it

$ riscv-none-elf-gcc -march=rv32i_zicsr_zifencei -mabi=ilp32 -Wall -S -o range.s range.c
$ riscv-none-elf-gcc -march=rv32i_zicsr_zifencei -mabi=ilp32 -Wall -c -o range.o range.s
$ riscv-none-elf-gcc -o a.elf range.o

Result of execution on rv32emu

24
Cycle Count: 1573

Disassembly of a.elf

a.elf:     file format elf32-littleriscv


Disassembly of section .text:

00010094 <exit>:
   10094:	1141                	add	sp,sp,-16
   10096:	4581                	li	a1,0
   10098:	c422                	sw	s0,8(sp)
   1009a:	c606                	sw	ra,12(sp)
   1009c:	842a                	mv	s0,a0
   1009e:	2f1000ef          	jal	10b8e 
   ...

00010142 <clz>:
   10142:	1101                	add	sp,sp,-32
   10144:	ce22                	sw	s0,28(sp)
   10146:	1000                	add	s0,sp,32
   10148:	fea42623          	sw	a0,-20(s0)
   1014c:	fec42783          	lw	a5,-20(s0)
   10150:	8385                	srl	a5,a5,0x1
   10152:	fec42703          	lw	a4,-20(s0)
   10156:	8fd9                	or	a5,a5,a4
   10158:	fef42623          	sw	a5,-20(s0)
   1015c:	fec42783          	lw	a5,-20(s0)
   10160:	8389                	srl	a5,a5,0x2
   ...

00010264 <rangeBitwiseAnd2>:
   10264:	1101                	add	sp,sp,-32
   10266:	ce06                	sw	ra,28(sp)
   10268:	cc22                	sw	s0,24(sp)
   1026a:	1000                	add	s0,sp,32
   1026c:	fea42623          	sw	a0,-20(s0)
   10270:	feb42423          	sw	a1,-24(s0)
   10274:	fec42703          	lw	a4,-20(s0)
   10278:	fe842783          	lw	a5,-24(s0)
   1027c:	00f71563          	bne	a4,a5,10286
   ...

Display the a.elf header

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           RISC-V
  Version:                           0x1
  Entry point address:               0x100c2
  Start of program headers:          52 (bytes into file)
  Start of section headers:          69304 (bytes into file)
  Flags:                             0x1, RVC, soft-float ABI
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         15
  Section header string table index: 14

Program size

   text	   data	    bss	    dec	    hex	filename
  51524	   1876	   1528	  54928	   d690	a.elf

Make the original Assembly run properly on rv32emu

According to rv32emu docs, rv32emu does not support printf system call.

Error will be like

$ rv32emu a.elf

unknown syscall 1

TODO: Modify assembly code

Observe different compilation options result

From gcc man page, there’s 8 different valid -O options

Optimization Options
-faggressive-loop-optimizations

-fweb -fwhole-program -fwpa -fuse-linker-plugin param
name=value -O -O0 -O1 -O2 -O3 -Os -Ofast -Og

  • -O/O1 (Optimize)
  • -O0 (Do no optimization. Default)
  • -O2 (Optimize even more)
  • -O3 (Optimize yet more)
  • -Os (Optimize for size)
  • -Ofast (Disregard strict standards compliance)
  • -Og (Optimize debugging experience)

In Makefile, we add OPT for variable to change compilation settings.

Default is -O0

SRC = range OUT = a.elf OPT = -O0 .PHONY: clean ASFLAGS = -march=rv32i_zicsr_zifencei -mabi=ilp32 $(OUT): $(SRC).o riscv-none-elf-gcc -o $(OUT) $(SRC).o $(SRC).o: $(SRC).s riscv-none-elf-gcc $(OPT) -c -o $(SRC).o $(SRC).s $(SRC).s: $(SRC).c riscv-none-elf-gcc $(OPT) -S -o $(SRC).s $(SRC).c clean: rm -f $(SRC).s $(SRC).o $(OUT)

Below we can observe at code size and cycle count it took

-O0 option

$ make OPT=-O0
$ riscv-none-elf-size a.elf > sizeO0

Cycle Count: 1573

   text	   data	    bss	    dec	    hex	filename
  51524	   1876	   1528	  54928	   d690	a.elf

-O1 option

$ make OPT=-O1
$ riscv-none-elf-size a.elf > sizeO1

Cycle Count: 1493

   text	   data	    bss	    dec	    hex	filename
  51194	   1876	   1528	  54598	   d546	a.elf

-O2 option

$ make OPT=-O2
$ riscv-none-elf-size a.elf > sizeO2

Cycle Count: 1493

   text	   data	    bss	    dec	    hex	filename
  51186	   1876	   1528	  54590	   d53e	a.elf

-O3 option

$ make OPT=-O3
$ riscv-none-elf-size a.elf > sizeO3

Cycle Count: 1452

   text	   data	    bss	    dec	    hex	filename
  51256	   1876	   1528	  54660	   d584	a.elf

-Os option

$ make OPT=-Os
$ riscv-none-elf-size a.elf > sizeOs

Cycle Count: 1493

   text	   data	    bss	    dec	    hex	filename
  51188	   1876	   1528	  54592	   d540	a.elf

-Ofast option

$ make OPT=-Ofast
$ riscv-none-elf-size a.elf > sizeOfast

Cycle Count: 1452

   text	   data	    bss	    dec	    hex	filename
  51256	   1876	   1528	  54660	   d584	a.elf

-Og option

$ make OPT=-Og
$ riscv-none-elf-size a.elf > sizeOg

Cycle Count: 1506

   text	   data	    bss	    dec	    hex	filename
  51176	   1876	   1528	  54580	   d534	a.elf

Conclusion

CSR

Name cycle count size
O0 1573 51524
O1 1493 51194
O2 1493 51186
O3 1452 51256
Os 1493 51188
Ofast 1452 51256

Besides option -O0, other options explicitly reduce the cycle count. The reason could be that the code is too simple, there is no more room for optimization.

And with option -Os, it indeed reduce the code size compare to others.

Improve your English writing. Explain more according to the experiments.
:notes: jserv