Try   HackMD

Assignment2: RISC-V Toolchain

contributed by kk908676

Question Selection

I refer to the Find First String of 1-bits of a Given Length by CLZ from 林柏全

The following is the source code:

#include <stdint.h>
#include <stdio.h>

uint16_t count_leading_zeros(uint64_t x)
{
    x |= (x >> 1);
    x |= (x >> 2);
    x |= (x >> 4);
    x |= (x >> 8);
    x |= (x >> 16);
    x |= (x >> 32);

    x -= ((x >> 1) & 0x5555555555555555);
    x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333);
    x = ((x >> 4) + x) & 0x0f0f0f0f0f0f0f0f;
    x += (x >> 8);
    x += (x >> 16);
    x += (x >> 32);

    return (64 - (x & 0x7f));
}

int find_string(uint64_t x, int n){  
    int clz;    // leading zeros of x

    int pos = 0;    // position of fist '1' bit from significant bit

    while(x != 0){
        clz = count_leading_zeros(x);
        x = x << clz;
        pos = pos + clz;
        clz = count_leading_zeros(~x);
        if (clz >= n)
            return pos;
        x = x << clz;
        pos = pos + clz;
    }

    return -1;
}

int main() {
    
    uint64_t test_data[] = {0x0f00000000000000, 
                            0x0000000000000000, 
                            0x0123456789abcdef};

    for (int i = 0; i < 3; i++) {
        uint64_t x = test_data[i];
        int n = 4; 
        int result = find_string(x, n);

        printf("Test Case %d: Input: 0x%016lx, Result: %d\n", i+1, x, result);
    }

    return 0;  
}

Motivation

The importance of "Find First String of 1-bits of a Given Length by CLZ" is primarily manifested in its efficiency and relevance in bitwise operations, particularly in programming contexts where performance is emphasized. This includes optimizing algorithms, performance improvement, and applications in bit-level security and cryptography.

Compile and excecute the code

First, we need to write a Makefile that suits our needs.

.PHONY: clean

include /home/an/rv32emu/mk/toolchain.mk

ASFLAGS = -march=rv32i_zicsr -mabi=ilp32
SOURCE = main.o initial.o getcycles.o

VAR = -O0
CC = riscv-none-elf-gcc

%.s: %.c
	$(CC) $(ASFLAGS) $(VAR) -s -o $@ $<
	
%.o: %.s
	$(CC) $(ASFLAGS) -c -o $@ $<

all: main.elf

main.elf: $(SOURCE)
	$(CC) -o $@ $(SOURCE)

clean:
	$(RM) main.elf main.o initial.o getcycles.o

Next, we execute the following instructions

$ ~/rv32emu/build/rv32emu main.elf

Here, we obtain the O0 version

![](https://hackmd.io/_uploads/HkxAy7LzT.png)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Don't put the screenshots which contain plain text only. Instead, utilize HackMD syntax to annotate the text.
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
jserv

Display the main.elf header

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           RISC-V
  Version:                           0x1
  Entry point address:               0x100c2
  Start of program headers:          52 (bytes into file)
  Start of section headers:          69400 (bytes into file)
  Flags:                             0x1, RVC, soft-float ABI
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         15
  Section header string table index: 14

Program size

    text	   data	    bss	    dec	    hex	filename
    52590	   1876	   1528	  55994	   daba	main.elf

O1 optimization

cycle count

![](https://hackmd.io/_uploads/SkjF_XUGT.png)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Don't put the screenshots which contain plain text only. Instead, utilize HackMD syntax to annotate the text.
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
jserv

Display the main.elf header

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           RISC-V
  Version:                           0x1
  Entry point address:               0x100c2
  Start of program headers:          52 (bytes into file)
  Start of section headers:          69408 (bytes into file)
  Flags:                             0x1, RVC, soft-float ABI
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         15
  Section header string table index: 14

Program size

    text	   data	    bss	    dec	    hex	filename
    51812	   1884	   1528	  55224	   d7b8	main.elf
  • It can be observed that both the cycle count and text size have decreased.

O2 optimization

cycle count

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

main.elf header

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           RISC-V
  Version:                           0x1
  Entry point address:               0x1016e
  Start of program headers:          52 (bytes into file)
  Start of section headers:          69408 (bytes into file)
  Flags:                             0x1, RVC, soft-float ABI
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         15
  Section header string table index: 14

Program size

    text	   data	    bss	    dec	    hex	filename
    51834	   1884	   1528	  55246	   d7ce	main.elf
  • O2 didn't perform as well as expected; overall, it didn't outperform O1.

O3 optimization

cycle count

main.elf header

ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           RISC-V
  Version:                           0x1
  Entry point address:               0x1016e
  Start of program headers:          52 (bytes into file)
  Start of section headers:          69408 (bytes into file)
  Flags:                             0x1, RVC, soft-float ABI
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         15
  Section header string table index: 14

Program size

    text	   data	    bss	    dec	    hex	filename
    52380	   1884	   1528	  55792	   d9f0	main.elf
  • O3 is similar to O2.

Modified the original Assembly on the rv32emu

First,I need to rewrite the code to fit the format of rv32emu.Because integer output is not supported, I attempted to modify the system call to output the cycle count.

In this code, I have added
.set SYSPRINTFCYCLE, 32

And added it to syscall.c

static void syscall_printfcycle(riscv_t *rv)
{
    uint32_t buffer = rv_get_reg(rv, rv_reg_a3);

    fprintf(stdout, "Cycle count: %d\n", buffer);

    rv_set_reg(rv, rv_reg_a3, 0);
}
an@an-Standard-PC-i440FX-PIIX-1996:~/Desktop/Hw2_asm$ ~/rv32emu/build/rv32emu main.elf 
Cycle count: 348
inferior exit code 0

Assembly optimization

Later, I attempted to make changes to the overflow handling to reduce the cycle count. Eventually, I found that the overflow check here was unnecessary for this program and proceeded to remove it.

I made reductions to these aspects:

blt   t1, t3, 16    # if underflow then jump

beq   zero, zero, 12
addi  t0, t0, -1    # underflow at lower bits
sub   t0, t0, t2        #[t0, t1] => x -= ((x>>1) & 0x5555555555555555)

or t4, t1, zero
xor t4, s4, t4    # nor t5, t1, zero    (t4 = ~(s4 | zero))
bgeu t4, t3, 8    # if no overflow then jump
addi t0, t0, 1    # if overflow upper bits plus 1
    
or t4, t1, zero
xor t4, s4, t4    # nor t5, t1, zero    (t4 = ~(s4 | zero))
bgeu t4, t3, 8    # if no overflow then jump
addi t0, t0, 1    # if overflow upper bits plus 1
    
or      t4, t1, zero
xor     t4, s4, t4
bgeu    t4, t3, 8
addi    t0, t0, 1

or      t4, t1, zero
xor     t4, s4, t4
bgeu    t4, t3, 8
addi    t0, t0, 1
    
or      t4, t1, zero
xor     t4, s4, t4
bgeu    t4, t3, 8
addi    t0, t0, 1

This is the cycle count obtained after the reductions

$ ~/rv32emu/build/rv32emu main.elf 
Cycle count: 314
inferior exit code 0

Next, I want to try reducing the text size.

  • Before
# x |= (x>>1);
    srli  t3, t1, 1    # shift lower bits of test data right with 1 bit
    slli  t2, t0, 31   # shift upper bits of test data left with 31 bits 
    or    t3, t2, t3   # combine to get new lower bits of test data
    srli  t2, t0, 1    # shift upper bound of test data right with 1 bit
    or    t0, t0, t2   # [0~31]x | [0~31](x >> 1)
    or    t1, t1, t3   # [32~63]x | [32~63](x >> 1)
    # x |= (x>>2);
    srli  t3, t1, 2  
    slli  t2, t0, 30  
    or    t3, t2, t3  
    srli  t2, t0, 2   
    or    t0, t0, t2  
    or    t1, t1, t3 
    # x |= (x>>4);
    srli  t3, t1, 4  
    slli  t2, t0, 28  
    or    t3, t2, t3  
    srli  t2, t0, 4   
    or    t0, t0, t2  
    or    t1, t1, t3  
    # x |= (x>>8);
    srli  t3, t1, 8  
    slli  t2, t0, 24  
    or    t3, t2, t3  
    srli  t2, t0, 8   
    or    t0, t0, t2  
    or    t1, t1, t3
    # x |= (x>>16);
    srli  t3, t1, 16  
    slli  t2, t0, 16  
    or    t3, t2, t3  
    srli  t2, t0, 16   
    or    t0, t0, t2  
    or    t1, t1, t3
  • After
    li    a5, 1
    li    a6, 32
loop:
    sub   a7, a6, a5
    srl   t3, t1, a5    # shift lower bits of test data right with n bit
    sll  t2, t0, a7   # shift upper bits of test data left with 31-n bits 
    or    t3, t2, t3   # combine to get new lower bits of test data
    srl   t2, t0, a5    # shift upper bound of test data right with n bit
    or    t0, t0, t2   # [0~31]x | [0~31](x >> n)
    or    t1, t1, t3   # [32~63]x | [32~63](x >> n)
    slli  a5, a5, 1
    beq   a5, a6, CE 
    j     loop

Similarly, we execute it in the terminal to obtain the result.

    text	data	 bss	  dec	     hex filename
     640          0       0      640       280  main.elf

Compare to the gcc compile

analyze

target cycle size
O0 8918 52590
O1 6796 51812
O2 6788 51834
O3 6883 52380
assembly 314 640

You shall explain why C implementation is significantly slower.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
jserv