Try   HackMD

Assignment2: GNU Toolchain

contributed by < RayChen >

Selected Question

I chose the Find Leftmost 0-byte using CLZ as my assignment from ι™³ε·ζ›œ.

The motivation is because I'm interested in his project, especially since my HW1 also involved the application of CLZ. I didn't realize there could be this kind of application, and I'd like to gain a deeper understanding of his code. I want to explore if there are ways to enhance its efficiency and contribute to his project.

Analysis Origin Code

Checking cycle counts from C code

1. C code

#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>

// count cycle
typedef uint64_t ticks;
static inline ticks getticks(void)

{   uint64_t result;
    uint32_t l, h, h2;
    asm volatile(
        "rdcycleh %0\n"
        "rdcycle %1\n"
        "rdcycleh %2\n"
        "sub %0, %0, %2\n"
        "seqz %0, %0\n"
        "sub %0, zero, %0\n"
        "and %1, %1, %0\n"
        : "=r"(h), "=r"(l), "=r"(h2));
    result = (((uint64_t) h) << 32) | ((uint64_t) l);
    return result;
}

uint16_t count_leading_zeros(uint64_t x)
{
    x |= (x >> 1);
    x |= (x >> 2);
    x |= (x >> 4);
    x |= (x >> 8);
    x |= (x >> 16);
    x |= (x >> 32);

    /* count ones (population count) */
    x -= ((x >> 1) & 0x5555555555555555);
    x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333);
    x = ((x >> 4) + x) & 0x0f0f0f0f0f0f0f0f;
    x += (x >> 8);
    x += (x >> 16);
    x += (x >> 32);
    return (64 - (x & 0x7f));
}

uint16_t zbytel(uint64_t x) {
    uint64_t y;
    uint16_t n;
    y = (x & 0x7F7F7F7F7F7F7F7F)+ 0x7F7F7F7F7F7F7F7F;   // convert each 0-byte to 0x80
    y = ~(y | x |0x7F7F7F7F7F7F7F7F);                  //and each nonzero byte to 0x00
    n = count_leading_zeros(y) >> 3  ;  // use number of leading zeros
    return n;                          // n = 0 ... 8 , 8 if x has no 0-byte.
}

int main(int argc, char *argv[ ] ){

    ticks t0 = getticks();
    uint64_t a = 0x1122334455007700; //In this example,
    uint16_t zbla = zbytel(a);       //the value is 5.
    uint64_t b = 0x1122334455667788; //Another example,
    uint16_t zblb = zbytel(b);       //the value is 8
    ticks t1 = getticks();
    printf("elapsed cycle: %d\n" , t1 - t0);    //cycle number

    printf("%llx\n",a);
    printf("%d\n",zbla);
    printf("%llx\n",b);
    printf("%d\n",zblb);
    return 0;

}

2. Makefile for C code

Using the gcc command converting .c file -> .S file -> .o file -> .elf file.

.PHONY: clean

include ../../../mk/toolchain.mk

CFLAGS = -march=rv32i -mabi=ilp32
ASFLAGS = -march=rv32i -mabi=ilp32
LDFLAGS = --oformat=elf32-littleriscv

%.S: %.c
	$(CROSS_COMPILE)gcc $(CFLAGS) -o $@ -S $<

%.o: %.S
	$(CROSS_COMPILE)as $(ASFLAGS) -o $@ $<

all: hugohw.elf

hugohw.S: hugohw.c
	$(CROSS_COMPILE)gcc $(CFLAGS) -o $@ -S $<

hugohw.elf: hugohw.o
	 $(CROSS_COMPILE)gcc -o $@ $<

clean:
	$(RM) hugohw.elf hugohw.o hugohw.S

3. Output of cycle

The red underlined part represents the printed cycle count.

![](https://hackmd.io/_uploads/rk86kzFGp.png)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’
Don't put the screenshots which contain plain text only. Instead, utilize HackMD syntax to annotate the text.
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’
jserv

4. Problem encountered during the process

![](https://hackmd.io/_uploads/B11TlfKM6.png)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’
Don't put the screenshots which contain plain text only. Instead, utilize HackMD syntax to annotate the text.
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’
jserv

Solution: change the uppercase 'S' to lowercase in the makefile.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More β†’

Using optimization to optimize cycle count

1. Add -On command(n represent 0~3 or fast)

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More β†’

2. Print out cycle count

Optimization level Cycles count
-O0 741
-O1 204
-O2 204
-O3 0
-Ofast 0

Analysis: I think the cycle count is 0 because the optimizer considers the part where I calculate the cycle count as non-essential code, so it optimizes it out.

Checking cycle counts from Assembly code

1. Using get_cycles function

.text
.globl get_cycles
.align 2
get_cycles:
    csrr a1, cycleh
    csrr a0, cycle
    csrr a2, cycleh
    bne a1, a2, get_cycles
    ret

.size get_cycles,.-get_cycles

2. Makefile for Assembly code

Using the assembler command converting .S file -> .o file
Using the linker command converting .o file -> .elf file.

.PHONY: clean

include ../../../mk/toolchain.mk

CFLAGS = -march=rv32i_zicsr_zifencei -mabi=ilp32
ASFLAGS = -march=rv32i_zicsr_zifencei -mabi=ilp32
LDFLAGS = --oformat=elf32-littleriscv

%.o: %.S
	$(CROSS_COMPILE)as $(ASFLAGS) -o $@ $<

all: hugohw.elf 

hugohw.elf: hugohw.o
	 $(CROSS_COMPILE)ld -o $@ -T hugohw.ld $(LDFLAGS) $<

clean:
	$(RM) hugohw.elf hugohw.o hugohw.S

3. rv32emu print out decimal

Although the code mentioned above can print the cycle count, but the rv32emu doesn't support the printf function. Therefore, we need to first store the cycle count's individual digits and convert them into ASCII codes to display it.
The code is display bellow:

# a0: integer of cycle
# a1: number of bytes in buffer

print_ascii:
    mv t0, a0     # load integer
    li t1, 0      # t1 = quotient
    li t2, 0      # t2 = reminder
    li t3, 10     # t3 = divisor
    mv t4, a1     # t4 = count round

check_less_then_ten:
    bge t0, t3, divide
    mv t2, t0
    mv t0, t1    # t0 = quotient
    j to_ascii

divide:
    sub t0, t0, t3
    addi t1, t1, 1
    j check_less_then_ten

to_ascii:
    addi t2, t2, 48	# reminder to ascii
    la t5, buffer  # t5 = buffer addr
    addi t4, t4, -1
    add t5, t5, t4
    sb t2, 0(t5)

    

    # counter = 0 exit
    beqz t4, convert_loop_done
    li t1, 0 # refresh quotient
    j check_less_then_ten

convert_loop_done:
    ret

4. Output of cycle

5. Problem encountered during the process

  Problem 1:
   The error said that I redefined _start argument
   Reason : No linking file specified for the linker.


  Solution 1:
   Add -T hugohw.ld in command.



  Problem 2:
   When executing the Makefile, the "csrr" instruction cannot be executed.
   Reason : Incorrect compiler architecture

  Solution 2:



Rewrite and Optimize Assembly code