# Assignment2: GNU Toolchain contributed by < [`RayChen`](https://github.com/padaray) > ## Selected Question I chose the `Find Leftmost 0-byte using CLZ` as my assignment from [陳川曜](https://hackmd.io/@cychen/computer_architecture_hw1). The motivation is because I'm interested in his project, especially since my HW1 also involved the application of CLZ. I didn't realize there could be this kind of application, and I'd like to gain a deeper understanding of his code. I want to explore if there are ways to enhance its efficiency and contribute to his project. ## Analysis Origin Code ### Checking cycle counts from C code #### 1. C code ```c #include <inttypes.h> #include <stdint.h> #include <stdio.h> // count cycle typedef uint64_t ticks; static inline ticks getticks(void) { uint64_t result; uint32_t l, h, h2; asm volatile( "rdcycleh %0\n" "rdcycle %1\n" "rdcycleh %2\n" "sub %0, %0, %2\n" "seqz %0, %0\n" "sub %0, zero, %0\n" "and %1, %1, %0\n" : "=r"(h), "=r"(l), "=r"(h2)); result = (((uint64_t) h) << 32) | ((uint64_t) l); return result; } uint16_t count_leading_zeros(uint64_t x) { x |= (x >> 1); x |= (x >> 2); x |= (x >> 4); x |= (x >> 8); x |= (x >> 16); x |= (x >> 32); /* count ones (population count) */ x -= ((x >> 1) & 0x5555555555555555); x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333); x = ((x >> 4) + x) & 0x0f0f0f0f0f0f0f0f; x += (x >> 8); x += (x >> 16); x += (x >> 32); return (64 - (x & 0x7f)); } uint16_t zbytel(uint64_t x) { uint64_t y; uint16_t n; y = (x & 0x7F7F7F7F7F7F7F7F)+ 0x7F7F7F7F7F7F7F7F; // convert each 0-byte to 0x80 y = ~(y | x |0x7F7F7F7F7F7F7F7F); //and each nonzero byte to 0x00 n = count_leading_zeros(y) >> 3 ; // use number of leading zeros return n; // n = 0 ... 8 , 8 if x has no 0-byte. } int main(int argc, char *argv[ ] ){ ticks t0 = getticks(); uint64_t a = 0x1122334455007700; //In this example, uint16_t zbla = zbytel(a); //the value is 5. uint64_t b = 0x1122334455667788; //Another example, uint16_t zblb = zbytel(b); //the value is 8 ticks t1 = getticks(); printf("elapsed cycle: %d\n" , t1 - t0); //cycle number printf("%llx\n",a); printf("%d\n",zbla); printf("%llx\n",b); printf("%d\n",zblb); return 0; } ``` #### 2. Makefile for C code Using the gcc command converting .c file -> .S file -> .o file -> .elf file. ```c .PHONY: clean include ../../../mk/toolchain.mk CFLAGS = -march=rv32i -mabi=ilp32 ASFLAGS = -march=rv32i -mabi=ilp32 LDFLAGS = --oformat=elf32-littleriscv %.S: %.c $(CROSS_COMPILE)gcc $(CFLAGS) -o $@ -S $< %.o: %.S $(CROSS_COMPILE)as $(ASFLAGS) -o $@ $< all: hugohw.elf hugohw.S: hugohw.c $(CROSS_COMPILE)gcc $(CFLAGS) -o $@ -S $< hugohw.elf: hugohw.o $(CROSS_COMPILE)gcc -o $@ $< clean: $(RM) hugohw.elf hugohw.o hugohw.S ``` #### 3. Output of cycle The red underlined part represents the printed cycle count. <s> ![](https://hackmd.io/_uploads/rk86kzFGp.png) </s> :::warning :warning: Don't put the screenshots which contain plain text only. Instead, utilize HackMD syntax to annotate the text. :notes: jserv ::: #### 4. Problem encountered during the process <s> ![](https://hackmd.io/_uploads/B11TlfKM6.png) </s> :::warning :warning: Don't put the screenshots which contain plain text only. Instead, utilize HackMD syntax to annotate the text. :notes: jserv ::: Solution: change the uppercase 'S' to lowercase in the makefile. ![](https://hackmd.io/_uploads/HJ7F-GtfT.png) ### Using optimization to optimize cycle count #### 1. Add -On command(n represent 0~3 or fast) ![](https://hackmd.io/_uploads/HkNR4MFfa.png) #### 2. Print out cycle count | Optimization level | Cycles count | | ----------------- | ------------ | | -O0 | 741 | | -O1 | 204 | | -O2 | 204 | | -O3 | 0 | | -Ofast | 0 | Analysis: I think the cycle count is 0 because the optimizer considers the part where I calculate the cycle count as non-essential code, so it optimizes it out. ### Checking cycle counts from Assembly code #### 1. Using get_cycles function ```c .text .globl get_cycles .align 2 get_cycles: csrr a1, cycleh csrr a0, cycle csrr a2, cycleh bne a1, a2, get_cycles ret .size get_cycles,.-get_cycles ``` #### 2. Makefile for Assembly code Using the assembler command converting .S file -> .o file Using the linker command converting .o file -> .elf file. ```c .PHONY: clean include ../../../mk/toolchain.mk CFLAGS = -march=rv32i_zicsr_zifencei -mabi=ilp32 ASFLAGS = -march=rv32i_zicsr_zifencei -mabi=ilp32 LDFLAGS = --oformat=elf32-littleriscv %.o: %.S $(CROSS_COMPILE)as $(ASFLAGS) -o $@ $< all: hugohw.elf hugohw.elf: hugohw.o $(CROSS_COMPILE)ld -o $@ -T hugohw.ld $(LDFLAGS) $< clean: $(RM) hugohw.elf hugohw.o hugohw.S ``` #### 3. rv32emu print out decimal Although the code mentioned above can print the cycle count, but the rv32emu doesn't support the printf function. Therefore, we need to first store the cycle count's individual digits and convert them into ASCII codes to display it. The code is display bellow: ```c # a0: integer of cycle # a1: number of bytes in buffer print_ascii: mv t0, a0 # load integer li t1, 0 # t1 = quotient li t2, 0 # t2 = reminder li t3, 10 # t3 = divisor mv t4, a1 # t4 = count round check_less_then_ten: bge t0, t3, divide mv t2, t0 mv t0, t1 # t0 = quotient j to_ascii divide: sub t0, t0, t3 addi t1, t1, 1 j check_less_then_ten to_ascii: addi t2, t2, 48 # reminder to ascii la t5, buffer # t5 = buffer addr addi t4, t4, -1 add t5, t5, t4 sb t2, 0(t5) # counter = 0 exit beqz t4, convert_loop_done li t1, 0 # refresh quotient j check_less_then_ten convert_loop_done: ret ``` #### 4. Output of cycle ![](https://hackmd.io/_uploads/SyWpfHYf6.png) #### 5. Problem encountered during the process &emsp; **Problem 1:** &emsp;&emsp; The error said that I redefined **`_start`** argument &emsp;&emsp; Reason : No linking file specified for the linker. ![](https://hackmd.io/_uploads/B12bqmKMT.png) &emsp; **Solution 1:** &emsp;&emsp; Add -T hugohw.ld in command. ![](https://hackmd.io/_uploads/H1alj7tMp.png) </br> &emsp; **Problem 2:** &emsp;&emsp; When executing the Makefile, the "csrr" instruction cannot be executed. &emsp;&emsp; Reason : Incorrect compiler architecture ![](https://hackmd.io/_uploads/S1q4yVYfT.png) &emsp; **Solution 2:** ![](https://hackmd.io/_uploads/ryC4J4FzT.png) </br> ### Rewrite and Optimize Assembly code